8
Panoramic Stereo Video Textures Vincent Couture Universit´ e de Montr´ eal [email protected] Michael S. Langer McGill University [email protected] ebastien Roy Universit´ e de Montr´ eal [email protected] Abstract A panoramic stereo (or omnistereo) pair of images pro- vides depth information from stereo up to 360 degrees around a central observer. Because omnistereo lenses or mirrors do not yet exist, synthesizing omnistereo images re- quires multiple stereo camera positions and baseline ori- entations. Recent omnistereo methods stitch together many small field of view images called slits which are captured by one or two cameras following a circular motion. How- ever, these methods produce omnistereo images for static scenes only. The situation is much more challenging for dy- namic scenes since stitching needs to occur over both space and time and should synchronize the motion between left and right views as much as possible. This paper presents the first ever method for synthesizing panoramic stereo video textures. The method uses full frames rather than slits and uses blending across seams rather than smooth- ing or matching based on graph cuts. The method produces loopable panoramic stereo videos that can be displayed up to 360 degrees around a viewer. 1. Introduction Stereo cameras capture two images of a scene from slightly different viewpoints, and when the stereo pair is displayed to a human viewer, one image to each eye, the images are fused and the disparities provide strong cues to scene depth, thereby enhancing the immersion experience. Stereo capture and display has a long and exciting history and we are now seeing a resurgence in popularity, especially in digital 3D cinema. The resurgence is, to a large extent, the result of recent advances made in computer vision and computer graphics which have been incorporated into many steps of the production pipeline [7]. Traditional stereo video uses two roughly parallel cam- eras, which maximizes the available stereo information near the optical axes. This paper addresses the more challeng- ing problem of capturing stereo video over a much wider field of view, up to 360 degrees, and synthesizing the videos into a stereo panorama. One application of such omnistereo videos is for display screens with a very wide field of view. In the extreme case of a 360 degree cylindrical screen, ob- servers would be be able to turn their gaze in any orientation and there could be more than one observer present, with dif- ferent observers looking in different directions at the same time. A second and more “every day” application of a 360 degree stereo video panorama would be to use a standard display such as a stereo computer monitor, and to allow the user to pan over the 360 degree view. An example would be a stereo-video extension of Google Street View, where the motion could be texture such as waves on a river or lake, trees blowing in the wind, or a flag waving. To capture stereo video in a wide range of directions, one could extend multi-camera systems. For example, the com- mercially available Ladybug [12] has five cameras to cover 360 degrees. One could extend such systems to stereo by doubling the number of cameras. Alternatively, one could attempt to combine previous computational approaches for static omnistereo and dynamic panoramas. We argue in Sec- tion 2, however, that one faces fundamental difficulties in doing so, related to stereo-motion synchronization. This leads us to take a different approach. The approach we introduce in this paper uses a conven- tional stereo video rig where the two cameras each follow a circular motion and capture a space-time volume of images. We show how to combine the full frames of these videos into left and right panoramic video textures. The method is most effective with localized stochastic motions such as leaves moving in the wind or waves on a lake. Because our method uses full frames, most of the scene is viewed by both cameras simultaneously which guarantees stereo syn- chronization for these points. The method produces omnis- tereo video textures that are several seconds long and are loopable in time [14]. The paper is organized as follows. Section 2 gives a brief overview of previous work on omnistereo and dy- namic panoramas. Section 3 gives the main details of our approach and outlines key differences from previous ap- proaches. Section 4 presents example results, using our method. We conclude in Section 5. 1

Panoramic Stereo Video Textures - McGill CIMlanger/MY_PAPERS/CoutureICCV2011.pdf · Panoramic Stereo Video Textures Vincent Couture Universite de Montr´ eal´ [email protected]

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Panoramic Stereo Video Textures - McGill CIMlanger/MY_PAPERS/CoutureICCV2011.pdf · Panoramic Stereo Video Textures Vincent Couture Universite de Montr´ eal´ chapdelv@iro.umontreal.ca

Panoramic Stereo Video Textures

Vincent CoutureUniversite de Montreal

[email protected]

Michael S. LangerMcGill University

[email protected]

Sebastien RoyUniversite de [email protected]

Abstract

A panoramic stereo (or omnistereo) pair of images pro-vides depth information from stereo up to 360 degreesaround a central observer. Because omnistereo lenses ormirrors do not yet exist, synthesizing omnistereo images re-quires multiple stereo camera positions and baseline ori-entations. Recent omnistereo methods stitch together manysmall field of view images called slits which are capturedby one or two cameras following a circular motion. How-ever, these methods produce omnistereo images for staticscenes only. The situation is much more challenging for dy-namic scenes since stitching needs to occur over both spaceand time and should synchronize the motion between leftand right views as much as possible. This paper presentsthe first ever method for synthesizing panoramic stereovideo textures. The method uses full frames rather thanslits and uses blending across seams rather than smooth-ing or matching based on graph cuts. The method producesloopable panoramic stereo videos that can be displayed upto 360 degrees around a viewer.

1. IntroductionStereo cameras capture two images of a scene from

slightly different viewpoints, and when the stereo pair isdisplayed to a human viewer, one image to each eye, theimages are fused and the disparities provide strong cues toscene depth, thereby enhancing the immersion experience.Stereo capture and display has a long and exciting historyand we are now seeing a resurgence in popularity, especiallyin digital 3D cinema. The resurgence is, to a large extent,the result of recent advances made in computer vision andcomputer graphics which have been incorporated into manysteps of the production pipeline [7].

Traditional stereo video uses two roughly parallel cam-eras, which maximizes the available stereo information nearthe optical axes. This paper addresses the more challeng-ing problem of capturing stereo video over a much widerfield of view, up to 360 degrees, and synthesizing the videosinto a stereo panorama. One application of such omnistereo

videos is for display screens with a very wide field of view.In the extreme case of a 360 degree cylindrical screen, ob-servers would be be able to turn their gaze in any orientationand there could be more than one observer present, with dif-ferent observers looking in different directions at the sametime. A second and more “every day” application of a 360degree stereo video panorama would be to use a standarddisplay such as a stereo computer monitor, and to allow theuser to pan over the 360 degree view. An example would bea stereo-video extension of Google Street View, where themotion could be texture such as waves on a river or lake,trees blowing in the wind, or a flag waving.

To capture stereo video in a wide range of directions, onecould extend multi-camera systems. For example, the com-mercially available Ladybug [12] has five cameras to cover360 degrees. One could extend such systems to stereo bydoubling the number of cameras. Alternatively, one couldattempt to combine previous computational approaches forstatic omnistereo and dynamic panoramas. We argue in Sec-tion 2, however, that one faces fundamental difficulties indoing so, related to stereo-motion synchronization. Thisleads us to take a different approach.

The approach we introduce in this paper uses a conven-tional stereo video rig where the two cameras each follow acircular motion and capture a space-time volume of images.We show how to combine the full frames of these videosinto left and right panoramic video textures. The methodis most effective with localized stochastic motions such asleaves moving in the wind or waves on a lake. Because ourmethod uses full frames, most of the scene is viewed byboth cameras simultaneously which guarantees stereo syn-chronization for these points. The method produces omnis-tereo video textures that are several seconds long and areloopable in time [14].

The paper is organized as follows. Section 2 givesa brief overview of previous work on omnistereo and dy-namic panoramas. Section 3 gives the main details of ourapproach and outlines key differences from previous ap-proaches. Section 4 presents example results, using ourmethod. We conclude in Section 5.

1

Page 2: Panoramic Stereo Video Textures - McGill CIMlanger/MY_PAPERS/CoutureICCV2011.pdf · Panoramic Stereo Video Textures Vincent Couture Universite de Montr´ eal´ chapdelv@iro.umontreal.ca

Figure 1. Half a frame (180o field of view out of 360o) of an omnistereo video of a field, shown in red/cyan anaglyph format.

2. Previous Work

Traditional omnistereo methods are designed to giveequal stereo resolution for all visual directions in a staticscene [5, 6, 8, 9, 10, 15]. These methods typically gatherseveral small field of view images called slits from one ortwo cameras rotating off-axis, and then make a mosaic ofthese slits. The slits typically cover one or two degrees, so200-400 slits are used to capture 360 degrees. Figure 2 il-lustrates an omnistereo system that uses two cameras. Analternative omnistereo configuration uses a single camerawith two slits, one to the left of center and one to the right[9]. This corresponds to two virtual cameras following acircle.

(a) (b)Figure 2. An omnistereo method that uses a rotating stereo pair ofparallel slit-cameras. (a) a pair of slits (b) slits of each camera arestitched together into a mosaic that covers 360◦ (only three slitsare shown).

It is not clear how/if one could generalize these staticomnistereo methods to dynamic scenes. In particular, onehas to deal with a well-known problem in stereo imagingthat objects at the edge of one camera’s field of view oftenare not seen in the other camera [7]. This problem is espe-cially difficult to solve if one uses slits, since every point isnear a slit boundary. Using a two-camera omnistereo sys-tem (with one slit per camera), the only way that a pointcould be visible in both slits at the same time is if the pointhappened to have a similar disparity as the slit. For one-camera omnistereo, the slits cannot be synchronized any-where since at any time the left and right slits are capturingdifferent parts of the scene.

We next turn to dynamic monocular panoramas. An ex-ample is the dynamosaicing method [13] which makes a

video mosaic by using graph cuts to compute a time evolv-ing surface in a video’s space-time volume. The surfacesare then stitched together to yield a video. A second graph-cut based approach [2] is panoramic video textures. Thismethod renders a video seen by a rotating camera. Ratherthan selecting and stitching together slices in the space-timevolume, it selects and stitches small space-time blocks. Inaddition to using graph cut matching to make the seamsless visible, it also uses gradient smoothing. This methodhas been shown to be effective for dynamic panoramas thatcontain waves on a lake, a flag in the wind, etc.

Regardless of whether one stitches together surfaces inXYT or small blocks, the key problem remains of how tosynchronize the left and right views. If one were to com-pute dynamic monocular panoramic textures for the left andright eye independently, using the above methods, there isno reason why the resulting panoramas would be synchro-nized, that is, there is no reason why the same scene events(e.g. a leaf blowing to the right) would appear at the sametime in the left and right views and would have the correctdisparity. One might try to extend the dynamic panoramamethods to also enforce a constraint on stereo motion con-sistency, but such an extension is not obvious and would sig-nificantly increase the (already large) computational com-plexity of these methods.

The approach that we take differs from previous ap-proaches in two fundamental ways. First, we use full videoframes rather than slits or small blocks. Using full framesreduces the stereo-motion synchronization problem, whicharises only near the boundaries of the regions being stitchedtogether. That is, using full frames rather than slits or smallblocks reduces the percentage of pixels that lie near theboundaries. The second difference is that, rather than usinggraph-cut based stitching or gradient smoothing to reducethe visual seam boundaries between regions, our methodsimply blends neighbouring regions together. many cases,namely the blended regions are not visually salient in prac-tice for motions that are stochastic and localized, such aswater flows or leaves in the wind. Although the blendingis visible in some cases if one scrutinizes the video, it istypically not visible in casual viewing.

Page 3: Panoramic Stereo Video Textures - McGill CIMlanger/MY_PAPERS/CoutureICCV2011.pdf · Panoramic Stereo Video Textures Vincent Couture Universite de Montr´ eal´ chapdelv@iro.umontreal.ca

3. Our approachThe panoramic stereo video problem begins with a stereo

video pair which has been captured by rotating a stereo rigaround a vertical axis. Each camera follows a circular pathsimilar to Figure 2. In each of the examples we present inSection 4, we capture a full 360 degrees. The videos areabout 2 minutes each, i.e. a few thousand stereo frames.

3.1. Camera calibration and video registration

Given the left and right videos, we first calibrate thestereo camera rig, both for the camera internals (focallength) and externals (position and orientation in eachframe). This calibration allows us to map pixels in the twocameras in each frame to pixels on a cylindrical projectionsurface (one for each camera). This yields left and rightXYT volumes, composed of frames that shift over time asthe cameras move. The camera calibration method and themapping from camera pixels to the cylindrical projectionpixels use standard computer vision techniques. Details arebriefly summarized as follows.

As a first approximation, we estimate camera parame-ters by ignoring camera translation and sub-sampling theframe sequence in time. We compute SIFT features in theseframes and compute homographies between frames usingRANSAC for robustness. We then estimate camera parame-ters (rotation, focal length) and perform bundle adjustment,taking radial distortions into account [4]. The next step isto improve and complete the previous estimates by consid-ering all the frames. We track features between frames, al-lowing for small camera translation, and perform anotherbundle adjustment that triangulates features in 3D [16].

The next section describes a few theoretical observationsunderlying our method. We then present the method it-self. For the sake of clarity, we begin with a simple casein which the stereo rig rotates at constant angular velocity.This would be the case if the camera motion were driven bya motor, for example. With uniform rotation, the boundarydirections of the calibrated space-time volume are two di-agonal planes of constant x-slope, namely lines in Figure 3.In Section 3.5 we will return to the more general case thatthe camera rotation may be non-uniform.

3.2. Motion paths and parallax

As the stereo rig rotates, the projection of a fixed pointin the scene moves across the image. This image motionis a combination of the motion of the scene point and thecamera’s rotation and translation. To understand this imagemotion, first consider two static scene points PZmin and P∞that enter the field of view at the same time and the same po-sition, namely at one edge of the frame. See Figure 4. Asthe rig rotates, motion parallax occurs and the x-pixel po-sitions of these two points diverge slightly as they cross the

Figure 3. A sequence of N frames captured by the left or rightcamera performing a full turnaround, after calibration and regis-tration. For simplicity, we assume in this figure that the rig rotatesat constant speed. The dashed and red lines represent the path fol-lowed by two static points, one far and one close, respectively (seeFig. 4), that enter and exit the field of view at the same time. Thetwo thick black lines represent the entry and exit frames [3].

(a) (b)Figure 4. The right camera only is shown. Let the camera rigrotate clockwise. (a) Two points PZmin and P∞ are shown thatenter the camera frame at the right edge at the same time (top).These points also exit the frame at the same time (right). As thecamera moves, the positions of the points drift across the frames.The depth difference of the two points leads to motion parallax.See expansion of the yellow circle in (b). The figure is not toscale, namely the camera baseline is typically much smaller thanthe distance to scene points and so parallax is typically very small.In this example, the camera field of view is 90o, but the argumentabout coincidence at the left and right edge holds for any field ofview size.

frame. The x positions converge and meet again when thepoints leave the field of view at the opposite edge, namely

Page 4: Panoramic Stereo Video Textures - McGill CIMlanger/MY_PAPERS/CoutureICCV2011.pdf · Panoramic Stereo Video Textures Vincent Couture Universite de Montr´ eal´ chapdelv@iro.umontreal.ca

when the camera center again lies on the line connecting thetwo points. In practice, the separation that is due to this mo-tion parallax is maximum at the center of the frame and is afew pixels only [3]. This parallax magnitude is comparableto that seen by a person who translates his head slightly, asin normal posture adjustment.

The space-time paths of the two points are sketched inFigure 3. The dashed vertical line traces the constant visualdirection of the point at infinity P∞. The red curve tracesthe changing visual direction of the point PZmin

which is ata finite distance.

In addition to the horizontal parallax just discussed, therecan also be a slight vertical parallax. For example, a pointat infinity that enters the field of view at the top right cornerof a frame will leave the frame at the top left corner, butif a point that is a finite distance were to enter at the topright corner at the same time then it would leave the frameearlier, namely at the vertical edge (before it reaches thetop left corner). This vertical parallax is zero for points onthe horizontal mid-line and increases to a few pixels hightoward the upper and lower corners. 1

A few other observations about motion paths are worthmentioning. First, our arguments above follow from geome-try illustrated in Figure 4 and do not require that the camerarotation speed is uniform, e.g. the paths of the two pointsin Figure 3 meet at the boundaries, regardless of whetherthe boundaries are straight or curved. Second, the above ar-gument considers static scene points only. For scene pointsthat are moving, the image paths will depend on the paral-lax just discussed and on the scene motion. Only the lat-ter creates synchronization problems at frame boundaries,as we discuss in the next section. Third, the motion pathsdiscussed above were for one camera only. How are themotion paths for the two cameras related? Points that are afinite distance away will enter each camera’s field of view atslightly different frames. This is the well-known monocularproblem at frame boundaries where some points are onlyseen by one of the two cameras because of image dispari-ties. In addition, the shape of the corresponding red curveswill be slightly different for the two eyes, which causes dis-parities to vary slightly over time. The parallax effects arevery small and in our examples there are visible only undercareful scrutiny.

3.3. Partition and alignment of space-time blocks

Suppose at this point that calibration and registrationhas been done, so that the pixels in each frame have beenremapped to the cylindrical projection surface. Let the two

1 The amount of horizontal and vertical parallax depends on cameraresolution, field of view and the range of scene depths. For HD camerashaving a 60 degree field of view and scene depths ranging for 2m to infinity,it can be shown that maximum parallax is about 5 pixels wide and 7 pixelshigh.

videos have N frames each. In the case that the cameraturns 360 degrees, frame N is where the camera completesthe full turnaround. Thus, frameN would be registered withframe 0.

At this stage, if we were to display the image sequencein stereo on a cylindrical projection screen, we would seethe scene through a window translating slowly over time,namely we would see the scene in stereo as captured by therotating camera and projected in the correct direction. Atany time, we would see only the field of view of the stereocamera, however. The problem that we are solving is to takethis stereo video and make a panorama stereo video from it,which is defined over the entire cylinder and at every time.

The main idea of our method is to partition each of thestereo XYT volumes into blocks (parallelipeds), and thento stitch the blocks together to form left and right video tex-tures. To explain how this partitioning and stitching works,we continue for now with the simplified case that the cam-era rotation is uniform.

Suppose that it takes T frames for any point to enter thefield of view at the right edge and exit the field of view at theleft edge. (This time is constant when the camera rotationspeed is constant, and the scene point is static.) We partitionthe entire image sequence into a set of consecutive blocks,each of T frames. We then re-align the blocks so that theyall start at the same frame in the video texture. See Figure5. In this example, the entire video is 120 seconds and ispartitioned into five blocks that are 24 seconds each.

(a)

(b)Figure 5. For a frame sequence captured by a camera performinga full turnaround in N = 5T seconds at constant speed. (a) Thefull original space-time volume divided in five non-overlappingblocks. (b) The blocks are aligned to start at the same time.

Page 5: Panoramic Stereo Video Textures - McGill CIMlanger/MY_PAPERS/CoutureICCV2011.pdf · Panoramic Stereo Video Textures Vincent Couture Universite de Montr´ eal´ chapdelv@iro.umontreal.ca

Consider a scene point at infinity that enters the field ofview somewhere on the right diagonal of the first block.Since this point is within the field of view for T frames,its path extends beyond the first block. When the blocks arealigned so that they all start at the same frame, the verticalpath followed by this point wraps around from frame T toframe 0 and again forms a vertical line. See Figure 6.

Recalling the arguments of Section 3.2, if a static scenepoint at a finite depth were to enter the space-time volumeat the same frame, then it would take a curved path insteadof a vertical path. The curved path would also wrap aroundfrom frame T to frame 0, and rise again to meet the verti-cal dashed line at the diagonal boundary. Thus, the pathsof static points in the scene would be continuous both attheir temporal boundaries (allowing the video to loop witha period of T frames) and also at the seams that define theframe’s spatial boundaries, i.e. the diagonals.

Figure 6. Motion paths of two static objects, one far away (centraldashed vertical line) and one close-by (red curve). In both cases,the motion paths are continuous and loop. The video goes fromframe 0 to T-1 and then loops so that frame T equals frame 0.

The continuity at the temporal boundary (looping) doesnot depend on any assumptions about the points being staticin the scene, nor does it depend on the camera rotationspeed being constant. This looping property at the tempo-ral boundaries always holds.2 The continuity at the frameboundary, though, often does not always hold exactly. Dis-continuities can occur when there is scene motion (see Fig.7) as just discussed, and also when there is vertical parallax,or lighting changes over time, or exposure changes due to acamera aperture change e.g. if one is in in shutter prioritymode.

How can one avoid such visual seams? Existing monoc-ular methods that render dynamic mosaics [2, 13] attemptto minimize seams both in space and time by using graphcut matching and/or gradient smoothing. As we discussed

2 The only exception occurs when the frame at 360o loops to the frameat 0o. In this case, if there are moving scene points at these limit framesand/or the lighting changes, then the video will not be loopable at thesepoints. The problem could be lessened by starting the capture in a directionin which the scene is static, or using a blending technique similar to whatwe discuss next. Similarly, if the panorama is less than 360 degrees and thescene has motion at frames 0 (or N ), our method will not produce loopingthere.

Figure 7. For an object moving in time in a small area (a leaf forinstance), the motion is continuous at the temporal boundary (hor-izontal edge), but there will be a motion discontinuity at the spatialboundary (seam), namely the diagonal edge.

in Section 2, however, it is unclear whether such methodscould be extended to dynamic stereo since such methodsuse thin slits or small blocks, and there are fundamentaldifficulties in stereo motion synchronization in these cases.Our approach is to avoid boundaries as much as possible, byusing full frames rather than slits or small blocks. We stillneed to stitch boundaries together, however, and for this weuse blending as we describe next.

3.4. Blending adjacent blocks

To blend adjacent blocks, we decrease the duration Tof each block and shift the block by the number of pixelscovered during that decrease in duration. See Figure 8 foran illustration of what happens for static scene points, andsee Figure 9 for the case of a moving scene point. In thesefigures, T has been decreased from 24 to 20 seconds. In theoverlap region, we blend the frames together using a simplelinear ramp function (see Figure 10).

Figure 8. Images are blended near the boundaries of two blocks.

The reason we use blending, rather than a more sophis-ticated technique such as gradient based smoothing [11], isthat it seemed to be sufficient. Although blending does leadsto a duplication of the points – or “ghosting” – the duplica-tion is typically not perceived, unless one is looking for it.There are several reasons for this. First, the blending is con-tinuous over time, with one copy fading out over time andthe other copy fading in, and so there are no salient lowlevel features that grab the eye such as block edges or tem-poral popping. Second, in the case of motion texture such as

Page 6: Panoramic Stereo Video Textures - McGill CIMlanger/MY_PAPERS/CoutureICCV2011.pdf · Panoramic Stereo Video Textures Vincent Couture Universite de Montr´ eal´ chapdelv@iro.umontreal.ca

Figure 9. For an object moving in time (a leaf, for instance), mo-tion is blended over the overlap between the two blocks.

Figure 10. Blending function for overlapping frames.

leaves or grass blowing, or waves evolving, the “texels” of-ten change over time or undergo occlusions and so are notvisually isolated and trackable entities. Third, the blend-ing window translates which results in further variation thatmay mask the duplication of texels.

3.5. Non-uniform camera rotation and blending

We next turn to the more general case that the camera rigis rotating at a non-uniform speed, and so the boundariesof the space-time volume are not straight (see Fig. 11). Tohandle the non-uniform speed, we continue to use a constantduration T for all blocks, but now we vary the blendingoverlap. The blending width depends on the frame-to-frameoverlap at each boundary, which corresponds to the distancebetween two adjacent diagonal curves in Figure 11(b).

To ensure there is some overlap between each pair of ad-jacent frames, T must be chosen carefully. Let d(i, i′) bethe angular distance (in units of pixels on the cylindricalprojection surface) travelled by the camera between framesi and i′. To be conservative, we require that, for all framesj, the distance d(j, j+T ) is less than or equal to some cho-sen fraction α of the width W of the original frame. Thisensures a blending overlap of at least (1 − α)W pixels be-tween frames. Given α and W , a sufficient condition on Tthat ensures some overlap is that, for all frames j,

j+T−1∑i=j

d(i, i+ 1) ≤ αW .

In our experiments, we chose α = 0.8 which ensures aminimum overlap of 20%. The result is that the overlapis smaller when the stereo rig rotates faster and the overlapis larger when the rig rotates slower.

(a)

(b)Figure 11. Similar to Figure 5 except that here the camera rig ro-tational velocity is not constant. The blocks are no longer aligned.Blending over frame boundaries is used to reduce the visibility ofseams.

3.6. Stereo motion synchronization

We have discussed how we partition and blend the blocksof each camera’s video. But to what extent are the resultingleft and right panoramic videos synchronized ? For scenepoints that are imaged simultaneously by the left and rightcameras, stereo-motion is automatically synchronized sincewe are using the full frames. The asynchronization occursonly near the frame boundaries, namely for points visible toone camera but not the other at a given time. Note howeverthat not all points near the frame boundaries are monocularand asynchronous. For example, if the cameras are exactlyparallel, then points at infinity always will be binocular. Itfollows that the vast majority of visible points in each framewill be synchronized between the left and right cameras.

Finally, for those points that are imaged near a frameboundary at some given time, it is important to distinguishsynchronization issues from blending issues. Blending canintroduce duplicated scene points in either camera’s video.But if these duplicated points are seen simultaneously in theleft and right views then they will be synchronized, namelythere will be two blended but distinct stereo copies of thescene points and the disparities of each copy will be correct.We have found that this ghosting is visible mainly when the

Page 7: Panoramic Stereo Video Textures - McGill CIMlanger/MY_PAPERS/CoutureICCV2011.pdf · Panoramic Stereo Video Textures Vincent Couture Universite de Montr´ eal´ chapdelv@iro.umontreal.ca

duplicated object is a visually isolated and trackable scenefeature such as a single tree branch. But even then, it isoften only noticeable when one is looking for it.

4. ResultsIn our experiments, we used two Canon HFS11 cameras

on a fixed tripod. This allowed camera rotation around analmost vertical axis (y axis). The distance between the cen-ters of both lens was about 6.5 cm, similar to the typicaldistance between human eyes. To synchronize frame cap-ture as well as zoom, both cameras were controlled throughthe LANC protocol. (A LANC Shepherd was connected tothe cameras by RA-V1 remote control adaptors.)

To speed up the experiments, we down-sampled the HDoriginal content, from 1920 × 1080 resolution to 960 ×540. The final panoramic video is high-resolution at about6500 × 540 pixels per eye. A GPU was used for thedense frame calibration and the blending. Each exampletook about an hour to render, separated about evenly be-tween calibration and blending, on a laptop with an NVidiaGeForce 8400M graphics card and an Intel dual core T75002.2 Ghz CPU and 2GB of RAM. Both steps could be accel-erated by having a separate thread handling disk operations(loading and saving frames). Moreover, both the calibra-tion and the blending steps have low memory requirements.Every output frame of the video texture can be blendedin parallel, which allows the method to render very high-resolution 360 degree textures. This contrasts with otherapproaches [2, 13] that require solving a large minimizationover the whole space-time volume.

We present two examples: one containing a river andanother containing a field with blowing tall grass. See Figs.1 and 12 for half of a single frame of each video (full videosare available online at [1]). The reason we show half a frameonly is that the 12:1 aspect ratio (horizontal:vertical) of theentire frame is very large and the vertical dimension wouldbe excessively squeezed.

Figure 12 shows a single frame from the left camera’spanoramic video texture and compares (a) no blending, ver-sus (b) blending. At first glance, the seams in (b) are slightlyvisible when seen below (a). However, this is an illusorycontour effect. The reader should cover up (a) when exam-ining (b).

To fully appreciate the stereo effects, the videos shouldbe displayed with correct perspective. We have projectedthem on a cylindrical screen made of a silver fabric thatmaintains light polarization. The screen is about 1.5m highwith a 4.5m diameter. A multiprojection system [18, 17]was setup with half the projectors polarized horizontallyand the other half polarized vertically, and viewed withglasses for polarized projection. To our knowledge, this isthe first time that a 360 panoramic stereo video texture hasbeen captured, computed and displayed.

Finally, although our method is motivated by the prob-lem of stereo video panoramas, it also applies to themore specific problems of static omnistero and to dynamicmonocular panoramas. For example, we tested our methodon monocular input videos from [2], which were shot bya single camera rotating on a tripod. (These sequences donot have parallax since the camera undergoes pure rotation.)The result for the Yachts sequence is available online at [1].In this example, camera rotation stops at a few discrete po-sitions which causes blending overlaps to increase consider-ably. Nonetheless, our method produces very good results.

5. Conclusion

This paper has introduced a method for computingpanoramic stereo video textures using a pair of off-the-shelfconsumer video cameras. There are several key ideas to themethod. First, it uses the full frame video which gives au-tomatic stereo motion synchronization for the vast majorityof the visible points. This synchronization issue would beproblematic if one were to use slits or small blocks as in pre-vious stereo or motion panorama methods. Second, ratherthan using graph cuts and/or smoothing to stitch togetherseams from different parts of each camera’s XYT volume,we use simple blending. While blending can create dupli-cates of points, this duplication is typically not visible sincethe blending is continuous and the blended points are partof a texture.

The main limitations of our method are similar to thosefaced by other stereo capture methods and mosaicing meth-ods. For stereo capture, there is always a problem of whatto do with monocular points at the frame boundary. Formosaicing, there is always the problem of how to stitchboundaries together so that the image is smooth. For thelatter problem, one might try to improve our method us-ing a more sophisticated smoothing technique, rather thanblending, though in many cases this is unnecessary sinceblending works well. Finally, like other dynamic mosaic-ing methods, we assume motion texture (defined loosely),rather than structured aperiodic motion. When the latter ispresent – for example, an isolated moving object – it wouldbe smoothly blended in or out as the frame boundary passesover it.

References

[1] http://vision3d.iro.umontreal.ca/en/projects/omnistereo/. 7

[2] A. Agarwala, K. Zheng, C. Pal, M. Agrawala,M. Cohen, B. Curless, D. Salesin, and R. Szeliski.Panoramic video textures. ACM Transactions onGraphics, 24(3):821–827, 2005. 2, 5, 7

Page 8: Panoramic Stereo Video Textures - McGill CIMlanger/MY_PAPERS/CoutureICCV2011.pdf · Panoramic Stereo Video Textures Vincent Couture Universite de Montr´ eal´ chapdelv@iro.umontreal.ca

(a)

(b)Figure 12. Half a frame (180o field of view out of 360o) of one camera’s panoramic video with (a) no overlap between the blocks. (b) aminimum overlap of 20% between blocks. The overlap blends motion discontinuities as well as lighting changes.

[3] V. Couture, M. S. Langer, and S. Roy. Capturingnon-periodic omnistereo motions. In 10th Workshopon Omnidirectional Vision, Camera Networks andNon-classical Cameras (OMNIVIS), Zaragoza, Spain,2010. 3, 4

[4] R. I. Hartley and A. Zisserman. Multiple View Geom-etry in Computer Vision. Cambridge University Press,second edition, 2004. 3

[5] H.-C. Huang and Y.-P. Hung. Panoramic stereoimaging system with automatic disparity warping andseaming. Graphical Models and Image Processing,60(3):196–208, 1998. 2

[6] H. Ishiguro, M. Yamamoto, and S. Tsuji. Omni-directional stereo. IEEE Transactions on PatternAnalysis and Machine Intelligence, 14(2):257–262,1992. 2

[7] B. Mendiburu. 3D Movie Making: Stereoscopic Dig-ital Cinema from Script to Screen. Focal Press, 2009.1, 2

[8] T. Naemura, M. Kaneko, and H. Harashima. Multi-user immersive stereo. IEEE International Conferenceon Image Processing, 1:903, 1998. 2

[9] S. Peleg and M. Ben-Ezra. Stereo panorama with asingle camera. IEEE Conference on Computer Visionand Pattern Recognition, 1:1395, 1999. 2

[10] S. Peleg, M. Ben-Ezra, and Y. Pritch. Omnistereo:Panoramic stereo imaging. IEEE Transactions on Pat-tern Analysis and Machine Intelligence, 23(3):279–290, 2001. 2

[11] P. Perez, M. Gangnet, and A. Blake. Poisson im-age editing. In SIGGRAPH Conference Proceedings,

pages 313–318, New York, NY, USA, 2003. ACMPress. 5

[12] Point Grey Research. Ladybug3, 2008. 1

[13] A. Rav-Acha, Y. Pritch, D. Lischinski, and S. Peleg.Dynamosaicing: Mosaicing of dynamic scenes. IEEETransactions on Pattern Analysis and Machine Intelli-gence, 29(10):1789–1801, 2007. 2, 5, 7

[14] A. Schodl, R. Szeliski, D. Salesin, and I. Essa. Videotextures. In SIGGRAPH Conference Proceedings,pages 489–498, New York, NY, USA, 2000. ACMPress/Addison-Wesley Publishing Co. 1

[15] S.M. Seitz, A. Kalai, and H.Y. Shum. Omnivergentstereo. International Journal of Computer Vision,48(3):159–172, July 2002. 2

[16] N. Snavely, S. Seitz, and R. Szeliski. Photo tourism:Exploring photo collections in 3d. In SIGGRAPHConference Proceedings, pages 835–846, New York,NY, USA, 2006. ACM Press. 3

[17] J.-P. Tardif and S. Roy. A MRF formulation for codedstructured light. International Conference on 3-D Dig-ital Imaging and Modeling, pages 22–29, 2005. 7

[18] J.-P. Tardif, S. Roy, and M. Trudeau. Multi-projectorsfor arbitrary surfaces without explicit calibration norreconstruction. International Conference on 3-D Dig-ital Imaging and Modeling, pages 217–224, 2003. 7