8
Non-Uniform Crosstalk Reduction for Dynamic Scenes F.A. Smit * Center for Mathematics and Computer Science R. van Liere Center for Mathematics and Computer Science B. Froehlich Bauhaus-Universit ¨ at Weimar ABSTRACT Stereo displays suffer from crosstalk, an effect that reduces or even inhibits the viewer’s ability to correctly perceive depth. Previous work on software crosstalk reduction focussed on the preprocessing of static scenes which are viewed from a fixed viewpoint. However, in virtual environments scenes are dynamic, and are viewed from various viewpoints in real-time on large display areas. In this paper, three methods are introduced for reducing crosstalk in virtual environments. A non-uniform crosstalk model is de- scribed, which can be used to accurately reduce crosstalk on large display areas. In addition, a novel temporal algorithm is used to ad- dress the problems that occur when reducing crosstalk in dynamic scenes. This way, high-frequency jitter caused by the erroneous as- sumption of static scenes can be eliminated. Finally, a perception based metric is developed that allows us to quantify crosstalk. We provide a detailed description of the methods, discuss their trade- offs, and compare their performance with existing crosstalk reduc- tion methods. Keywords: Crosstalk, Ghosting, Stereoscopic display Index Terms: I.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism—Virtual Reality I.3.3 [Computer Graphics]: Picture/Image Generation—Display Algorithms 1 I NTRODUCTION Stereoscopic display systems allow the user to see three dimen- sional images in virtual environments. For active stereo with CRT monitors, Liquid Crystal Shutter (LCS) glasses are used in com- bination with a frame sequential display of left and right images. When the left image is displayed the right eye cell of the LCS glasses goes opaque and the left eye cell becomes clear, and vice versa. Stereo systems suffer from a disturbing effect called crosstalk or ghosting. Crosstalk occurs when one eye receives a stimulus which was intended for the other eye. This produces a visible shadow on the image that reduces, or even inhibits, the viewer’s ability to cor- rectly perceive depth. The effect is most noticeable at high contrast boundaries with large disparities. Three main sources of crosstalk can be identified: phosphor af- terglow, LCS leakage and LCS timing [9]. Typical phosphors used in CRT monitors do not extinguish immediately after excitation by the electron beam, but decay slowly over time. Therefore, some of the pixel intensity in one video frame may still be visible in the subsequent video frame. Also, LCS glasses do not go completely opaque when occluding an eye, but still allow a small percentage of light to leak through. Finally, due to inexact timing and non- instantaneous switching of the LCS glasses, one eye may perceive some of the light intended for the other eye. The combination of these effects causes an undesirable increase in intensity in the im- age, which is visible as crosstalk. The amount of crosstalk increases * e-mail: [email protected] e-mail: [email protected] e-mail: [email protected] drastically towards the bottom of the display area in a non-linear fashion. Hence, crosstalk is non-uniform over the display area. To enhance depth perception we want to eliminate or reduce the effect of crosstalk. One way to achieve this is by using better, spe- cialized hardware, such as display devices with faster phosphors and higher quality LCS glasses. However, this type of hardware is relatively expensive and might not eliminate crosstalk completely. An alternative solution is to reduce the effect of crosstalk in soft- ware by processing and adjusting the image frames that are to be displayed. The governing idea is to subtract an amount of intensity from each pixel in the displayed image to compensate for the leakage of intensity from the previous video frame. A pre-condition here is that the displayed pixels have enough initial intensity to subtract from. If this is not the case, the overall intensity of the image has to be artificially increased, thereby loosing contrast. As such, there appears to be a tradeoff between loss of contrast and the amount of crosstalk reduction possible. Previous work on software crosstalk reduction mostly focussed on the preprocessing of static scenes with a fixed viewpoint [4] [5]. However, what is needed is crosstalk reduction in virtual environ- ments; i.e. for dynamic scenes, various viewpoints and in real-time. Since large screens with different viewpoints are used, crosstalk can no longer be assumed to be uniform over the entire display area. Furthermore, as virtual environments operate in real-time, crosstalk reduction algorithms must be fast enough to run in real-time. After having implemented an initial version of the crosstalk re- duction algorithm, we discovered some high frequency jitter for fast moving objects. This effect was not visible when crosstalk reduc- tion was disabled. No such problem has previously been mentioned in the literature. After further investigation, we found this to be a defect in the crosstalk reduction algorithm due to the assumption of static scenes. All previous crosstalk reduction methods correct a single pair of left and right application frames, which is valid only for static scenes. For dynamic scenes the application frames change slightly over time, and are displayed in sequence with preceding application frames. Therefore, crosstalk induced from previous ap- plication frames needs to be considered. Upon closer inspection this turns out to be a problem that is related to the video refresh rate (e.g. 100 Hz) being different than the application frame rate (e.g. 20 Hz). In this case there is no longer a one-to-one correspondence between video frames and application frames. In this paper, our contribution is threefold: A non-uniform crosstalk model, along with a procedure for interactive user calibration of the model parameters. The method is based on a combination of the non-uniform crosstalk characteristics described by Woods and Tan [9] and the calibration procedure proposed by Konrad et al. [4]. This addresses the problem of non-uniform crosstalk over the dis- play area. A real-time, temporal crosstalk reduction algorithm, which addresses the problem of changing frames in dynamic scenes. We developed an efficient approach to consider the temporal high-frequency jitter, and were able to completely remove it. The resulting algorithm runs entirely on the GPU and is fast enough to operate in real-time.

3D Crosstalk Reduction

Embed Size (px)

Citation preview

Page 1: 3D Crosstalk Reduction

Non-Uniform Crosstalk Reduction for Dynamic Scenes

F.A. Smit∗

Center for Mathematics and Computer Science

R. van Liere†

Center for Mathematics and Computer Science

B. Froehlich‡

Bauhaus-Universitat Weimar

ABSTRACT

Stereo displays suffer from crosstalk, an effect that reduces or eveninhibits the viewer’s ability to correctly perceive depth. Previouswork on software crosstalk reduction focussed on the preprocessingof static scenes which are viewed from a fixed viewpoint. However,in virtual environments scenes are dynamic, and are viewed fromvarious viewpoints in real-time on large display areas.

In this paper, three methods are introduced for reducing crosstalkin virtual environments. A non-uniform crosstalk model is de-scribed, which can be used to accurately reduce crosstalk on largedisplay areas. In addition, a novel temporal algorithm is used to ad-dress the problems that occur when reducing crosstalk in dynamicscenes. This way, high-frequency jitter caused by the erroneous as-sumption of static scenes can be eliminated. Finally, a perceptionbased metric is developed that allows us to quantify crosstalk. Weprovide a detailed description of the methods, discuss their trade-offs, and compare their performance with existing crosstalk reduc-tion methods.

Keywords: Crosstalk, Ghosting, Stereoscopic display

Index Terms: I.3.7 [Computer Graphics]: Three-DimensionalGraphics and Realism—Virtual Reality I.3.3 [Computer Graphics]:Picture/Image Generation—Display Algorithms

1 INTRODUCTION

Stereoscopic display systems allow the user to see three dimen-sional images in virtual environments. For active stereo with CRTmonitors, Liquid Crystal Shutter (LCS) glasses are used in com-bination with a frame sequential display of left and right images.When the left image is displayed the right eye cell of the LCSglasses goes opaque and the left eye cell becomes clear, and viceversa.

Stereo systems suffer from a disturbing effect called crosstalk orghosting. Crosstalk occurs when one eye receives a stimulus whichwas intended for the other eye. This produces a visible shadow onthe image that reduces, or even inhibits, the viewer’s ability to cor-rectly perceive depth. The effect is most noticeable at high contrastboundaries with large disparities.

Three main sources of crosstalk can be identified: phosphor af-terglow, LCS leakage and LCS timing [9]. Typical phosphors usedin CRT monitors do not extinguish immediately after excitation bythe electron beam, but decay slowly over time. Therefore, someof the pixel intensity in one video frame may still be visible in thesubsequent video frame. Also, LCS glasses do not go completelyopaque when occluding an eye, but still allow a small percentageof light to leak through. Finally, due to inexact timing and non-instantaneous switching of the LCS glasses, one eye may perceivesome of the light intended for the other eye. The combination ofthese effects causes an undesirable increase in intensity in the im-age, which is visible as crosstalk. The amount of crosstalk increases

∗e-mail: [email protected]†e-mail: [email protected]‡e-mail: [email protected]

drastically towards the bottom of the display area in a non-linearfashion. Hence, crosstalk is non-uniform over the display area.

To enhance depth perception we want to eliminate or reduce theeffect of crosstalk. One way to achieve this is by using better, spe-cialized hardware, such as display devices with faster phosphorsand higher quality LCS glasses. However, this type of hardware isrelatively expensive and might not eliminate crosstalk completely.An alternative solution is to reduce the effect of crosstalk in soft-ware by processing and adjusting the image frames that are to bedisplayed.

The governing idea is to subtract an amount of intensity fromeach pixel in the displayed image to compensate for the leakageof intensity from the previous video frame. A pre-condition hereis that the displayed pixels have enough initial intensity to subtractfrom. If this is not the case, the overall intensity of the image hasto be artificially increased, thereby loosing contrast. As such, thereappears to be a tradeoff between loss of contrast and the amount ofcrosstalk reduction possible.

Previous work on software crosstalk reduction mostly focussedon the preprocessing of static scenes with a fixed viewpoint [4] [5].However, what is needed is crosstalk reduction in virtual environ-ments; i.e. for dynamic scenes, various viewpoints and in real-time.Since large screens with different viewpoints are used, crosstalk canno longer be assumed to be uniform over the entire display area.Furthermore, as virtual environments operate in real-time, crosstalkreduction algorithms must be fast enough to run in real-time.

After having implemented an initial version of the crosstalk re-duction algorithm, we discovered some high frequency jitter for fastmoving objects. This effect was not visible when crosstalk reduc-tion was disabled. No such problem has previously been mentionedin the literature. After further investigation, we found this to be adefect in the crosstalk reduction algorithm due to the assumptionof static scenes. All previous crosstalk reduction methods correct asingle pair of left and right application frames, which is valid onlyfor static scenes. For dynamic scenes the application frames changeslightly over time, and are displayed in sequence with precedingapplication frames. Therefore, crosstalk induced from previous ap-plication frames needs to be considered. Upon closer inspectionthis turns out to be a problem that is related to the video refresh rate(e.g. 100 Hz) being different than the application frame rate (e.g.20 Hz). In this case there is no longer a one-to-one correspondencebetween video frames and application frames.

In this paper, our contribution is threefold:

• A non-uniform crosstalk model, along with a procedurefor interactive user calibration of the model parameters.The method is based on a combination of the non-uniformcrosstalk characteristics described by Woods and Tan [9] andthe calibration procedure proposed by Konrad et al. [4]. Thisaddresses the problem of non-uniform crosstalk over the dis-play area.

• A real-time, temporal crosstalk reduction algorithm, whichaddresses the problem of changing frames in dynamic scenes.We developed an efficient approach to consider the temporalhigh-frequency jitter, and were able to completely remove it.The resulting algorithm runs entirely on the GPU and is fastenough to operate in real-time.

Page 2: 3D Crosstalk Reduction

• A quantitative evaluation methodology for the perceptualquality of crosstalk reduction. Such a method is useful forthe development and comparison of various crosstalk reduc-tion techniques. The proposed evaluation method measuresthe amount of perceptually disturbing crosstalk, allowing usto make quantitative assessments about the quality of variouscrosstalk reduction algorithms.

The methods in this paper have been applied to the PersonalSpace Station [7], a desktop virtual environment that consists ofa head tracked user behind a 22 inch CRT monitor using activestereo. However, we will show that these methods can be beneficialto other virtual environments, in particular those environments withlarge display areas.

2 RELATED WORK

Lipscomb and Wooten [5] described a method to reduce crosstalkin software. The background intensity is artificially increased, afterwhich crosstalk is reduced by decreasing pixel intensities accordingto a specifically constructed function. The screen is divided into16 horizontal bands, and the amount of crosstalk reduction is ad-justed for each band to account for the non-uniformity of crosstalkover the screen. Also, some additional adjustments are made in thecase of thin lines. Although a non-uniform model is used, the dif-ficulty with this method is determining proper function parametersthat provide maximum crosstalk reduction for each of the 16 dis-crete bands. We propose a non-uniform model that is continuousand can be interactively calibrated.

A calibration based method was proposed by Konrad et al. [4].First, the amount of crosstalk caused by an unintended stimuluson a prespecified intended stimulus is measured by a psychovisualuser experiment. The viewer has to match two rectangular regionsin the center of the screen in color. One contains crosstalk andthe other does not. After matching the color, the actual amount ofcrosstalk can be determined. The procedure is repeated for severalvalues of intended and unintended stimuli, resulting in a two dimen-sional look-up table. This look-up table is inverted in a preprocess-ing stage, after which crosstalk can be reduced by decreasing pixelintensities according to the inverted table values. Optionally, pixelintensities are artificially increased by a contrast reducing mappingto allow for greater possibility of crosstalk reduction. A drawbackof this method is that it assumes crosstalk is uniform over the heightof the screen. Our method uses a similar calibration procedure, butis based on a non-uniform crosstalk model.

Klimenko et al. [3] implemented real-time crosstalk reductionfor passive stereo systems. Three separate layers are combined us-ing hardware texture blending functions. The first layer containsthe unmodified left or right image frame to be rendered. The sec-ond layer is a simple intensity increasing, additive layer to allowfor subsequent subtraction. Finally, the left image is rendered ontothe right image as a subtractive alpha layer and vice versa. The al-pha values are constant but different for each color channel. Thisis a linearized version of the subtractive model of Lipscomb andWooten [5]. Although the method works in real-time, it does nottake into account the interdependencies between subsequent imageframes. We have improved this with a temporal model. Also, withconstant alpha values the model is uniform over the screen, andsome effort is needed to determine the proper alpha values. Ourmethod is continuous non-uniform and can be interactively cali-brated. Futhermore, the model and implementation are completelylinear due to hardware restrictions. By using modern GPU fragmentprograms and frame buffer objects we avoid such restrictions, andare able to implement a non-linear correction model completely onthe GPU.

Woods and Tan [9] studied the various causes and characteristicsof crosstalk. They showed that most CRT display devices use phos-phors with very similar characteristics, such as spectral response

and decay times. However, there was a considerable amount of vari-ation in the quality of LCS glasses. The resulting crosstalk modelshows that the amount of crosstalk due to phosphor afterglow in-creases heavily towards the bottom-end of the screen. Crosstalkdue to leakage of the LCS glasses is almost uniform over the first80% of the display, but slightly reduces in the bottom 20%. Finally,to evaluate the real amount of crosstalk, photographs of the screenwere taken through the LCS glasses using a digital camera. Ourquantitative evaluation method to determine the quality of crosstalkreduction is based on this approach. Also, we examined the valid-ity of our non-uniform model by comparing it with the describedcrosstalk characteristics.

Daly [1] proposed the Visible Differences Predictor (VDP) toestimate the perceptual difference between two still images. Twoimages are taken as input, and an output image containing the prob-ability of perceivable difference for each pixel is produced. The al-gorithm operates using a frequency domain weighting with the hu-man contrast sensitivity function, followed by a series of detectionmechanisms based on the human visual system. We will comparedigital photographs taken through the LCS glasses in this mannerto estimate the perceptual quality of crosstalk reduction methods.

3 METHODS

In this section we will give a detailed technical description of ourmethods. First, we will describe the non-uniform crosstalk modeland its calibration procedure in Section 3.1. Next, we describe howto use the calibration data for the implementation of crosstalk re-duction for temporal scenes in Section 3.2. Finally, our quantitativeevaluation method is described in Section 3.3.

3.1 Crosstalk Model and Calibration

The amount of perceived crosstalk can vary depending on a numberof factors, such as lighting, monitor brightness and different hard-ware. Most crosstalk reduction methods, for example the methoddescribed by Lipscomb and Wooten [5], need to be adjusted forvarying conditions. However, it might not be immediately clear tothe users exactly how to do this. Therefore, a simple way of cali-brating the crosstalk reduction algorithm is desirable.

To estimate the amount of crosstalk correction required in agiven situation, we make use of a calibration method based on asimilar psychovisual experiment by Konrad et al. [4]. The latter ex-periment assumes crosstalk to be uniform across the screen. How-ever, as was shown by Woods and Tan [9], this is not the case andresults in poor crosstalk reduction towards the bottom of the screen.What is needed is a crosstalk model, combined with an easy touse calibration method, that can handle the non-uniform nature ofcrosstalk.

3.1.1 Non-uniform Crosstalk Model

We first describe the non-uniform nature of crosstalk using a sim-ple function of screen height y. This function need not be exact;it only serves as an indication of the shape of the non-uniformityfor crosstalk reduction purposes. We separate the crosstalk intocrosstalk due to leakage of the LCS glasses, and crosstalk due tophosphor afterglow. The former can almost be approximated as be-ing uniform over screen height, while the latter exhibits a strongnon-linear shape [9]. To approximate this shape, we modeled thetypical decay of CRT phosphors using a power-law decay func-tion [2]

φ(t, i,y) = i∗ (t − y+1)−γ

where t ∈ [0,∞) stands for the time after the first vertical blank(VBL) signal, i is the intensity of the phosphor in the next videoframe (at t = 1), y ∈ [0,1] is the screen height, and γ is a constantindicating the speed of decay. As the amount of crosstalk is constantover a specific scanline, φ(t, i,y) disregards the pixel column x and

Page 3: 3D Crosstalk Reduction

Left Right

Left Iintended Iad just

Right Iunintended I0

Left Right

Left Idesired Iad just

Right I0 Iunintended

Table 1: (Left) Calibration setup for a uniform crosstalk model. Thecolumns show the two halves of the screen, while the rows showthe sequentially displayed left and right eye frames. By alteringIad just the amount of crosstalk between Iintended and Iunintended canbe determined. (Right) An alternate calibration setup for our non-uniform model. By altering Iad just we find the intensity to display, givenIunintended , such that the user perceives Idesired

only depends on y. It is assumed that the first frame is displayed fort ∈ [0,1), the second for t ∈ [1,2), etc.

To estimate the perceived intensity of the phosphor in the secondframe, we integrate φ(t, i,y) over t:

c(i,y) =∫ 2.0

1.05φ(t, i,y)dt = i ·

(

1

γ(2.05− y)γ−

1

γ(3− y)γ

)

We integrate over t ∈ [1.05,2] because the electron beam spends asmall amount of time in the vertical retrace (VRT) state, at whichtime the LCS glasses are still opaque. We have also experimentedwith exponential decay functions, but found that a power-law decayfunction better approximates reality. Still, c(i,y) is only an approx-imation of shape, and does not model the physical reality exactly.

Suppose that we fix y to the screen height of a given pixel wewish to correct. We can now vary two parameters, γ , and i to es-timate the amount of crosstalk. Also, a constant term needs to beadded, which is a combination of a constant amount of crosstalkfor the phosphor decay and for the LCS glasses. This leads to thefollowing calibration function

ct(σ ,ε,γ) = σ + ε ·

(

1

γ(2.05− y)γ−

1

γ(3− y)γ

)

Note, when γ is fixed, the rightmost term can be precalculated asit only depends on y. The value for σ describes the amount ofcrosstalk at the top of the screen, ε at the bottom, and γ is a param-eter indicating the curvature of the function. The three parameterscan be set to closely match the shape of the crosstalk curves foundby Woods and Tan [9].

3.1.2 Uniform Crosstalk Calibration

To calibrate the display we use a procedure similar to the one usedby Konrad et al. [4]. We shall first describe Konrad’s method inmore detail, after which we describe the changes required for ournon-uniform model.

First the screen is divided into a left and a right half. At eachscreen half a certain constant intensity is displayed for the right andleft eye in stereo mode, as shown in Table 1. As stated earlier, onlythe center of the screen is used here, and crosstalk is assumed to beuniform. When the screen is viewed through LCS glasses where theleft eye is shuttering normally and the right eye is always opaque,the following can be seen on the screen:

Le f t : Iintended +θ(Iunintended , Iintended)

Right : Iad just +θ(I0, Iad just) = Iad just

where θ(u, i) is a function describing the amount of crosstalk be-tween an intended intensity i and an unintended intensity u in theother eye. When u = 0 it is assumed there is no crosstalk, soθ(0, i) = i. When the users matches the two screen halves in in-tensity by adjusting Iad just , this results in the equality: Iintended +

θ(Iunintended , Iintended) = Iad just . Hence, the amount of crosstalkbetween an intended and unintended intensity can be specified as:

θ(Iunintended , Iintended) = Iad just − Iintended

The procedure is repeated for several values of Iintended andIunintended , and for each red, green and blue color channel. Thisresults in three look-up tables, one for each channel, that map fromR

2 → R. Next, these tables are linearly interpolated and invertedby an iterative procedure to be able to correct for crosstalk. Thisresults in a function that maps a given unintended and desired per-ceived intensity to the intensity to be displayed:

θ−1(Iunintended , Idesired) = Idisplayed

3.1.3 Non-uniform Crosstalk Calibration

In this section we shall describe the changes required to calibratethe non-uniform crosstalk model. The parameter space of our non-uniform model is multi-dimensional, as opposed to one dimen-sional. Our calibration procedure will return function parametersσ ,ε,γ for a function depending on the screen height y of the pixel.Therefore, interpolating and inverting the resulting calibration ta-bles is not a straightforward task. To avoid this problem, we changethe calibration setup slightly as shown in Table 1. Instead of de-termining the amount of crosstalk an unintended intensity produceson an intended intensity, we directly estimate the intensity to be dis-played, given a desired intensity and an unintended intensity. Thisis equivalent to calibrating for θ−1.

To match both screen halves in intensity, the user first adjusts σin order to match the top part of the screen. Then, ε is adjusted tomatch the bottom part of the screen. Finally, γ is adjusted for theproper curvature. The user adjustable intensity is set to:

Iad just = 1−min(1,ct(σ ,ε,γ))

This is an approximation of the inverse of ct(σ ,ε,γ) up to a con-

stant. Hence, when displaying ct−1, the crosstalk is canceled out.The approximation results in slightly too little crosstalk correctiontowards the bottom of the screen; for values of y close to 1. How-ever, as was shown by Woods and Tan [9], the LCS glasses produceslightly less crosstalk at the bottom of the screen. We previouslyestimated this LCS leakage to be constant, so the small error wemake by estimating the inversion compensates for this assumption.

When the two halves match, the following can be seen:

Le f t : Idesired +θ(I0, Idesired) = Idesired

Right : Iad just +θ(Iunintended , Iad just) =

1−min(1,ct(σ ,ε,γ))+θ(Iunintended , Iad just)

If the amount of crosstalk has been modeled correctly, that is θ ≈ ct,the values of Iad just and θ(Iunintended , Iad just) will compensate eachother up to the constant Idesired . In other words, given an undesiredintensity, we know what intensity to display for the user to per-ceive the desired intensity. Note, Iad just occurs both by itself andin θ(Iunintended , Iad just), therefore we interactively solve a complexequality. Also, all equations depend on the height y of the currentpixel.

User guided calibration of three correction parameters, on a two-dimensional grid of intended and unintended parameters, for eachcolor channel, is a tedious and time-consuming process. After someinitial experimentation, we found that the value for γ could be fixedat 6.5 in all cases on our hardware. We expect this to be true formost hardware, as γ depends on the type of phosphors used, andCRT monitors are known to use very similar phosphors [9]. Thisreduces the parameter space to only σ and ε . To interpolate thecalibration grid for uncalibrated values of Idesired and Iunintended weused linear interpolation on the remaining function parameters σand ε .

Page 4: 3D Crosstalk Reduction

Figure 1: A simple crosstalk reduction pipeline for static scenes. Leftand right scene textures are taken as input and are processed onthe GPU using the calibration table textures. The output is writtendirectly to the left and right back buffers using MRT-2.

3.2 Application of Temporal Crosstalk Reduction

In this section we describe the implementation of the temporalcrosstalk reduction algorithm on the GPU. First, we will describethe simple case of crosstalk reduction, assuming a static scene.Then the required changes for dynamic scenes are discussed.

3.2.1 Crosstalk Reduction for Static Scenes

The algorithm starts out by rendering the stereo scene for the leftand right eye to two separate textures. Next, crosstalk reductionis performed, and finally the resulting left and right textures aredisplayed as screen sized quads. The only requirements are mod-ern video hardware, capable of running fragment programs, andan existing pipeline which is able to render to textures using framebuffer objects (FBOs). An overview of the static crosstalk reductionpipeline is given in Figure 1.

Once the scene has been rendered to the left and right floatingpoint textures, they are bound as input to the reduction algorithm.Note that texture targets are bound to video hardware buffers, there-fore everything runs entirely on the GPU; no texture downloads bythe CPU are required. The calibration grid is interpolated and storedin three separate 256x256 input textures for each of the red, greenand blue calibration colors. This results in three calibration tex-tures, TRed , TGreen and TBlue. The textures have a 32 bit floatingpoint, 4-channel RGBA format.

The calibration textures map the pair of pixel intensities(Iunintended , Idesired) to the two parameters σ and ε of the calibra-tion function (see Section 3.1). As this only requires two of thefour available texture channels, we also store the inverted pair ofintensities as an optimization. For c ∈ {Red,Green,Blue}, the cal-ibration textures have the following layout:

Tc(u,v) → (σ cu,v,ε

cu,v,σ

cv,u,ε

cv,u)

The left and right scene textures, as well as the three calibrationtextures, are then passed to a hardware fragment program by draw-ing a screen sized quad. In this way, screen pixels can be processedin parallel on the GPU. When drawing a pixel intended for the leftapplication frame, Idesired is fetched from the left scene texture andIunintended from the right scene texture, and vice versa. Next, thecorresponding values of σ c and εc are fetched from the calibrationtextures for all three color channels.

From the fetched calibration parameters, and the assumed con-stant γ = 6.5, a correction function depending on the screen heighty of the pixel can be constructed. We perform crosstalk reductionby setting:

Icdisplayed = 1−min(1,σ c

u,v + εcu,v · f (y))

where c ∈ {Red,Green,Blue} as before, u = Icunintended , v = Ic

desired

and

f (y) =

(

1

γ(2.05− y)γ−

1

γ(3− y)γ

)

Note, as f (y) does not depend on either σ or ε , it needs to be cal-culated only once for a fixed value of y. Also, all of the abovecalculations can be done in the fragment program.

Instead of rendering the crosstalk corrected left and right appli-cation frames separately in two passes, they can be rendered in asingle pass using hardware Multiple Render Targets (MRT). Thisway, the left and right back buffers are bound simultaneously forwriting.

In the dual pass case, correcting a single pixel for either the leftor right application frame takes five texture lookups: two to fetchthe desired and unintended pixel intensities from the left and rightscene textures, and three to fetch the corresponding calibration pa-rameters. This results in a total of ten texture lookups when render-ing the left and right application frames.

However, when using MRT both sets of calibration parameterscan be fetched simultaneously due to the structure of the calibra-tion textures. Only five texture lookups are required in this case:two to fetch the pixel intensities from the left and right scene tex-tures, and three to fetch the calibration parameters for (Ic

le f t , Icright)

and (Icright , I

cle f t) simultaneously from Tc. Also, both corrected pix-

els can be written to their corresponding left or right back bufferimmediately.

It may appear that we make an error because we do not compen-sate the crosstalk reduction for the crosstalk reduction itself in theprevious frame; i.e. it seems we should compensate the left framen+1 with respect to the corrected right frame n and not to the rightframe n + 1. However, for static scenes this is not the case, as ourcalibration procedure already determines the proper intensity to bedisplayed given a desired perceived intensity. As the calibrationprocedure performs a similar correction itself, the correct parame-ters are found implicitly.

An inherent problem with subtractive crosstalk correction meth-ods is that one cannot compensate the crosstalk when the desiredintensity is too low compared to the unintended intensity. Whenthe crosstalk estimation for a color channel results in a value higherthan the desired intensity for that channel, the best one can do isto set the channel to zero. In order to avoid this problem, thedesired image intensity can be artificially increased. One wayof doing this is by the following linear mapping for α ∈ [0,1]:Idesired = α + Idesired(1−α). We optionally allow this kind of in-tensity increase in our implementation. However, a big drawback ofincreasing intensity this way is that it significantly reduces contrast.

3.2.2 Crosstalk Reduction for Dynamic Scenes

So far we have assumed static scenes in the application of crosstalkreduction. However, in virtual environments this is not the case. Inan interactive application, subsequent application frames are usu-ally slightly different. For frame sequential, active stereo the appli-cation frames are displayed sequentially as La

n,Ran,L

an+1,R

an+1, ...,

where La and Ra stand for the left and right application frames re-spectively. The problem lies in the fact that Ra

n and Ran+1 are slightly

different due to animation. The static crosstalk reduction algorithmcombines La

n+1 with Ran+1 to produce a corrected left frame, while

it should combine Lan+1 with Ra

n. This results in incorrect crosstalkreduction at the differing regions of Ra

n and Ran+1.

The problem is complicated by the fact that rendering an applica-tion frame usually takes much longer than the display of an individ-ual video frame. At a monitor refresh rate of 100 Hz in active stereomode, ie. 50 Hz per eye, application frames need to be rendered inunder approximately 5 ms per eye. Typical applications can notmaintain this frame rate, and thus per application frame, many se-quential video frames are displayed. The situation is sketched in

Page 5: 3D Crosstalk Reduction

Figure 2: Time sequence diagram of video and application frames in dynamic scenes. A refresh rate of 100Hz is assumed here. During the timeit takes to render the left and right application frame N, eight left and right video frames are displayed. At the first buffer swap, the last video rightframe causes crosstalk on the first video left frame after the bufferswap. These two video frames belong to different application frames N-1 andN. The curved arrows on top indicate the pairs of video frames our algorithm corrects between. The straight arrows at the bottom show whichframes are rendered when. At the second buffer swap the right and left video frames belong to the same application frame N. This figure showsthe unoptimized case, where all four video frames are corrected to the previous corrected video frame.

Figure 2. As crosstalk only exists between video frames, the slowrendering of application frames needs to be taken into account aswell.

When application frames can be rendered at the same speed asvideo frames, the crosstalk reduction algorithm can be easily modi-fied to take dynamic scenes into account. In addition to the left andright scene textures SL

n+1 and SRn+1 for application frame n + 1, the

corrected right output CRn for application frame n is used as input as

well. Now SLn+1 is corrected with respect to CR

n , resulting in CLn+1.

Next, SRn+1 is corrected with respect to CL

n+1, resulting in CRn+1. The

procedure can now be repeated for the next application frame n+2.When application frames can not be rendered as fast as video

frames, the procedure is no longer correct. Suppose that during thetime to render a left and right application frame, four left and fourright video frames are displayed. In this case, the above algorithmcorrects SL

n with respect to CRn−1. However, this is only correct for

the first time this video frame is displayed. The same left applica-tion frame will be displayed three more times, and those times itshould have been corrected with respect to CR

n .When rendering an application frame takes much longer than

the display of a video frame, we can not properly correct everyvideo frame on a single videocard. This is due to the fact that videohardware executes rendering commands in a sequential manner. Atsome point the application frame’s drawing commands need to beexecuted by the hardware, at which time there is no way to cor-rect the video frames displayed during this period. An alternativesolution would be the use of two videocards: one for renderingapplication frames, and one for correcting every video frame. As-suming that the transmission and correction of frames can be donefast enough, this will allow the crosstalk to be reduced correctly forevery video frame.

Our implementation partially addresses this problem by draw-ing ahead two pairs of left and right images for the same appli-cation frame, which are displayed as subsequent video frames. Inthis way we correct between new video frames, and the previouslycorrected video frame: SL

n+1 −CRn , SR

n+1 −CLn+1, SL

n+1 −CRn+1 and

SRn+1 −CL

n+1. However, as more than four video frames are usuallydisplayed per application frame, the last two corrected video frameswill be repeated multiple times. Also, for example, CR

n+1 is not very

different from SRn+1. Therefore, we optimize this procedure and use

the following correction sequence: CLn = SL

n −CRn−1, CR

n = SRn −SL

n ,

CLn+1 = SL

n −SRn , and CR

n+1 = SRn −SL

n

This allows us to use the optimization to correct SLn and SR

n bothways using five texture lookups as discussed in the previous section.Also, CR

n+1 can be set to this same result, after which it is used asinput to the next application frame. The four corrected frames canbe drawn to four textures in a single pass using MRT-4, after whichthe first two textures can be displayed, followed by the last twotextures after a buffer swap. This is shown in Figure 3.

Figure 3: Crosstalk reduction pipeline for dynamic scenes. Againthe left and right scene textures are taken as input, together with theoutput of the previous run. Crosstalk reduction is then performed onthe GPU, and the output is written to four textures using MRT-4. Thefirst pair is then displayed, followed by a buffer swap, after which thesecond pair is displayed. The display sequence conforms to Figure 2.This figure shows the unoptimized case. In the optimized case thecorrection methods directly use the scene textures, instead of theoutput of the previous correction.

3.3 Evaluation Method

To evaluate the quality of different crosstalk reduction algorithms,we developed a way to quantitatively compare the amount ofcrosstalk reduction. One of the problems is the fact that crosstalkis only visible to an external viewer and can not be detected in therendering system itself. Therefore, some way of registering the realamount of generated crosstalk is required for both corrected anduncorrected scenes.

Another important issue is that we only wish to measure theamount of crosstalk that is perceptually disturbing. Crosstalk in-creases the overall intensity of the display in a non-uniform way;however, when the intensity increase is gradual this is not percep-tually disturbing or even discernable. Two images can differ a lotin their pixel values without showing any perceptually discernabledifferences. Therefore, statistical methods of comparison, such asthe root mean squared error (RMSE), do not provide us with anyuseable information. This discrepancy of statistically based com-parison methods was noted before by various authors (an overviewis given by McNamara [6]).

To externally register the amount of crosstalk in a given scene,we take digital photographs of the monitor screen through the LCSglasses. The digital camera is placed in front of the left eye of theLCS glasses, which are shuttering in the normal frame synchro-

Page 6: 3D Crosstalk Reduction

Figure 4: A photograph of the reference frame where no crosstalk ispresent. The photo was taken through the left eye of activated LCSglasses.

nized manner. This setup is similar to the one used by Woods andTan [9].

Using this setup, we take a photograph of a reference applicationframe Pre f , where the left scene is displayed in the left eye and theright eye display is kept blank. In this way no crosstalk is presentin the reference frame. A sample reference photograph is shown inFigure 4. Next, we take a photograph Pct of the same applicationframe, where the left and right scenes are displayed as normal in theleft and right eyes. This photo includes crosstalk for the left scene,originating from the right scene. Finally, we take another photoPcor in the same manner, but this time with the crosstalk reductionalgorithm enabled. All photographs are taken with identical camerasettings and a relatively slow shutter speed.

The crosstalk in Pct and Pcor can be isolated by comparing thesephotographs to Pre f , which is clear of crosstalk. When no crosstalkis present, the left eye image should be equal to Pre f . To evalu-ate the perceptually disturbing differences between (Pre f ,Pct) and(Pre f ,Pcor) we make use of the Visible Differences Predictor (VDP)by Daly [1].

To estimate the perceptual differences between two images, theVDP first applies a non-linear response function to each imageto estimate the relation between brightness sensation and lumi-nance. Next, the image is converted into the frequency domainand weighted by the human contrast sensitivity function (CSF), re-sulting in local contrast information. Additional sub-band process-ing based on the human visual system (HVS), in combination withmasking functions, provides scaled contrast differences between thetwo images. Finally, these contrast differences are used as input toa psychometric function to determine the probability of perceptualdifference. More specific implementational details of the methodare described by Daly [1].

The output of the VDP algorithm is a probability map that indi-cates for each pixel the probability a human viewer will perceivea difference between the corresponding two pixels in the input im-ages. In order to quantify the results, we determine the percentageof pixels that are different with a probability of over 95%. Thethreshold was liberally chosen because the typical VDP output perpixel was either very high (>95%), or very low (<5%), with hardlyany values in between. This provides us with a measure of theamount of perceptually disturbing crosstalk.

Crosstalk increases the intensity of the entire display area, aseven the background causes crosstalk on itself. However, thesegradual differences in intensity over larger areas do not cause per-ceptually disturbing artifacts. For the crosstalk photos compared tothe reference photos, the RMSE per scanline at the bottom of thescreen is very large as the crosstalk photo is much brighter there.However, the VDP output shows this does not cause a disturbing or

even perceivable difference. In this way, by using a human percep-tion based comparison, we only quantify the truly disturbing visibledifferences between the crosstalk and reference images.

4 RESULTS

In this section we describe experimental results, and compare ourcrosstalk reduction algorithm to the case where no crosstalk reduc-tion is performed, and to Konrad’s uniform reduction model [4]. Toevaluate the amount of disturbing crosstalk we use the evaluationmethod described in Section 3.3.

NuVision LCS glasses were fixed in front of a Canon A520 digi-tal camera to take photographs of an Iiyama Vision Master Pro 51222 inch CRT screen. The monitor displayed a frame by frame ani-mation of the optimization procedure for globally optimal Feketepoint configurations [8]. Each application frame was displayedin reference mode (Pre f ), with normal crosstalk (Pct ), with non-uniform crosstalk reduction enabled (Pcor), and with Konrad’s uni-form crosstalk reduction (Puni) in sequence. This allowed us totake photographs of all application frames, for each of the displaymodes. The setup was previously described in Section 3.3.

In this manner we took photographs of 800 animation frames,resulting in a total of 3200 digital color photos of 2048x1536 reso-lution. The photos were taken with the following fixed camera set-tings: shutter speed 0.5s, aperture F2.8, fixed manual focus, fixedwhite balance, ISO 50, with the highest quality settings. The cam-era was operated automatically over USB in a darkened room, en-suring frame synchronization and unchanged conditions.

For each animation frame, we compared each of the photos Pcor ,Pct and Puni to the corresponding reference frame Pre f using thepreviously described evaluation method. Figure 5 shows the threephotos and the output of the VDP comparison with the referencephoto for animation frame 570. The reference photo for frame 570was shown earlier in Figure 4. The percentage of perceptually dif-ferent pixels for each frame and correction method has been plottedin Figure 6.

From Figure 6 it can be seen that both uniform and non-uniform crosstalk reduction provide significantly better results thanno crosstalk reduction. Also, non-uniform reduction is significantlybetter than uniform reduction; especially when the animation pro-gresses. This can be explained by the fact that when the animationhas been running for some time, more geometry becomes visibleat the bottom of the screen. The parameters of the uniform reduc-tion model have been calibrated only for a specific location on thedisplay, in this case the center. The amount of crosstalk becomeslarger at the bottom of the screen and the uniform model does notcorrect this. However, the non-uniform model takes this into ac-count and corrects for crosstalk everywhere. This can also be seenin the photos of frame 570 in Figure 5.

When the desired pixel intensity to be displayed is very low, andthe corresponding unintended intensity due to crosstalk is fairlyhigh, no adequate crosstalk reduction can be performed. We dis-tinguish between two types of crosstalk: object-to-background andobject-to-object. Object-to-background crosstalk is caused by a ge-ometric object and is visible on the background. This is generallythe most perceptually disturbing kind of crosstalk. Object-to-objectcrosstalk is the crosstalk that is visible on the surface of an object,often caused by the object itself.

In most cases object-to-background crosstalk can be corrected,as the animation background intensity of 0.6 for all three colorchannels is fairly high. However, object-to-object crosstalk couldnot be corrected in all cases. This is due to the fact that the objectoften does not have a high enough desired intensity in a specificcolor channel to reduce the crosstalk caused in that channel, forexample when a blue object region causes blue crosstalk on a redobject region. In this case the desired intensity of the blue channelis zero, while the unintended intensity is not.

Page 7: 3D Crosstalk Reduction

Figure 5: Photographs and evaluation for frame 570 of the animation. The top row shows the photos as captured by the camera. The bottom rowshows the corresponding perceptual differences with the reference photo in red. From left to right the images represent: no crosstalk reduction,uniform crosstalk reduction, and non-uniform crosstalk reduction. Similar MPEG videos of the entire animation sequence are available.

0

1

2

3

4

5

6

7

8

9

10

1 44 87 130 173 216 259 302 345 388 431 474 517 560 603 646 689 732 775

% P

erc

ep

tua

lly d

iffe

ren

t

Normal crosstalk Uniform crosstalk reduction Non-uniform crosstalk reduction

Figure 6: On the Y-axis the plot shows the percentage of perceptually different pixels compared to the corresponding reference photo. Theanimation frame number is shown on the X-axis.

0

5

10

15

20

25

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Background intensity

% P

erc

ep

tua

lly d

iffe

ren

t

Normal crosstalk Crosstalk reduction

Figure 7: Percentage of perceptually disturbing crosstalk with varyingbackground intensity, with and without crosstalk reduction.

Therefore, a second experiment consisted of determining theinfluence of the background intensity on the ability to performcrosstalk reduction. This time we took a sequence of photographsfor frame 570 with varying background intensity. For each back-ground intensity we took a reference photo, a photo with normalcrosstalk, and with non-uniform crosstalk reduction enabled. Theperceptual difference evaluation was done as before and is shownin Figure 7.

For very low background intensities, almost no correction is pos-sible and crosstalk reduction does not provide a much better result.However, when the background intensity reaches 0.5, almost allobject-to-background crosstalk can be corrected. It can also be seenthat higher background intensities reduce crosstalk in general. Thisis due to the fact that the human visual system is less sensitive tointensity differences when intensities are large, and the phosphorsreach a peak intensity value.

Finally, we experimentally evaluated the effect of different back-ground/foreground ratios, which is defined as the number of back-ground pixels divided by the number of foreground, or object pix-

Page 8: 3D Crosstalk Reduction

0

1

2

3

4

5

6

7

8

9

0.3 0.5 0.7 0.9 1.1 1.3 1.5 1.7 1.9 2.1 2.3 2.5 2.7 2.9 3.1 3.3 3.5 3.7 3.9Geometry scale factor

% P

erc

ep

tua

lly d

iffe

ren

tNormal crosstalk Crosstalk reduction

Figure 8: Percentage of perceptually disturbing crosstalk with vary-ing background/foreground ratio, with and without crosstalk reduc-tion. The x-axis shows an increasing geometry scale factor, whichcauses a decreasing background/foreground ratio.

els. Using a constant background intensity of 0.6, we varied the sizeof the geometry in frame 570. For larger geometry sizes the back-ground/foregrond ratio becomes smaller, resulting in more object-to-object and less object-to-background crosstalk. Again, refer-ence, crosstalk, and non-uniform crosstalk reduction photos weretaken. The results are shown in Figure 8.

When the background/foreground ratio becomes smaller, morecases of impossible to correct object-to-object crosstalk occur. Thisexplains why the non-uniform reduction algorithm performs worsewith increasing geometry size. However, as it performs the bestpossible approximate correction, it still produces better quality thanno correction. In the case where no correction is performed, thebackground/foreground ratio has less effect. Increasing the geom-etry size simply has the effect of replacing object-to-backgroundcrosstalk with object-to-object crosstalk, both of which are not cor-rected anyway. Finally, note that when the geometry becomes verysmall it becomes more difficult to perceive the crosstalk regions, ascan be seen for the non-corrected curve. The reduction algorithmis not affected by this behavior, as it is capable of correcting thecrosstalk regardless.

5 DISCUSSION

Our crosstalk reduction method works well in many cases. How-ever, in the case of heavy object-to-object crosstalk the quality ofreduction is suboptimal. This is due to the fact that the amount ofcrosstalk reduction possible depends on the desired display inten-sity, i.e. when for any color channel Iunintended � Idesired , no ad-equate reduction can be performed. For object-to-object crosstalkthis is often the case.

Unfortunately we could not quantitatively examine the effect ofthe temporal reduction model, but one can imagine the effect. With-out the temporal model there are left video frames immediately af-ter a buffer swap for which the crosstalk reduction is incorrect. Infact, two types of errors are introduced:

• The left video frames are not corrected for the preceding rightapplication frames. This leaves some amount of uncorrectedcrosstalk, resulting in areas that are too bright.

• Instead, these left video frames are corrected for the upcom-ing right application frame. This crosstalk reduction on areaswhere it is not required introduces incorrectly darkened areas.

Manual inspection showed that this is particulary relevant for mov-ing objects, where the edges are slightly different from frame toframe. These are exactly the regions where static crosstalk reduc-tion methods error in their assumption of static scenes. The effectis not noticeable every frame, as application frames are drawn at aslower rate than video frames. Thus, the effect is noticeable only

when application frames are changed. This typically happens at arate of approximately 15-20 Hz and would explain the visual jitter,which is completely removed with the temporal model. We did notrun a user study on this effect, but several people using our setupconfirmed this observation.

Monitor-based passive stereo systems make use of a frame syn-chronized polarizing screen in front of the CRT display and glassescontaining different polarization filters for each eye. The nature ofcrosstalk is very similar to active stereo in this case, however thelayered construction of the polarizing screen introduces a view de-pendent amount of non-uniform crosstalk. This kind of crosstalkcan only be compensated for by using accurate headtracking to de-termine the amount of correction required.

Currently the calibration of the non-uniform model is done in-teractively by user inspection. The procedure is tedious and errorprone. However, in combination with our quantitative evaluationmethod it would be possible to automate this task by using a com-puter controlled camera. Automatic calibration would be especiallydesirable in virtual environments with large screens. It is a knownfact that the amount of crosstalk is also dependent on the angle ofview through the LCS glasses [9]. An automated calibration proce-dure could calibrate for many different angles of view, making useof headtracking information.

6 CONCLUSIONS

We have proposed a new non-uniform crosstalk model. To deter-mine the parameters for this model, we used an interactive calibra-tion procedure. Also, to address the problems caused by interac-tive applications in virtual environments, we introduced a tempo-ral crosstalk reduction model for dynamic scenes. This way, high-frequency jitter caused by the erroneous assumption of static scenescould be eliminated. The algorithm was implemented on the GPUand runs in real-time. Finally, we proposed a quantitative evaluationmethodology to assess the quality of different crosstalk reductionalgorithms.

We compared our non-uniform reduction method to no crosstalkreduction, and to a uniform crosstalk reduction method. This wasdone by finding perceptual differences between photographs of ananimation. It was shown that our method provides better qualitycrosstalk reduction. Finally, the effects of background intensity andbackground/foreground ratio were examined.

REFERENCES

[1] S. Daly. The visible differences predictor: an algorithm for the assess-

ment of image fidelity, 1993.

[2] Electronic Industries Alliance Tube Electron Panel Advisory Coun-

cil (EIA-TEPAC). Optical characteristics of cathode-ray tube screens,

TEP116-C, 1993.

[3] S. Klimenko, P. Frolov, L. Nikitina, and I. Nikitin. Crosstalk reduction

in passive stereo-projection systems. Eurographics, 2003.

[4] J. Konrad, B. Lacotte, and E. Dubois. Cancellation of image crosstalk

in time-sequential displays of stereoscopic video. In IEEE Transactions

on Image Processing, Vol. 9 No. 5, pages 897–908, 2000.

[5] J. S. Lipscomb and W. L. Wooten. Reducing crosstalk between stereo-

scopic views. In Proc. SPIE Vol. 2177, pages 92–96, 1994.

[6] A. McNamara. Star: Visual perception in realistic image synthesis. In

Eurographics 2000, August 2000.

[7] J. D. Mulder and R. van Liere. The Personal Space Station: Bringing

interaction within reach. In VRIC, pages 73–81, 2002.

[8] R. van Liere, J. Mulder, J. Frank, and J. de Swart. Virtual fekete point

configurations: A case study in perturbing complex systems. In VR ’00:

Proceedings of the IEEE VR 2000, page 189, 2000.

[9] A. J. Woods and S. S. Tan. Characteristic sources of ghosting in time-

sequential stereoscopic video displays. In Proc. SPIE Vol. 4660, pages

66–77, 2002.