Fast and Robust High Dynamic Range Image Generation with ...cg/Veroeffentlichungen/... · HDR generation that removes these artifacts with-out prior knowledge of the camera response

Fast and Robust High Dynamic Range Image Generation with Camera andObject Movement

Thorsten Grosch

Institute for Computational VisualisticsUniversity of Koblenz-Landau

Universitatsstr. 1, 56070 Koblenz, GermanyEmail: [email protected]

Abstract

High dynamic range (HDR) images play an impor-tant role in todays computer graphics: Typical ap-plications are the improved display of a photographwith a tone mapper and the relighting of a virtualobject in a natural environment using an HDR en-vironment map. Generating an HDR image usuallyrequires a sequence of photographs with differentexposure times which are combined to a single im-age. In practice two problems occur when takingan image sequence: The camera is moving or ob-jects are moving during the sequence. This resultsin a blurry HDR image due to the camera move-ment and moving objects appear multiple times asghost images in the combined image. In this paper,we present a simple and fast algorithm for a robustHDR generation that removes these artifacts with-out prior knowledge of the camera response func-tion.

1 Introduction and Previous Work

Nowadays, high dynamic range images are an es-sential part in the field of computer graphics. Espe-cially for natural illumination, images are requiredthat contain a radiance value at each pixel. A pho-tograph taken with a usual camera can not capturethe whole dynamic range of the visible radiances.There are always saturated regions, underexposedregions, or both. Moreover, each digital camerauses an unknown non-linear scaling function forconverting the incoming sensor exposure to a pixelcolor. Determining this camera response functionwas first mentioned by Mann et al. [4]. Here, asimple gamma function is used to describe the ra-diometric response function of a camera. A more

general calculation was presented by Debevec andMalik [1]. They use a series of images with dif-ferent exposure times to combine the images to asingle HDR image and to recover the camera re-sponse function by solving a linear system of equa-tions. The camera curve is described as a table of256 exposure values for each possible grey value.An alternative algorithm was presented by Robert-son et al. [9] based on a probabilistic approach.Mitsunaga and Nayar [5] calculate the camera curveas a polynomial function with fixed degree. Theirmethod is able to determine the response functionwith only rough estimates of the exposure times.A robust method for determining the camera curvewas also presented by Grossberg and Nayar [2]. In-stead of comparing pixel values, they compare thebrightness histograms of the images to determinethe camera response function. By contrast, only afew hardware solutions for capturing HDR imagesexist, like the Spheron camera, HDRC (IMS Chips)or the digital micromirror array introduced by Na-yar et al. [6]. At present, the conventional way tocreate an HDR image is to take a sequence of im-ages with different exposure times and to combinethem to a single HDR image with one of the meth-ods described in [1][2][4][5][9].

Two problems occur when combining these im-ages:

1. If the images are taken with a hand-held cam-era, the combined HDR image will look blurry be-cause of small camera movements. This can mainlybe avoided by placing the camera on a tripod, buteven pressing the capture button or changing the ex-posure time can result in an offset of a few pixels.

2. Ghost images appear in the combined imagebecause of moving objects while capturing the im-age series. This is not a problem in indoor envi-ronments, but often unavoidable in outdoor environ-

ments. Typical examples are moving people, cloudsor trees waving in the wind.

One example with both camera and object mo-tion is shown in Figure 1, the combined image with-out alignment is shown in Figure 2.

Ward [10] presented a method to solve the firstproblem by assuming a pure translational cam-era movement. Each image of the sequence isconverted to a so-called median threshold bitmap(MTB). These binary images are almost identicalfor all exposure times and can be used for aligningthe images (Figure 3). The algorithm starts with alow resolution image to get an initial estimate forthe translation and refines the estimate for the highresolution images. The method is very fast becauseof many hardware optimizations.

The second problem is mentioned by Kang et al.[3] for creating an HDR video. Here, a video se-quence is recorded with two alternating exposuretimes. Neighbouring images in a video sequencewith known exposure time are first globally regis-tered and then locally registered. First, a global reg-istration computes an image warp which aligns thetwo images optimally. Then, the local registrationis computed using a hierarchical homography to de-termine the optical flow of small pixel movements.The authors show that their method is also appli-cable to still images taken with a digital camerawith convincing results. However, their algorithmrequires a precomputed camera response functionand it is quite time consuming. They report thatgenerating an HDR image from a sequence of fiveimages takes about 30 seconds. Since we do not ex-pect that the response function is known for all pos-sible camera adjustments, we decided to develop analgorithm for robust and fast HDR generation with-out prior knowledge of the response function. Ouralgorithm can be summarized as follows:

1. Calculate Median Threshold Bitmaps2. Translational Alignment3. Translational and Rotational Alignment4. Calculate Camera Response Function5. Calculate Error Map6. Combine HDR Image7. Manual Corrections

Each step will be described in the following sec-tions: Section 2 explains the alignment of the im-ages, in section 3 we describe our method for re-moval of ghost images. The results are presented insection 4. An intuitive tool for manual correction

of the remaining artifacts is presented in section 5before we conclude in section 6.

Figure 1: Image sequence taken with a hand-heldcamera

Figure 2: Combined HDR image without alignment

Figure 3: Median threshold bitmaps for two imagesof the car sequence. The bitmaps are almost identi-cal except of the moving objects.

2 Image Registration

Since we have to solve two problems, we assumethat we have a static scene with a few moving ob-jects. We solve the correct alignment first and re-move the ghost images in the second step.

2.1 Basic Ideas

Our alignment method is based on the algorithmpresented by Ward [10], but we include an imagerotation. During our test series we found that a puretranslation is often not sufficient. Several image se-quences contain a small rotation, especially whenthe camera is held vertically. We use an optimiza-tion algorithm to calculate the best rotation anglefor aligning two binary threshold images. The start-ing value for the algorithm is the translation deter-mined with the method described in [10] and a ro-tation angle set to zero.

When XOR-combining both bitmaps, we countthe number of remaining white pixels which is usedas a function value for the optimization algorithm.The Downhill Simplex method [7] is used to findthe best transformation since it is known as a robustoptimization algorithm.

2.2 Graphics Hardware

We use the graphics hardware for a fast calculationof the XOR combination of two binary thresholdbitmaps. The basic idea is to draw both imagesas rectangular, screen-filling polygons with the me-dian threshold image as a texture. The first rectan-gle is kept fixed, while the second rectangle is trans-lated and rotated around the view axis, as shownin Figure 4. After a logical XOR combination, theresulting bitmap can be read from framebuffer tomain memory for counting the white pixels. Be-cause reading pixels back to main memory is quiteslow, we developed an optimized version, which isdescribed in the next section.

XOR

MTB 2

MTB 1

Count Pixels = 6

Figure 4: Image alignment: The first image is keptfixed while the second image is translated and ro-tated around the view axis.

2.3 Fast Summation of White Pixels

To avoid reading the pixels from framebuffer backto main memory, we do not use the logical opera-tions included in OpenGL. Instead, we use a specialfragment program for the XOR combination of bothbitmaps. Both median threshold bitmaps are repre-sented as textures, but we drawonly the transformedbitmap and calculate the superposed texture pixelsin the fragment program. As can be seen in Fig-ure 5, the texture pixel of the transformed bitmap isaccessed by the texture coordinates (s,t). The corre-sponding texture coordinates for the first bitmap aresimply the window coordinates (x,y) of the currentfragment.

s t

x

y

Figure 5: Texture coordinates (left) and intersectionregion (right) of both median threshold bitmaps.

Because current fragment programs do not sup-port logical operations, we discard the current frag-ment in case ofidentical texture pixels to simulatethe XOR combination. The exclusion bitmaps areused as textures, too. If at least one of them is zero,the fragment is discarded. The fragment program isshown in Figure 6.

To count the white pixels, we use anocclusionquery when drawing the transformed rectangle. Be-cause we discard the black pixels in our fragmentprogram, the occlusion query returns the number ofwhite pixels.

2.4 Intersection Region

When combining both images, care must be takento count only pixels in theintersection of both rect-angles, as can be seen in Figure 5. Because we drawonly the transformed rectangle, we restrict to theintersection region automatically. Regions outsidethe viewport are removed by clipping (dotted re-gions) and fragments outside the transformed rect-angle (striped regions) are never accessed.

void combineFP(out float4 color : COLOR,float4 tc : TEXCOORD0,float4 wpos : WPOS,

uniform samplerRECT mtbSampler1,uniform samplerRECT mtbSampler2,uniform samplerRECT ebSampler1,uniform samplerRECT ebSampler2)

{// discard identical pixels// to simulate XOR operation

if (texRECT(mtbSampler1, wpos.xy).r ==texRECT(mtbSampler2, tc.xy).r)discard;

// discard fragment if at least// one exclusion bitmap is zero

if (texRECT(ebSampler1, wpos.xy).r == 0)discard;

if (texRECT(ebSampler2, tc.xy).r == 0)discard;

// return white color

color = float4(1.0, 1.0, 1.0, 1.0);}

Figure 6: Fragment program: Both median thresh-old bitmaps and exclusion bitmaps are used as tex-tures. When drawing the transformed rectangle, thebitmaps are combined and the fragment is discardedin case of a black pixel.

2.5 Comparison

The correct alignment could be found in most ofour testcases even in case of object motion duringthe series. Figure 7 shows the XOR combination oftwo median threshold bitmaps before and after theoptimization, the aligned HDR image is shown inFigure 8.

Without a rotational alignment, several artifactsappear in the HDR image, as can be seen in Figure9. The translation and rotation values for this imageare listed in table 1.

3 Removing Ghost Images

After aligning the images, we are able to remove theghost images. To safely detect the affected pixels,we calculate the camera response function first.

Figure 7: XOR combination of the median thresh-old bitmaps. Left: Original images. Right: Alignedimages after 41 iterations of the Downhill Simplexalgorithm.

Figure 8: Aligned HDR image: The alignmentworks despite the moving objects which appear asghost images.

3.1 Camera Response Function

Most algorithms for calculation of the camera re-sponse function require static scenes and a directcorrespondence between the pixels. Because wehave a non-aligned sequence with object motion,we use the algorithm presented by Grossberg andNayar [2]. This algorithm calculates the responsefunction based on cumulative histograms and ismostly unaffected by camera or object motion. Inthe following, we denote the camera functionf asa conversion of sensor exposureX to pixel colorz:f(X) = z.

3.2 Predicted pixel colors

With a known camera response functionf , we canpredict the pixel color from one image to another.Suppose we have a pixel with colorzi,j at positioni, taken with exposure time∆tj . In a second image,taken with exposure time∆tk, the corresponding

Figure 9: Difference between pure translationalalignment (left) and alignment including a rotation(right). Without a small image rotation, the com-bined image contains a slight blur. Moreover, sev-eral parts in the image appear twice like the top ofthe towers shown in the insets.

pixel color iszi,k. The resulting exposure values atsensori are:

f−1(zi,j) = Xi,j = Ei · ∆tj (1)

f−1(zi,k) = Xi,k = Ei · ∆tk (2)

whereEi is the sensor irradiance.Rearranging these equations leads to apredicted

pixel color zi,k for the second image:

zi,k = f(∆tk

∆tj

f−1(zi,j)) (3)

For each pair of consecutive images, we test ifthe real colorzi,k in the second image is well ap-proximated with the predicted colorzi,k from thefirst image:

|zi,k − zi,k| < ǫ (4)

The user parameterǫ depends on the amount ofnoise in the camera images. A significant differ-ence between the two colors indicates object mo-tion. In this case, we mark the pixel as invalid ina specialerror map. Pixels set to invalid are notused when the HDR image is combined from theimage sequence. Here, we use the idea introduced

in [3] and fill the invalid pixels with a reference im-age. The difference is, that our reference image isselected by counting the number of underexposedand saturated pixelsonly in invalid regions markedin the error map. The image with the lowest numberis selected as the reference image. In this way weminimize the loss of dynamic range. The resultingerror map for the car sequence is shown in Figure10.

Because a direct insertion from one image in-creases the amount of noise, we test if some of theother images can be used as well. Therefore, we cal-culate the predicted color from the reference imageto all the other images (Equation 3) and calculatethe average of all pixels which fulfil Equation 4.

The color prediction only works if we are not in asaturated or underexposed region. In these regions,the pixel color is always black or white. We there-fore use two threshold values to exclude the blackand white pixels. Typically, we only check pre-dicted colors if at least one of the two pixel colorsis in the range [20..240].

Table 1: Alignment values for the image shownin Figure 9. The top rows shows the translationalalignment. The bottom row shows the alignmentwith a rotation. Note that the best translation valuesfor alignment are floating point numbers.

Alignment 1 - 2 2 - 3 3 - 4 4 -5tx -3 -7 -5 7ty 5 3 -4 7tx -3.0 -6.83 -5.06 7.09ty 5.0 3.35 -3.71 6.9α 0.0 -0.31 -0.17 0.09

4 Results

We tested our algorithm with 180 hand-held imagesequences. In 79 cases, the visual image qualitycould be improved with an additional image rota-tion. In the remaining series, the rotation angle wastoo small for a visible difference. Because we usethe translational alignment as an initial guess forour algorithm, we never decrease the quality of thealignment. In a few cases, loss of visual image qual-ity could be observed in certain regions when usinga rotation. Nevertheless, the overall appearance of

Figure 10: Error map for the car sequence. Allpixels that violate the predicted color condition aremarked in black. Note that all moving cars havebeen detected. Some pixels inside the cars are notmarked due to a similar background color.

the image could be improved. One such example isshown in the middle row of Figure 13 in the colorimage section.

We observed alignment problems with 13 series.These problem cases contain large object move-ments, smoke or fire with changing illuminationduring the series and images with noisy bitmaps.After manually adjusting the threshold from 50%to a different value, seven of these series couldbe aligned. Despite these problems, the alignmentbased on median threshold bitmaps proved to bevery robust. Several problematic sequences con-taining water (rivers, fountains), reflective or refrac-tive objects (light probes, glass sculptures) or wind(waving trees and flags) were aligned properly.

In our test series, the average translation betweentwo hand-held photographs is 8.19 pixels (for a res-olution of 1024 x 768 pixels), the average rotationangle is 0.16 degrees.

We compare our alignment with the alignmentused in Photoshop CS2. In the majority of cases,we observed a better image quality with our align-ment, as can be seen in the color images. On theother hand, Photoshop CS2 achieves better resultswith the few problematic image series.

The computation time for creating an HDR im-age from a sequence of five (1024 x 768) imagestogether with the camera response function is aboutfour seconds on a dual Athlon PC, running at 2.21GHz, equipped with a GeForce 7800 GTX graphicscard. For comparison, the software implementation

Figure 11: Aligned HDR image without ghost im-ages. Note that the cars are not completely removedbecause we replace defective regions from asingleimage.

of the optimization algorithm takes about 20 sec-onds for the same task.

5 Manual Adjustments

Most problems in the combined HDR image can besolved with the algorithm described in the preceed-ing sections. However, some problems still remain:

• Pale ghost images because of similar colors ofmoving objects and background

• Clipped colors because of over - or underex-posed regions in the reference image

• Small image alignment errors due to highlyvarying distances of the visible objects

Recently, Ward [8] proposed heuristics for ghost re-moval using all images, but this algorithm cannotguarantee the complete elimination of all ghost im-ages. Since we believe that there is no way to ensurea consistent HDR image automatically, we decidedto create a tool for interactive correction of the com-bined HDR image. Nowadays, some image manip-ulation programs support HDR images, like Photo-shop CS2 or IDRUNA Photogenics. However, wedeveloped our own tool because we can use the fol-lowing information collected during HDR combi-nation to aid the user with the correction:

• The aligned images• The error map• The camera response function

The main idea is to select a different reference im-age for a problem region and copy the informationmanually. Therefore, saturated or possibly damaged

regions can be visualized by showing the error im-age as red pixels. The main user interaction is theselection of one image from the sequence which issuitable for a defective region. The region is cor-rected with a specialclone brush which copies theinformation from the selected image to the HDRimage. The camera response function is used forconversion from the LDR image to the HDR image.Figure 12 shows a typical example: The left im-age is the combined HDR image with a few ghostimages. In the center image, possible problem re-gions are shown as red pixels. One of the originalimages is selected and the problem region is filledwith the information from the selected image witha few mouse strokes. If none of the images fits for areplacement, we use a standard clone brush to copyinformation inside the HDR image or simple paint-ing tools.

Figure 12: Left: Image region with a few ghost im-ages. In this case, the reference image was too darkfor a replacement. Center: Visualization of the errormap. Right: Image correction by painting informa-tion from a different reference image with a clonebrush.

6 Conclusion and Future Work

We presented an algorithm for robust high dynamicrange image generation that can handle both cam-era and object motion without a precomputed cam-era response function. We extended the translationalignment based on median threshold bitmaps withan additional rotation. Using an additional rotationimproved the image quality significantly in morethan 40% of our test series. By exploiting the fastgraphics hardware, we are able to align a sequenceof five images in about four seconds. Regions con-taining ghost images can be detected by testing apredicted color for each pixel. These regions arefilled with a reference image that minimizes the loss

of dynamic range. For the remaining problem re-gions, we developed an intuitive tool for manualcorrections.

As future work, we are searching for methods tofurther improve the HDR image quality. In compar-ison with the original LDR image, the HDR imageofter contains a slight blur. Moreover, images witha very short or long exposure time are often mis-aligned. In these cases, a different threshold valuehas to be used for the bitmaps instead of the me-dian. Currently, we select this value manually, anautomatic detection is still an open problem.

7 Acknowledgements

We would like to thank Markus Geimer, JensKrueger, Matthias Biedermann, Oliver Abert, To-bias Ritschel, Rodja Trappe and Angelika Braeunigde Colan for helpful suggestions and proofreadingthe paper.

References

[1] P. Debevec and J. Malik. Recovering HighDynamic Range Radiance Maps from pho-tographs. InProceedings SIGGRAPH 1997,pp. 369-378

[2] M. Grossberg and S. Nayar. What can beknown about the radiometric response fromimages? InProc. of European Conference onComputer Vision (ECCV) 2002, pp. 189-205,2002.

[3] S. Kang, M. Uyttendaele, S. Winder and R.Szeliski. High dynamic range video. InACMTrans. on Graphics 2003, pp. 319-325, 2003.

[4] S. Mann and R. Picard. Being ’undigital’ withdigital cameras: Extending dynamic rangeby combining differently exposed pictures. InProc. of IS&T’s 48th annual conference 1995,pp. 422-428, 1995.

[5] T. Mitsunaga and S. Nayar. Radiometric selfcalibration. InProc. of IEEE Conference onCVPR 1999, pp. 374-380, 1999.

[6] S.K. Nayar, V. Brazoi and T.E. Boult. Pro-grammable Imaging Using a Digital Mi-cromirror Array. InProc. of IEEE Conferenceon CVPR 2004, pp. 436-443, 2004.

[7] W.H. Press, S.A. Teukolsky, W.T. Vetterlingand B.P. Flannery.Numerical Recipes in C++.Cambridge University Press, 2002.

[8] E. Reinhard, G. Ward, S. Pattanaik and P. De-bevec.High Dynamic Range Imaging. MorganKaufmann, 2005.

[9] M.A. Robertson, S. Borman and R.L. Steven-son. Dynamic range improvement throughmultiple exposures. InProc. of InternationalConference on Image Processing 1999, pp.159-163, 1999

[10] G. Ward. Fast, robust image registration forcompositing high dynamic range photographsfrom hand-held exposures.Journal of Graph-ics Tools, Volume 8/2003, pp. 17-30.

Figure 13: Comparison between the different alignment techniques: Pure translational alignment (left), ourmethod with translation and rotation (center) and Photoshop CS2 alignment (right).

Figure 14: No alignment (left), alignment (center) and ghost removal(right).

Documents

Fast and Robust High Dynamic Range Image Generation with ...cg/Veroeffentlichungen/... · HDR generation that removes these artifacts with-out prior knowledge of the camera response