FAST NEAR INFRARED FUSION-BASED ADAPTIVE ENHANCEMENT …faculty.sist.shanghaitech.edu.cn/faculty/luoxl/class/2017Fall_EE251/... · image to construct the fused image. These fusion

FAST NEAR INFRARED FUSION-BASED ADAPTIVE ENHANCEMENT OF VISIBLEIMAGES

Ahmed Elliethy, Hussein A. Aly

Dept. of Computer Engineering, Military Technical College, Cairo, Egypt,[email protected], [email protected]

ABSTRACT

Visible (VS) and near infra-red (NIR) band sensors provide dig-

ital images that capture complementary spectral radiations from a

scene. Since NIR radiations propagate well through haze, mist, or

fog, the captured NIR image contains better scene details compared

to the VS image in such cases. However, NIR radiations are material

dependent and provide little information about color or texture of

the scene’s objects. To exploit the complementary details provided

by VS and NIR images, we propose a fusion approach that adap-

tively injects missing spatial details to the VS image from the NIR

image while preserving the spectral contents of the VS image. The

spatial details are adaptively weighted based on the relative differ-

ence between local contrasts of the NIR and the VS images. Thus,

the proposed approach prevents unnecessary modification of colors

or amplification of scene details that result in an unrealistic fused

image. Moreover, the proposed approach is non-iterative, fast with

a low complexity of O(n), and suitable to be implemented on em-

bedded cameras’ hardware. Experimental fusion results obtained on

natural NIR and VS image pairs show the effectiveness of the pro-

posed approach compared with two alternatives.

Index Terms— Image fusion, near infrared, image enhance-

ment

1. INTRODUCTION

Digital images represent captured spectral radiations from a scene

in the visible band (VS) with wavelengths λ ∈ [400, 700] nm in the

form of three color components (red, green, and blue). Challenging

imaging conditions such as haze, mist, fog, and overwhelming or

poor lighting conditions may degrade the quality of the captured VS

images. On the other hand, near-infrared (NIR) sensor sensitive to

wavelengths of λ ∈ [650, 1650] nm can capture complementary ra-

diations from the scene and provide a see-through mechanism under

the aforementioned challenging imaging conditions [1]. However,

the NIR radiation is material dependent [2, 3], and therefore some

details about objects made from the same material may be lost. Fig-

ure 1 shows examples of VS and NIR image pairs captured for the

same scene that demonstrate the complementary details provided by

VS and NIR bands.

Several VS-NIR fusion approaches that exploit the complemen-

tary details provided by VS and NIR images are proposed to tackle

different kinds of problems such as VS image enhancement [4–6]

and de-hazing [7, 8]. In [4], a fused image is obtained from the VS

image by replacing either a color plan or the luminance plane of the

VS image with the NIR one. In [5], a contrast-preserving mapping

model is proposed to alter the pixel values of the NIR image to match

their corresponding pixels in the luminance plane of the VS image

while preserving the local contrast of the NIR image. The altered

(a) (b)

Fig. 1. Samples VS and NIR image pairs from [4] captured for the

same scene that demonstrates the complementary details provided

by VS and NIR bands, where the VS and NIR images are shown in

the first and second rows, respectively. The image pair in column (a)

shows the effectiveness of the NIR in the presence of haze (at the

mountain area), while column (b) shows that the NIR image suffers

from some loss of details (at the water area) compared to the VS

image.

NIR image is used along with other color information from the VS

image to construct the fused image. These fusion approaches pro-

vide an effective enhancement to the VS image when the NIR image

contains more details compared to the VS image. However, when

the NIR image suffers from details loss in some areas as shown in

Fig. 1 (b), the fusion may deteriorate these areas and degrades the

VS image. To overcome this problem, the fusion approach proposed

in [6] uses an adaptive smoothness constraint based on gradient and

color correlations between the VS and NIR images. However, the

fusion approach is iterative and computationally intensive.

In this paper, we propose a fusion approach that adaptively in-

jects spatial details from the NIR to the VS image without alter-

ing the colors of the VS image. The proposed approach has three

stages. First, local contrast for both NIR and VS images are com-

puted. Then, the spatial details from the NIR are extracted using a

carefully designed high pass filter. Finally, the extracted spatial de-

tails are weighted according to the relative difference between the

computed local contrasts and injected into the VS image. Two key

advantages offered by the proposed approach compared to the afore-

mentioned prior approaches. First, the proposed approach incorpo-

rates only missing spatial details from the NIR to the VS image with-

156978-1-5090-5990-4/17/$31.00 ©2017 IEEE GlobalSIP 2017

exctraction

channel

Luminance

Local

contrast

estimation

(LC)

HPF

fusionestimationmap

FusionPer-channel

IRGB INIR

LC (Y )

LC(

INIR)

JRGB

F

Y

Fig. 2. Block diagram of the proposed approach. The local contrasts for the NIR image INIR and the luminance plane Y of the VS image

IRGB are first computed. Then, the spatial details from the NIR are extracted using a high pass filter. Finally, the spatial details are weighted

according to a fusion map F and injected into the VS image to obtain the enhanced image JRGB. Note that, we add a constant to the spatial

details for better visualization.

out introducing unnecessary modification of colors or amplification

of details that may result in an unrealistic fused image. Second, the

proposed approach is non-iterative and fast with a low complexity

of O(n). Therefore, it is suitable for implementation on embedded

cameras’ hardware.

The rest of the paper is organized as follows. In Sec. 2, we

detail the proposed adaptive VS-NIR fusion approach. In Sec. 3,

we present a visual comparison of the fusion results obtained by the

proposed approach and by the methods in [4] and [5]. Finally, the

paper is concluded in Sec. 4.

2. PROPOSED ADAPTIVE VISIBLE AND NIR FUSION

APPROACH

The proposed approach is designed based on the following proposi-

tions:

• The spatial details which are only apparent in the NIR image

INIR and lost in the VS image IRGB should be incorporated

into the fused image JRGB.

• The spectral contents (colors) of IRGB should be preserved

after fusion.

Based on these propositions, we designed the proposed approach as

shown in the block diagram in Fig. 2. The local contrasts for the NIR

image INIR and the luminance plane Y of the VS image IRGB are first

computed to estimate a fusion map F . Then, the spatial details from

the NIR are extracted using a high pass filter. Finally, the spatial

details are weighted according to the fusion map F and injected into

the VS image to obtain the enhanced image JRGB.

The fusion map F is the key that determines the regions that

suffer from missing spatial details in IRGB compared to INIR. To

estimate F , we first extract the luminance plane Y from IRGB, then

F is defined as the relative difference in local contrast between INIR

and Y . Specifically,

F (x) =max

(

0, LC(

INIR (x))

− LC(

Y RGB (x)))

LC (INIR (x)), (1)

where LC (I (x)) is the local contrast for the image I at the spatial

location x = [x, y]T . Inspired by [9], our local contrast is defined as

LC (I (x)) =α

(

maxx′∈N (x)

I(

x′)

− minx′∈N (x)

I(

x′)

)

+

(1− α)

(

maxx′∈N (x)

‖∇I (x)‖

)

, (2)

where N (x) is an S × S neighborhood around x, ∇I is the spatial

gradient of I , and α = 0.5 is a constant. Note that, F has large val-

ues for the regions that have better spatial details in INIR compared

to IRGB, and low values (or zeros) for other regions where the spatial

details of IRGB is better. Hence, F will serve as our adaptive selector

of the amount of fusion (injection) of the spatial details of INIR to

produce JRGB.

We designed a high pass filter g to extract the higher frequency

contents (spatial details) of INIR as g = δ − h, where δ is the unit

impulse filter and h is a prototype Gaussian filter with radial cut-

off frequency Ωcut cycles/picture height (c/ph) and with kernel size

of k × k. More details about how we set the parameters of g are

presented in Sec. 3.

With the estimated fusion map F and the extracted spatial details

g ∗ INIR (where ∗ represents convolution), we propose the fusion

process to generate1 the image JRGB as

JRGB(x) = I

RGB(x) + F (x)(

g ∗ INIR)

(x). (3)

Note that, only the the higher frequency contents(

g ∗ INIR)

are

injected into IRGB while the base-band contents of IRGB are left in-

tact. Additionally, for the regions where the captured spatial details

of IRGB are attenuated compared to their counterparts in INIR, F will

be large to boost the injected high frequency contents from INIR. On

the contrary, the other regions where the spatial details of IRGB is

better, F → 0 and the second term in Eq. (3) vanishes or has very

little effect. Therefore, the proposed fusion in Eq. (3) complies with

our propositions.

The proposed fusion approach is non-iterative and has low com-

putational complexity. Due to space constraints, we summarize the

operations required for each equation of the proposed approach in

Table 1. From the table, the computational complexity C of the pro-

posed approach, applied on an image IRGB with a total number of n

pixels, is given by

C(n) = n(

A(

k2 + 13

)

+ M(

k2 + 12

)

+ C(

3S2 + 1))

= O(n). (4)

This fast method without any iterative process can be hardware

implemented on system on chip and integrated in the camera hard-

ware.

1Note that, the fusion in Eq. (3) is performed on every channel (red, green,and blue) of IRGB.

157

Eq.add/sub mult/div comparison

(A) (M) (C)

(1) 1 1 1

(2) 9 10 3S2

(3) k2 + 3 k2 + 1 0

Table 1. Computational complexity analysis for the proposed fusion

approach.

3. EXPERIMENTAL RESULTS

We evaluated the proposed approach on the dataset from [4] that

consists of 477 pairs of VS-NIR images organized into 9 categories.

The images were captured using a modified SLR camera by using

an IR-block or IR-pass filter in front of the camera’s lens. We set

S = 5, Ωcut = 0.05, and k = 19 in all experiments. The parameter

Ωcut is determined by performing a linear search in the range Ωcut :0.01 → 0.5 and for each value of Ωcut, we first perform the proposed

fusion approach on large number of images from the dataset, then

we subjectively evaluated every fused image, and finally we pick the

value for Ωcut that results in high quality fused images. Similarly,

we performed a linear search for k starting from k = 5 until we

obtained minimum ripples in both the stop and pass bands of the

frequency response of g. The magnitude of the frequency response

of g with the above specified parameters is shown in Fig. 3.

−1/2

−1/4

0

1/4

1/2

−1/2

−1/4

0

1/4

1/2

−25

−20

−15

−10

−5

0

u (c/ph)v (c/ph)

dB

Fig. 3. Magnitude of the frequency response of the high pass filter g

with Ωcut = 0.05.

We compare the proposed approach with (a) the coloring ap-

proach in [4] and (b) with the contrast-preserving mapping approach

in [5]. The experiments were performed on a large number of im-

age pairs from the data set and we present few of them in Fig. 5 and

Fig. 6. The columns of both figures from left to right represent IRGB,

INIR, the fused image obtained using [4], the fused image obtained

using [5], and the fused image JRGB obtained using the proposed ap-

proach, respectively. Additionally, the fusion maps corresponding to

the image pairs in the first two rows of Fig. 5 and the first two rows

of Fig. 6 are shown in Fig. 4 (a) and (b), respectively.

As shown in Fig. 5, the fused images JRGB obtained by the pro-

posed approach show a great enhancement to the scene details com-

pared to IRGB preserving the spectral contents of IRGB. For example,

the majority of the blurriness in IRGB shown in the first row of the

figure is restored in the fused image. Another example, the details of

the hazy distant regions missed in IRGB (shown in the last three rows

of the figure) are much better in the fused image compared to IRGB.

The reason behind the obtained enhancements is that the fusion map

automatically determines the regions in IRGB where scene details are

missing (as shown in Fig. 4 (a)) and accordingly, our adaptive fusion

approach incorporates the details from INIR to IRGB without intro-

(a)

(b)

Fig. 4. Corresponding fusion maps for sample image pairs from

Fig. 5 and Fig. 6. Specifically, the fusion maps in (a) and (b) are

corresponding to the image pairs in the first two rows of Fig. 5 and

the first two rows of Fig. 6, respectively.

ducing unnecessary modification of colors or amplification of scene

details that may result in an unrealistic fused image.

Note that, for all image pairs in Fig. 5, the NIR images have bet-

ter captured scene details compared to the corresponding VS images.

Therefore, the fused images obtained using [4] and [5] also show bet-

ter scene details compared to IRGB since these approaches maintain

a high fidelity between the fused and the NIR images. However, the

fused images have noticeably different spectral contents compared

to IRGB and the fused images seem unnatural.

The problem of obtaining unnatural fused images using [4]

and [5] is very apparent in Fig. 6. This is because the approaches

in [4] and [5] blindly maintain high fidelity between the fused and

the NIR images even when the NIR images lack scene details as

shown in the figure. On the other hand, the proposed fusion ap-

proach avoid this problem by adaptively estimating the fusion map,

which in this case has small values (as shown in Fig. 4 (b)) and

therefore, the injection from the NIR to the VS image is refrained.

We implemented the proposed approach using C++. The run-

ning time for the images included in the results (of size 682× 1024)

was 0.7 seconds on a laptop that has core i7 processor and memory

of 12 GB. The proposed approach is 2.5× faster compared with the

running time2 of the method in [5].

4. CONCLUSIONS

In this paper, we propose a fast fusion approach to enhance a VS im-

age by adaptively injecting spatial details from a co-registered NIR

image without altering the colors of the visible image. Specifically,

the proposed approach first estimates a fusion map from the local

contrasts of the NIR and the VS images. Then, the spatial details

from the NIR are extracted using a carefully designed high pass fil-

ter. Finally, the extracted spatial details are weighted according to

the fusion map and injected into the VS image. Two key advantages

2Note that, the speed-up is reported according to our own implementationof the method in [5].

158

Fig. 5. Sample VS and NIR images and their fusion results. The columns from left to right represent IRGB, INIR, the fused image obtained

using [4], the fused image obtained using [5], and the fused image obtained using the proposed approach, respectively. Note that, for all image

pairs, the NIR images have better captured scene details compared to the corresponding VS images. Images are best viewed electronically.

Fig. 6. Sample VS and NIR images and their fusion results. The columns are organized similar to those of Fig. 5. Note that, for all image

pairs, the NIR images lack scene details compared to the corresponding VS images. Images are best viewed electronically.

offered by the proposed approach. First, the adaptive fusion incor-

porates only missing spatial details from the NIR to the VS image

without introducing any unnecessary modification of colors or am-

plification of details that may result in an unrealistic fused image.

Second the proposed approach has low computational complexity.

The advantages of the proposed approach are highlighted with sev-

eral fusion results obtained from natural NIR and VS image pairs.

The proposed approach takes only about 0.7 seconds to fuse an im-

age of size 682 × 1024 and shows better image enhancement com-

pared to two alternative approaches.

159

5. REFERENCES

[1] C. Colvero, M. Cordeiro, G. De Faria, and J. Von der Weid,

“Experimental comparison between far-and near-infrared wave-

lengths in free-space optical systems,” Microwave and optical

technology letters, vol. 46, no. 4, pp. 319–323, 2005.

[2] N. Salamati, C. Fredembach, and S. Susstrunk, “Material classi-

fication using color and NIR images,” Color and Imaging Con-

ference, vol. 2009, no. 1, pp. 216–222, 2009.

[3] N. Salamati and S. Susstrunk, “Material-based object segmenta-

tion using near-infrared information,” Color and Imaging Con-

ference, vol. 2010, no. 1, pp. 196–201, 2010.

[4] C. Fredembach and S. Susstrunk, “Colouring the near-infrared,”

Color and Imaging Conference, vol. 2008, no. 1, pp. 176–182,

2008.

[5] C. H. Son, X. P. Zhang, and K. Lee, “Near-infrared coloring via

a contrast-preserving mapping model,” in IEEE Global Conf. on

Signal and Information Proc. (GlobalSIP), Dec 2015, pp. 677–

681.

[6] D. Sugimura, T. Mikami, H. Yamashita, and T. Hamamoto, “En-

hancing color images of extremely low light scenes based on

RGB/NIR images acquisition with different exposure times,”

IEEE Trans. Image Proc., vol. 24, no. 11, pp. 3586–3597, Nov

2015.

[7] C. Feng, S. Zhuo, X. Zhang, L. Shen, and S. Susstrunk, “Near-

infrared guided color image dehazing,” in IEEE Intl. Conf. Im-

age Proc., Sept 2013, pp. 2363–2367.

[8] L. Schaul, C. Fredembach, and S. Susstrunk, “Color image de-

hazing using the near-infrared,” in IEEE Intl. Conf. Image Proc.,

Nov 2009, pp. 1629–1632.

[9] Y.-W. Tai and M. S. Brown, “Single image defocus map estima-

tion using local contrast prior,” in IEEE Intl. Conf. Image Proc.,

Nov 2009, pp. 1797–1800.

160

Documents

FAST NEAR INFRARED FUSION-BASED ADAPTIVE ENHANCEMENT …faculty.sist.shanghaitech.edu.cn/faculty/luoxl/class/2017Fall_EE251/... · image to construct the fused image. These fusion