Upload
others
View
12
Download
0
Embed Size (px)
Citation preview
Infrared and visible image fusion using a
novel deep decomposition method
Hui Li Xiao-Jun Wu *
Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence,
Jiangnan University, Wuxi, China, 214122.
Abstract: Infrared and visible image fusion is an important problem in image fusion tasks which has been applied
widely in many fields. To better preserve the useful information from source images, in this paper, we propose an
effective image fusion framework using a novel deep decomposition method which based on Latent Low-Rank
Representation(LatLRR). And this decomposition method is also named DDLatLRR. Firstly, the LatLRR is utilized to learn
a project matrix which used to extract salient features. Then, the base part and multi-level detail parts are obtained
by DDLatLRR. With adaptive fusion strategies, the fused base part and the fused detail parts are reconstructed.
Finally, the fused image is obtained by combine the fused base part and the detail parts. Compared with other fusion
methods experimentally, the proposed algorithm has better fusion performance than state-of-the-art fusion methods in
both subjective and objective evaluation. The Code of our fusion method is available at
https://github.com/exceptionLi/imagefusion_deepdecomposition .
Keywords: image fusion;deep decomposition;latent low-rank representation; infrared image; visible image;
1 Introduction
In multi-sensor image fusion field, the infrared and visible image fusion is an important task.
It has been widely used in many applications, such as surveillance, object detection and target
recogintion. The main purpose of image fusion is to generate a single image which contains the
complementary information from multiple images of the same scene[1]. In infrared and visible image
fusion, it is a key problem to extract the salient object from infrared image and visible image.
And many fusion methods have been proposed in recently.
As we all know, the most commonly used method in image fusion are multi-scale transforms, such
as discrete wavelet transform(DWT)[2], contourlet trnsform[3], shift-invariant shearlet
transform[4] and quaternion wavelet transform[5] etc. Due to the conventional tansform methods have
not enough detail preservation ability, Lou et al.[6] proposed a fusion method based on contextual
statistical similarity and nonsubsampled shearlet transform which can obtain the local structure
information from source images. For infrared and visible image fusion, Bavirisetti et al.[7]
proposed a fusion method based on two-scale decomposed and saliency detection, they used mean
filter and median filter to extract the base layers and detail layers, and visual saliency is used
to obtain weight maps. Then, the fused image is obtained by calculate these three parts.
Besides above methods, Zhang et al.[8] proposed a morphological gradient based fusion method.
This method used the different morphological gradient operators to obtain the focus region, defocus
region and focus boundary region, respectively. Then the fused image is obtained by using an
appropriate fusion strategy.
In representation learning based algorithms, the most common methods are based on sparse
representation(SR). Zong et al.[9] proposed a novel medical image fusion method based on SR. The
Histogram of Oriented Gradient(HOG) features are used to classify the image patches and several
sub-dictionaries are learned. The 𝑙1-norm and choose-max strategy are utilized to reconstruct fused
image. In addition, there are many algorithms based on combining SR and other tools which are pulse
coupled neural network(PCNN)[10], low-rank representation(LRR)[11] and shearlet transform[12].
Moreover, the joint sparse representation[13] and cosparse representation[14] were also applied
into image fusion field.
With the rise of deep learning, deep features of source images are used to reconstruct the
fused image. In [15], Yu Liu et al. proposed a fusion method based on convolutional sparse
representation(CSR). The CSR is different from deep learning methods, but the features extracted by
CSR indicate multi-scale and multi-layer features which just like deep features. In addition, Yu
Liu et al.[16] also proposed a convolutional neural network(CNN)-based fusion method. Image patches
which contain different blur versions are used to train a network and obtain a decision map. The
fused image is obtained by using the decision map and source images. In ICCV 2017, Prabhakar
et.al.[17] proposed a novel CNN-based fusion framework for multi-exposure image fusion task.
However, these deep learning-based methods still have drawbacks which the network is difficult
to train when the training data is not enough, especially in infrared and visible image fusion
task. And CNN-based methods just work on a specific image fusion task.
To solve these drawbacks, Li et.al.[18] proposed a novel fusion framework based on a pretrained
network(VGG-19[19]). Firstly, the detail parts and base parts are obtained by a optimization
method[20]. Average strategy is used to fuse base parts and a deep learning framework is utilized
to obtain fused detail parts. The VGG-19 is used to extract multi-layer deep features from detail
parts. With multi-layers fusion strategy and choose-max operator, the final fused detail parts are
obtained. The final fused image is reconstructed by combine fused detail parts and fused base
parts.
Although the SR and deep learning based methods obtain good fusion performance, these methods
still have drawbacks: 1)In SR based methods, the dictionary learning is a very time consuming
operator, especially online dictionary learning; 2)In deep features based method[18], the
decomposition method is very sample, and different decomposition methods will cause different
influence.
So in this paper, we propose a novel fusion framework based on a deep decomposition
method(DDLatLRR) for infrared and visible image fusion. Firstly, a project matrix is learned by the
Latent Low-Rank Representation(LatLRR)[21], and the project matrix is used to map the input data to
a salient feature space. Secondly, image patches are obtained by divide source images with sliding
window technique. And these image patches are stretched to a source matrix which each column
indicates an image patch. The DDLatLRR is used to decompose the source matrix level-by-level to
extract salient features which named detail parts and the base part. Then, the fused detail parts
are reconstructed by an adaptive fusion strategy and a reshape operator. And the fused base part is
obtained by average strategy. Finally, the fused image is reconstructed by combining the fused
detail parts and the fused base part. Compare with state-of-the-art fusion methods, our fusion
framework achieves better fusion performance in both subjective and objective evaluation.
This paper is structured as follows. In Section 2, we give a brief introduction of related
works. In Section 3, the proposed fusion framework based on a deep decomposition method is
introducted in detail. Section 4 introduces how to learn a project matrix. The experimental results
are shown in Section 5. Finally, Section 6 draws the conclusions.
2 Related works
Latent Low-Rank Representation:
In 2010, Liu et al.[22] proposed LRR theroy which the input data matrix is chosen as the
dictionary, but this method can not achieve good performance when the input data is insufficient or
corrupted. So in 2011, the author proposed LatLRR theroy[21], the low-rank structure and salient
structure can be extracted by LatLRR from raw data.
In reference [21], the LatLRR problem is reduced to solve the following optimization problem,
min𝑍,𝐿,𝐸
‖𝑍‖∗ + ‖𝐿‖∗ + 𝜆‖𝐸‖1, ............................... (1)
𝑠. 𝑡. 𝑋 = 𝑋𝑍 + 𝐿𝑋 + 𝐸,
where 𝜆 > 0 is the balance coefficient, ‖∙‖∗ denotes the nuclear norm which is the sum of the
singular values of matrix and ‖∙‖1 is 𝑙1 − 𝑛𝑜𝑟𝑚. X denotes observed data matrix, Z is low-rank
coefficients, L is a project matrix which is named salient coefficients, and E is sparse noisy
matrix. Eq.(1) is solved by the inexact Augmented Lagrangian Multiplier (ALM)[22]. Then the salient
component LX is obtained by Eq.(1).
Fig.1. Latent low-rank representation. The observed data matrix X, low-rank coefficients Z and project matrix L
Why choose project matrix L:
As shown in Fig.1. Assume that source image is divided into M image patches and the size of
image patch is 𝑛 × 𝑛, and 𝑁 = 𝑛 × 𝑛. X indicates the observed matrix and each column denotes an
image patch. The size of Z is related to the number of image patches, which depends on the size of
the source image. And it is time consuming to calculate the low-rank coefficients for every image
in test phase, if Z is applied to fusion framework.
However, the size of project matrix L is just related to image patch size. In this case, once
the project matrix is learned by LatLRR, it can be used to process other images which are arbitrary
size.
So in our fusion method, Eq.(1) is used to learn a project matrix L from training data(infrared
and visible images). Then, observed data matrices(X) are decomposed into detail parts and base
parts by a pre-trained project matrix, the detail of this operator will be introduced in the next
section.
3 The Proposed Fusion Method
In this section, the proposed fusion method is presented in detail. The deep decomposition
method and fusion strategies for detail parts and base part will be presented in the next
subsection.
Assuming that 𝐼1 and 𝐼2 indicate input images(infrared and visible images), and 𝐼𝑘(𝑘 ∈ {1, 2}) is
irrelevant with the type of input images. The fusion algorithm is the same when 𝑘 > 2. The
framework of our fusion algorithm is shown in Fig.2.
Fig.2. The framework of proposed fusion method. Source images are decomposed into detail parts(𝑉𝑑𝑘1:𝑟) and base
part(𝐼𝑏𝑘𝑟 ). Then, with adaptive fusion strategies, the fused image(𝐼𝑓) is reconstructed by fused detail parts(𝐼𝑏𝑓
1:𝑟) and
base part(𝐼𝑏𝑓).
As shown in Fig.2, the input image(𝐼𝑘) is divided into many image patches by sliding window
technique with overlapping(the step is one). And these image patches are reshuffled into a source
matrix(𝑉𝑑𝑘0 ) which each column indicates an image patch. The detail parts and the base parts are
calculated by Eq.(2).
𝑉𝑑𝑘𝑖 = 𝐿 × 𝑃(𝐼𝑏𝑘
𝑖−1), 𝐼𝑏𝑘𝑖 = 𝐼𝑏𝑘
𝑖−1 − 𝑅(𝑉𝑑𝑘𝑖 ) ......................... (2)
𝑠. 𝑡. 𝐼𝑏𝑘0 = 𝐼𝑘 , 𝑘 ∈ {1,2}, 𝑖 = 1,2, ⋯ , 𝑟
where r and L denote the decomposition level and the project matrix which learned by LatLRR,
respectively. 𝑉𝑑𝑘𝑖 means the decomposition results from the previous base part 𝐼𝑏𝑘
𝑖−1, 𝑃(⋅) denotes the
sliding window technique and reshuffled operator, and R(⋅) indicates the operator which is utilized
to reconstruct the detail image from detail part. As shown in Eq.(2), the detail parts are
generated by L, 𝑃(⋅) and the input image 𝐼𝑏𝑘𝑖−1. Then, the base part is obtained by subtract the detail
part from input image.
After decomposed with r levels by our decomposition method, the input image(𝐼𝑘) is decomposed
into r pairs of detail part matrices(𝑉𝑑𝑘1:𝑟) and one pair of base part(𝐼𝑏𝑘
𝑟 ). For each pair of detail
parts, an adaptive fusion strategy is used to fused these part column by column. Then, r fused
detail images(𝐼𝑑𝑓1:𝑟) are obtained. The fused detail images can be calculated by Eq.(3),
𝐼𝑑𝑓𝑖 = 𝑅 (𝐹𝑆(𝑉𝑑1
𝑖 , 𝑉𝑑2𝑖 )) , 𝑖 = 1, 2, ⋯ , 𝑟 ............................ (3)
where r indicates the decomposition level, 𝑅(⋅) denotes the operator which is used to reconstruct a
salient feature image from the detail part matrix and 𝐹𝑆(⋅) is fusion strategy which will be
introduced in the next subsection.
Due to base part contains more contour and brightness information, in our fusion method,
weighted average strategy is utilized to obtain the fused base part.
After the fused detail images and base part are obtained by adaptive strategies, fused image is
reconstructed by those parts.
In the next subsections, the detail of deep decomposition method, the fusion strategies and
reconstruction will be presented.
3.1 Deep Decomposition based on LatLRR(DDLatLRR)
Firstly, we introduce the LatLRR based decomposition method(DLatLRR). As we discussed in
Eq.(2), once the project matrix L is learned by LatLRR, it can be used to extract detail parts and
base parts from input images by DLatLRR. And the process of DLatLRR is shown in Fig.(3).
Fig.3. The process of DLatLRR.
In our decomposition method, the input image(𝐼𝑏𝑘𝑖−1) is divided into image patches and reshuffled
to vectors by the operator 𝑃(⋅). Then, the detail part matrix(𝑉𝑑𝑘𝑖 ) is calculated by project matrix L
and 𝑃(𝐼𝑏𝑘𝑖−1). The salient features are shown in detail image(𝑅(𝑉𝑑𝑘
𝑖 )) which reconstructed by 𝑅(⋅).
For just one level(𝑟 = 1) of our decomposition method, the detail part and base part can be
calculated by Eq.(2).
If 𝑟 > 1 which means multi-level decomposition method is used, the deep decomposition
method(DDLatLRR) is presented. And the framework of DDLatLRR is shown in Fig.4.
Fig.4. Deep decomposition based on LatLRR(DDLatLRR). DLatLRR indicates our decomposition method for one level. And 𝑟
denotes the number of decomposition level.
In Fig.4, 𝐼𝑘 presents the source image, 𝑉𝑑𝑘𝑖 and 𝐼𝑏𝑘
𝑖 denote the detail part matrix and base part
from input image which is decomposed by DLatLRR, 𝑖 = 1,2, ⋯ , 𝑟. As we can see from Fig.4, each
previous base part is decomposed by DLatLRR. Then, for 𝑟 levels decomposition, we will get 𝑟 detail
part matrices 𝑉𝑑𝑘1:𝑟 and one base part 𝐼𝑏𝑘
𝑟 . And 𝐼𝑘 can be reconstructed by add 𝐼𝑏𝑘𝑟 and 𝑟 detail images.
Then, adaptive strategies are utilized to fuse the detail parts and the base part.
3.2 Fusion Strategies
Once the detail parts and the base part are obtained by DDLatLRR, we choose adaptive strategies
to fuse these parts.
3.2.1 For base part
Base parts for input images contain more common features, redundant information and brightness
information. So, in our fusion method, we use weighted average strategy to obtain the fused base
part. The fused base part can be calculated by Eq.(4),
𝐼𝑏𝑓(𝑥, 𝑦) = 𝑤b1𝐼𝑏1𝑟 (𝑥, 𝑦) + 𝑤b2𝐼𝑏2
𝑟 (𝑥, 𝑦) ............................ (4)
𝑠. 𝑡. 𝑤𝑏1 = 𝑤𝑏2 = 0.5.
where (𝑥, 𝑦) denotes the corresponding position in base parts(𝐼𝑏1𝑟 , 𝐼𝑏2
𝑟 ) and fused base part(𝐼𝑏𝑓).
3.2.2 For detail parts
In contrast to base part, the detail parts preserve more structure information and salient
features. So, the fusion strategy(𝐹𝑆(⋅)) for detail parts should be more carefully chosen. In our
method, nuclear norm is used to calculate the weights for each corresponding image patches which
are divided by 𝑃(⋅) from input images. And this strategy is shown in Fig.5.
Fig.5. Fusion strategy based on nuclear norm. ‘reshape’ means the reconstruction operator which reverse the vector
to an image patch. || ⋅ ||∗ indicates nuclear-norm. 𝑤𝑑𝑘𝑖,𝑗 indicates the weight for each column. 𝑉𝑑𝑓
𝑖 denotes the fused
detail part.
In our fusion strategy, 𝑉𝑑𝑘𝑖,𝑗 and 𝑉𝑑𝑓
𝑖,𝑗 indicate vectors which are 𝑗-th column in the detail part
matrix 𝑉𝑑𝑘𝑖 and the fused detail part matrix 𝑉𝑑𝑓
𝑖 , respectively. And 𝑖 is the decomposition level, 𝑘 ∈
{1,2}. Firstly, for each corresponding column, the weights 𝑤𝑑𝑘𝑖,𝑗 are calculated by Eq.(5),
𝑤𝑑𝑘𝑖,𝑗
=𝑤𝑑𝑘
𝑖,��
∑ 𝑤𝑑𝑝𝑖,��𝑃
𝑝=1 , ..................................... (5)
𝑠. 𝑡. 𝑤𝑑𝑘𝑖,��
= ||𝑟𝑒(𝑉𝑑𝑘𝑖,𝑗
)||∗ , P = 2
where 𝑟𝑒(⋅) indicates the reshape operator which is used to reconstruct image patch from the vector
𝑉𝑑𝑘𝑖,𝑗, and || ⋅ ||∗ indicates nuclear-norm which calculates the sum of singular values for a matrix.
Then, fused detail part vector 𝑉𝑑𝑓𝑖,𝑗 is obtained by weights 𝑤𝑑𝑘
𝑖,𝑗 and detail part vectors 𝑉𝑑𝑘
𝑖,𝑗, as
shown in Eq.(6),
𝑉𝑑𝑓𝑖,𝑗
= ∑ 𝑤𝑑𝑝𝑖,𝑗
× 𝑉𝑑𝑝𝑖,𝑗𝑃
𝑝=1 , 𝑃 = 2. ............................... (6)
This fusion strategy is applied for each level and r fused detail part matrices 𝑉𝑑𝑘1:𝑟 are
obtained. Then, each fused detail image(𝐼𝑑𝑓𝑖 ) for r levels is calculated by Eq.(7),
𝐼𝑑𝑓𝑖 = 𝑅(𝑉𝑑𝑓
𝑖 ) ...................................... (7)
where 𝑅(⋅) indicates the reconstruction operator which is discussed before.
3.3 Reconstruction
Once we have the fused detail images and the fused base part, fused image is generated by
Eq.(8),
𝐼𝑓(𝑥, 𝑦) = 𝐼𝑏𝑓(𝑥, 𝑦) + ∑ (𝐼𝑑𝑓𝑖 (𝑥, 𝑦))𝑟
𝑖=1 ............................ (8)
4 Learning the Project Matrix 𝑳
As we discussed before, a project matrix 𝐿 is learned by LatLRR. And the size of 𝐿 is just
related with image patch. Training data[23] are shown in Fig.6, which contain five pairs infrared
and visible images. In Fig.6, first row and second row are infrared images and visible images,
respectively.
Fig.6. Five pairs infrared and visible images, which are used to learn L by LatLRR.
In our learning phase, all these images are divided into image patches by sliding windows
technique without overlapping. We choose three different image patch size, which are 𝑛 × 𝑛 and 𝑛 ∈
{8, 16, 32}. And, 1200 image patches are randomly chosen to generate an input matrix 𝑋 which each
column indicates all pixels of one image patch. Then, the size of 𝑋 is 𝑁 × 𝑀, where 𝑁 = 𝑛 × 𝑛 and
𝑀 = 1200.
As we discussed in section 2.1 and 2.2, the project matrix 𝐿 could be learned by LatLRR and
ALM. In LatLRR, 𝜆 is set as 0.4. With three different image patch size, three types of 𝐿 are
obtained, which their size are 64 × 64, 128 × 128 and 1024 × 1024.
In this case, if we choose the same image patch size(𝑛 × 𝑛) in sliding window technique for
testing images, 𝐿 could be used to extract features, repeatedly.
5 Experimental Results and Analysis
The aims our experiment are to discuss why the nuclear-norm and overlapping operator are
utilized in our method, and evaluate the fusion performance compare with other existing fusion
methods.
5.1 Experimental setting
Our testing data is available at [24] which also are collected from [25] and [26]. There are 21
pairs infrared and visible images. And a sample of these infrared and visible images is shown in
Fig.7
Fig.7. Five pairs of source images. The top row contains infrared images, and the second row contains visible
images.
Firstly, we discuss the reason of why we choose overlapping operator and nuclear-norm in fusion
strategy. For non-overlapping, three types of 𝐿(𝐿8, 𝐿16, 𝐿32) are used, which means the decomposition
size is 8 × 8, 16 × 16 and 32 × 32.
However, for overlapping operator, due to each pixel have to generate an image patch, when
image patch size is set 32 × 32,, the input source matrix size will be extremely lager and the
speed for just calculate one level also will be very slow. So, we just choose 8 × 8 and 16 × 16 for
decomposition size when use overlapping operator.
For deep decomposition method, the number of decomposition level is 1 to 4, which means 𝑟 =
1, 2, 3, 4.
For comparison, nine recent and classical fusion methods are chosen to perform the same
experiment, including: cross bilateral filter method(CBF)[27], discrete cosine harmonic wavelet
transform method(DCHWT)[28], joint sparse representation-based method(JSR)[29], the JSR model with
saliency detection fusion method(JSRSD)[13], the gradient transfer fusion method(GTF)[30], weighted
least square optimization-based method(WLS)[25], convolutional sparse representation(ConvSR)[15],
VGG-19 and multi-layers fusion strategy-based method(VggML)[18] and a CNN-based fusion
method(DeepFuse)[17].
For the purpose of quantitative comparison between our method and other existing fusion methods
seven quality metrics are utilized. These are: entropy(En); mutual information(MI); 𝑄𝑎𝑏𝑓[31]
reflects the quality of visual information obtained from the fusion of input images; 𝐹𝑀𝐼𝑑𝑐𝑡 and
𝐹𝑀𝐼𝑤[32] calculate fast mutual information (FMI) for the pixel, discrete cosine and wavelet
features, respectively; a modified structural similarity 𝑆𝑆𝐼𝑀𝑎[18]; and MS_SSIM[33] calculates a
modified structural similarity which just focus on structural information. The fusion performance
of fused image is better with the increasing numerical index of these metric values.
All the experiments are implemented in MTALAB R2017b on 2.8 GHz Intel(R) Core(TM) i5-8400 CPU
with 16 GB RAM.
5.2 Why choose nuclear-norm
Firstly, in this experiment, we choose non-overlapping and overlapping operator to divide
testing images with sliding windows technique. 𝑙1-norm and nuclear-norm are utilized to make
comparisons. And 𝑙1-norm based fusion strategy is shown in Fig.8.
Fig.8. Fusion strategy based on 𝑙1-norm. || ⋅ ||1 indicates 𝑙1-norm
The average values of seven quality metrics for 21 fused images which obtained by our method
using non-overlapping and two norm-based strategies are shown in Table 1. And Table 2 shows the
quality metrics values for overlapping operator. 𝐿8 𝐿16 and 𝐿32 indicate that 𝐿 is learned by set
image patch size is 8 × 8, 16 × 16 and 32 × 32, respectively. And for overlapping, the input matrix
size will be extremely lager and the fusion speed will be very slow, so 𝐿32 is discarded.
And in our tables, the best and second-best values are denoted in bold and red, respectively.
Table 1 The average values of ten metrics for 21 fused images which are obtained by our method with non-overlapping
and two norms.
Non-overlapping En MI Qabf 𝐹𝑀𝐼𝑑𝑐𝑡 𝐹𝑀𝐼𝑤 𝑆𝑆𝐼𝑀𝑎 MS_SSIM
𝑙1-norm
L8
level-1 6.20354 12.40708 0.38865 0.38595 0.40588 0.77641 0.87710
level-2 6.22697 12.45395 0.40803 0.36816 0.40117 0.77376 0.87760
level-3 6.24534 12.49069 0.41648 0.35635 0.39954 0.77195 0.87649
level-4 6.25966 12.51932 0.42051 0.34838 0.39927 0.77086 0.87494
L16
level-1 6.21365 12.42731 0.40794 0.38040 0.40357 0.77271 0.88806
level-2 6.24910 12.49820 0.44080 0.35591 0.39256 0.76579 0.89484
level-3 6.27704 12.55409 0.45630 0.33877 0.38487 0.76019 0.89725
level-4 6.29874 12.59748 0.46306 0.32631 0.37932 0.75595 0.89776
L32
level-1 6.20968 12.41936 0.39557 0.37721 0.39942 0.77327 0.88553
level-2 6.24695 12.49390 0.42609 0.34732 0.38379 0.76650 0.89219
level-3 6.27939 12.55878 0.44334 0.32513 0.37142 0.76025 0.89521
level-4 6.30711 12.61421 0.45275 0.30891 0.36169 0.75485 0.89660
nuclear-
norm
L8
level-1 6.20768 12.41537 0.40342 0.38946 0.40813 0.77563 0.88093
level-2 6.23431 12.46861 0.42628 0.37347 0.40334 0.77255 0.88336
level-3 6.25529 12.51057 0.43494 0.36214 0.40116 0.77047 0.88322
level-4 6.27169 12.54337 0.43833 0.35383 0.40030 0.76919 0.88211
L16
level-1 6.21417 12.42834 0.41765 0.38391 0.40652 0.77234 0.88931
level-2 6.25079 12.50157 0.45573 0.36196 0.39716 0.76491 0.89754
level-3 6.28051 12.56103 0.47401 0.34606 0.38998 0.75868 0.90125
level-4 6.30429 12.60857 0.48237 0.33409 0.38451 0.75379 0.90285
L32
level-1 6.20754 12.41508 0.39922 0.37922 0.40077 0.77323 0.88493
level-2 6.24321 12.48642 0.43149 0.35092 0.38599 0.76648 0.89161
level-3 6.27483 12.54966 0.44965 0.32948 0.37403 0.76020 0.89485
level-4 6.30227 12.60454 0.45990 0.31356 0.36458 0.75469 0.89652
Table 2 The average values of seven metrics for 21 fused images which are obtained by our method with overlapping
and two norms.
Overlapping En MI Qabf 𝐹𝑀𝐼𝑑𝑐𝑡 𝐹𝑀𝐼𝑤 𝑆𝑆𝐼𝑀𝑎 MS_SSIM
𝑙1-norm
L8
level-1 6.19811 12.39622 0.38558 0.39976 0.41494 0.77830 0.87583
level-2 6.41292 12.82583 0.49336 0.39962 0.42733 0.76452 0.91792
level-3 6.68753 13.37506 0.48276 0.38757 0.43273 0.70326 0.92864
level-4 6.57591 13.15182 0.37264 0.33866 0.41473 0.60962 0.88210
L16
level-1 6.21917 12.43833 0.42405 0.40229 0.41881 0.77632 0.89272
level-2 6.45665 12.91330 0.52704 0.40060 0.43029 0.75556 0.94603
level-3 6.75216 13.50432 0.49299 0.38505 0.43449 0.68956 0.93230
level-4 6.98423 13.96846 0.35764 0.36075 0.43119 0.60014 0.86392
nuclear-
norm
L8
level-1 6.20284 12.40569 0.40245 0.40343 0.41767 0.77772 0.88031
level-2 6.42438 12.84876 0.50919 0.40609 0.43028 0.76149 0.92561
level-3 6.70183 13.40366 0.49195 0.39115 0.43481 0.69669 0.93517
level-4 6.60205 13.20411 0.37112 0.34020 0.41535 0.60163 0.88644
L16
level-1 6.21857 12.43714 0.43640 0.40623 0.42214 0.77573 0.89399
level-2 6.45277 12.90554 0.53936 0.40784 0.43479 0.75241 0.94796
level-3 6.74924 13.49849 0.49937 0.39279 0.43944 0.68260 0.93285
level-4 6.98267 13.96534 0.35134 0.36741 0.43546 0.59045 0.86227
From Table 1 and Table 2, the nuclear-norm gains almost best and second-best values of seven
quality metrics in both non-overlapping and overlapping. For non-overlapping, nuclear-norm obtains
four best values(Qabf, 𝐹𝑀𝐼𝑑𝑐𝑡, 𝐹𝑀𝐼𝑤, MS_SSIM) and five second-best values(except 𝐹𝑀𝐼𝑑𝑐𝑡, 𝐹𝑀𝐼𝑤).
For overlapping, the nuclear-norm still has advantageous in both best and second-best values.
These results denote that nuclear-norm can achieve better performance with or without
overlapping operator. Because 𝑙1-norm based strategy just considers the values of detail parts, the
structural information is ignored. And nuclear-norm is used to calculate the sum of singular values
for a matrix, which considers the structural information for an image patch. Another reason is that
our fusion framework is based on LatLRR which is also related with nuclear-norm, so nuclear-norm
obtains better performance than 𝑙1-norm. Thus nuclear-norm is used in our fusion framework.
5.3 Why choose overlapping operator
In this section, we analyze the influence of overlapping operator on our method and fusion
strategy is based on nuclear-norm. Seven quality metrics are still chosen to evaluate the
performance. The values are shown in Table 3. And the best and second-best values are denoted in
bold and red, respectively.
Table 3 The average values of seven metrics for 21 fused images are obtained by our method with nuclear-norm and
non-overlapping and overlapping operator.
nuclear-norm En MI Qabf 𝐹𝑀𝐼𝑑𝑐𝑡 𝐹𝑀𝐼𝑤 𝑆𝑆𝐼𝑀𝑎 MS_SSIM
Non-
overlapping
L8
level-1 6.20768 12.41537 0.40342 0.38946 0.40813 0.77563 0.88093
level-2 6.23431 12.46861 0.42628 0.37347 0.40334 0.77255 0.88336
level-3 6.25529 12.51057 0.43494 0.36214 0.40116 0.77047 0.88322
level-4 6.27169 12.54337 0.43833 0.35383 0.40030 0.76919 0.88211
L16
level-1 6.21417 12.42834 0.41765 0.38391 0.40652 0.77234 0.88931
level-2 6.25079 12.50157 0.45573 0.36196 0.39716 0.76491 0.89754
level-3 6.28051 12.56103 0.47401 0.34606 0.38998 0.75868 0.90125
level-4 6.30429 12.60857 0.48237 0.33409 0.38451 0.75379 0.90285
L32
level-1 6.20754 12.41508 0.39922 0.37922 0.40077 0.77323 0.88493
level-2 6.24321 12.48642 0.43149 0.35092 0.38599 0.76648 0.89161
level-3 6.27483 12.54966 0.44965 0.32948 0.37403 0.76020 0.89485
level-4 6.30227 12.60454 0.45990 0.31356 0.36458 0.75469 0.89652
Over-
lapping
L8
level-1 6.20284 12.40569 0.40245 0.40343 0.41767 0.77772 0.88031
level-2 6.42438 12.84876 0.50919 0.40609 0.43028 0.76149 0.92561
level-3 6.70183 13.40366 0.49195 0.39115 0.43481 0.69669 0.93517
level-4 6.60205 13.20411 0.37112 0.34020 0.41535 0.60163 0.88644
L16
level-1 6.21857 12.43714 0.43640 0.40623 0.42214 0.77573 0.89399
level-2 6.45277 12.90554 0.53936 0.40784 0.43479 0.75241 0.94796
level-3 6.74924 13.49849 0.49937 0.39279 0.43944 0.68260 0.93285
level-4 6.98267 13.96534 0.35134 0.36741 0.43546 0.59045 0.86227
From Table 3, overlapping-based framework obtains all the best values and the second-best
values compare with non-overlapping-based framework. These results indicate that with overlapping
operator, our deep decomposition method can perserve more information in detail parts, which will
improve the performance of our fusion method. So, overlapping operator is utilized to divided input
images in our deep decomposition method.
5.4 Subjective evaluation
Due to the space limit, we only show the fused results of one pair source images(“street”).
And these results are obtained by nine existing fusion methods and our algorithm(DDLatLRR) which
use different 𝐿 and decomposition levels. The fused results are shown in Fig.9.
Fig.9. Experiment on “street” images. (a) Infrared image; (b) Visible image; (c) CBF; (d) DCHWT; (e) JSR; (f)
JSRSD. (g) GTF; (h) WLS; (i) ConvSR; (j) VggML; (k) DeepFuse; l-o) DDLatLRR(𝐿8) with level 1 to 4; p-s)
DDLatLRR(𝐿16) with level 1 to 4.
As we can see from Fig.9, the fused images which are obtained by CBF and DCHWT contain more
artifacts information and their salient features are not clear. For the fused images which are
obtained by JSR, JSRSD, GTF contain many ringing artifacts around the salient features and the
detail information are also not clear. In contrast, the fused images which are obtained by WLS,
ConvSR, VggML, DeepFuse and the proposed fusion method contain more saliency features and preserve
more detail information.
On the other hand, with the increasing of the decomposition level, the salient features are
enhanced by our fusion methods, as shown in Fig.9(n), Fig.9(o) and Fig.9(r), Fig.9(s).
5.5 Objective Evaluation
For better present the persuasion of our fusion method, seven quality metrics are also used to
evaluate the fusion performance for nine existing fusion methods and our algorithm.
In this section, the average values of seven quality metrics for 21 pairs source images are
shown in Table 4. And the best and second-best values are denoted in bold and red, respectively.
Table 4 The average values of seven quality metrics for 21 pairs source images.
En MI Qabf 𝐹𝑀𝐼𝑑𝑐𝑡 𝐹𝑀𝐼𝑤 𝑆𝑆𝐼𝑀𝑎 MS_SSIM
CBF(2013) 6.85749 13.71498 0.43961 0.26309 0.32350 0.59957 0.70879
DCHWT(2012) 6.56777 13.13553 0.46592 0.38568 0.40147 0.73132 0.84326
JSR(2013) 6.72263 12.72654 0.32306 0.14236 0.18506 0.54073 0.75523
JSRSD(2017) 6.72057 13.38575 0.32281 0.14253 0.18498 0.54127 0.75517
GTF(2016) 6.63433 13.26865 0.41037 0.39787 0.41038 0.70016 0.80844
WLS(2017) 6.64071 13.28143 0.50077 0.33103 0.37662 0.72360 0.93349
ConvSR(2016) 6.25869 12.51737 0.53485 0.34640 0.34640 0.75335 0.90281
VggML(2018) 6.18260 12.36521 0.36818 0.40463 0.41684 0.77799 0.87478
DeepFuse(2017) 6.69935 13.39869 0.43797 0.41501 0.42477 0.72882 0.93353
DDLatLRR
L8
level-1 6.20284 12.40569 0.40245 0.40343 0.41767 0.77772 0.88031
level-2 6.42438 12.84876 0.50919 0.40609 0.43028 0.76149 0.92561
level-3 6.70183 13.40366 0.49195 0.39115 0.43481 0.69669 0.93517
level-4 6.60205 13.20411 0.37112 0.34020 0.41535 0.60163 0.88644
L16
level-1 6.21857 12.43714 0.43640 0.40623 0.42214 0.77573 0.89399
level-2 6.45277 12.90554 0.53936 0.40784 0.43479 0.75241 0.94796
level-3 6.74924 13.49849 0.49937 0.39279 0.43944 0.68260 0.93285
level-4 6.98267 13.96534 0.35134 0.36741 0.43546 0.59045 0.86227
In Table 4, the proposed method achieves five best values(En, MI, Qabf, 𝐹𝑀𝐼𝑤 and MS_SSIM) and
three second-best values(𝐹𝑀𝐼𝑑𝑐𝑡, 𝐹𝑀𝐼𝑤 and 𝑆𝑆𝐼𝑀𝑎). These values indicate that the fused images which
are obtained by our method are more natural and contain less artificial information. From objective
evaluation, our fusion method has better fusion performance than those compared methods.
Specially for the quality metrics of EN and MI, we notice that CBF obtained the second-best
values because its fused image contains more noise and artifacts information, as shown in Fig.9.
However, our fusion method obtains the best values on EN and MI is because the salient features are
enhanced by our fusion method with the rise of decomposition level. So our fusion method may have
feature enhancement ability.
6 Conclusions
In this paper, we proposed a novel infrared and visible image fusion method based on a deep
decomposition method(DDLatLRR). Firstly, the training data are used to learn a project matrix 𝐿 by
LatLRR. Then, DDLatLRR is utilized to decompose detail parts and base part from source images. In
DDLatLRR, the source images are divided into image patches by sliding window technique with
overlapping operator. After 𝑟 levels decomposition, 𝑟 detail parts and one base part are obtained.
For base part, weighted-average strategy is used to generate fused base part. The nuclear-norm
based fusion strategy is utilized to fuse detail parts. Finally, the fused image is reconstructed
by adding the fused base part and the fused detail part. We use both subjective and objective
methods to evaluate the proposed method, the experimental results show that the proposed method
exhibits better performance than other compared methods.
Reference:
[1] S. Li, X. Kang, L. Fang, J. Hu, and H. Yin, “Pixel-level image fusion: A survey of the state of
the art,” Inf. Fusion, vol. 33, pp. 100–112, 2017.
[2] A. Ben Hamza, Y. He, H. Krim, and A. Willsky, “A Multiscale Approach to Pixel-level Image
Fusion,” Integr. Comput. Aided. Eng., vol. 12, pp. 135–146, 2005.
[3] S. Yang, M. Wang, L. Jiao, R. Wu, and Z. Wang, “Image fusion based on a new contourlet packet,”
Inf. Fusion, vol. 11, no. 2, pp. 78–84, 2010.
[4] L. Wang, B. Li, and L. F. Tian, “EGGDD: An explicit dependency model for multi-modal medical
image fusion in shift-invariant shearlet transform domain,” Inf. Fusion, vol. 19, no. 1, pp. 29–
37, 2014.
[5] H. Pang, M. Zhu, and L. Guo, “Multifocus color image fusion using quaternion wavelet transform,”
2012 5th Int. Congr. Image Signal Process. CISP 2012, 2012.
[6] X. Luo, Z. Zhang, B. Zhang, and X. Wu, “Image Fusion with Contextual Statistical Similarity and
Nonsubsampled Shearlet Transform,” IEEE Sens. J., vol. 17, no. 6, pp. 1760–1771, 2017.
[7] D. P. Bavirisetti and R. Dhuli, “Two-scale image fusion of visible and infrared images using
saliency detection,” Infrared Phys. Technol., vol. 76, pp. 52–64, 2016.
[8] Y. Zhang, X. Bai, and T. Wang, “Boundary finding based multi-focus image fusion through multi-
scale morphological focus-measure,” Inf. Fusion, 2017.
[9] J. jing Zong and T. shuang Qiu, “Medical image fusion based on sparse representation of
classified image patches,” Biomed. Signal Process. Control, vol. 34, pp. 195–205, 2017.
[10] X. Lu, B. Zhang, Y. Zhao, H. Liu, and H. Pei, “The infrared and visible image fusion algorithm
based on target separation and sparse representation,” Infrared Phys. Technol., vol. 67, pp.
397–407, 2014.
[11] H. Li and X.-J. Wu, “Multi-focus Noisy Image Fusion using Low-Rank Representation,” 2018.
[12] M. Yin, P. Duan, W. Liu, and X. Liang, “A novel infrared and visible image fusion algorithm based
on shift-invariant dual-tree complex shearlet transform and sparse representation,”
Neurocomputing, vol. 226, no. November 2016, pp. 182–191, 2017.
[13] C. H. Liu, Y. Qi, and W. R. Ding, “Infrared and visible image fusion method based on saliency
detection in sparse domain,” Infrared Phys. Technol., vol. 83, pp. 94–102, 2017.
[14] R. Gao, S. A. Vorobyov, and H. Zhao, “Image fusion with cosparse analysis operator,” IEEE Signal
Process. Lett., vol. 24, no. 7, pp. 943–947, 2017.
[15] Y. Liu, X. Chen, R. K. Ward, and J. Wang, “Image Fusion with Convolutional Sparse
Representation,” IEEE Signal Process. Lett., vol. 23, no. 12, pp. 1882–1886, 2016.
[16] Y. Liu, X. Chen, H. Peng, and Z. Wang, “Multi-focus image fusion with a deep convolutional neural
network,” Inf. Fusion, vol. 36, pp. 191–207, 2017.
[17] K. R. Prabhakar, V. S. Srikar, and R. V. Babu, “DeepFuse: A Deep Unsupervised Approach for
Exposure Fusion with Extreme Exposure Image Pairs,” in Proceedings of the IEEE International
Conference on Computer Vision, 2017, pp. 4724–4732.
[18] H. Li, X.-J. Wu, and J. Kittler, “Infrared and Visible Image Fusion using a Deep Learning
Framework,” in arXiv preprint arXiv:1804.06992, 2018.
[19] A. Z. Karen Simonyan, “VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION,” in
ICLR 2015, 2015, vol. 5, no. 3, pp. 345–358.
[20] S. Li, X. Kang, and J. Hu, “Image fusion with guided filtering,” IEEE Trans. Image Process.,
vol. 22, no. 7, pp. 2864–2875, 2013.
[21] G. Liu and S. Yan, “Latent Low-Rank Representation for Subspace Segmentation and Feature
Extraction,” in Proceedings of the IEEE International Conference on Computer Vision, 2011, pp.
1615–1621.
[22] G. Liu, Z. Lin, and Y. Yu, “Robust Subspace Segmentation by Low-Rank Representation,” in
Proceedings of the 27th International Conference on Machine Learning, 2010, pp. 663–670.
[23] H. Li, Training data,
https://github.com/exceptionLi/imagefusion_deepdecomposition/tree/master/training_data. 2018.
[24] H. Li, Testing data,
https://github.com/exceptionLi/imagefusion_deepdecomposition/tree/master/IV_images. 2018.
[25] J. Ma, Z. Zhou, B. Wang, and H. Zong, “Infrared and visible image fusion based on visual saliency
map and weighted least square optimization,” Infrared Phys. Technol., vol. 82, pp. 8–17, 2017.
[26] Alexander Toet et al., TNO Image Fusion Dataset.
https://figshare.com/articles/TN_Image_Fusion_Dataset/1008029. 2014.
[27] B. K. Shreyamsha Kumar, “Image fusion based on pixel significance using cross bilateral filter,”
Signal, Image Video Process., 2015.
[28] B. K. Shreyamsha Kumar, “Multifocus and multispectral image fusion based on pixel significance
using discrete cosine harmonic wavelet transform,” Signal, Image Video Process., vol. 7, no. 6,
pp. 1125–1143, 2013.
[29] Q. Zhang, Y. Fu, H. Li, and J. Zou, “Dictionary learning method for joint sparse representation-
based image fusion,” Opt. Eng., vol. 52, no. 5, p. 057006, 2013.
[30] J. Ma, C. Chen, C. Li, and J. Huang, “Infrared and visible image fusion via gradient transfer and
total variation minimization,” Inf. Fusion, vol. 31, pp. 100–109, 2016.
[31] C. S. Xydeas and V. Petrovic, “Objective image fusion performance measure,” Electron. Lett.,
2000.
[32] M. Haghighat and M. A. Razian, “Fast-FMI: Non-reference image fusion metric,” in 8th IEEE
International Conference on Application of Information and Communication Technologies, AICT 2014 -
Conference Proceedings, 2014.
[33] K. Ma, S. Member, K. Zeng, and Z. Wang, “Perceptual Quality Assessment for Multi-Exposure Image
Fusion,” IEEE Trans. Image Process., vol. 24, no. 11, pp. 3345–3356, 2015.