11
Supplementary File: Image Super-Resolution Using Very Deep Residual Channel Attention Networks Yulun Zhang 1 , Kunpeng Li 1 , Kai Li 1 , Lichen Wang 1 , Bineng Zhong 1 , and Yun Fu 1,2 1 Department of ECE, Northeastern University, Boston, USA 2 College of Computer and Information Science, Northeastern University, Boston, USA Abstract. In this supplementary file, we first provide convergence anal- yses of three very deep networks to validate the effectiveness of our proposed residual in residual structure. Then we compare with over 20 state-of-the-art super-resolution methods with bicubic (BI) and blur- down (BD) degradation models. For BI model, we not only compare with generative models for 4× and 8× SR, but also show comparisons with generative adversarial networks based SR methods. For BD model, we provide more quantitative and visual results. For both BI and DB mod- els, we compare state-of-the-art methods with/without self-ensemble to further show the improvements of our proposed method. All quantitative results are evaluated in terms of PSNR and SSIM on standard datasets. 1 Experimental Results 1.1 Convergence Analyses We first conduct analyses on three very deep residual networks, which have 400 residual blocks (RB). As shown in Figure 1, the green line (R400 SS) denotes the network simply stacking 400 RBs. The blue line (R400 LSC) denotes the network with long skip connection (LSC). This can be viewed as MDSR [1], whose RB number is set as 400 and scaling factor is set as 2. The red line (R400 RIR) denotes the network that uses our proposed residual in residual (RIR) structure. All these three networks are trained from scratch. To save the training time, here we set batch size as 16 and input size as 32×32 (different from 48×48 in our main manuscript). From Figure 1, we mainly have three observations and corresponding explanations: (1) Simply stacking lots of RBs to construct very deep networks is not ap- plicable for image SR. We can see the green line (R400 SS) would start at a relatively low performance. Its training process is not stable. The performance after training 200 epochs is relatively low either. The main reason causing the result is that information flow could become more difficult in deeper networks. Furthermore, spatial information is very important for image SR. While, the tail convolutional (Conv) layers in very deep network would extract more high-level features, which is not sufficient for image SR.

Supplementary File: Image Super-Resolution Using Very Deep ...yulunzhang.com/papers/ECCV-2018-RCAN_supp.pdf · Supplementary File: Image Super-Resolution Using Very Deep Residual

  • Upload
    others

  • View
    18

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Supplementary File: Image Super-Resolution Using Very Deep ...yulunzhang.com/papers/ECCV-2018-RCAN_supp.pdf · Supplementary File: Image Super-Resolution Using Very Deep Residual

Supplementary File:Image Super-Resolution Using Very DeepResidual Channel Attention Networks

Yulun Zhang1, Kunpeng Li1, Kai Li1, Lichen Wang1,Bineng Zhong1, and Yun Fu1,2

1Department of ECE, Northeastern University, Boston, USA2College of Computer and Information Science, Northeastern University, Boston, USA

Abstract. In this supplementary file, we first provide convergence anal-yses of three very deep networks to validate the effectiveness of ourproposed residual in residual structure. Then we compare with over 20state-of-the-art super-resolution methods with bicubic (BI) and blur-down (BD) degradation models. For BI model, we not only compare withgenerative models for 4× and 8× SR, but also show comparisons withgenerative adversarial networks based SR methods. For BD model, weprovide more quantitative and visual results. For both BI and DB mod-els, we compare state-of-the-art methods with/without self-ensemble tofurther show the improvements of our proposed method. All quantitativeresults are evaluated in terms of PSNR and SSIM on standard datasets.

1 Experimental Results

1.1 Convergence Analyses

We first conduct analyses on three very deep residual networks, which have 400residual blocks (RB). As shown in Figure 1, the green line (R400 SS) denotes thenetwork simply stacking 400 RBs. The blue line (R400 LSC) denotes the networkwith long skip connection (LSC). This can be viewed as MDSR [1], whose RBnumber is set as 400 and scaling factor is set as 2. The red line (R400 RIR)denotes the network that uses our proposed residual in residual (RIR) structure.

All these three networks are trained from scratch. To save the training time,here we set batch size as 16 and input size as 32×32 (different from 48×48 inour main manuscript). From Figure 1, we mainly have three observations andcorresponding explanations:

(1) Simply stacking lots of RBs to construct very deep networks is not ap-plicable for image SR. We can see the green line (R400 SS) would start at arelatively low performance. Its training process is not stable. The performanceafter training 200 epochs is relatively low either. The main reason causing theresult is that information flow could become more difficult in deeper networks.Furthermore, spatial information is very important for image SR. While, the tailconvolutional (Conv) layers in very deep network would extract more high-levelfeatures, which is not sufficient for image SR.

Page 2: Supplementary File: Image Super-Resolution Using Very Deep ...yulunzhang.com/papers/ECCV-2018-RCAN_supp.pdf · Supplementary File: Image Super-Resolution Using Very Deep Residual

2 Yulun Zhang et al.

0 20 40 60 80 100 120 140 160 180 2005

10

15

20

25

30

35

40

Epoch

PS

NR

(dB

)

R400_RIRR400_LSCR400_SS

Fig. 1. Convergence analyses on three very deep residual networks. All the three net-works have 400 residual blocks (RB). The green line (R400 SS) denotes the networksimply stacking 400 RBs. The blue line (R400 LSC) denotes the network with long skipconnection (LSC). This can be viewed as MDSR [1], whose RB number is set as 400.The red line (R400 RIR) denotes the network that uses our proposed RIR structure.The curves are based on the PSNR values on Set5 (2×) in 200 epochs

(2) MDSR [1] with very large number of RBs would also hardly obtain ben-efits from very large network depth. We can also see the blue line (R400 LSC)would start at a relatively low performance, being similar as that of R400 SS. Thetraining process of R400 LSC is more stable than that of R400 SS. R400 LSCalso has higher PSNR value than that of R400 SS after 200 epochs. This differ-ence is mainly made by introducing LSC, which not only helps information flow,but also forwards low-level features to the tail Conv layers.

(3) Very deep networks with RIR structure has better performance and itstraining process is more stable. We can see the red line (R400 RIR) would startat a relatively high performance. It converges much faster than R400 SS andR400 LSC. After training 200 epochs, R400 RIR achieves the highest perfor-mance. These improvements mainly come from our proposed RIR structure,which encourages the information flow and forwards features of different levelsto the tail Conv layers.

In summary, the experiments above show that simply stacking RBs or intro-ducing LSC would not be applicable for very deep networks. To construct verydeep trainable network and obtain benefits from very large network depth, ourproposed RIR structure can be a good choice. The following experiments wouldfurther demonstrate the superior performance of our proposed residual channelattention network (RCAN) against other state-of-the-art methods.

Page 3: Supplementary File: Image Super-Resolution Using Very Deep ...yulunzhang.com/papers/ECCV-2018-RCAN_supp.pdf · Supplementary File: Image Super-Resolution Using Very Deep Residual

Supplementary File: Image Super-Resolution Using Very Deep RCAN 3

Table 1. Quantitative results with BI model. Self-ensemble is NOT used to furtherenhance the results. Best and second best results are highlighted and underlined

Method ScaleSet5 Set14 B100 Urban100 Manga109

PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM

Bicubic ×2 33.66 0.9299 30.24 0.8688 29.56 0.8431 26.88 0.8403 30.80 0.9339SelfExSR [2] ×2 36.50 0.9537 32.23 0.9036 31.18 0.8855 29.38 0.9032 35.82 0.9690SRCNN [3] ×2 36.66 0.9542 32.45 0.9067 31.36 0.8879 29.50 0.8946 35.60 0.9663FSRCNN [4] ×2 37.05 0.9560 32.66 0.9090 31.53 0.8920 29.88 0.9020 36.67 0.9710SCN [5] ×2 36.58 0.9540 32.36 0.9036 31.22 0.8833 29.47 0.8953 35.34 0.9654VDSR [6] ×2 37.53 0.9590 33.05 0.9130 31.90 0.8960 30.77 0.9140 37.22 0.9750LapSRN [7] ×2 37.52 0.9591 33.08 0.9130 31.08 0.8950 30.41 0.9101 37.27 0.9740DRRN [8] ×2 37.74 0.9590 33.23 0.9140 32.05 0.8970 31.23 0.9190 37.92 0.9760MemNet [9] ×2 37.78 0.9597 33.28 0.9142 32.08 0.8978 31.31 0.9195 37.72 0.9740EDSR [1] ×2 38.11 0.9602 33.92 0.9195 32.32 0.9013 32.93 0.9351 39.10 0.9773MDSR [1] ×2 38.11 0.9602 33.85 0.9198 32.29 0.9007 32.84 0.9347 38.96 0.9769SRMDNF [10] ×2 37.79 0.9601 33.32 0.9159 32.05 0.8985 31.33 0.9204 38.07 0.9761D-DBPN [11] ×2 38.09 0.9600 33.85 0.9190 32.27 0.9000 32.55 0.9324 38.89 0.9775RDN [12] ×2 38.24 0.9614 34.01 0.9212 32.34 0.9017 32.89 0.9353 39.18 0.9780RCAN (ours) ×2 38.27 0.9614 34.12 0.9216 32.41 0.9027 33.34 0.9384 39.44 0.9786

Bicubic ×3 30.39 0.8682 27.55 0.7742 27.21 0.7385 24.46 0.7349 26.95 0.8556SelfExSR [2] ×3 32.62 0.9094 29.16 0.8197 28.30 0.7843 26.45 0.8100 27.57 0.8210SRCNN [3] ×3 32.75 0.9090 29.30 0.8215 28.41 0.7863 26.24 0.7989 30.48 0.9117FSRCNN [4] ×3 33.18 0.9140 29.37 0.8240 28.53 0.7910 26.43 0.8080 31.10 0.9210SCN [5] ×3 32.59 0.9080 29.20 0.8175 28.31 0.7809 26.19 0.7995 30.12 0.9110VDSR [6] ×3 33.67 0.9210 29.78 0.8320 28.83 0.7990 27.14 0.8290 32.01 0.9340LapSRN [7] ×3 33.82 0.9227 29.87 0.8320 28.82 0.7980 27.07 0.8280 32.21 0.9350DRRN [8] ×3 34.03 0.9240 29.96 0.8350 28.95 0.8000 27.53 0.7640 32.74 0.9390MemNet [9] ×3 34.09 0.9248 30.00 0.8350 28.96 0.8001 27.56 0.8376 32.51 0.9369EDSR [1] ×3 34.65 0.9280 30.52 0.8462 29.25 0.8093 28.80 0.8653 34.17 0.9476MDSR [1] ×3 34.66 0.9280 30.44 0.8452 29.25 0.8091 28.79 0.8655 34.17 0.9473SRMDNF [10] ×3 34.12 0.9254 30.04 0.8382 28.97 0.8025 27.57 0.8398 33.00 0.9403RDN [12] ×3 34.71 0.9296 30.57 0.8468 29.26 0.8093 28.80 0.8653 34.13 0.9484RCAN (ours) ×3 34.74 0.9299 30.65 0.8482 29.32 0.8111 29.09 0.8702 34.44 0.9499

Bicubic ×4 28.42 0.8104 26.00 0.7027 25.96 0.6675 23.14 0.6577 24.89 0.7866SelfExSR [2] ×4 30.33 0.8623 27.40 0.7518 26.85 0.7108 24.82 0.7386 27.83 0.8660SRCNN [3] ×4 30.48 0.8628 27.50 0.7513 26.90 0.7101 24.52 0.7221 27.58 0.8555FSRCNN [4] ×4 30.72 0.8660 27.61 0.7550 26.98 0.7150 24.62 0.7280 27.90 0.8610SCN [5] ×4 30.39 0.8628 27.45 0.7501 26.86 0.7082 24.51 0.7239 27.32 0.8535VDSR [6] ×4 31.35 0.8830 28.02 0.7680 27.29 0.0726 25.18 0.7540 28.83 0.8870LapSRN [7] ×4 31.54 0.8850 28.19 0.7720 27.32 0.7270 25.21 0.7560 29.09 0.8900DRRN [8] ×4 31.68 0.8880 28.21 0.7720 27.38 0.7280 25.44 0.7640 29.46 0.8960SRResNet [13] ×4 32.05 0.9019 28.49 0.8184 27.58 0.7620 26.07 0.7839 N/A N/ASRGAN [13] ×4 29.40 0.8472 26.02 0.7397 25.16 0.6688 N/A N/A N/A N/AMemNet [9] ×4 31.74 0.8893 28.26 0.7723 27.40 0.7281 25.50 0.7630 29.42 0.8942SRDenseNet [14] ×4 32.02 0.8930 28.50 0.7780 27.53 0.7337 26.05 0.7819 N/A N/AENet-E [15] ×4 31.74 0.8869 28.42 0.7774 27.50 0.7326 25.66 0.7703 N/A N/AENet-PAT [15] ×4 28.56 0.8082 25.77 0.6784 24.93 0.6270 23.54 0.6936 N/A N/AEDSR [1] ×4 32.46 0.8968 28.80 0.7876 27.71 0.7420 26.64 0.8033 31.02 0.9148MDSR [1] ×4 32.50 0.8973 28.72 0.7857 27.72 0.7418 26.67 0.8041 31.11 0.9148SRMDNF [10] ×4 31.96 0.8925 28.35 0.7787 27.49 0.7337 25.68 0.7731 30.09 0.9024D-DBPN [11] ×4 32.47 0.8980 28.82 0.7860 27.72 0.7400 26.38 0.7946 30.91 0.9137RDN [12] ×4 32.47 0.8990 28.81 0.7871 27.72 0.7419 26.61 0.8028 31.00 0.9151RCAN (ours) ×4 32.63 0.9002 28.87 0.7889 27.77 0.7436 26.82 0.8087 31.22 0.9173

Bicubic ×8 24.40 0.6580 23.10 0.5660 23.67 0.5480 20.74 0.5160 21.47 0.6500SelfExSR [2] ×8 25.49 0.7030 23.92 0.6010 24.19 0.5680 21.81 0.5770 22.99 0.7190SRCNN [3] ×8 25.33 0.6900 23.76 0.5910 24.13 0.5660 21.29 0.5440 22.46 0.6950FSRCNN [4] ×8 20.13 0.5520 19.75 0.4820 24.21 0.5680 21.32 0.5380 22.39 0.6730SCN [5] ×8 25.59 0.7071 24.02 0.6028 24.30 0.5698 21.52 0.5571 22.68 0.6963VDSR [6] ×8 25.93 0.7240 24.26 0.6140 24.49 0.5830 21.70 0.5710 23.16 0.7250LapSRN [7] ×8 26.15 0.7380 24.35 0.6200 24.54 0.5860 21.81 0.5810 23.39 0.7350MemNet [9] ×8 26.16 0.7414 24.38 0.6199 24.58 0.5842 21.89 0.5825 23.56 0.7387MSLapSRN [16] ×8 26.34 0.7558 24.57 0.6273 24.65 0.5895 22.06 0.5963 23.90 0.7564EDSR [1] ×8 26.96 0.7762 24.91 0.6420 24.81 0.5985 22.51 0.6221 24.69 0.7841MDSR [1] ×8 26.97 0.7764 24.90 0.6416 24.76 0.5972 22.39 0.6175 24.55 0.7822D-DBPN [11] ×8 27.21 0.7840 25.13 0.6480 24.88 0.6010 22.73 0.6312 25.14 0.7987RCAN (ours) ×8 27.31 0.7878 25.23 0.6511 24.98 0.6058 23.00 0.6452 25.24 0.8029

Page 4: Supplementary File: Image Super-Resolution Using Very Deep ...yulunzhang.com/papers/ECCV-2018-RCAN_supp.pdf · Supplementary File: Image Super-Resolution Using Very Deep Residual

4 Yulun Zhang et al.

Table 2. Quantitative results with BI model. Self-ensemble is used to further enhanceresults except for RCAN. Best and second best results are highlighted and underlined

Method ScaleSet5 Set14 B100 Urban100 Manga109

PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM

Bicubic ×2 33.66 0.9299 30.24 0.8688 29.56 0.8431 26.88 0.8403 30.80 0.9339EDSR+ [1] ×2 38.20 0.9606 34.02 0.9204 32.37 0.9018 33.10 0.9363 39.28 0.9776MDSR+ [1] ×2 38.17 0.9605 33.92 0.9203 32.34 0.9014 33.03 0.9362 39.16 0.9774RDN+ [12] ×2 38.30 0.9616 34.10 0.9218 32.40 0.9022 33.09 0.9368 39.38 0.9784RCAN (ours) ×2 38.27 0.9614 34.12 0.9216 32.41 0.9027 33.34 0.9384 39.44 0.9786RCAN+ (ours) ×2 38.33 0.9617 34.23 0.9225 32.46 0.9031 33.54 0.9399 39.61 0.9788

Bicubic ×3 30.39 0.8682 27.55 0.7742 27.21 0.7385 24.46 0.7349 26.95 0.8556EDSR+ [1] ×3 34.76 0.9290 30.66 0.8481 29.32 0.8104 29.02 0.8685 34.52 0.9493MDSR+ [1] ×3 34.77 0.9288 30.53 0.8465 29.30 0.8101 28.99 0.8683 34.43 0.9486RDN+ [12] ×3 34.78 0.9300 30.67 0.8482 29.33 0.8105 29.00 0.8683 34.43 0.9498RCAN (ours) ×3 34.74 0.9299 30.65 0.8482 29.32 0.8111 29.09 0.8702 34.44 0.9499RCAN+ (ours) ×3 34.85 0.9305 30.76 0.8494 29.39 0.8122 29.31 0.8736 34.76 0.9513

Bicubic ×4 28.42 0.8104 26.00 0.7027 25.96 0.6675 23.14 0.6577 24.89 0.7866EDSR+ [1] ×4 32.62 0.8984 28.94 0.7901 27.79 0.7437 26.86 0.8080 31.43 0.9182MDSR+ [1] ×4 32.61 0.8982 28.82 0.7876 27.78 0.7425 26.86 0.8082 31.42 0.9175RDN+ [12] ×4 32.61 0.9003 28.92 0.7893 27.80 0.7434 26.82 0.8069 31.39 0.9184RCAN (ours) ×4 32.63 0.9002 28.87 0.7889 27.77 0.7436 26.82 0.8087 31.22 0.9173RCAN+ (ours) ×4 32.73 0.9013 28.98 0.7910 27.85 0.7455 27.10 0.8142 31.65 0.9208

Bicubic ×8 24.40 0.6580 23.10 0.5660 23.67 0.5480 20.74 0.5160 21.47 0.6500EDSR+ [1] ×8 27.19 0.7835 25.12 0.6467 24.90 0.6011 22.71 0.6284 24.97 0.7915MDSR+ [1] ×8 27.20 0.7847 25.08 0.6461 24.89 0.6007 22.71 0.6281 24.94 0.7910RCAN (ours) ×8 27.31 0.7878 25.23 0.6511 24.98 0.6058 23.00 0.6452 25.24 0.8029RCAN+ (ours) ×8 27.47 0.7913 25.40 0.6553 25.05 0.6077 23.22 0.6524 25.58 0.8092

1.2 Results with Bicubic (BI) Degradation Model

We compare our method with 22 state-of-the-art methods: SelfExSR [2], SR-CNN [3], FSRCNN [4], SCN [5], VDSR [6], LapSRN [7], DRRN [8], SRRes-Net [13], SRGAN [13], MemNet [9], SRDenseNet [14], ENet-E [15], ENet-PAT [15],MSLapSRN [16], EDSR [1], MDSR [1], SRMDNF [10], D-DBPN [11], RDN [12],EDSR+ [1], MDSR+ [1] and RDN+ [12]. Similar to [1, 12, 17], we also in-troduce self-ensemble strategy to further improve our RCAN and denote theself-ensembled one as RCAN+. For fair comparison, we would compare ourRCAN with other methods without self-ensemble and RCAN+ with other self-ensembled methods respectively.

Quantitative results without self-ensemble. Table 1 shows quantitativecomparisons for 2×, 3×, 4×, and 8× SR. When compared with all previousmethods, our RCAN performs the best on all the standard datasets with allscaling factors. It should also be known that SRMDNF [10] and D-DBPN [11]not only use 800 DIV2K training images as we use, but also introduce very largenumber of extra images for training. Even we use much less number of trainingimages (e.g., 800 DIV2K images), our RCAN can still outperform these leadingmethods. One reason is that the very deep network of our RCAN has strongerrepresentational property.

Quantitative results with self-ensemble. Table 2 shows quantitativecomparisons using self-ensemble [1, 12, 17]. We also add RCAN in the table toshow its superior performance. As we can see that RCAN+ performs the best onall the datasets with all scaling factors. Even without self-ensemble, our RCANstill performs the second best in several cases, where our RCAN outperforms

Page 5: Supplementary File: Image Super-Resolution Using Very Deep ...yulunzhang.com/papers/ECCV-2018-RCAN_supp.pdf · Supplementary File: Image Super-Resolution Using Very Deep Residual

Supplementary File: Image Super-Resolution Using Very Deep RCAN 5

other self-ensembled methods (e.g., EDSR+, MDSR+, and RDN+). These com-parisons demonstrate the effectiveness of our proposed RCAN and RCAN+.

Visual comparisons with generative network based methods. In Fig-ure 2, we show the visual comparisons with generative network based methodsfor 4× SR. We can see that most of previous leading SR methods would sufferfrom heavy blurring artifacts and cannot recover some details. For example, inimages “img 044” and “img 067”, all the compared methods suffer from heavyblurring artifacts. While, our proposed RCAN can alleviate the blurring artifactto some degree and recover more details.

In some other cases (e.g., images “img 012”, “img 092”, and “img 93”), allthe compared methods generate wrong details, such as the lines with wrong di-rections. In contrary, our proposed RCAN can recover lines with right directions.

In addition to the edges and lines above, texture can be more difficult for SRmethods to recover. For example, in image “img 076”, all the compared methodsgenerate ruleless textures with some blurring artifacts, failing to recover thetextures of bricks. However, our proposed RCAN can reconstruct finer textures,being more faithful to the ground truth.

These comparisons show that previous leading generative network basedmethods may suffer from heavy blurring artifacts and cannot recover some de-tails. It indicates that these methods have limited ability to handle more chal-lenging cases. On the other hand, our proposed RCAN can alleviate the blurringartifacts to great degree and can recover more details. This superior visual re-sults are not only in consistency with the quantitative results in Table 1, butalso demonstrate the better representational ability of our RCAN.

Visual comparisons with GAN-based methods. Recently, with the us-age of generative adversarial network (GAN), some image SR methods have beenproposed, such as SRGAN [13] and ENet [15]. They claim that their methodscan enhance the visual results, even though the quantitative results are relativelylow. In Figure 3, we compare our RCAN with GAN-based methods visually.

Specifically, we compare 4× SR results with SRResNet [13], SRResNet VGG22 [13],SRGAN MSE [13], SRGAN VGG22 [13], SRGAN VGG54 [13], ENet E [15], andENet PAT [15]. SResNet, SRResNet VGG22, and ENet E only use generativenetworks. We also take their visual results into comparisons. The remainingmethods use both generative and adversarial networks with different contentlosses. Readers can refer to [13, 15] for more details. All the results are releasedby their authors.

We can see from Figure 3 that SRResNet and ENet E would suffer from heavyblurring artifacts, being similar as those (see Figure 2) by generative networkbased methods. SRResNet VGG22 would suffer from checkerboard artifacts.GAN-based methods (e.g., SRGAN MSE, SRGAN VGG22, SRGAN VGG54,and ENet PAT) would produce unpleasing artifacts. For example, in image “bar-bara”, all the compared methods fail to recover the book edges. However, ourproposed RCAN recovers much better results. In image “253027”, all the com-pared methods generate heavy artifacts or wrong details for the horse mouth.

Page 6: Supplementary File: Image Super-Resolution Using Very Deep ...yulunzhang.com/papers/ECCV-2018-RCAN_supp.pdf · Supplementary File: Image Super-Resolution Using Very Deep Residual

6 Yulun Zhang et al.

While, our proposed RCAN recovers the correct texture of the horse mouth.Similar observations can be found in other images.

These comparisons show that GAN-based methods don’t perform well andusually produce unpleasing artifacts. One main reason is that it’s hard to controlthe adversarial training. According to these observations and analyses, we findthat SR results with adversarial training may not be faithful to the groundtruth. This conclusion is in consistency with the observation in LapSRN [7].These comparison further demonstrate the effectiveness of our proposed RCANwith more powerful representational ability.

Visual comparisons for 8× SR. In Figure 4, we further provide visualcomparisons for 8× SR, a more challenging case in image SR. When the scal-ing factor is very large, most compared methods would suffer from heavy blur-ring artifacts and cannot recover details (e.g., see images “img 008”, “img 014”,“img 023”, and “Donburakokko”). Even in such challenging cases, our proposedRCAN can recover more details than all the compared methods. In images“AkkeraKanjinchou” and “ShimatteIkouze vol01”, our RCAN recovers muchbetter results with more clear edges and other details. These comparisons demon-strate that our RCAN can obtain more useful information and produce finerresults.

1.3 Results with Blur-downscale (BD) Degradation Model

We also show more results for 3× SR with BD degradation model.Quantitative results. As shown in Table 3, our RCAN+ achieves the best

performance on each dataset. Even without self-ensemble, our RCAN obtainsthe second best results in most cases.

Visual results. We further show visual results in Figure 5. As we can see,our proposed RCAN can recover more details than other compared methods.This demonstrates the effectiveness of our RCAN with BD degradation model.

Table 3. Quantitative with BD degradation model. Best and second best results arehighlighted and underlined

Method ScaleSet5 Set14 B100 Urban100 Manga109

PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM

Bicubic ×3 28.78 0.8308 26.38 0.7271 26.33 0.6918 23.52 0.6862 25.46 0.8149SPMSR [18] ×3 32.21 0.9001 28.89 0.8105 28.13 0.7740 25.84 0.7856 29.64 0.9003SRCNN [3] ×3 32.05 0.8944 28.80 0.8074 28.13 0.7736 25.70 0.7770 29.47 0.8924FSRCNN [4] ×3 26.23 0.8124 24.44 0.7106 24.86 0.6832 22.04 0.6745 23.04 0.7927VDSR [6] ×3 33.25 0.9150 29.46 0.8244 28.57 0.7893 26.61 0.8136 31.06 0.9234IRCNN [19] ×3 33.38 0.9182 29.63 0.8281 28.65 0.7922 26.77 0.8154 31.15 0.9245SRMDNF [10] ×3 34.01 0.9242 30.11 0.8364 28.98 0.8009 27.50 0.8370 32.97 0.9391RDN [12] ×3 34.58 0.9280 30.53 0.8447 29.23 0.8079 28.46 0.8582 33.97 0.9465RDN+ [12] ×3 34.70 0.9289 30.64 0.8463 29.30 0.8093 28.67 0.8612 34.34 0.9483RCAN (ours) ×3 34.70 0.9288 30.63 0.8462 29.32 0.8093 28.81 0.8647 34.38 0.9483RCAN+ (ours) ×3 34.83 0.9296 30.76 0.8479 29.39 0.8106 29.04 0.8682 34.76 0.9502

Page 7: Supplementary File: Image Super-Resolution Using Very Deep ...yulunzhang.com/papers/ECCV-2018-RCAN_supp.pdf · Supplementary File: Image Super-Resolution Using Very Deep Residual

Supplementary File: Image Super-Resolution Using Very Deep RCAN 7

Urban100 (4×):img 012

HR Bicubic SRCNN [3] FSRCNN [4] VDSR [6]PSNR/SSIM 22.53/0.6128 23.21/0.6719 23.28/0.6778 23.44/0.6958

LapSRN [7] MemNet [9] EDSR [1] SRMDNF [10] RCAN23.47/0.6977 23.59/0.7086 24.20/0.7572 23.64/0.7191 24.33/0.7657

Urban100 (4×):img 044

HR Bicubic SRCNN [3] FSRCNN [4] VDSR [6]PSNR/SSIM 26.91/0.7246 29.68/0.8093 30.16/0.8242 29.66/0.8297

LapSRN [7] MemNet [9] EDSR [1] SRMDNF [10] RCAN30.30/0.8393 29.20/0.8337 33.16/0.8961 31.29/0.8597 33.27/0.9055

Urban100 (4×):img 067

HR Bicubic SRCNN [3] FSRCNN [4] VDSR [6]PSNR/SSIM 16.96/0.7019 18.31/0.7947 18.16/0.7946 18.52/0.8241

LapSRN [7] MemNet [9] EDSR [1] SRMDNF [10] RCAN18.60/0.8367 18.95/0.8480 21.11/0.9021 18.99/0.8574 21.30/0.9127

Urban100 (4×):img 076

HR Bicubic SRCNN [3] FSRCNN [4] VDSR [6]PSNR/SSIM 21.57/0.6281 22.03/0.6777 22.01/0.6772 22.14/0.6910

LapSRN [7] MemNet [9] EDSR [1] SRMDNF [10] RCAN22.01/0.6917 22.11/0.6921 23.94/0.7740 22.46/0.7108 24.31/0.7896

Urban100 (4×):img 092

HR Bicubic SRCNN [3] FSRCNN [4] VDSR [6]PSNR/SSIM 16.58/0.4371 17.56/0.5413 17.70/0.5532 18.31/0.6001

LapSRN [7] MemNet [9] EDSR [1] SRMDNF [10] RCAN18.20/0.6078 18.59/0.6390 19.14/0.6773 18.58/0.6307 19.64/0.6964

Urban100 (4×):img 093

HR Bicubic SRCNN [3] FSRCNN [4] VDSR [6]PSNR/SSIM 23.62/0.8055 26.17/0.8757 26.75/0.8879 27.49/0.9135

LapSRN [7] MemNet [9] EDSR [1] SRMDNF [10] RCAN27.40/0.9136 27.93/0.9208 29.56/0.9326 28.13/0.9214 30.80/0.9420

Fig. 2. Visual comparison for 4× SR with BI model on Urban100 dataset. The bestresults are highlighted. These comparisons mainly show the effectiveness of our pro-posed RCAN against generative network based methods

Page 8: Supplementary File: Image Super-Resolution Using Very Deep ...yulunzhang.com/papers/ECCV-2018-RCAN_supp.pdf · Supplementary File: Image Super-Resolution Using Very Deep Residual

8 Yulun Zhang et al.

Set14 (4×):barbara

HR Bicubic SRResNet SRResNet VGG22 SRGAN MSEPSNR/SSIM 25.15/0.6870 25.89/0.7470 25.55/0.7287 23.74/0.6930

SRGAN VGG22 SRGAN VGG54 ENet E ENet PAT RCAN24.97/0.7062 24.91/0.6857 26.00/0.7472 24.69/0.6831 26.25/0.7629

Set14 (4×):comic

HR Bicubic SRResNet SRResNet VGG22 SRGAN MSEPSNR/SSIM 21.69/0.5836 23.53/0.7269 22.30/0.6721 21.07/0.6284

SRGAN VGG22 SRGAN VGG54 ENet E ENet PAT RCAN21.35/0.6429 20.44/0.5937 23.51/0.7225 20.65/0.5870 23.85/0.7516

Set14 (4×):ppt3

HR Bicubic SRResNet SRResNet VGG22 SRGAN MSEPSNR/SSIM 21.91/0.8190 26.47/0.9430 25.39/0.9360 24.01/0.8995

SRGAN VGG22 SRGAN VGG54 ENet E ENet PAT RCAN25.07/0.9229 24.22/0.8976 26.39/0.9421 24.47/0.9003 27.16/0.9591

B100 (4×):78004

HR Bicubic SRResNet SRResNet VGG22 SRGAN MSEPSNR/SSIM 24.48/0.6487 26.34/0.7450 25.81/0.7187 25.28/0.7169

SRGAN VGG22 SRGAN VGG54 ENet E ENet PAT RCAN24.52/0.6695 24.65/0.6578 26.45/0.7492 24.44/0.6580 27.40/0.7808

B100 (4×):223061

HR Bicubic SRResNet SRResNet VGG22 SRGAN MSEPSNR/SSIM 23.87/0.5910 24.69/0.6695 23.91/0.6251 23.76/0.6125

SRGAN VGG22 SRGAN VGG54 ENet E ENet PAT RCAN23.63/0.6123 22.71/0.5720 24.57/0.6595 22.13/0.5626 25.07/0.7178

B100 (4×):253027

HR Bicubic SRResNet SRResNet VGG22 SRGAN MSEPSNR/SSIM 21.31/0.6277 22.59/0.7032 22.02/0.6577 21.91/0.6797

SRGAN VGG22 SRGAN VGG54 ENet E ENet PAT RCAN21.53/0.6418 21.02/0.5854 22.62/0.7042 20.48/0.5837 23.09/0.7182

Fig. 3. Visual comparison for 4× SR with BI model on Set14 and B100 datasets.The best results are highlighted. SRResNet, SRResNet VGG22, SRGAN MSE, SR-GAN VGG22, and SRGAN VGG54 are proposed in [13], ENet E and ENet PAT areproposed in [15]. These comparisons mainly show the effectiveness of our proposedRCAN against GAN based methods

Page 9: Supplementary File: Image Super-Resolution Using Very Deep ...yulunzhang.com/papers/ECCV-2018-RCAN_supp.pdf · Supplementary File: Image Super-Resolution Using Very Deep Residual

Supplementary File: Image Super-Resolution Using Very Deep RCAN 9

Urban100 (8×):img 008

HR Bicubic SCN [3] SRCNN [4] VDSR [6]PSNR/SSIM 17.94/0.3001 18.15/0.3107 18.18/0.3349 18.18/0.3385

LapSRN [7] MemNet [9] MSLapSRN [1] EDSR [10] RCAN18.31/0.3519 18.29/0.3471 18.43/0.3683 18.70/0.4120 19.03/0.4423

Urban100 (8×):img 014

HR Bicubic SCN [3] SRCNN [4] VDSR [6]PSNR/SSIM 20.14/0.4822 20.25/0.4642 20.40/0.5048 20.39/0.5095

LapSRN [7] MemNet [9] MSLapSRN [1] EDSR [10] RCAN20.52/0.5172 20.56/0.5215 21.62/0.5468 20.86/0.5547 20.98/0.5687

Urban100 (8×):img 023

HR Bicubic SCN [3] SRCNN [4] VDSR [6]PSNR/SSIM 22.15/0.6113 22.77/0.6326 22.69/0.6614 22.72/0.6747

LapSRN [7] MemNet [9] MSLapSRN [1] EDSR [10] RCAN22.87/0.6866 23.01/0.6913 23.20/0.7000 24.02/0.7289 24.48/0.7488

Manga109 (8×):AkkeraKanjinchou

HR Bicubic SCN [3] SRCNN [4] VDSR [6]PSNR/SSIM 20.09/0.5252 20.97/0.5331 21.20/0.5781 21.59/0.6000

LapSRN [7] MemNet [9] MSLapSRN [1] EDSR [10] RCAN21.98/0.6181 22.14/0.6258 22.33/0.6394 23.42/0.6848 23.87/0.7027

Manga109 (8×):Donburakokko

HR Bicubic SCN [3] SRCNN [4] VDSR [6]PSNR/SSIM 20.03/0.5729 20.37/0.5581 20.46/0.6104 20.47/0.6231

LapSRN [7] MemNet [9] MSLapSRN [1] EDSR [10] RCAN20.64/0.6389 20.71/0.6448 20.77/0.6627 21.20/0.7063 21.64/0.7360

Manga109 (8×):ShimatteIkouze vol01

HR Bicubic SCN [3] SRCNN [4] VDSR [6]PSNR/SSIM 19.50/0.6631 20.53/0.6768 20.92/0.7345 21.31/0.7647

LapSRN [7] MemNet [9] MSLapSRN [1] EDSR [10] RCAN21.81/0.7865 22.20/0.8017 22.61/0.8230 23.25/0.8553 23.69/0.8774

Fig. 4. Visual comparison for 8× SR with BI model on Urban100 and Manga109datasets. The best results are highlighted. These comparisons mainly show the ef-fectiveness of our proposed RCAN against generative network based methods for verylarge scaling factor (e.g., 8)

Page 10: Supplementary File: Image Super-Resolution Using Very Deep ...yulunzhang.com/papers/ECCV-2018-RCAN_supp.pdf · Supplementary File: Image Super-Resolution Using Very Deep Residual

10 Yulun Zhang et al.

Urban100 (3×):img 015

HR Bicubic SPMSR [18] SRCNN [3] FSRCNN [6]PSNR/SSIM 23.21/0.6223 25.51/0.7267 25.23/0.7155 22.16/0.6198

VDSR [6] IRCNN [19] SRMDNF [10] RDN [12] RCAN26.47/0.7572 26.24/0.7559 27.05/0.7717 28.07/0.7929 28.48/0.7998

Urban100 (3×):img 033

HR Bicubic SPMSR [18] SRCNN [3] FSRCNN [6]PSNR/SSIM 24.45/0.6920 26.51/0.7650 26.20/0.7507 23.73/0.6905

VDSR [6] IRCNN [19] SRMDNF [10] RDN [12] RCAN27.63/0.8116 27.23/0.8022 28.33/0.8454 29.89/0.8873 30.27/0.8985

Urban100 (3×):img 046

HR Bicubic SPMSR [18] SRCNN [3] FSRCNN [6]PSNR/SSIM 22.98/0.6980 23.81/0.7643 23.78/0.7587 22.02/0.6837

VDSR [6] IRCNN [19] SRMDNF [10] RDN [12] RCAN24.06/0.7851 24.04/0.7844 24.51/0.8079 25.03/0.8367 25.55/0.8542

Urban100 (3×):img 047

HR Bicubic SPMSR [18] SRCNN [3] FSRCNN [6]PSNR/SSIM 20.27/0.6480 22.05/0.7760 21.72/0.7566 19.53/0.6707

VDSR [6] IRCNN [19] SRMDNF [10] RDN [12] RCAN22.32/0.7907 22.48/0.7978 23.07/0.8269 23.74/0.8554 23.86/0.8623

Urban100 (3×):img 059

HR Bicubic SPMSR [18] SRCNN [3] FSRCNN [6]PSNR/SSIM 20.69/0.6734 21.72/0.7472 21.54/0.7332 19.61/0.6555

VDSR [6] IRCNN [19] SRMDNF [10] RDN [12] RCAN22.18/0.7727 22.42/0.7800 22.84/0.7988 24.34/0.8419 24.98/0.8548

Urban100 (3×):img 098

HR Bicubic SPMSR [18] SRCNN [3] FSRCNN [6]PSNR/SSIM 19.93/0.4240 20.91/0.5585 20.85/0.5482 18.98/0.4300

VDSR [6] IRCNN [19] SRMDNF [10] RDN [12] RCAN21.14/0.5857 21.17/0.5858 21.56/0.6196 22.03/0.6566 22.33/0.6744

Fig. 5. Visual comparison for 3× SR with BD model on Urban100 dataset. Thebest results are highlighted. These comparisons mainly show the effectiveness of ourproposed RCAN against generative network based methods with BD model

Page 11: Supplementary File: Image Super-Resolution Using Very Deep ...yulunzhang.com/papers/ECCV-2018-RCAN_supp.pdf · Supplementary File: Image Super-Resolution Using Very Deep Residual

Supplementary File: Image Super-Resolution Using Very Deep RCAN 11

References

1. Lim, B., Son, S., Kim, H., Nah, S., Lee, K.M.: Enhanced deep residual networksfor single image super-resolution. In: CVPRW. (2017)

2. Huang, J.B., Singh, A., Ahuja, N.: Single image super-resolution from transformedself-exemplars. In: CVPR. (2015)

3. Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convo-lutional networks. TPAMI (2016)

4. Dong, C., Loy, C.C., Tang, X.: Accelerating the super-resolution convolutionalneural network. In: ECCV. (2016)

5. Wang, Z., Liu, D., Yang, J., Han, W., Huang, T.: Deep networks for image super-resolution with sparse prior. In: ICCV. (2015)

6. Kim, J., Kwon Lee, J., Mu Lee, K.: Accurate image super-resolution using verydeep convolutional networks. In: CVPR. (2016)

7. Lai, W.S., Huang, J.B., Ahuja, N., Yang, M.H.: Deep laplacian pyramid networksfor fast and accurate super-resolution. In: CVPR. (2017)

8. Tai, Y., Yang, J., Liu, X.: Image super-resolution via deep recursive residualnetwork. In: CVPR. (2017)

9. Tai, Y., Yang, J., Liu, X., Xu, C.: Memnet: A persistent memory network forimage restoration. In: ICCV. (2017)

10. Zhang, K., Zuo, W., Zhang, L.: Learning a single convolutional super-resolutionnetwork for multiple degradations. In: CVPR. (2018)

11. Haris, M., Shakhnarovich, G., Ukita, N.: Deep back-projection networks for super-resolution. In: CVPR. (2018)

12. Zhang, Y., Tian, Y., Kong, Y., Zhong, B., Fu, Y.: Residual dense network forimage super-resolution. In: CVPR. (2018)

13. Ledig, C., Theis, L., Huszar, F., Caballero, J., Cunningham, A., Acosta, A., Aitken,A., Tejani, A., Totz, J., Wang, Z., Shi, W.: Photo-realistic single image super-resolution using a generative adversarial network. In: CVPR. (2017)

14. Tong, T., Li, G., Liu, X., Gao, Q.: Image super-resolution using dense skip con-nections. In: ICCV. (2017)

15. Sajjadi, M.S., Scholkopf, B., Hirsch, M.: Enhancenet: Single image super-resolutionthrough automated texture synthesis. In: ICCV. (2017)

16. Lai, W.S., Huang, J.B., Ahuja, N., Yang, M.H.: Fast and accurate image super-resolution with deep laplacian pyramid networks. arXiv:1710.01992 (2017)

17. Timofte, R., Rothe, R., Van Gool, L.: Seven ways to improve example-based singleimage super resolution. In: CVPR. (2016)

18. Peleg, T., Elad, M.: A statistical prediction model based on sparse representationsfor single image super-resolution. TIP (2014)

19. Zhang, K., Zuo, W., Gu, S., Zhang, L.: Learning deep cnn denoiser prior for imagerestoration. In: CVPR. (2017)