6
Supplemental Material Yifan Wang 1,2 Federico Perazzi 2 Brian McWilliams 2 Alexander Sorkine-Hornung 2 Olga Sorkine-Hornung 1 Christopher Schroers 2 1 ETH Zurich 2 Disney Research module layers # out v {0,1,2} CONV(3,3) 64 d2 CONV(3,3)-RELU 64 CONV(3,3)-RELU-AVGPOOL d1 CONV(3,3)-RELU 64 CONV(3,3)-RELU-AVGPOOL d0 CONV(3,3)-RELU 128 CONV(3,3)-RELU-AVGPOOL CONV(3,3)-RELU 256 CONV(3,3)-RELU-AVGPOOL CONV(3,3)-RELU-AVGPOOL 512 CONV(3,3)-RELU-AVGPOOL CONV(3,3) Table 1: Detailed architecture specification for discriminator. PSNR S14 B100 U100 DIV2K 2× ProSR 33.93 32.32 32.81 36.42 ProGanSR 32.49 30.94 31.36 34.78 4× ProSR 28.90 27.77 26.77 30.79 ProGanSR 26.82 25.71 25.13 28.61 8× ProSR 25.23 24.97 22.94 27.17 ProGanSR 23.56 23.10 20.71 24.75 Table 2: PSNR values of ProSR and ProGanSR. 1. Extended Evaluation Comprehensive Quantitative Comparison. In addition to the PSNR comparison, we provide the SSIM values of the proposed ProSRwith other state-of-the-art approaches in Table 4. When evaluating ProGanSR in terms of PSNR, we observe a drop of up to 2dB, as shown in 2, which aligns with the measures reported in other GAN-extended SISR methods [6, 8]. Visual Comparison. We present more results in Figure 2- 4. Figure 2 and Figure 1 show more ProGanSR results and the hallucinated fine details; Figure 4 and Figure 3 show more examples of ProSRin comparison with other ap- proaches. 2. Implementation Details 2.1. Network Specification In Table 3 we list the detailed architecture of the final proposed ProSR and ProSR s . In the very deep model ProSR , local and pyramidal residual links are adopted to facilitate gradient flow, hence compression units of each DCU always compress to the features to a fixed number e.g. 160 in order to enable element-wise addition in resid- ual links; whereas in ProSR s , we use simple sequential con- nection without residual links, and set the compression rate of each DCU to 0.4. The growth-rate used in ProSR and ProSR s is 40 and 12 respectively. The number parameters in ProSR for upsampling ratio 2×∼ 8× are 9.5M, 13.4M and 15.5M, while in ProSR s these are 1.1M, 2.1M and 3.1M respectively. ProGanSRuses ProSR as the generator; the architecture for discriminator is specified in Table 1. Depending on the upsampling ratio, the input is downsampled by a factor of 64, 32 and 16. The final output of the discriminator is a 512-channel feature. 2.2. Training Details We train our network using DIV2K dataset [1], which consists of 800 high resolution training image (2K). The training patch size is 48 × 48, 40 × 40, 32 × 32 for up- sample ratio 2×, 4× and 8× respectively. Random crop- ping, flipping and transpose is applied as data augmentation. Additionally we subtract the mean from DIV2K dataset and rescale the image to range [-127.5, -127.5] as in [7]. Adam optimizer [3] is used with initial learning rate 0.0001 and the 1st and 2nd momentum is set to 0.9 and 0.999 re- spectively. The learning rate is halved after when the evalu- ation result hasn’t improved for 40 epochs. 1

Supplemental Material - igl · Supplemental Material Yifan Wang1,2 Federico Perazzi2 Brian McWilliams2 Alexander Sorkine-Hornung2 Olga Sorkine-Hornung1 Christopher Schroers2 1ETH

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Supplemental Material - igl · Supplemental Material Yifan Wang1,2 Federico Perazzi2 Brian McWilliams2 Alexander Sorkine-Hornung2 Olga Sorkine-Hornung1 Christopher Schroers2 1ETH

Supplemental Material

Yifan Wang1,2 Federico Perazzi2 Brian McWilliams2

Alexander Sorkine-Hornung2 Olga Sorkine-Hornung1 Christopher Schroers2

1ETH Zurich 2Disney Research

module layers # out

v{0,1,2} CONV(3,3) 64

d2CONV(3,3)-RELU

64CONV(3,3)-RELU-AVGPOOL

d1CONV(3,3)-RELU

64CONV(3,3)-RELU-AVGPOOL

d0

CONV(3,3)-RELU128

CONV(3,3)-RELU-AVGPOOL

CONV(3,3)-RELU256

CONV(3,3)-RELU-AVGPOOL

CONV(3,3)-RELU-AVGPOOL

512CONV(3,3)-RELU-AVGPOOL

CONV(3,3)

Table 1: Detailed architecture specification for discriminator.

PSNR S14 B100 U100 DIV2K

2×ProSRℓ 33.93 32.32 32.81 36.42

ProGanSR 32.49 30.94 31.36 34.78

4×ProSRℓ 28.90 27.77 26.77 30.79

ProGanSR 26.82 25.71 25.13 28.61

8×ProSRℓ 25.23 24.97 22.94 27.17

ProGanSR 23.56 23.10 20.71 24.75

Table 2: PSNR values of ProSR and ProGanSR.

1. Extended Evaluation

Comprehensive Quantitative Comparison. In addition

to the PSNR comparison, we provide the SSIM values of

the proposed ProSRwith other state-of-the-art approaches

in Table 4. When evaluating ProGanSR in terms of PSNR,

we observe a drop of up to 2dB, as shown in 2, which aligns

with the measures reported in other GAN-extended SISR

methods [6, 8].

Visual Comparison. We present more results in Figure 2-

4. Figure 2 and Figure 1 show more ProGanSRℓ results

and the hallucinated fine details; Figure 4 and Figure 3

show more examples of ProSRin comparison with other ap-

proaches.

2. Implementation Details

2.1. Network Specification

In Table 3 we list the detailed architecture of the final

proposed ProSRℓ and ProSRs. In the very deep model

ProSRℓ, local and pyramidal residual links are adopted to

facilitate gradient flow, hence compression units of each

DCU always compress to the features to a fixed number

e.g. 160 in order to enable element-wise addition in resid-

ual links; whereas in ProSRs, we use simple sequential con-

nection without residual links, and set the compression rate

of each DCU to 0.4. The growth-rate used in ProSRℓ and

ProSRs is 40 and 12 respectively. The number parameters

in ProSRℓ for upsampling ratio 2× ∼ 8× are 9.5M, 13.4M

and 15.5M, while in ProSRs these are 1.1M, 2.1M and 3.1M

respectively.

ProGanSRuses ProSRℓ as the generator; the architecture

for discriminator is specified in Table 1. Depending on the

upsampling ratio, the input is downsampled by a factor of

64, 32 and 16. The final output of the discriminator is a

512-channel feature.

2.2. Training Details

We train our network using DIV2K dataset [1], which

consists of 800 high resolution training image (2K). The

training patch size is 48 × 48, 40 × 40, 32 × 32 for up-

sample ratio 2×, 4× and 8× respectively. Random crop-

ping, flipping and transpose is applied as data augmentation.

Additionally we subtract the mean from DIV2K dataset

and rescale the image to range [−127.5,−127.5] as in [7].

Adam optimizer [3] is used with initial learning rate 0.0001and the 1st and 2nd momentum is set to 0.9 and 0.999 re-

spectively. The learning rate is halved after when the evalu-

ation result hasn’t improved for 40 epochs.

1

Page 2: Supplemental Material - igl · Supplemental Material Yifan Wang1,2 Federico Perazzi2 Brian McWilliams2 Alexander Sorkine-Hornung2 Olga Sorkine-Hornung1 Christopher Schroers2 1ETH

ProSRℓ ProSRs

module layers # out module layers # out

v{0,1,2} CONV(3,3) 160 v{0,1,2} CONV(3,3) 24

u0

9 DCUs 160

u0

4 DCUs 138

CONV(3,3) 160 CONV(3,3) 138

SP-CONV(3,3) 160 SP-CONV(3,3) 138

u1

3 DCUs 160

u1

2 DCUs 142

CONV(3,3) 160 CONV(3,3) 142

SP-CONV(3,3) 160 SP-CONV(3,3) 142

u2

1 DCUs 160

u2

1 DCUs 143

CONV(3,3) 160 CONV(3,3) 143

SP-CONV(3,3) 160 SP-CONV(3,3) 143

r{0,1,2} CONV(3,3) 3 r{0,1,2} CONV(3,3) 3

Table 3: Detailed architecture specification for ProSRℓ and ProSRs. SP-CONV denotes sub-pixel convolution layer.

SSIM2× 4× 8×

S14 B100 U100 DIV2K S14 B100 U100 DIV2K S14 B100 U100 DIV2K

VDSR 0.913 0.896 0.914 0.939 0.768 0.726 0.754 0.822 0.614 0.583 0.571 0.699

DRCN 0.913 0.894 0.913 - 0.768 0.724 0.752 - 0.614 0.582 0.571 0.694

DRRN 0.914 0.897 0.919 0.941 0.772 0.728 0.764 0.827 0.622 0.587 0.583 0.704

LapSRN 0.913 0.895 0.959 0.942 0.772 0.727 0.756 0.825 0.620 0.586 0.581 0.704

MsLapSRN 0.915 0.898 0.919 0.942 0.774 0.731 0.768 0.829 0.629 0.592 0.598 0.711

EnhanceNet - - - - 0.778 0.734 0.771 - - - - -

SRDenseNet - - - - 0.778 0.734 0.782 - - - - -

ProSRs (Ours) 0.916 0.898 0.921 0.943 0.782 0.736 0.783 0.836 0.641 0.598 0.616 0.721

EDSR 0.920 0.901 0.935 0.949 0.788 0.742 0.803 0.845 0.645 0.601 0.621 0.724

MDSR 0.920 0.901 0.935 0.948 0.786 0.742 0.804 0.845 - - - -

ProSRℓ (Ours) 0.921 0.902 0.935 0.948 0.790 0.743 0.809 0.846 0.652 0.606 0.645 0.731

Table 4: SSIM evaluation compared with the state-of-the-art approaches.

References

[1] E. Agustsson and R. Timofte. Ntire 2017 challenge on single

image super-resolution: Dataset and study. In The IEEE Con-

ference on Computer Vision and Pattern Recognition (CVPR)

Workshops, July 2017. 1

[2] J. Kim, J. Kwon Lee, and K. Mu Lee. Accurate image super-

resolution using very deep convolutional networks. In Pro-

ceedings of the IEEE Conference on Computer Vision and Pat-

tern Recognition, pages 1646–1654, 2016. 5, 6

[3] D. P. Kingma and J. Ba. Adam: A method for stochastic opti-

mization. CoRR, abs/1412.6980, 2014. 1

[4] W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang. Deep

laplacian pyramid networks for fast and accurate super-

resolution. In IEEE Conference on Computer Vision and Pat-

tern Recognition, 2017. 5, 6

[5] W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang. Fast and

accurate image super-resolution with deep laplacian pyramid

networks. arXiv preprint arXiv:1710.01992, 2017. 5, 6

[6] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham,

A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al. Photo-

realistic single image super-resolution using a generative ad-

versarial network. arXiv preprint arXiv:1609.04802, 2016. 1,

4

[7] B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee. Enhanced

deep residual networks for single image super-resolution. In

The IEEE Conference on Computer Vision and Pattern Recog-

nition (CVPR) Workshops, July 2017. 1, 5, 6

[8] M. S. M. Sajjadi, B. Scholkopf, and M. Hirsch. Enhancenet:

Single image super-resolution through automated texture syn-

thesis. CoRR, abs/1612.07919, 2016. 1, 4

[9] Y. Tai, J. Yang, and X. Liu. Image super-resolution via deep

recursive residual network. In Proceedings of the IEEE Con-

ference on Computer Vision and Pattern Recognition, 2017. 5,

6

Page 3: Supplemental Material - igl · Supplemental Material Yifan Wang1,2 Federico Perazzi2 Brian McWilliams2 Alexander Sorkine-Hornung2 Olga Sorkine-Hornung1 Christopher Schroers2 1ETH

InputProSR

(Ours)

ProGanSR

(Ours)

Figure 1: 8× ProGanSR results compared to ProSR.

Page 4: Supplemental Material - igl · Supplemental Material Yifan Wang1,2 Federico Perazzi2 Brian McWilliams2 Alexander Sorkine-Hornung2 Olga Sorkine-Hornung1 Christopher Schroers2 1ETH

Input [6] [8]ProGanSR

(Ours)HR

Input [6] [8]ProGanSR

(Ours)HR

Input [8]ProGanSR

(Ours)HR

Input [8]ProGanSR

(Ours)HR

Figure 2: Comparison of 4× GAN results.

Page 5: Supplemental Material - igl · Supplemental Material Yifan Wang1,2 Federico Perazzi2 Brian McWilliams2 Alexander Sorkine-Hornung2 Olga Sorkine-Hornung1 Christopher Schroers2 1ETH

LR VDSR [2]

22.29 dB/0.566

DRRN [9]

22.34 dB/0.570

LapSRN [4]

22.31 dB/0.5685

MsLapSRN [5]

22.36 dB/0.574

EDSR [7]

22.65 dB/0.590

ProSRℓ (Ours)

22.81 dB/0.5984

HR

LR VDSR [2]

22.94 dB/0.6552

DRRN [9]

23.07 dB/0.6632

LapSRN [4]

23.01 dB/0.6602

MsLapSRN [5]

23.26 dB/0.6758

EDSR [7]

23.99 dB/0.7035

ProSRℓ (Ours)

24.29 dB/0.7152

HR

LR VDSR [2]

21.31 dB/0.600

DRRN [9]

22.05 dB/0.611

LapSRN [4]

21.45 dB/0.611

MsLapSRN [5]

24.91 dB/0.777

EDSR [7]

22.05 dB/0.645

ProSRℓ (Ours)

22.5 dB/0.671

HR

LR VDSR [2]

19.30 dB/0.572

DRRN [9]

19.27 dB/0.573

LapSRN [4]

19.22 dB/0.567

MsLapSRN [5]

19.41 dB/0.580

EDSR [7]

19.83 dB/0.612

ProSRℓ (Ours)

21.02 dB/0.689

HR

LR VDSR [2]

15.27 dB/0.600

DRRN [9]

15.39 dB/0.627

LapSRN [4]

15.31 dB/0.628

MsLapSRN [5]

15.53 dB/0.674

EDSR [7]

16.13 dB/0.710

ProSRℓ (Ours)

16.83 dB/0.773

HR

Figure 3: Comparison of 8× results between ProSRℓ (Ours) with other existing PSNR-driven models.

Page 6: Supplemental Material - igl · Supplemental Material Yifan Wang1,2 Federico Perazzi2 Brian McWilliams2 Alexander Sorkine-Hornung2 Olga Sorkine-Hornung1 Christopher Schroers2 1ETH

LR VDSR [2]

19.23 dB/0.581

DRRN [9]

19.51 dB/0.623

LapSRN [4]

19.34 dB/0.604

MsLapSRN [5]

19.77/0.633

EDSR [7]

20.50 dB/0.6873

ProSRℓ (Ours)

21.40 dB/0.722

HR

LR VDSR [2]

17.87 dB/0.580

DRRN [9]

18.58 dB/0.639

LapSRN [4]

18.20 dB/0.610

MsLapSRN [5]

18.73 dB/0.646

EDSR [7]

19.13/0.679

ProSRℓ (Ours)

19.82 dB/0.700

HR

LR VDSR [2]

21.94 dB/0.778

DRRN [9]

22.65 dB/0.808

LapSRN [4]

22.41 dB/0.799

MsLapSRN [5]

22.95/0.821

EDSR [7]

24.14 dB/0.861

ProSRℓ (Ours)

24.83 dB/0.873

HR

LR VDSR [2]

29.18 dB/0.902

DRRN [9]

29.15 dB/0.903

LapSRN [4]

29.18 dB/0.902

MsLapSRN [5]

29.31 dB/0.906

EDSR [7]

30.16 dB/0.919

ProSRℓ (Ours)

30.28 dB/0.919

HR

LR VDSR [2]

25.51 dB/0.794

DRRN [9]

25.63 dB/0.805

LapSRN [4]

25.61 dB/0.800

MsLapSRN [5]

25.91 dB/0.810

EDSR [7]

26.76 dB/0.836

ProSRℓ (Ours)

27.05 dB/0.840

HR

LR VDSR [2] DRRN [9] LapSRN [4] MsLapSRN [5]

21.49 dB/0.524

EDSR [7]

21.79 dB/0.553

ProSRℓ (Ours)

21.86 dB/0.557

HR

LR VDSR [2] DRRN [9] LapSRN [4] MsLapSRN [5]

21.49 dB/0.524

EDSR [7]

21.79 dB/0.553

ProSRℓ (Ours)

21.86 dB/0.557

HR

Figure 4: Comparison of 4× results between ProSRℓ (Ours) with other existing PSNR-driven models.