16
CS-F441: S ELECTED TOPICS FROM COMPUTER S CIENCE (DEEP L EARNING FOR NLP & CV) Lecture-KT-13: GoogLeNet, ResNet Dr. Kamlesh Tiwari, Assistant Professor, Department of Computer Science and Information Systems, BITS Pilani, Rajasthan-333031 INDIA Nov 13, 2019 (Campus @ BITS-Pilani July-Dec 2019)

CS-F441: Selected Topics from Computer Science (Deep ... · SCIENCE (DEEP LEARNING FOR NLP & CV) Lecture-KT-13: GoogLeNet, ResNet Dr. Kamlesh Tiwari, Assistant Professor, Department

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS-F441: Selected Topics from Computer Science (Deep ... · SCIENCE (DEEP LEARNING FOR NLP & CV) Lecture-KT-13: GoogLeNet, ResNet Dr. Kamlesh Tiwari, Assistant Professor, Department

CS-F441: SELECTED TOPICS FROM COMPUTER

SCIENCE (DEEP LEARNING FOR NLP & CV)

Lecture-KT-13: GoogLeNet, ResNet

Dr. Kamlesh Tiwari,Assistant Professor,

Department of Computer Science and Information Systems,BITS Pilani, Rajasthan-333031 INDIA

Nov 13, 2019 (Campus @ BITS-Pilani July-Dec 2019)

Page 2: CS-F441: Selected Topics from Computer Science (Deep ... · SCIENCE (DEEP LEARNING FOR NLP & CV) Lecture-KT-13: GoogLeNet, ResNet Dr. Kamlesh Tiwari, Assistant Professor, Department

Recap: ImageNet ILSVRC1

(2009) 22K category, 14M imagesChallenge 1000 class, 1431167 imagesHoG, LBP, SVM ...

1Imagenet large scale visual recognition challenge http://www.image-net.org/challenges/LSVRC/

STCS-DL4NLP&CV (CS-F441) Campus @ BITS-Pilani Lecture-KT-13 (Nov 13, 2019) 2 / 13

Page 3: CS-F441: Selected Topics from Computer Science (Deep ... · SCIENCE (DEEP LEARNING FOR NLP & CV) Lecture-KT-13: GoogLeNet, ResNet Dr. Kamlesh Tiwari, Assistant Professor, Department

GoogLeNet 2

Recall scale invariance in SIFTMultiple filters of different size is agood ideaWith W × H × D input and F × F × DFilter and S = 1 and no padding,output is of size(W − F + 1)× (H − F + 1)Each value needs F × F × Dcomputation

Con we reduce this computation a bit?Idea is to have 1 × 1 computation

2Cite: 17022, Going deeper with convolutions, Szegedy, Christian and Liu, Wei and Jia, Yangqing and Sermanet, Pierre

and Reed, Scott and Anguelov, Dragomir and Erhan, Dumitru and Vanhoucke, Vincent and Rabinovich, Andrew, In: Proceedingsof the IEEE conference on computer vision and pattern recognition pages 1–9, CVPR-2015

STCS-DL4NLP&CV (CS-F441) Campus @ BITS-Pilani Lecture-KT-13 (Nov 13, 2019) 3 / 13

Page 4: CS-F441: Selected Topics from Computer Science (Deep ... · SCIENCE (DEEP LEARNING FOR NLP & CV) Lecture-KT-13: GoogLeNet, ResNet Dr. Kamlesh Tiwari, Assistant Professor, Department

GoogLeNet 2

Recall scale invariance in SIFTMultiple filters of different size is agood ideaWith W × H × D input and F × F × DFilter and S = 1 and no padding,output is of size(W − F + 1)× (H − F + 1)Each value needs F × F × Dcomputation

Con we reduce this computation a bit?Idea is to have 1 × 1 computation

2Cite: 17022, Going deeper with convolutions, Szegedy, Christian and Liu, Wei and Jia, Yangqing and Sermanet, Pierre

and Reed, Scott and Anguelov, Dragomir and Erhan, Dumitru and Vanhoucke, Vincent and Rabinovich, Andrew, In: Proceedingsof the IEEE conference on computer vision and pattern recognition pages 1–9, CVPR-2015

STCS-DL4NLP&CV (CS-F441) Campus @ BITS-Pilani Lecture-KT-13 (Nov 13, 2019) 3 / 13

Page 5: CS-F441: Selected Topics from Computer Science (Deep ... · SCIENCE (DEEP LEARNING FOR NLP & CV) Lecture-KT-13: GoogLeNet, ResNet Dr. Kamlesh Tiwari, Assistant Professor, Department

1 × 1 convolution

1 × 1 is 1 × 1 × DThey produce on output placeBy using D1 such 1 × 1 convolutionoutput becomes F × F × D1

We have D1 < D

STCS-DL4NLP&CV (CS-F441) Campus @ BITS-Pilani Lecture-KT-13 (Nov 13, 2019) 4 / 13

Page 6: CS-F441: Selected Topics from Computer Science (Deep ... · SCIENCE (DEEP LEARNING FOR NLP & CV) Lecture-KT-13: GoogLeNet, ResNet Dr. Kamlesh Tiwari, Assistant Professor, Department

Inception Block: Multiple convolutions

1 1 × 1 convolution2 1 × 1 convolution followed by 3 × 33 1 × 1 convolution followed by 5 × 54 3 × 3 maxpool followed by 1 × 15 Appropriate padding is done to make things of same size

STCS-DL4NLP&CV (CS-F441) Campus @ BITS-Pilani Lecture-KT-13 (Nov 13, 2019) 5 / 13

Page 7: CS-F441: Selected Topics from Computer Science (Deep ... · SCIENCE (DEEP LEARNING FOR NLP & CV) Lecture-KT-13: GoogLeNet, ResNet Dr. Kamlesh Tiwari, Assistant Professor, Department

GoogLeNet

1 Input is RGB 229 × 2292 Each inception module have very specific configuration.

(3a) 192 × 28 × 28 64 96 128 16 32 32(3b) 256 × 28 × 28 28 128 192 32 96 61(4a) 48 × 14 × 14 192 96 208 16 48 96(4b) 512 × 14 × 14 160 112 224 24 64 64(4c) 512 × 14 × 14 128 128 256 24 64 64(4d) 512 × 14 × 14 112 144 228 32 64 64(4e) 528 × 14 × 14 256 160 320 32 128 128(5a) 832 × 7 × 7 256 160 320 32 128 128(5b) 832 × 7 × 7 384 192 384 48 124 128

STCS-DL4NLP&CV (CS-F441) Campus @ BITS-Pilani Lecture-KT-13 (Nov 13, 2019) 6 / 13

Page 8: CS-F441: Selected Topics from Computer Science (Deep ... · SCIENCE (DEEP LEARNING FOR NLP & CV) Lecture-KT-13: GoogLeNet, ResNet Dr. Kamlesh Tiwari, Assistant Professor, Department

GoogLeNet

VGGNET has 512 × 7 × 7 size at pre-FC this was an issue toconnect with 4096GoogLeNet applies a average pool. Gives 49 time reduction. has1024 values onlyDropout and connect to 100012 times less connections as compared to AlexNet2 times more computation as compared to AlexNetVery high accuracy. Error reduced from 16% -to- 6.7%

STCS-DL4NLP&CV (CS-F441) Campus @ BITS-Pilani Lecture-KT-13 (Nov 13, 2019) 7 / 13

Page 9: CS-F441: Selected Topics from Computer Science (Deep ... · SCIENCE (DEEP LEARNING FOR NLP & CV) Lecture-KT-13: GoogLeNet, ResNet Dr. Kamlesh Tiwari, Assistant Professor, Department

ResNet 3

If a shallow neural network works well. What would happen if we addmore layers?

Deep network should also workwell (It would learn identity innew layers)

3Cite: 32871, Deep residual learning for image recognition, He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and

Sun, Jian, In: IEEE conference on computer vision and pattern recognition, pages 770–778, CVPR-2016

STCS-DL4NLP&CV (CS-F441) Campus @ BITS-Pilani Lecture-KT-13 (Nov 13, 2019) 8 / 13

Page 10: CS-F441: Selected Topics from Computer Science (Deep ... · SCIENCE (DEEP LEARNING FOR NLP & CV) Lecture-KT-13: GoogLeNet, ResNet Dr. Kamlesh Tiwari, Assistant Professor, Department

ResNet 3

If a shallow neural network works well. What would happen if we addmore layers?

Deep network should also workwell (It would learn identity innew layers)

3Cite: 32871, Deep residual learning for image recognition, He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and

Sun, Jian, In: IEEE conference on computer vision and pattern recognition, pages 770–778, CVPR-2016

STCS-DL4NLP&CV (CS-F441) Campus @ BITS-Pilani Lecture-KT-13 (Nov 13, 2019) 8 / 13

Page 11: CS-F441: Selected Topics from Computer Science (Deep ... · SCIENCE (DEEP LEARNING FOR NLP & CV) Lecture-KT-13: GoogLeNet, ResNet Dr. Kamlesh Tiwari, Assistant Professor, Department

ResNet 3

If a shallow neural network works well. What would happen if we addmore layers?

Deep network should also workwell (It would learn identity innew layers)

3Cite: 32871, Deep residual learning for image recognition, He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and

Sun, Jian, In: IEEE conference on computer vision and pattern recognition, pages 770–778, CVPR-2016

STCS-DL4NLP&CV (CS-F441) Campus @ BITS-Pilani Lecture-KT-13 (Nov 13, 2019) 8 / 13

Page 12: CS-F441: Selected Topics from Computer Science (Deep ... · SCIENCE (DEEP LEARNING FOR NLP & CV) Lecture-KT-13: GoogLeNet, ResNet Dr. Kamlesh Tiwari, Assistant Professor, Department

ResNetBut, in practice it was not happening

Why? Identity is one of the solution in large domain.

STCS-DL4NLP&CV (CS-F441) Campus @ BITS-Pilani Lecture-KT-13 (Nov 13, 2019) 9 / 13

Page 13: CS-F441: Selected Topics from Computer Science (Deep ... · SCIENCE (DEEP LEARNING FOR NLP & CV) Lecture-KT-13: GoogLeNet, ResNet Dr. Kamlesh Tiwari, Assistant Professor, Department

Let me tell this to the network

STCS-DL4NLP&CV (CS-F441) Campus @ BITS-Pilani Lecture-KT-13 (Nov 13, 2019) 10 / 13

Page 14: CS-F441: Selected Topics from Computer Science (Deep ... · SCIENCE (DEEP LEARNING FOR NLP & CV) Lecture-KT-13: GoogLeNet, ResNet Dr. Kamlesh Tiwari, Assistant Professor, Department

ResNet Comparison 4

152-layer deep net was better than human. Only 3.6% error rateImageNet Classification a

aBetter than the 2nd best system ImageNet Detection: 16% ImageNet Localization: 27% COCO Detection:

11% COCO Segmentation: 12%

4ResNet Winner of ILSVRC 2015 (Image Classification, Localization, Detection) Sik-Ho Tsang

STCS-DL4NLP&CV (CS-F441) Campus @ BITS-Pilani Lecture-KT-13 (Nov 13, 2019) 11 / 13

Page 15: CS-F441: Selected Topics from Computer Science (Deep ... · SCIENCE (DEEP LEARNING FOR NLP & CV) Lecture-KT-13: GoogLeNet, ResNet Dr. Kamlesh Tiwari, Assistant Professor, Department

ResNet Hyper-parameters and Issues

Training takes huge timeBatch NormalizationZavier/2 initializationSGD and momentumSmall learning rate 0.1Mini-batch size 256Weight decayNo Dropout

STCS-DL4NLP&CV (CS-F441) Campus @ BITS-Pilani Lecture-KT-13 (Nov 13, 2019) 12 / 13

Page 16: CS-F441: Selected Topics from Computer Science (Deep ... · SCIENCE (DEEP LEARNING FOR NLP & CV) Lecture-KT-13: GoogLeNet, ResNet Dr. Kamlesh Tiwari, Assistant Professor, Department

Thank You!

Thank you very much for your attention5!

Queries ?

5Credit: Prof. Mitesh Khapra, Deep Learning(CS7015): Lec 11.5 Image Classification for NPTEL

STCS-DL4NLP&CV (CS-F441) Campus @ BITS-Pilani Lecture-KT-13 (Nov 13, 2019) 13 / 13