38
Deep Learning for Computer Vision Deep Learning for Computer Vision Christian Bartz, Joseph Bethge

Deep Learning for Computer Vision · 2018-06-04 · Deep Learning for Computer Vision Slide #3 How to Build a Sheep Recognizer 1. get data if there is no data, generate it! 2. think

  • Upload
    others

  • View
    33

  • Download
    0

Embed Size (px)

Citation preview

Deep Learning for Computer Vision

Deep Learning for Computer Vision

Christian Bartz, Joseph Bethge

Deep Learning for Computer Vision

How can we detect the HPI sheep with a deep neural network?

Deep Learning for Computer Vision Slide #3

How to Build a Sheep Recognizer

1. get data

○ if there is no data, generate it!

2. think about a suitable architecture

○ RCNN, Yolo, SSD, …

3. train the network in your favourite Framework

○ in our case Chainer

4. evaluate the model in the wild

Recall: best practices as presented in the lecture (optimizers, activations,

data augmentation, …)

Deep Learning for Computer Vision Slide #4

Data GenerationIdea

● background images

Deep Learning for Computer Vision Slide #5

Data GenerationIdea

● background images

● locations for each image

(bounding boxes)

Deep Learning for Computer Vision Slide #6

Data GenerationIdea

● background images

● locations for each image

(bounding boxes)

● objects to insert

Deep Learning for Computer Vision Slide #7

Data GenerationIdea

● background images

● locations for each image

(bounding boxes)

● objects to insert

Deep Learning for Computer Vision Slide #8

Data GenerationIdea

● background images = i

● locations for each image

(bounding boxes) = L

● objects to insert = o

Advantages:

● number of images scales:

i * L * o

● ground truth is known from bounding boxes

But:

● not real data (occlusion, lighting, bad placement, ...)

Deep Learning for Computer Vision Slide #9

Object Detection and Recognition in the WildA Brief History

Deep Learning for Computer Vision Slide #10

Object Detection and Recognition in the WildA Brief History Before DL

● Sliding Windows

Compute a new regressed bounding box and

classification score for all sliding window positions

● Region Proposal Detection

Object proposal methods compute boxes which

potentially contain an object.

Deep Learning for Computer Vision Slide #11

Object Detection and Recognition in the WildA Brief History

RCNN

● selective search

● a deep CNN

● SVMs for each class

Deep Learning for Computer Vision

Fast RCNN

● Selective search for bbox on top conv5

● ROI Pooling instead of warped step

● Multi-task for classification & regression

● 25x faster than RCNN

Slide #12

Object Detection and Recognition in the WildA Brief History

Deep Learning for Computer Vision

Faster RCNN

● Region Proposal Network

● 250x faster than RCNN

Slide #13

Object Detection and Recognition in the WildA Brief History

Deep Learning for Computer Vision Slide #14

Object Detection and Recognition in the WildA Brief History

Mask RCNN

● ROI align is more accurate than ROI pooling

● FPN (Feature Pyramid Network) permits to get feature map at different scales

● Instance semantic bbox detection

Deep Learning for Computer Vision Slide #15

Object Detection and Recognition in the WildA Brief History

YOLO - ?

Deep Learning for Computer Vision Slide #16

Object Detection and Recognition in the WildA Brief History

YOLO - You Only Look Once

● one CNN trained end-to-end

● fixed base regions

● fast (around 45 FPS)

● 7 x 7 x 5 = 245 boxes

Confidence score

Class Prediction

Deep Learning for Computer Vision

SSD - Single Shot Multibox Detector

● combines feature maps from multiple scales

● predefined grid cells’ aspect ratio

Slide #17

Object Detection and Recognition in the WildA Brief History

Deep Learning for Computer Vision

SSD - Single Shot Multibox Detector

● combines feature maps from multiple scales

● predefined grid cells’ aspect ratio

Slide #18

Object Detection and Recognition in the WildA Brief History

Feature Extractor

Deep Learning for Computer Vision

SSD - Single Shot Multibox Detector

● combines feature maps from multiple scales

● predefined grid cells’ aspect ratio

● fast (45 - 59 FPS)

Slide #19

Object Detection and Recognition in the WildA Brief History

Multibox Predictor

Deep Learning for Computer Vision

SSD - Single Shot Multibox Detector

● one CNN trained end-to-end

● combines feature maps from multiple scales

● fast (45 - 59 FPS)

Slide #20

Object Detection and Recognition in the WildA Brief History

Post Processing

Deep Learning for Computer Vision Slide #21

Object Detection and Recognition in the WildSSD vs YOLO

Deep Learning for Computer Vision Slide #22

Object Detection and Recognition in the WildFaster RCNN vs SSD

Deep Learning for Computer Vision Slide #23

Implementation

● trained model using Chainer and ChainerCV

● Chainer is very similar to LENGTH

● trained two types of models

○ SSD 512

○ SSD 300

Deep Learning for Computer Vision Slide #24

Implementation

● trained on 4 1080Ti for 200 epochs

● how long did it take?

Deep Learning for Computer Vision Slide #25

Implementation

● trained on 4 1080Ti for 200 epochs

● training took around 7 hours for SSD512

Deep Learning for Computer Vision

Get the Code

Clone the code: https://github.com/HPI-DeepLearning/schaaaafrichter

● everything should be in the README.md● GPU installation takes some effort, but is an excellent preparation for

the challenge following soon

Slide #26

Deep Learning for Computer Vision

Tasks for Today

1. finish the sheep localizer and use it for live demo purposes or static

evaluation

2. have fun

Slide #27

Deep Learning for Computer Vision Slide #28

Your Task - Implement Sheep Localizer(sheeping/sheep_localizer.py)

1. load the model

2. preprocess input image

3. resize input image

4. perform forward pass through network

5. visualize detection result

Deep Learning for Computer Vision Slide #29

Task 1: Build the Modeldef build_model(self)

● determine correct model type

○ self.model_type = ["ssd300", "ssd512"]

● build correct model class

○ https://bartzi.de/chainercv/ssd

○ watch out: we only have 1 class → n_fg_class=1

● if necessary transfer model to GPU

○ https://bartzi.de/chainer/gpu

● transfer learned weights to model

○ with np.load(self.model_file) as f:

chainer.serializers.NpzDeserializer(f).load(model)

Deep Learning for Computer Vision Slide #30

Task 2: Preprocess Input Imagesdef preprocess(self, image)

● reorder channels to be in format: CHW

○ https://bartzi.de/numpy/transpose

● convert type of array to float32 (np.astype)

● subtract mean from image

○ why do we do that?

Deep Learning for Computer Vision Slide #31

Task 2: Preprocess Input Imagesdef preprocess(self, image)

● reorder channels to be in format: CHW

○ https://bartzi.de/numpy/transpose

● convert type of array to float32 (np.astype)

● subtract mean from image

○ self.mean

Deep Learning for Computer Vision Slide #32

Task 3: Resize Input Imagedef resize(self, image, is_array=True)

● resize image for instance with pillow

○ https://bartzi.de/pillow/resize

○ use self.input_size

Deep Learning for Computer Vision Slide #33

Task 4: Perform Forward Passdef localize(self, image, is_array=True)

● have a look at documentation and find out how to use model for

predicting

○ https://bartzi.de/chainercv/ssd

○ keep predicted bboxes and class scores

Deep Learning for Computer Vision Slide #34

Task 5: Visualize Detection Resultdef visualize_results(self, image, bboxes, scores)

● draw each bbox onto input image and also show confidence score

○ a bbox is a 4-tuple of (top, left, bottom, right)

○ you can use self.color, self.font and self.font_scale

○ https://bartzi.de/opencv/drawing

0.85

scaling back!scaling!

prediction

Deep Learning for Computer Vision Slide #35

(Live?) Demo

Deep Learning for Computer Vision Slide #36

Time to Hack!(sheeping/sheep_localizer.py)

1. build the model

■ build_model

2. preprocess image

■ preprocess

3. resize image

■ resize

4. perform forward pass

■ localize

5. visualize results

■ visualize_results

→ get pre-trained models from: https://bartzi.de/models/sheep

Deep Learning for Computer Vision Slide #37

Next Time

We will generate something with GANs!

Send an email or visit us anytime with questions!

Christian: [email protected] H-1.11

Joseph: [email protected] H-1.21

Deep Learning for Computer Vision Slide #38

Bitte bringen Sie die Studenten dazu den Raum zu verlassen, um die Präsentation zu beenden.