Deep Learning for Computer Vision · 2018-06-04 · Deep Learning for Computer Vision Slide #3 How to Build a Sheep Recognizer 1. get data if there is no data, generate it! 2. think

Deep Learning for Computer Vision


Christian Bartz, Joseph Bethge


How can we detect the HPI sheep with a deep neural network?

Deep Learning for Computer Vision Slide #3

How to Build a Sheep Recognizer

1. get data

○ if there is no data, generate it!

2. think about a suitable architecture

○ RCNN, Yolo, SSD, …

3. train the network in your favourite Framework

○ in our case Chainer

4. evaluate the model in the wild

Recall: best practices as presented in the lecture (optimizers, activations,

data augmentation, …)


Data GenerationIdea

● background images


Data GenerationIdea


● locations for each image

(bounding boxes)


Data GenerationIdea



(bounding boxes)

● objects to insert


Data GenerationIdea



(bounding boxes)

● objects to insert


Data GenerationIdea

● background images = i


(bounding boxes) = L

● objects to insert = o

Advantages:

● number of images scales:

i * L * o

● ground truth is known from bounding boxes

But:

● not real data (occlusion, lighting, bad placement, ...)


Object Detection and Recognition in the WildA Brief History


Object Detection and Recognition in the WildA Brief History Before DL

● Sliding Windows

Compute a new regressed bounding box and

classification score for all sliding window positions

● Region Proposal Detection

Object proposal methods compute boxes which

potentially contain an object.



RCNN

● selective search

● a deep CNN

● SVMs for each class


Fast RCNN

● Selective search for bbox on top conv5

● ROI Pooling instead of warped step

● Multi-task for classification & regression

● 25x faster than RCNN

Slide #12



Faster RCNN

● Region Proposal Network

● 250x faster than RCNN

Slide #13




Mask RCNN

● ROI align is more accurate than ROI pooling

● FPN (Feature Pyramid Network) permits to get feature map at different scales

● Instance semantic bbox detection



YOLO - ?



YOLO - You Only Look Once

● one CNN trained end-to-end

● fixed base regions

● fast (around 45 FPS)

● 7 x 7 x 5 = 245 boxes

Confidence score

Class Prediction


SSD - Single Shot Multibox Detector

● combines feature maps from multiple scales

● predefined grid cells’ aspect ratio

Slide #17






Slide #18


Feature Extractor





● fast (45 - 59 FPS)

Slide #19


Multibox Predictor



● one CNN trained end-to-end


● fast (45 - 59 FPS)

Slide #20


Post Processing


Object Detection and Recognition in the WildSSD vs YOLO


Object Detection and Recognition in the WildFaster RCNN vs SSD


Implementation

● trained model using Chainer and ChainerCV

● Chainer is very similar to LENGTH

● trained two types of models

○ SSD 512

○ SSD 300


Implementation

● trained on 4 1080Ti for 200 epochs

● how long did it take?


Implementation

● trained on 4 1080Ti for 200 epochs

● training took around 7 hours for SSD512


Get the Code

Clone the code: https://github.com/HPI-DeepLearning/schaaaafrichter

● everything should be in the README.md● GPU installation takes some effort, but is an excellent preparation for

the challenge following soon

Slide #26

https://github.com/HPI-DeepLearning/schaaaafrichter


Tasks for Today

1. finish the sheep localizer and use it for live demo purposes or static

evaluation

2. have fun

Slide #27


Your Task - Implement Sheep Localizer(sheeping/sheep_localizer.py)

1. load the model

2. preprocess input image

3. resize input image

4. perform forward pass through network

5. visualize detection result


Task 1: Build the Modeldef build_model(self)

● determine correct model type

○ self.model_type = ["ssd300", "ssd512"]

● build correct model class

○ https://bartzi.de/chainercv/ssd

○ watch out: we only have 1 class → n_fg_class=1

● if necessary transfer model to GPU

○ https://bartzi.de/chainer/gpu

● transfer learned weights to model

○ with np.load(self.model_file) as f:

chainer.serializers.NpzDeserializer(f).load(model)

https://bartzi.de/chainercv/ssd

https://bartzi.de/chainer/gpu


Task 2: Preprocess Input Imagesdef preprocess(self, image)

● reorder channels to be in format: CHW

○ https://bartzi.de/numpy/transpose

● convert type of array to float32 (np.astype)

● subtract mean from image

○ why do we do that?

https://bartzi.de/numpy/transpose


Task 2: Preprocess Input Imagesdef preprocess(self, image)

● reorder channels to be in format: CHW

○ https://bartzi.de/numpy/transpose

● convert type of array to float32 (np.astype)

● subtract mean from image

○ self.mean

https://bartzi.de/numpy/transpose


Task 3: Resize Input Imagedef resize(self, image, is_array=True)

● resize image for instance with pillow

○ https://bartzi.de/pillow/resize

○ use self.input_size

https://bartzi.de/pillow/resize


Task 4: Perform Forward Passdef localize(self, image, is_array=True)

● have a look at documentation and find out how to use model for

predicting

○ https://bartzi.de/chainercv/ssd

○ keep predicted bboxes and class scores

https://bartzi.de/chainercv/ssd


Task 5: Visualize Detection Resultdef visualize_results(self, image, bboxes, scores)

● draw each bbox onto input image and also show confidence score

○ a bbox is a 4-tuple of (top, left, bottom, right)

○ you can use self.color, self.font and self.font_scale

○ https://bartzi.de/opencv/drawing

0.85

scaling back!scaling!

prediction

https://bartzi.de/opencv/drawing


(Live?) Demo


Time to Hack!(sheeping/sheep_localizer.py)

1. build the model

■ build_model

2. preprocess image

■ preprocess

3. resize image

■ resize

4. perform forward pass

■ localize

5. visualize results

■ visualize_results

→ get pre-trained models from: https://bartzi.de/models/sheep

https://bartzi.de/models/sheep


Next Time

We will generate something with GANs!

Send an email or visit us anytime with questions!

Christian: [email protected] H-1.11

Joseph: [email protected] H-1.21

mailto:[email protected]

mailto:[email protected]


Bitte bringen Sie die Studenten dazu den Raum zu verlassen, um die Präsentation zu beenden.

Documents

Deep Learning for Computer Vision · 2018-06-04 · Deep Learning for Computer Vision Slide #3 How to Build a Sheep Recognizer 1. get data if there is no data, generate it! 2. think