Upload
others
View
33
Download
0
Embed Size (px)
Citation preview
Deep Learning for Computer Vision Slide #3
How to Build a Sheep Recognizer
1. get data
○ if there is no data, generate it!
2. think about a suitable architecture
○ RCNN, Yolo, SSD, …
3. train the network in your favourite Framework
○ in our case Chainer
4. evaluate the model in the wild
Recall: best practices as presented in the lecture (optimizers, activations,
data augmentation, …)
Deep Learning for Computer Vision Slide #5
Data GenerationIdea
● background images
● locations for each image
(bounding boxes)
Deep Learning for Computer Vision Slide #6
Data GenerationIdea
● background images
● locations for each image
(bounding boxes)
● objects to insert
Deep Learning for Computer Vision Slide #7
Data GenerationIdea
● background images
● locations for each image
(bounding boxes)
● objects to insert
Deep Learning for Computer Vision Slide #8
Data GenerationIdea
● background images = i
● locations for each image
(bounding boxes) = L
● objects to insert = o
Advantages:
● number of images scales:
i * L * o
● ground truth is known from bounding boxes
But:
● not real data (occlusion, lighting, bad placement, ...)
Deep Learning for Computer Vision Slide #9
Object Detection and Recognition in the WildA Brief History
Deep Learning for Computer Vision Slide #10
Object Detection and Recognition in the WildA Brief History Before DL
● Sliding Windows
Compute a new regressed bounding box and
classification score for all sliding window positions
● Region Proposal Detection
Object proposal methods compute boxes which
potentially contain an object.
Deep Learning for Computer Vision Slide #11
Object Detection and Recognition in the WildA Brief History
RCNN
● selective search
● a deep CNN
● SVMs for each class
Deep Learning for Computer Vision
Fast RCNN
● Selective search for bbox on top conv5
● ROI Pooling instead of warped step
● Multi-task for classification & regression
● 25x faster than RCNN
Slide #12
Object Detection and Recognition in the WildA Brief History
Deep Learning for Computer Vision
Faster RCNN
● Region Proposal Network
● 250x faster than RCNN
Slide #13
Object Detection and Recognition in the WildA Brief History
Deep Learning for Computer Vision Slide #14
Object Detection and Recognition in the WildA Brief History
Mask RCNN
● ROI align is more accurate than ROI pooling
● FPN (Feature Pyramid Network) permits to get feature map at different scales
● Instance semantic bbox detection
Deep Learning for Computer Vision Slide #15
Object Detection and Recognition in the WildA Brief History
YOLO - ?
Deep Learning for Computer Vision Slide #16
Object Detection and Recognition in the WildA Brief History
YOLO - You Only Look Once
● one CNN trained end-to-end
● fixed base regions
● fast (around 45 FPS)
● 7 x 7 x 5 = 245 boxes
Confidence score
Class Prediction
Deep Learning for Computer Vision
SSD - Single Shot Multibox Detector
● combines feature maps from multiple scales
● predefined grid cells’ aspect ratio
Slide #17
Object Detection and Recognition in the WildA Brief History
Deep Learning for Computer Vision
SSD - Single Shot Multibox Detector
● combines feature maps from multiple scales
● predefined grid cells’ aspect ratio
Slide #18
Object Detection and Recognition in the WildA Brief History
Feature Extractor
Deep Learning for Computer Vision
SSD - Single Shot Multibox Detector
● combines feature maps from multiple scales
● predefined grid cells’ aspect ratio
● fast (45 - 59 FPS)
Slide #19
Object Detection and Recognition in the WildA Brief History
Multibox Predictor
Deep Learning for Computer Vision
SSD - Single Shot Multibox Detector
● one CNN trained end-to-end
● combines feature maps from multiple scales
● fast (45 - 59 FPS)
Slide #20
Object Detection and Recognition in the WildA Brief History
Post Processing
Deep Learning for Computer Vision Slide #22
Object Detection and Recognition in the WildFaster RCNN vs SSD
Deep Learning for Computer Vision Slide #23
Implementation
● trained model using Chainer and ChainerCV
● Chainer is very similar to LENGTH
● trained two types of models
○ SSD 512
○ SSD 300
Deep Learning for Computer Vision Slide #24
Implementation
● trained on 4 1080Ti for 200 epochs
● how long did it take?
Deep Learning for Computer Vision Slide #25
Implementation
● trained on 4 1080Ti for 200 epochs
● training took around 7 hours for SSD512
Deep Learning for Computer Vision
Get the Code
Clone the code: https://github.com/HPI-DeepLearning/schaaaafrichter
● everything should be in the README.md● GPU installation takes some effort, but is an excellent preparation for
the challenge following soon
Slide #26
Deep Learning for Computer Vision
Tasks for Today
1. finish the sheep localizer and use it for live demo purposes or static
evaluation
2. have fun
Slide #27
Deep Learning for Computer Vision Slide #28
Your Task - Implement Sheep Localizer(sheeping/sheep_localizer.py)
1. load the model
2. preprocess input image
3. resize input image
4. perform forward pass through network
5. visualize detection result
Deep Learning for Computer Vision Slide #29
Task 1: Build the Modeldef build_model(self)
● determine correct model type
○ self.model_type = ["ssd300", "ssd512"]
● build correct model class
○ https://bartzi.de/chainercv/ssd
○ watch out: we only have 1 class → n_fg_class=1
● if necessary transfer model to GPU
○ https://bartzi.de/chainer/gpu
● transfer learned weights to model
○ with np.load(self.model_file) as f:
chainer.serializers.NpzDeserializer(f).load(model)
Deep Learning for Computer Vision Slide #30
Task 2: Preprocess Input Imagesdef preprocess(self, image)
● reorder channels to be in format: CHW
○ https://bartzi.de/numpy/transpose
● convert type of array to float32 (np.astype)
● subtract mean from image
○ why do we do that?
Deep Learning for Computer Vision Slide #31
Task 2: Preprocess Input Imagesdef preprocess(self, image)
● reorder channels to be in format: CHW
○ https://bartzi.de/numpy/transpose
● convert type of array to float32 (np.astype)
● subtract mean from image
○ self.mean
Deep Learning for Computer Vision Slide #32
Task 3: Resize Input Imagedef resize(self, image, is_array=True)
● resize image for instance with pillow
○ https://bartzi.de/pillow/resize
○ use self.input_size
Deep Learning for Computer Vision Slide #33
Task 4: Perform Forward Passdef localize(self, image, is_array=True)
● have a look at documentation and find out how to use model for
predicting
○ https://bartzi.de/chainercv/ssd
○ keep predicted bboxes and class scores
Deep Learning for Computer Vision Slide #34
Task 5: Visualize Detection Resultdef visualize_results(self, image, bboxes, scores)
● draw each bbox onto input image and also show confidence score
○ a bbox is a 4-tuple of (top, left, bottom, right)
○ you can use self.color, self.font and self.font_scale
○ https://bartzi.de/opencv/drawing
0.85
scaling back!scaling!
prediction
Deep Learning for Computer Vision Slide #36
Time to Hack!(sheeping/sheep_localizer.py)
1. build the model
■ build_model
2. preprocess image
■ preprocess
3. resize image
■ resize
4. perform forward pass
■ localize
5. visualize results
■ visualize_results
→ get pre-trained models from: https://bartzi.de/models/sheep
Deep Learning for Computer Vision Slide #37
Next Time
We will generate something with GANs!
Send an email or visit us anytime with questions!
Christian: [email protected] H-1.11
Joseph: [email protected] H-1.21