64
Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive Fully Convolutional Networks By Jifeng Dai, Kaiming He, Yi Li, Shaoqing Ren, Jian Sun Presented by Zilong Bai [email protected]

Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

Fully Convolutional Networks for Semantic SegmentationBy Jonathan Long* Evan Shelhamer* Trevor Darrell

Instance-sensitive Fully Convolutional NetworksBy Jifeng Dai, Kaiming He, Yi Li, Shaoqing Ren, Jian Sun

Presented by Zilong [email protected]

Page 2: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

Outline1. What problems they attempt to solve?

2. Key Contributions

3. Network Architecture Details

4. Experimental Setup and Results

5. Strengths and Weaknesses*

6. Possible Extensions*

a. And other comments

Page 3: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

UC Berkeley

Fully Convolutional Networksfor Semantic Segmentation

Jonathan Long* Evan Shelhamer* Trevor Darrell

Content Source: https://docs.google.com/presentation/d/1VeWFMpZ8XN7OC3URZP4WdXvOGYckoFWGVN7hApoXVnc/edit#slide=id.g529579d43_3_7

Page 4: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

Problem to solve:Image Segmentation. Pixels in, pixels out.

Semanticsegmentation

Monocular depth estimation Eigen & Fergus 2015

Boundary prediction Xie & Tu 2015Optical flow Fischer et al. 2015

Content Source: https://docs.google.com/presentation/d/1VeWFMpZ8XN7OC3URZP4WdXvOGYckoFWGVN7hApoXVnc/edit#slide=id.g529579d43_3_7

Page 5: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

Problem to solveWhat is semantic segmentation?Input: Image (2D array of pixels)Output: Pixels clustered according to their semantical categories.

I.e. Class-level pixel-wise clustering (supervised)

NOTE: pixels of two people in the same image will be clustered together by this model. Second paper attempts to fill in the blank of this area ....

Input Output

Image Source: https://docs.google.com/presentation/d/1VeWFMpZ8XN7OC3URZP4WdXvOGYckoFWGVN7hApoXVnc/edit#slide=id.g529579d43_3_7

Page 6: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

Key Contributions1) AlexNet (VGG, GoogLeNet) -> Fully Convolutional Network

a) From image-level classification to pixel-level clustering

b) Arbitrary sized input images*

c) End-to-end learning model

2) Skip-layer structure to improve segmentation detail

a) Combine deep, coarse, semantic information with shallow, fine,

appearance information.

b) WHAT (deeper layers) + WHERE(shallower layers)

Page 7: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

7

“tabby cat”

1000-dim vector

< 1 millisecond

Convnets perform classification

end-to-end learning

Content Source: https://docs.google.com/presentation/d/1VeWFMpZ8XN7OC3URZP4WdXvOGYckoFWGVN7hApoXVnc/edit#slide=id.g529579d43_3_7

Page 8: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

“tabby cat”

8

Recall: a classification network

NOTE: Implement layer 6 and 7 as fully connected layers fixes the size of input images

Content Source: https://docs.google.com/presentation/d/1VeWFMpZ8XN7OC3URZP4WdXvOGYckoFWGVN7hApoXVnc/edit#slide=id.g529579d43_3_7

Page 9: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

9

Recall: R-CNNObject detection without modifying AlexNet architecture

figure: Girshick et al.

Content Source: https://docs.google.com/presentation/d/1VeWFMpZ8XN7OC3URZP4WdXvOGYckoFWGVN7hApoXVnc/edit#slide=id.g529579d43_3_7

Page 10: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

R-CNN

10

Many seconds

“cat”

“dog”

Recall:R-CNN does detection

Whether using off-the-shelf methods or in-network layers for region proposals, bounding boxes are always needed in these approaches

SLOWContent Source: https://docs.google.com/presentation/d/1VeWFMpZ8XN7OC3URZP4WdXvOGYckoFWGVN7hApoXVnc/edit#slide=id.g529579d43_3_7

Page 11: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

11

~1/10 second

end-to-end learning

???

Content Source: https://docs.google.com/presentation/d/1VeWFMpZ8XN7OC3URZP4WdXvOGYckoFWGVN7hApoXVnc/edit#slide=id.g529579d43_3_7

Page 12: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

“tabby cat”

12

A classification network (see it again)

Content Source: https://docs.google.com/presentation/d/1VeWFMpZ8XN7OC3URZP4WdXvOGYckoFWGVN7hApoXVnc/edit#slide=id.g529579d43_3_7

Page 13: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

13

How to become fully convolutional

To be honest, fully convolutional, is just another way of thinking…

Content Source: https://docs.google.com/presentation/d/1VeWFMpZ8XN7OC3URZP4WdXvOGYckoFWGVN7hApoXVnc/edit#slide=id.g529579d43_3_7

Page 14: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

Becoming fully convolutional

To be honest, fully convolution, is just another way of thinking…

But it makes significant difference in training and maintaining the network structure in implementation!- Only convolution kernels are maintained; downsampling ratios are controlled by strides.- Arbitrary size- Faster! Compare to naive implementation

Layer 6 can be generated with kernel 13 x 13 x d_5, stride = 0: a kernel that does not move aroundLayer 7 can be generated with kernel 1 x 1 x d_6, stride = 0: another kernel that does not move around

Content Source: https://docs.google.com/presentation/d/1VeWFMpZ8XN7OC3URZP4WdXvOGYckoFWGVN7hApoXVnc/edit#slide=id.g529579d43_3_7

Page 15: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

15

Now it is fully convolutional

Content Source: https://docs.google.com/presentation/d/1VeWFMpZ8XN7OC3URZP4WdXvOGYckoFWGVN7hApoXVnc/edit#slide=id.g529579d43_3_7

Page 16: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

16

Upsampling output

NOTE: Upsampled output is H x W x (class number + 1)

Each H x W slice shows the heat map for one category

Content Source: https://docs.google.com/presentation/d/1VeWFMpZ8XN7OC3URZP4WdXvOGYckoFWGVN7hApoXVnc/edit#slide=id.g529579d43_3_7

Page 17: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

17

End-to-end &Pixels-to-pixels network

Each semantic segmentation ground truth image actually needs to be divided into (class number + 1) slices and each slice corresponds to the ground truth heat map of one category.

Content Source: https://docs.google.com/presentation/d/1VeWFMpZ8XN7OC3URZP4WdXvOGYckoFWGVN7hApoXVnc/edit#slide=id.g529579d43_3_7

Page 18: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

conv, pool,nonlinearity

upsampling

pixelwiseoutput + loss

End-to-end, pixels-to-pixels network

18Content Source: https://docs.google.com/presentation/d/1VeWFMpZ8XN7OC3URZP4WdXvOGYckoFWGVN7hApoXVnc/edit#slide=id.g529579d43_3_7

Page 19: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

stride 32

no skips

input image

If stopped right here, what could we get?

19

Coarse. Really, really coarse �

Content Source: https://docs.google.com/presentation/d/1VeWFMpZ8XN7OC3URZP4WdXvOGYckoFWGVN7hApoXVnc/edit#slide=id.g529579d43_3_7

Page 20: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

Spectrum of deep features

Combine where (local, shallow) with what (global, deep)

fuse features into deep jet

(cf. Hariharan et al. CVPR15 “hypercolumn”)

20Content Source: https://docs.google.com/presentation/d/1VeWFMpZ8XN7OC3URZP4WdXvOGYckoFWGVN7hApoXVnc/edit#slide=id.g529579d43_3_7

Page 21: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

Skip layers

skip to fuse layers!

Interp + sum

Interp + sum

dense output 21

End-to-end, joint learningof semantics and location

Content Source: https://docs.google.com/presentation/d/1VeWFMpZ8XN7OC3URZP4WdXvOGYckoFWGVN7hApoXVnc/edit#slide=id.g529579d43_3_7

Page 22: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

Skip layers

22Content Source: https://computing.ece.vt.edu/~f15ece6504/slides/L13_FCN.pdf

Page 23: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

Skip layers

23

How exactly are layers fused?

Take FCN-16s for instance: fusing pool4 and conv 7 in the following steps:

1. Add a 1 x 1 convolution layer on top of pool4 to produce additional class predictions. a. The output predictions of pool4 are 16s

2. 2x upsample the output of conv 7 which are 32s. a. The output predictions of upsampled conv 7 are 16s as well.

3. Add these 16s predictions together.4. Upsample these 16s predictions back to image size.NOTE: ALL the weights can be learned. The upsampling weights can be initialized with bilinear interpolation.

Page 24: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

stride 32

no skips

stride 16

1 skip

stride 8

2 skips

ground truthinput image

Skip layer refinement

24Content Source: https://docs.google.com/presentation/d/1VeWFMpZ8XN7OC3URZP4WdXvOGYckoFWGVN7hApoXVnc/edit#slide=id.g529579d43_3_7

Page 25: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

Training + Testing- Train full image at a time without patch sampling - Reshape network to take input of any size- Forward time is ~100ms for 500 x 500 x 21 output (This is really fast!)

25Content Source: https://docs.google.com/presentation/d/1VeWFMpZ8XN7OC3URZP4WdXvOGYckoFWGVN7hApoXVnc/edit#slide=id.g529579d43_3_7

Page 26: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

Qualitative Results

FCN SDS* Truth Input

26

Relative to prior state-of-the-art SDS:

- 30% relative improvementfor mean IoU

- 286× faster

*Simultaneous Detection and Segmentation Hariharan et al. ECCV14

Page 27: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

resultsFCN SDS* Truth Input

27

Relative to prior state-of-the-art SDS:

- 30% relative improvementfor mean IoU

- 286× faster

*Simultaneous Detection and Segmentation Hariharan et al. ECCV14

Ghosts sitting on that boat?!!

Qualitative Results

Page 28: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

Experimental Setup1) AlexNet architecture2) VGG nets, pick the VGG 16-layer net5 3) GoogLeNet, use only the final loss layer, and improve performance by

discarding the final average pooling layer.

*Decapitate each net by discarding the final classifier layer, and convert all fully connected layers to convolutions.

Page 29: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

results

29

Quantitative Results

Page 30: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

results

30

Quantitative ResultsSIFT FLOW NYUDv2

PASCAL VOC 2011 8498-training

Content Source: https://computing.ece.vt.edu/~f15ece6504/slides/L13_FCN.pdf

Page 31: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

Potential Extensions

A boring extension: if we directly use shallower layers and upsample without fusing with deeper layers, how bad would it be?

An interesting, promising and intuitive extension:What the next paper attempted to address=>

Page 32: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

Instance-sensitive Fully Convolutional Networks

Jifeng Dai, Kaiming He, Jian Sun. Microsoft ResearchYi Li. Tsinghua University (While interning at Microsoft Research)Shaoqing Ren.University of Science and Technology of China (While interning at Microsoft Research)

32

Page 33: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

Problem to solve:Instance-level Segmentation. Pixels in, pixels out.

33

Page 34: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

Problem to solve:Instance-level Segmentation. Pixels in, pixels out.

34

Page 35: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

Major ContributionsA fully convolutional network architecture that:1) Computes a set of instance-sensitive score maps

a) Each pixel is a classifier of relative positions to an object instance

b) Assemble to output instance candidate at each position

2) Reuse semantic segmentation results

3) Exploits image local coherence

a) w/o any high-dimensional layer related to the mask resolution

(compare with DeepMask)

Page 36: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

Major ContributionsA fully convolutional network architecture for

instance-level segmentation.

Page 37: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

37

Recall:Upsampling output

NOTE: Upsampled output is H x W x (class number + 1)

Each H x W slice shows the heat map for one category

Page 38: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

38

> Generate instance-sensitive score maps > Assemble

Generate a set of k x k instance-sensitive score maps (for instance k = 3)

#1 #2 #3

#4 #5 #6

#7 #8 #9

#1 #2

#4

#3

#5 #6

#7 #8 #9

m x mx (k x k)

Page 39: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

39

> Generate instance-sensitive score maps > Assemble

Generate a set of k x k instance-sensitive score maps (for instance k = 3)

#1 #2 #3

#4 #5 #6

#7 #8 #9

#1 #2

#4

#3

#5 #6

#7 #8 #9

NOTE: Not all positions the sliding window visited were objects.

m x mx (k x k)

Page 40: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

Complete Instance-level Segmentation Network -2 BranchesUpper: Generate instance-sensitive score maps and assembleBottom: Generate objectness scores

Page 41: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

Experimental Setup1) Use the VGG-16 network pre-trained on ImageNet as the feature extractor. 2) The 13 convolutional layers in VGG-16 are applied fully convolutionally on

an input image of arbitrary size.3) Reduce the network stride and increase feature map resolution:

a) the max pooling layer pool4 (between conv4_3 and conv5_1) is modified to have a stride of 1 instead of 2,

b) accordingly the filters in conv5_1 to conv5_3 are adjusted by the “hole algorithm”.

*Using this modified VGG network, the effective stride of the conv5_3 feature map is s = 8 pixels w.r.t. the input image.

Page 42: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

DeepMaskLooks similar, but it doesn’t know how to use the local coherence

Page 43: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

Quantitative ResultsAblation comparisons on the PASCAL VOC 2012 validation set

Page 44: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

Quantitative ResultsPerformance evaluations on PASCAL VOC 2012Validation set

Page 45: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

Quantitative ResultsPerformance evaluations on MS COCOValidation set

Page 46: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

Qualitative Result

Page 47: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

Qualitative Result

Page 48: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

Strengths and WeaknessesStrengths:1) Both papers addressed very important questions with fully convolutional networks efficiently.2) Both papers have novelty with respect to network architectures.3) Both papers have convincing experiments.

a) Visualization and numerical results are clear and convincing.4) The discussion on the convolution operations in the first paper is helpful for interpretation and

better understanding of convolutional networks.5) The second paper doesn’t require another process to generate region proposals.

Weaknesses:1) How to use the training data is never clearly addressed.

a) What ground truth is used together with the forwarded heap maps for the loss functions?i) The first paper is intuitive in this part, but the second paper is very confusing.

2) Several essential points are unclear in the second papera) Did the second paper skip layers? b) Where did the second paper upsample? Or they just did not?

3) The relative location grids in the second paper worked well but look strange:a) One person’s “left” could be the other’s “right”, but each channel is in charge of the

relative location of all sliding windows.

Page 49: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

Potential directions

1) Other tasks to be resolved by fully convolutional networksa) Scene recognition?

(1) Semantical combination of objects

2) Why is the size of sliding windows fixed in the second paper?a) Many small instances crowded together.

3) What about combining box-level object recognition with semantic segmentation?

Image Source: https://www.pinterest.com/pin/369787819374178444/https://www.pinterest.com/pin/399553798160612769/

Page 50: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

Backup SlidesDatasets:

+ NYUD net for multi-modal input and SIFT Flow net for multi-task output

PASCAL VOC Table 3 gives the performance of our FCN-8s on the test sets of PASCAL VOC 2011 and 2012, and compares it to the previous state-of-the-art, SDS [17], and the well-known R-CNN [12]. NYUDv2 [33] is an RGB-D dataset collected using the Microsoft Kinect. It has 1449 RGB-D images, with pixelwise labels that have been coalesced into a 40 class semantic segmentation task by Gupta et al. [14].

SIFT Flow is a dataset of 2,688 images with pixel labels for 33 semantic categories (“bridge”, “mountain”, “sun”), as well as three geometric categories (“horizontal”, “vertical”,and “sky”).

Page 51: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

Past and future history offully convolutional networks

51Content Source: https://docs.google.com/presentation/d/1VeWFMpZ8XN7OC3URZP4WdXvOGYckoFWGVN7hApoXVnc/edit#slide=id.g529579d43_3_7

Page 52: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

history

Convolutional Locator NetworkWolf & Platt 1994

Shape Displacement NetworkMatan & LeCun 1992

52Content Source: https://docs.google.com/presentation/d/1VeWFMpZ8XN7OC3URZP4WdXvOGYckoFWGVN7hApoXVnc/edit#slide=id.g529579d43_3_7

Page 53: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

53

Scale Pyramid, Burt & Adelson ‘83

pyramids

0 1 2

The scale pyramid is a classic multi-resolution representation.

Fusing multi-resolution network layers is a learned, nonlinear counterpart.

Content Source: https://docs.google.com/presentation/d/1VeWFMpZ8XN7OC3URZP4WdXvOGYckoFWGVN7hApoXVnc/edit#slide=id.g529579d43_3_7

Page 54: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

54

Jet, Koenderink & Van Doorn ‘87

jets

The local jet collects the partial derivatives at a point for a rich local description.

The deep jet collects layer compositions for a rich,learned description.

Content Source: https://docs.google.com/presentation/d/1VeWFMpZ8XN7OC3URZP4WdXvOGYckoFWGVN7hApoXVnc/edit#slide=id.g529579d43_3_7

Page 55: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

55

extensions

- more tasks- random fields- weak supervision

Content Source: https://docs.google.com/presentation/d/1VeWFMpZ8XN7OC3URZP4WdXvOGYckoFWGVN7hApoXVnc/edit#slide=id.g529579d43_3_7

Page 56: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

many pixelwise tasks

semanticsegmentation

56

monocular depth estimation Eigen & Fergus 2015

boundary prediction Xie & Tu 2015optical flow Fischer et al. 2015Content Source: https://docs.google.com/presentation/d/1VeWFMpZ8XN7OC3URZP4WdXvOGYckoFWGVN7hApoXVnc/edit#slide=id.g529579d43_3_7

Page 57: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

fully conv. nets + random fields

Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs.Chen* & Papandreou* et al. ICLR 2015. 57Content Source: https://docs.google.com/presentation/d/1VeWFMpZ8XN7OC3URZP4WdXvOGYckoFWGVN7hApoXVnc/edit#slide=id.g529579d43_3_7

Page 58: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

fully conv. nets + random fields

Conditional Random Fields as Recurrent Neural Networks. Zheng* & Jayasumana* et al. arxiv 2015. 58Content Source: https://docs.google.com/presentation/d/1VeWFMpZ8XN7OC3URZP4WdXvOGYckoFWGVN7hApoXVnc/edit#slide=id.g529579d43_3_7

Page 59: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

[ comparison credit: CRF as RNN, Zheng* & Jayasumana* et al. ICCV 2015 ]

59DeepLab: Chen* & Papandreou* et al. ICLR 2015. CRF-RNN: Zheng* & Jayasumana* et al. ICCV 2015Content Source: https://docs.google.com/presentation/d/1VeWFMpZ8XN7OC3URZP4WdXvOGYckoFWGVN7hApoXVnc/edit#slide=id.g529579d43_3_7

Page 60: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

fully conv. nets + weak supervision

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation.Pathak et al. arXiv 2015.

FCNs expose a spatial loss map to guide learning:segment from tags by MIL or pixelwise constraints.

60Content Source: https://docs.google.com/presentation/d/1VeWFMpZ8XN7OC3URZP4WdXvOGYckoFWGVN7hApoXVnc/edit#slide=id.g529579d43_3_7

Page 61: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

fully conv. nets + weak supervision

BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation.Dai et al. 2015.

FCNs expose a spatial loss map to guide learning:mine boxes + feedback to refine masks.

61Content Source: https://docs.google.com/presentation/d/1VeWFMpZ8XN7OC3URZP4WdXvOGYckoFWGVN7hApoXVnc/edit#slide=id.g529579d43_3_7

Page 62: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

leaderboard

== segmentation with Caffe

62

FCNFCNFCNFCNFCNFCNFCNFCNFCNFCNFCN

FCNFCNFCN

FCN

Content Source: https://docs.google.com/presentation/d/1VeWFMpZ8XN7OC3URZP4WdXvOGYckoFWGVN7hApoXVnc/edit#slide=id.g529579d43_3_7

Page 63: Fully Convolutional Networks for Semantic …yjlee/teaching/ecs289g-fall...Fully Convolutional Networks for Semantic Segmentation By Jonathan Long* Evan Shelhamer* Trevor Darrell Instance-sensitive

caffeinated contemporaries

Hypercolumn SDSHariharan, Arbeláez,Girshick, Malik

Zoom-OutMostajabi, Yadollahpour,Shaknarovich

Convolutional Feature MaskingDai, He, Sun

63Content Source: https://docs.google.com/presentation/d/1VeWFMpZ8XN7OC3URZP4WdXvOGYckoFWGVN7hApoXVnc/edit#slide=id.g529579d43_3_7