Efficient 360-degree Visual Perception - NVIDIAon-demand.gputechconf.com/gtc-taiwan/2018/pdf/1-4... · Efficient 360-degree Visual Perception ... •Wild-360 dataset with videos and

Efficient 360-degree Visual Perception高效360度視覺感知

Min Sun

Assistant Professor

National Tsing Hua University1

The Power of 360 Camera

2

The Power of 360 Camera

3

The Market of 360 Cameras is Booming

4

Applications: Virtual Reality

5

Applications: Autonomous Systems

Indoor Robot Self-Driving Car Drone

6

Cube Padding for Unsupervised Saliency Prediction in 360 Videos立方填補於360影片之非監督式學習

Hsien-Tzu Cheng, Chun-Hung Chao, Jin-Dong Dong, Hao-Kai Wen, Tyng-Luh Liu, Min Sun

7

Motivation

8

Motivation

9

Our Goal – Predict Salient Regions

• Self-supervised training

• Computational efficiency10

Our Goal – Automatic View Selection

11

Overview

Challenges:• Image distortion• Image Boundary CNN on the Cube

Robust saliency• on top and bottom• across boundary

12

Outline

• Our method

• Dataset

• Result

• Conclusion

13

Cube Padding

14

Feature Visualization

15

Model

16

Model – Static Model

17

Learning Deep Features for Discriminative Localization, Zhou et al. CVPR16

“achieve 37.1% top-5 error for object localization on ILSVRC 2014 without training on any bounding box annotation”

specific class


18

VGG-16

ResNet-50



specific class


19

VGG-16

ResNet-50



specific class

avg.

po

ol

CA

M-c

on

v

1000 x [fc channel]

B x H x W x [fc channel]

B x H x W x 1000

B x H x W x 1

(Maximum)

(Weighted)


20

Model – Temporal Model

21

Model – Loss function design

𝑶 (static saliency) 𝒎 (motion)

22

Model – Loss function design 𝑂 (static saliency) 𝑚 (motion)

23

•Warp( ),_ԡ ԡ 2

𝑂 (static saliency) 𝑚 (motion)Model – Loss function design

24

•

•

Warp( ),_ԡ ԡ 2

_ԡ ԡ 2


25

•

•

•

Warp( ),_ԡ ԡ 2

_ԡ ԡ 2

Mask( ),ԡ ԡ 2


26

Outline

• Our method

• Dataset

• Result

• Conclusion & Future work

27

DatasetTrain/test split: 60/25 clips, totally 50k frames

Totally 30 annotators, 80 trajectories per video

28

Outline

• Our method

• Dataset

• Result

• Conclusion

29

https://www.youtube.com/watch?v=rlR6fTvpWBg 30

https://www.youtube.com/watch?v=rlR6fTvpWBg

Result

31

Resultfps

fps

fps

fps

32

Result – Human study

33

https://www.youtube.com/watch?v=AQbgHt_oU3c34

https://www.youtube.com/watch?v=AQbgHt_oU3c

https://www.youtube.com/watch?v=AQbgHt_oU3c35

https://www.youtube.com/watch?v=AQbgHt_oU3c

Outline

• Our method

• Dataset

• Result

• Conclusion

36

Conclusion

• Simple and effective Cube Padding (CP) technique

• Novel spatial-temporal network

• Unsupervised trained with designed temporal loss functions

• Wild-360 dataset with videos and saliency heatmap annotations

• Outperforms state-of-the-art methods in both speed and quality

37

Self-Supervised Learning of Depth from 360◦ Videos

自主式學習360影片之深度預測

Under Submission

38

Our Goal – 360 Depth Prediction

39

Applications: Autonomous Systems

Indoor Robot Self-Driving Car Drone

40

𝑸𝟏

𝑰𝟏

𝑰𝟐

𝑫𝟏

𝑷𝟏𝑷𝟐

R, T

𝑷𝟏

𝑷𝟐

DNet

PNet

Our Model

Zhou et al., Unsupervised Learning of Depth and Ego-Motion from Video, CVPR 2017

I: EquirectangularI: CubeD: DepthP: Camera motionQ: Point Cloud

41

𝑡1

Frame Inverse Depth

𝑡2

Frame Inverse Depth Frame Inverse Depth

𝑡1

𝑡2

Dataset – PanoSUNCG

42

Our Results

https://drive.google.com/open?id=1BhTwMxtPyoNcny-qyCqJN-FgvqMuCPFA43

https://drive.google.com/open?id=1BhTwMxtPyoNcny-qyCqJN-FgvqMuCPFA

Quantitative Results – Depth

44

Efficiency – Speedup Ratio

45

Frame Our prediction Frame Our prediction

Qualitative Results – Real-world Videos

46

Conclusion

360 saliency and depth prediction systems• Cube Padding (CP) technique is Simple and effective

• Self-training is important for both systems to scale-up

• Outperform other state-of-the-art methods in both speed and quality

47

ThanksQ & A

48

Documents

Efficient 360-degree Visual Perception - NVIDIAon-demand.gputechconf.com/gtc-taiwan/2018/pdf/1-4... · Efficient 360-degree Visual Perception ... •Wild-360 dataset with videos and