Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Efficient 360-degree Visual Perception高效360度視覺感知
Min Sun
Assistant Professor
National Tsing Hua University1
The Power of 360 Camera
2
The Power of 360 Camera
3
The Market of 360 Cameras is Booming
4
Applications: Virtual Reality
5
Applications: Autonomous Systems
Indoor Robot Self-Driving Car Drone
6
Cube Padding for Unsupervised Saliency Prediction in 360 Videos立方填補於360影片之非監督式學習
Hsien-Tzu Cheng, Chun-Hung Chao, Jin-Dong Dong, Hao-Kai Wen, Tyng-Luh Liu, Min Sun
7
Motivation
8
Motivation
9
Our Goal – Predict Salient Regions
• Self-supervised training
• Computational efficiency10
Our Goal – Automatic View Selection
11
Overview
Challenges:• Image distortion• Image Boundary CNN on the Cube
Robust saliency• on top and bottom• across boundary
12
Outline
• Our method
• Dataset
• Result
• Conclusion
13
Cube Padding
14
Feature Visualization
15
Model
16
Model – Static Model
17
Learning Deep Features for Discriminative Localization, Zhou et al. CVPR16
“achieve 37.1% top-5 error for object localization on ILSVRC 2014 without training on any bounding box annotation”
specific class
Model – Static Model
18
VGG-16
ResNet-50
Learning Deep Features for Discriminative Localization, Zhou et al. CVPR16
“achieve 37.1% top-5 error for object localization on ILSVRC 2014 without training on any bounding box annotation”
specific class
Model – Static Model
19
VGG-16
ResNet-50
Learning Deep Features for Discriminative Localization, Zhou et al. CVPR16
“achieve 37.1% top-5 error for object localization on ILSVRC 2014 without training on any bounding box annotation”
specific class
avg.
po
ol
CA
M-c
on
v
1000 x [fc channel]
B x H x W x [fc channel]
B x H x W x 1000
B x H x W x 1
(Maximum)
(Weighted)
Model – Static Model
20
Model – Temporal Model
21
Model – Loss function design
𝑶 (static saliency) 𝒎 (motion)
22
Model – Loss function design 𝑂 (static saliency) 𝑚 (motion)
23
•Warp( ),_ԡ ԡ 2
𝑂 (static saliency) 𝑚 (motion)Model – Loss function design
24
•
•
Warp( ),_ԡ ԡ 2
_ԡ ԡ 2
𝑂 (static saliency) 𝑚 (motion)Model – Loss function design
25
•
•
•
Warp( ),_ԡ ԡ 2
_ԡ ԡ 2
Mask( ),ԡ ԡ 2
𝑂 (static saliency) 𝑚 (motion)Model – Loss function design
26
Outline
• Our method
• Dataset
• Result
• Conclusion & Future work
27
DatasetTrain/test split: 60/25 clips, totally 50k frames
Totally 30 annotators, 80 trajectories per video
28
Outline
• Our method
• Dataset
• Result
• Conclusion
29
https://www.youtube.com/watch?v=rlR6fTvpWBg 30
Result
31
Resultfps
fps
fps
fps
32
Result – Human study
33
https://www.youtube.com/watch?v=AQbgHt_oU3c34
https://www.youtube.com/watch?v=AQbgHt_oU3c35
Outline
• Our method
• Dataset
• Result
• Conclusion
36
Conclusion
• Simple and effective Cube Padding (CP) technique
• Novel spatial-temporal network
• Unsupervised trained with designed temporal loss functions
• Wild-360 dataset with videos and saliency heatmap annotations
• Outperforms state-of-the-art methods in both speed and quality
37
Self-Supervised Learning of Depth from 360◦ Videos
自主式學習360影片之深度預測
Under Submission
38
Our Goal – 360 Depth Prediction
39
Applications: Autonomous Systems
Indoor Robot Self-Driving Car Drone
40
𝑸𝟏
𝑰𝟏
𝑰𝟐
𝑫𝟏
𝑷𝟏𝑷𝟐
R, T
𝑷𝟏
𝑷𝟐
DNet
PNet
Our Model
Zhou et al., Unsupervised Learning of Depth and Ego-Motion from Video, CVPR 2017
I: EquirectangularI: CubeD: DepthP: Camera motionQ: Point Cloud
41
𝑡1
Frame Inverse Depth
𝑡2
Frame Inverse Depth Frame Inverse Depth
𝑡1
𝑡2
Dataset – PanoSUNCG
42
Our Results
https://drive.google.com/open?id=1BhTwMxtPyoNcny-qyCqJN-FgvqMuCPFA43
Quantitative Results – Depth
44
Efficiency – Speedup Ratio
45
Frame Our prediction Frame Our prediction
Qualitative Results – Real-world Videos
46
Conclusion
360 saliency and depth prediction systems• Cube Padding (CP) technique is Simple and effective
• Self-training is important for both systems to scale-up
• Outperform other state-of-the-art methods in both speed and quality
47
ThanksQ & A
48