Upload
others
View
15
Download
0
Embed Size (px)
Citation preview
VISIONLAB
Scene Segmentation with Dense Reconstruction from Monocular VideoJoshua Hernandez, Konstantine Tsotsos, Gottfried Graber, Trevor Campbell, Stefano Soatto
<[email protected]> <[email protected]> <[email protected]> <[email protected]> <[email protected]>
Dynamic Means clustering of objects
Results
Geometry of objects
System Overview
Ground-Truth Geometric Seg. Object Seg. Ground-Truth ComparisonSingle-frame depth-layer segmentations
Precision
Reca
ll
P/R Curve for Industrial1 A nity Matrices
Object a nityGeometric a nity
F=0.703
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
10.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
F=0.632
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Reca
ll
Segmentation Precision/Recall0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
F=0.605F=0.648
Precision0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Frame segmentationsScene segmentations
F=0.628F=0.549
Precision
Reca
ll
P/R Curve for Park A nity Matrices
Object a nityGeometric a nity
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
10.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Reca
ll
Segmentation Precision/Recall0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Precision0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
10.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
F=0.729
F=0.836
Frame segmentationsScene segmentations
Park
Industrial
Single-frame depth-layer segmentation
Dense monocular reconstruction
Scene objects are defined by a mesh-traversal geodesic distance,
−1
−0.50
0.5
1
−1
−0.5
0
0.5
1
−1
−0.5
00.5
1
Con
cavi
ty W
eigh
t
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Concavity Penalty
Frame 1. 1 1 2 2
4 4
1 1 2 21 1
4
A B
Frame 2.
Frame 3.
dL from A:
Traversal Penalty1��01��0
dL from B:
Depth-Layer Penalty
Segm
enta
tion
Hist
ory
Dist
ance
0
0.5
1
�(d L)
0
0.5
1
Obj
ect D
istan
ce
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.60
0.5
1
Geometric Distance
Combined
where
Mesh Geodesic
Batch-sequential k-means-like algorithm produces temporally-consistent segmentations as (dij) evolves.
T. Campbell, M. Liu, et al., Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture. NIPS 2013.
A. Ayvaci and S. Soatto, Detachable Object Detection: Segmentation and Depth ordering from Short Baseline Video. PAMI 2012.
Each frame is segmented into depth layers according to occlusion constraints computed from depthmaps.
Dense 3D reconstruction and per-frame camera poses are computed from monocular video of indoor and outdoor scenes.
G. Graber, Realtime 3D Reconstruction. Master’s Thesis
Wednesday, November 19, 14