1
VISIONLAB Scene Segmentation with Dense Reconstruction from Monocular Video Joshua Hernandez, Konstantine Tsotsos, Gottfried Graber, Trevor Campbell, Stefano Soatto <[email protected] > <[email protected] > <[email protected] > <[email protected] > <[email protected] > Dynamic Means clustering of objects Results Geometry of objects System Overview Ground-Truth Geometric Seg. Object Seg. Ground-Truth Comparison Single-frame depth-layer segmentations Precision Recall P/R Curve for Industrial1 A nity Matrices Object a nity Geometric a nity F=0.703 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 F=0.632 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Recall Segmentation Precision/Recall 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 F=0.605 F=0.648 Precision 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Frame segmentations Scene segmentations F=0.628 F=0.549 Precision Recall P/R Curve for Park A nity Matrices Object a nity Geometric a nity 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Recall Segmentation Precision/Recall 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Precision 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 F=0.729 F=0.836 Frame segmentations Scene segmentations Park Industrial Single-frame depth-layer segmentation Dense monocular reconstruction Scene objects are defined by a mesh-traversal geodesic distance, 1 0.5 0 0.5 1 1 0.5 0 0.5 1 1 0.5 0 0.5 1 Concavity Weight 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Concavity Penalty 1 1 2 2 4 4 1 1 2 2 1 1 4 A B d L from A: Traversal Penalty 1 0 1 0 d L from B: Depth-Layer Penalty Segmentation History Distance 0 0.5 1 σ(d L ) 0 0.5 1 Object Distance 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 0 0.5 1 Geometric Distance Combined where Mesh Geodesic Batch-sequential k-means-like algorithm produces temporally-consistent segmentations as (d ij ) evolves. T. Campbell, M. Liu, et al., Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture. NIPS 2013. A. Ayvaci and S. Soatto, Detachable Object Detection: Segmentation and Depth ordering from Short Baseline Video. PAMI 2012. Each frame is segmented into depth layers according to occlusion constraints computed from depthmaps. Dense 3D reconstruction and per-frame camera poses are computed from monocular video of indoor and outdoor scenes. G. Graber, Realtime 3D Reconstruction. Master’s Thesis Wednesday, November 19, 14

System Overview - EECS Wiki Server

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

Page 1: System Overview - EECS Wiki Server

VISIONLAB

Scene Segmentation with Dense Reconstruction from Monocular VideoJoshua Hernandez, Konstantine Tsotsos, Gottfried Graber, Trevor Campbell, Stefano Soatto

<[email protected]> <[email protected]> <[email protected]> <[email protected]> <[email protected]>

Dynamic Means clustering of objects

Results

Geometry of objects

System Overview

Ground-Truth Geometric Seg. Object Seg. Ground-Truth ComparisonSingle-frame depth-layer segmentations

Precision

Reca

ll

P/R Curve for Industrial1 A nity Matrices

Object a nityGeometric a nity

F=0.703

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

10.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

F=0.632

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Reca

ll

Segmentation Precision/Recall0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

F=0.605F=0.648

Precision0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Frame segmentationsScene segmentations

F=0.628F=0.549

Precision

Reca

ll

P/R Curve for Park A nity Matrices

Object a nityGeometric a nity

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

10.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Reca

ll

Segmentation Precision/Recall0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Precision0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

10.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

F=0.729

F=0.836

Frame segmentationsScene segmentations

Park

Industrial

Single-frame depth-layer segmentation

Dense monocular reconstruction

Scene objects are defined by a mesh-traversal geodesic distance,

−1

−0.50

0.5

1

−1

−0.5

0

0.5

1

−1

−0.5

00.5

1

Con

cavi

ty W

eigh

t

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Concavity Penalty

Frame 1. 1 1 2 2

4 4

1 1 2 21 1

4

A B

Frame 2.

Frame 3.

dL from A:

Traversal Penalty1��01��0

dL from B:

Depth-Layer Penalty

Segm

enta

tion

Hist

ory

Dist

ance

0

0.5

1

�(d L)

0

0.5

1

Obj

ect D

istan

ce

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.60

0.5

1

Geometric Distance

Combined

where

Mesh Geodesic

Batch-sequential k-means-like algorithm produces temporally-consistent segmentations as (dij) evolves.

T. Campbell, M. Liu, et al., Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture. NIPS 2013.

A. Ayvaci and S. Soatto, Detachable Object Detection: Segmentation and Depth ordering from Short Baseline Video. PAMI 2012.

Each frame is segmented into depth layers according to occlusion constraints computed from depthmaps.

Dense 3D reconstruction and per-frame camera poses are computed from monocular video of indoor and outdoor scenes.

G. Graber, Realtime 3D Reconstruction. Master’s Thesis

Wednesday, November 19, 14