System Overview - EECS Wiki Server

VISIONLAB

Scene Segmentation with Dense Reconstruction from Monocular VideoJoshua Hernandez, Konstantine Tsotsos, Gottfried Graber, Trevor Campbell, Stefano Soatto

<[email protected]> <[email protected]> <[email protected]> <[email protected]> <[email protected]>

Dynamic Means clustering of objects

Results

Geometry of objects

System Overview

Ground-Truth Geometric Seg. Object Seg. Ground-Truth ComparisonSingle-frame depth-layer segmentations

Precision

Reca

ll

P/R Curve for Industrial1 A nity Matrices

Object a nityGeometric a nity

F=0.703

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

10.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

F=0.632

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Reca

ll

Segmentation Precision/Recall0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

F=0.605F=0.648

Precision0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Frame segmentationsScene segmentations

F=0.628F=0.549

Precision

Reca

ll

P/R Curve for Park A nity Matrices

Object a nityGeometric a nity

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

10.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Reca

ll

Segmentation Precision/Recall0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Precision0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

10.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

F=0.729

F=0.836

Frame segmentationsScene segmentations

Park

Industrial

Single-frame depth-layer segmentation

Dense monocular reconstruction

Scene objects are defined by a mesh-traversal geodesic distance,

−1

−0.50

0.5

1

−1

−0.5

0

0.5

1

−1

−0.5

00.5

1

Con

cavi

ty W

eigh

t

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Concavity Penalty

Frame 1. 1 1 2 2

4 4

1 1 2 21 1

4

A B

Frame 2.

Frame 3.

dL from A:

Traversal Penalty1��01��0

dL from B:

Depth-Layer Penalty

Segm

enta

tion

Hist

ory

Dist

ance

0

0.5

1

�(d L)

0

0.5

1

Obj

ect D

istan

ce

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.60

0.5

1

Geometric Distance

Combined

where

Mesh Geodesic

Batch-sequential k-means-like algorithm produces temporally-consistent segmentations as (dij) evolves.

T. Campbell, M. Liu, et al., Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture. NIPS 2013.

A. Ayvaci and S. Soatto, Detachable Object Detection: Segmentation and Depth ordering from Short Baseline Video. PAMI 2012.

Each frame is segmented into depth layers according to occlusion constraints computed from depthmaps.

Dense 3D reconstruction and per-frame camera poses are computed from monocular video of indoor and outdoor scenes.

G. Graber, Realtime 3D Reconstruction. Master’s Thesis

Wednesday, November 19, 14

Documents

System Overview - EECS Wiki Server