46
Models for Multi-View Object Class Detection Han-Pang Chiu 1

Models for Multi-View Object Class Detection Han-Pang Chiu 1

Embed Size (px)

Citation preview

Page 1: Models for Multi-View Object Class Detection Han-Pang Chiu 1

Models for Multi-View Object Class Detection

Han-Pang Chiu

1

Page 2: Models for Multi-View Object Class Detection Han-Pang Chiu 1

Multi-View Object Class Detection

2

Training Set

Test Set

Multi-View Same Object

Multi-View Object Class

Single-View Object Class

Page 3: Models for Multi-View Object Class Detection Han-Pang Chiu 1

The Roadblock

3

- The learning processes for each viewpoint of the same object class should be related.

• All existing methods for multi-view object class detection require many real training images of objects for many viewpoints.

Page 4: Models for Multi-View Object Class Detection Han-Pang Chiu 1

- a 3D class skeleton: The arrangement of part centroids in 3D.

The Potemkin1 model can be viewed as a collection of parts, which are oriented 3D primitives.

4

The Potemkin Model

- 2D projective transforms: The shape change of each part from one view to another.

1So-called “Potemkin villages” were artificial villages, constructed only of facades. Our models, too are constructed of facades.

Page 5: Models for Multi-View Object Class Detection Han-Pang Chiu 1

The Potemkin Model

multiple 2D models[Crandall07, Torralba04, Leibe07]

5

explicit 3D model[Hoiem07, Yan07]

cross-view constraints[Thomas06, Savarese07, Kushal07]

Related Approaches

Data-Efficiency , Compatibility

2D3D

Page 6: Models for Multi-View Object Class Detection Han-Pang Chiu 1

6

Two Uses of the Potemkin Model

Multi-View Object Class

Detection System

2D Test Image Detection Result

1. Generate virtual training data

3D Understanding

2. Reconstruct 3D shapes of detected objects

Page 7: Models for Multi-View Object Class Detection Han-Pang Chiu 1

7

Outline

Potemkin Model Basic Generalized 3D

Estimation Class Skeleton

Real Training

Data

Supervised Part

Labeling

Use

Virtual Training

Data Generation

Page 8: Models for Multi-View Object Class Detection Han-Pang Chiu 1

- K projection matrices

8

Definition of the Basic Potemkin Model

3D Space

K view bins

- K view bins

- a class skeleton (S1,S2,…,SN): class-dependent

2D Transforms

- NK2 transformation matrices

• A basic Potemkin model for an object class with N parts.

Page 9: Models for Multi-View Object Class Detection Han-Pang Chiu 1

9

T,

Estimating the Basic Potemkin Model Phase 1

- Learn 2D projective transforms from a 3D oriented primitive

view

view

T2, T3

, ………………

8 Degrees Of Freedom

view view

T1,

Page 10: Models for Multi-View Object Class Detection Han-Pang Chiu 1

10

Estimating the Basic Potemkin Model Phase 2

- We compute 3D class skeleton for the target object class.- Each part needs to be visible in at least two views from the view bins we are interested in. - We need to label the view bins and the parts of objects in real training images.

Page 11: Models for Multi-View Object Class Detection Han-Pang Chiu 1

11

Using the Basic Potemkin Model

Page 12: Models for Multi-View Object Class Detection Han-Pang Chiu 1

3D Model

SyntheticClass-Independent

2D Synthetic Views

Shape Primitives

Generic Transforms

Target Object Class

RealClass-Specific

Few Labeled Images

Skeleton

Part TransformsPart Transforms

The Basic Potemkin ModelEstimating Using

All Labeled Images

Virtual ImagesCombine PartsCombine Parts

VirtualView-Specific

12

Page 13: Models for Multi-View Object Class Detection Han-Pang Chiu 1

13

Problem of the Basic Potemkin Model

-0.5

0

0.5

-1-0.5

00.5

1

-0.8

-0.6

-0.4

-0.2

0

0.2

34

2

6

y

1

5

x

z

-100 -50 0 50

-60

-40

-20

0

20

40

60

80

-4000-20000

2000 x

y

-100 -50 0 50

-60

-40

-20

0

20

40

60

80

-4000-20000

2000 x

y

-50 0 50

-60

-40

-20

0

20

40

60

80

-4000-2000

0x

y

-50 0 50

-60

-40

-20

0

20

40

60

80

-4000-2000

0 x

y

-50 0 50 100

-60

-40

-20

0

20

40

60

80

-4000-2000

02000 x

y

-50 0 50 100

-60

-40

-20

0

20

40

60

80

-4000-20000

2000 x

y

Page 14: Models for Multi-View Object Class Detection Han-Pang Chiu 1

14

Outline

Potemkin Model Basic Generalized 3D

Estimation Class Skeleton

Multiple Primitives

Real Training

DataSupervised Part Labeling

Use Virtual Training Data Generation

Page 15: Models for Multi-View Object Class Detection Han-Pang Chiu 1

Multiple Oriented Primitives

2D Transforms 2D views

MultiplePrimitives

15

• An oriented primitive is decided by the 3D shape and the starting view bin.

K viewsView1 View2 ……………………….. View K

azimuth

elevation

azimuth

Page 16: Models for Multi-View Object Class Detection Han-Pang Chiu 1

3D Shapes

16

2D TransformT,

view

view

K view bins

Page 17: Models for Multi-View Object Class Detection Han-Pang Chiu 1

3D Model

Target Object Class

All Labeled Images

SyntheticClass-Independent

RealClass-Specific

Few Labeled Images

2D Synthetic Views

Primitive Selection

Shape Primitives

Generic Transforms Skeleton

Part Transforms

Infer Part IndicatorInfer Part Indicator Virtual ImagesCombine PartsCombine Parts

Part Transforms

VirtualView-Specific

The Potemkin ModelEstimating Using

17

Page 18: Models for Multi-View Object Class Detection Han-Pang Chiu 1

- Find a best set of primitives to model all parts

M

18

Greedy Primitive Selection

- Four primitives are enough for modeling four object classes (21 object parts).

1 2 3 4 5 6 7 80.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

Number of Greedily Selected Primitives

Qua

lity

of T

rans

form

atio

n

chair

bicycle

caraircraft

all classes

Greedy Selection

view view

A

Mm

mT

,...,2,1

,

BABA

B

?

Page 19: Models for Multi-View Object Class Detection Han-Pang Chiu 1

19

Primitive-Based Representation

Page 20: Models for Multi-View Object Class Detection Han-Pang Chiu 1

• Better predict what objects look like in novel views

Single Primitive

Multiple Primitives 20

The Influence of Multiple Primitives

Page 21: Models for Multi-View Object Class Detection Han-Pang Chiu 1

21

Virtual Training Images

Page 22: Models for Multi-View Object Class Detection Han-Pang Chiu 1

3D Model

Target Object Class

All Labeled Images

SyntheticClass-Independent

RealClass-Specific

Few Labeled Images

2D Synthetic Views

Primitive Selection

Shape Primitives

Generic Transforms Skeleton

Part Transforms

Infer Part IndicatorInfer Part Indicator Virtual ImagesCombine PartsCombine Parts

Part Transforms

VirtualView-Specific

The Potemkin ModelEstimating Using

22

Page 23: Models for Multi-View Object Class Detection Han-Pang Chiu 1

23

Outline

Potemkin Model Basic Generalized

Estimation Class Skeleton

Multiple Primitives

Real Training

Data

Supervised Part

Labeling

Self-Supervised

Part Labeling

Use Virtual Training Data Generation

Page 24: Models for Multi-View Object Class Detection Han-Pang Chiu 1

Self-Supervised Part Labeling• For the target view, choose one model object and label its parts.• The model object is then deformed to other objects in the target view for part labeling.

20 40 6080100

50

100

150

20 4060 80100

50

100

150

20 40 60 80 100

50

100

150

100 samples

20 40 60 80 100

50

100

150

100 samples

10 20 30 40 50 60 70 80 90 100 110

20

40

60

80

100

120

140

160

93 correspondences (unwarped X)

10 20 30 40 50 60 70 80 90 100 110

20

40

60

80

100

120

140

160

k=6, o=1, I

f=0.06657, aff.cost=0.10301, SC cost=0.07626

50 100 150 200

20406080

100

50 100 150 200

20406080

100

50 100 150 200

20

40

60

80

100

100 samples

50 100 150 200

20

40

60

80

100

100 samples

20 40 60 80 100 120 140 160 180 200 220

10

20

30

40

50

60

70

80

90

100

110

75 correspondences (unwarped X)

20 40 60 80 100 120 140 160 180 200 220

10

20

30

40

50

60

70

80

90

100

110

k=6, o=1, I

f=0.055368, aff.cost=0.084792, SC cost=0.14406

24

Page 25: Models for Multi-View Object Class Detection Han-Pang Chiu 1

Multi-View Class Detection Experiment• Detector: Crandall’s system (CVPR05, CVPR07)• Dataset: cars (partial PASCAL), chairs (collected by LIS)• Each view (Real/Virtual Training): 20/100 (chairs), 15/50 (cars)• Task: Object/No Object, No viewpoint identification

250 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Object Class: Chair Object Class: Car

False Positive Rate False Positive Rate

True

Pos

itive

Rat

e

Real images

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Real imagesReal images from all views

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Real imagesReal images from all viewsReal + Virtual (single primitive)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Real imagesReal images from all viewsReal + Virtual (single primitive)Real + Virtual (multiple primitives)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Real + Virtual (self-supervised)

Real imagesReal images from all viewsReal + Virtual (single primitive)Real + Virtual (multiple primitives)

Page 26: Models for Multi-View Object Class Detection Han-Pang Chiu 1

26

Outline

Potemkin Model Basic Generalized 3D

Estimation Class Skeleton

Multiple Primitives Class Planes

Real Training

Data

Supervised Part

Labeling

Self-Supervised

Part Labeling

Use Virtual Training Data Generation

Page 27: Models for Multi-View Object Class Detection Han-Pang Chiu 1

27

Definition of the 3D Potemkin Model

3D Space

K view bins

- K view bins - K projection matrices, K rotation matrices, TR33

- a class skeleton (S1,S2,…,SN)- K part-labeled images-N 3D planes, Qi ,(i 1,…N): ai X+bi Y+ci Z+di =0

• A 3D Potemkin model for an object class with N parts.

Page 28: Models for Multi-View Object Class Detection Han-Pang Chiu 1

28

3D Representation• Efficiently capture prior knowledge of 3D shapes of the target

object class.• The object class is represented as a collection of parts, which

are oriented 3D primitive shapes. • This representation is only approximately correct.

Page 29: Models for Multi-View Object Class Detection Han-Pang Chiu 1

Estimating 3D Planes

29

-100 -50 0 50 100

-60

-40

-20

0

20

40

60

80

-4000-2000020004000 x

y

-50 0 50

-60

-40

-20

0

20

40

60

80

-4000-200002000

x

y

-50 0 50

-60

-40

-20

0

20

40

60

80

-4000-200002000 x

y

-100 -50 0 50 100

-60

-40

-20

0

20

40

60

80

-4000-2000020004000x

y

-50 0 50 100

-60

-40

-20

0

20

40

60

80

-4000-2000020004000 x

y

-100 -50 0 50

-60

-40

-20

0

20

40

60

80

-4000-2000020004000 x

y

Page 30: Models for Multi-View Object Class Detection Han-Pang Chiu 1

No Occlusion Handling

Occlusion Handling

Self-Occlusion Handling

-50 0 50

-60

-40

-20

0

20

40

60

80

-4000-200002000 x

y

-100 -50 0 50 100

-60

-40

-20

0

20

40

60

80

-4000-2000020004000x

y

-50 0 50 100

-60

-40

-20

0

20

40

60

80

-4000-2000020004000 x

y

30

Page 31: Models for Multi-View Object Class Detection Han-Pang Chiu 1

3D Potemkin Model: CarMinimum requirement: four views of one instanceNumber of Parts: 8(right-side, grille, hood, windshield, roof,back-windshield, back-grille, left-side)

-140 -120 -100 -80 -60 -40 -20 0 20

-60

-40

-20

0

20

40

60

-20-100x 10

4

-100 -50 0 50-100

-50

0

50

100

-20-100x 10

4-150 -100 -50 0 50 100

-100

-50

0

50

-15-10-505x 104

yx

31

Page 32: Models for Multi-View Object Class Detection Han-Pang Chiu 1

32

Outline

Potemkin Model Basic Generalized 3D

Estimation Class Skeleton

Multiple Primitives Class Planes

Real Training

Data

Supervised Part

Labeling

Self-Supervised

Part Labeling

Use Virtual Training Data Generation

Single-View 3D Reconstruction

Page 33: Models for Multi-View Object Class Detection Han-Pang Chiu 1

Single-View Reconstruction• 3D Reconstruction (X, Y, Z) from a Single 2D Image (xim, yim)

- a camera matrix (M), a 3D plane

33

034333231

24232221

34333231

14131211

34333231

24232221

14131211

dcZbYaX

mZmYmXm

mZmYmXmy

mZmYmXm

mZmYmXmx

mmmm

mmmm

mmmm

M

im

im

Page 34: Models for Multi-View Object Class Detection Han-Pang Chiu 1

Detection(Leibe et al. 07)

Segmentation(Li et al. 05)

Automatic 3D Reconstruction• 3D Class-Specific Reconstruction from a Single 2D Image - a camera matrix (M), a 3D ground plane (agX+bgY+cgZ+dg=0)

34

2D Input Self-SupervisedPart Registration

20 40 60 80 100 120

10

20

30

40

50

60

70

Geometric Context

(Hoiem et al.05)

50 100 150 200 250 300 350

50

100

150

200

250

300

50 100 150 200 250 300 350

50

100

150

200

250

300

3D Output

-100 -50 0 50-100

-50

0

50

100

-20-100x 10

4

3D PotemkinModel

0: iiiii dZcYbXaP

20 40 60 80 100 120

10

20

30

40

50

60

70

Occluded Part

PredictionP120 40 60 80 100 120

10

20

30

40

50

60

70

P2

offset

Page 35: Models for Multi-View Object Class Detection Han-Pang Chiu 1

• Hoiem et al. classified image regions into three geometric classes (ground, vertical surfaces, and sky).

• They treat detected objects as vertical planar surfaces in 3D.

• They set a default camera matrix and a default 3D ground plane.

Application: Photo Pop-up

35

Page 36: Models for Multi-View Object Class Detection Han-Pang Chiu 1

Object Pop-up

36

The link of the demo videos:http://people.csail.mit.edu/chiu/demos.htm

Page 37: Models for Multi-View Object Class Detection Han-Pang Chiu 1

Depth Map Prediction

• Match a predicted depth map against available 2.5D data • Improve performance of existing 2D detection systems

37

50 100 150 200 250 300

50

100

150

200

50 100 150 200 250 300

50

100

150

200

50 100 150 200 250 300

50

100

150

200

50 100 150 200 250 300

50

100

150

200

50 100 150 200 250 300

50

100

150

200

50 100 150 200 250 300

50

100

150

200

50 100 150 200 250 300

50

100

150

200

50 100 150 200 250 300

50

100

150

200

50 100 150 200 250 300

50

100

150

200

50 100 150 200 250 300

50

100

150

200

Page 38: Models for Multi-View Object Class Detection Han-Pang Chiu 1

Application: Object Detection

38

• 109 test images and stereo depth maps, 127 annotated cars

50 100 150 200 250 300

50

100

150

200

0

5

10

15

20

25

30

35

40

45

50

zs

221

,

))((min21

aZaZD isaa

i

• 15 candidates/image (each candidate ci: bounding box bi, likelihood li from 2D detector, predicted depth map zi)

))1()1log()exp(log( wDwl ii scale offset

Likelihood from detector Depth consistency

Videre Designs

zi

Page 39: Models for Multi-View Object Class Detection Han-Pang Chiu 1

Experimental Results

39

• Number of car training/test images: 155/109• Murphy-Torralba-Freeman detector (w = 0.5)• Dalal-Triggs detector (w=0.6)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

FP per image

Det

ectio

n R

ate

2D Detector

2D Detector(With Depth)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

0.1

0.2

0.3

0.4

0.5

0.6

FP per image

Det

ectio

n R

ate

2D Detector

2D Detector(With Depth)

Murphy-Torralba-Freeman Detector Dalal-Triggs Detector

Page 40: Models for Multi-View Object Class Detection Han-Pang Chiu 1

Quality of Reconstruction• Calibration: Camera, 3D ground plane (1m by 1.2m table) • 20 diecast model cars

40

Average overlap centroid error orientation errorPotemkin 77.5 % 8.75 mm 2.34o

Single Plane 73.95 mm 16.26o

Ferrari F1: 26.56%, 24.89 mm, 3.37o

Page 41: Models for Multi-View Object Class Detection Han-Pang Chiu 1

Application: Robot Manipulation• 20 diecast model cars, 60 trials• Successful grasp: 57/60 (Potemkin), 6/60 (Single Plane)

41

The link of the demo videos:http://people.csail.mit.edu/chiu/demos.htm

Page 42: Models for Multi-View Object Class Detection Han-Pang Chiu 1

Application: Robot Manipulation• 20 diecast model cars, 60 trials• Successful grasp: 57/60 (Potemkin), 6/60 (Single Plane)

42

Page 43: Models for Multi-View Object Class Detection Han-Pang Chiu 1

-1000100200

-600

-500

-400

-300

-200

-100

0

-200

-100

0

100

Xobject

Extrinsic parameters (object-centered)

Yobject

Z obje

ct

Occluded Part Prediction• A Basket instance

43

-0.50

0.51

1.5 -1-0.5

00.5

-1.2

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

y

2

35

6

x

4

1

z

-1000

100200

-600

-400

-200

0

-200

-100

0

100

Xobject

Extrinsic parameters (object-centered)

Yobject

Z obje

ct

The link of the demo videos:http://people.csail.mit.edu/chiu/demos.htm

Page 44: Models for Multi-View Object Class Detection Han-Pang Chiu 1

Contributions

• The Potemkin Model: - Provide a middle ground between 2D and 3D - Construct a relatively weak 3D model - Generate virtual training data - Reconstruct 3D objects from a single image

• Applications - Multi-view object class detection - Object pop-up - Object detection using 2.5D data - Robot Manipulation

44

Page 45: Models for Multi-View Object Class Detection Han-Pang Chiu 1

Acknowledgements

• Thesis committee members - Tómas Lozano-Pérez, Leslie Kaelbling, Bill Freeman

• Experimental Help - LableMe and detection system: Sam Davies - Robot system: Kaijen Hsiao and Huan Liu - Data collection: Meg A. Lippow and Sarah Finney - Stereo vision: Tom Yeh and Sybor Wang - Others: David Huynh, Yushi Xu, and Hung-An Chang

• All LIS people • My parents and my wife, Ju-Hui

45

Page 46: Models for Multi-View Object Class Detection Han-Pang Chiu 1

46

Thank you!