CV 輪講 Putting Objects in Perspective

CV 輪講Putting Objects in

Perspective

藤吉研究室　土屋成光 2008年 7月 1日

Back ground

一般物体認識 / 画像シーン認識– 低解像度– 見えの違い– 奥行きによるサイズの違い　　⇒局所的な認識法が通用しない

人間は物体間の関係を利用– 三次元構造のモデル化– 局所的な認識手法を高精度に

Putting Objects in Perspective

Derek Hoiem ， Alexei A. Efros ， Martial Hebert

Carnegie Mellon University 　 Robotics Institute

CVPR2006

Understanding an Image

Today: Local and Independent

検出結果

Local Object Detection

True Detection

True Detections

MissedMissed

False Detections

Local Detector: [Dalal-Triggs 2005]

Object Support

Surface Estimation

Image Support Vertical Sky

V-Left V-Center V-Right V-Porous V-Solid

[Hoiem, Efros, Hebert ICCV 2005]

Software available online

ObjectSurface?

Support?

Object Size in the Image

Image World

Input Image

Object Size ↔ Camera Viewpoint

Loose Viewpoint Prior


Input Image Loose Viewpoint Prior

Object Position/Sizes Viewpoint








Efficient from surface and viewpoint

Image

P(object) P(object | surfaces)

P(surfaces) P(viewpoint)

P(object | viewpoint)

Image

P(object | surfaces, viewpoint)

Efficient from surface and viewpoint

P(object)

P(surfaces) P(viewpoint)

Scene Parts Are All Interconnected

Objects

3D SurfacesCamera Viewpoint

Input to Algorithm

Surface Estimates Viewpoint Prior

Surfaces: [Hoiem-Efros-Hebert 2005]

Local Car Detector

Local Ped Detector

Object Detection

Local Detector: [Dalal-Triggs 2005]

Approximate Model

Objects

3D SurfacesViewpoint

s1

o1

θ

on...

sn…

Local Object Evidence

Local Surface Evidence

Local Object Evidence

Local Surface Evidence

Viewpoint

Objects

Local Surfaces

Inference over Tree

Viewpoint estimation

Viewpoint Prior

HorizonHeight Height Horizon

Like

liho

od

Like

liho

od

Viewpoint Final

Object Identitie

Local detector

Surface Geometry

Probability map

Object detection

4 TP / 2 FP

3 TP / 2 FP

4 TP / 1 FP

Ped Detection

Car Detection

Local Detector: [Dalal-Triggs 2005]4 TP / 0 FP

Car: TP / FP

Ped: TP / FP

Initial (Local) Final (Global)

Experiments on LabelMe Dataset

Testing with LabelMe dataset: 422 images– 923 Cars at least 14 pixels tall– 720 Peds at least 36 pixels tall

Each piece of evidence improves performance

Local Detector from [Murphy-Torralba-Freeman 2003]

Car Detection Pedestrian Detection

Can be used with any detector that outputs confidences

Local Detector: [Dalal-Triggs 2005] (SVM-based)

Car Detection Pedestrian Detection

Accurate Horizon Estimation

Median Error: 8.5% 4.5% 3.0%

90% Bound:

[Murphy-Torralba-

Freeman 2003]

[Dalal- Triggs 2005]

Horizon Prior

Qualitative Results

Initial: 2 TP / 3 FP Final: 7 TP / 4 FP


Car: TP / FP Ped: TP / FP

Qualitative Results




Qualitative Results




Qualitative Results




Geometric Context

Estimate surface

ground: green, sky: blue, vertical: red, o:porous, x: solid

Geometric Cues

Color

Location

Texture

Perspective

Robust Spatial Support

RGB Pixels Superpixels

[Felzenszwalb and Huttenlocher 2004]

oversegmentation

Multiple Segmentations

Superpixels

…

Multiple Segmentations

単一のセグメントではセグメントエラーの可能性複数のセグメント数でセグメンテーション

Labeling Segments

…

…

各セグメント結果を統合

Learn from training images

前準備– multiple segmentation の算出– 各セグメントのラベルの算出 – ground, vertical,

sky, or “mixed” boosted decision trees による密度計算

– 8 nodes per tree– Logistic regression version of Adaboost

　　　　　　　　　　　　　　　　　　　　　　　 [Collins and Schapire and Singer 2002]

Label LikelihoodHomogeneity Likelihood

Image Labeling

…

Labeled Segmentations

Labeled Pixels

Learned from training images

Summary & Future Work

meters

met

ers

Ped Pe

dCar

Reasoning in 3D:• Object to object• Scene label• Object segmentation

Conclusion

Image understanding is a 3D problem– Must be solved jointly

This paper is a small step– Much remains to be done

CV 輪講Recovering Occlusion

Boundaries from a Single Image,

Closing the Loop in Scene Interpretation

藤吉研究室　土屋成光 2008年 8月 26日

Back ground

一般物体認識 / 画像シーン認識– 低解像度– 見えの違い– 奥行きによるサイズの違い　　⇒局所的な認識法が通用しない

人間は物体間の関係を利用– 三次元構造のモデル化– 局所的な認識手法を高精度に

Recovering Occlusion Boundaries from a Single

Image

Derek Hoiem ， Andrew N. Stein, Alexei A. Efros ， Martial Hebert


ICCV’07

単画像からのオクルージョン理解

オクルージョン，境界理解– 物体を探索する際に必須– Edge, region, depth によって推定

手法の流れ

1. 千領域にセグメンテーションWatershed with Pb soft boundaries

2. Region, Boundary, 3D Cues の算出depth : horizon + junction to ground

3. Boundary の算出Conditional random field (CRF)

4. Boundary を用いて更にセグメンテーション

results

Boundary

Object popout

Closing the Loopin Scene Interpretation

Derek Hoiem ， Alexei A. Efros ， Martial Hebert


CVPR’08

Putting Objects in Perspective

4 TP / 2 FP

3 TP / 2 FP

4 TP / 1 FP

Ped Detection

Car Detection

Local Detector: [Dalal-Triggs 2005]4 TP / 0 FP

Car: TP / FP

Ped: TP / FP

Initial (Local) Final (Global)

Scene Parts Are All Interconnected

Objects

3D SurfacesCamera Viewpoint

with Occlusions

一般物体認識フレームワークPutting Objects in Perspective

シーン構造認識Automatic Photo Pop-up

Occlusion, Boundary 情報の利用

関係モデル

相互に関係

Putting Objects への利用

相互的に情報を利用することで高精度に

Initial : Dalal-Triggs Iter 1 : Hoiem et al. Final : This paper

Car : Up, Ped : Down群衆の境界線の精度が問題

Photo popup への利用

Occlusion, Object の利用により高精度化

まとめ

Occlusion/Boundary の算出– 一枚の画像から geometry, depth などを用いて算出– 高精度なセグメンテーション

Occlusion/Boundary の利用– セグメンテーションによるエラーの低減– 一般物体認識に有用

課題：– 群衆などから得られる Boundary の高精度化

Documents

CV 輪講 Putting Objects in Perspective