CV 輪講 Putting Objects in Perspective

  • View
    90

  • Download
    0

Embed Size (px)

DESCRIPTION

CV Putting Objects in Perspective. 2008 7 1 . Back ground. / . Putting Objects in Perspective. Derek Hoiem Alexei A. Efros Martial Hebert - PowerPoint PPT Presentation

Transcript

AdaBoost

CVPutting Objects in Perspective 2008711Back ground/

2Putting Objects in PerspectiveDerek HoiemAlexei A. EfrosMartial Hebert

Carnegie Mellon UniversityRobotics Institute

CVPR2006

Understanding an Image4One of the ultimate problems in computer vision is scene understanding, and scene understanding requires not only estimating the individual elements of the image but capturing the interplay among them. For example, if we humans look at this image, we understand the major surfaces: what can be walked on? what can be walked into? We understand the perspective: the cars that are further away are smaller in the image but they are actually the same size. And so we are able to put the objects into perspective and reason about them in the context of the 3D scene. How can we give computers this same ability?Today: Local and Independent

5The dominant paradigm in computer vision today is local and independent

Local Object Detection

True DetectionTrue DetectionsMissedMissedFalse DetectionsLocal Detector: [Dalal-Triggs 2005]7

Object SupportSurface Estimation

ImageSupportVerticalSkyV-LeftV-CenterV-RightV-PorousV-Solid[Hoiem, Efros, Hebert ICCV 2005]Software available onlineObjectSurface?Support?

9

Object Size in the Image

ImageWorld

10The object size relationships in the image appear to be very complicated. For instance, its hard to determine which is taller: the car or the person? But this question is easily answered if we can look at the relationships in the 3D scene. Fortunately, if the objects all lie on the ground plane, we only need two variables: the horizon and camera height. But how do we determine these parameters from a single image?

Input ImageObject Size Camera Viewpoint

Loose Viewpoint Prior11Its very difficult to estimate the viewpoint directly from an image, so to begin with, we simply have a loose prior on the viewpoint. Object Size Camera Viewpoint

Input ImageLoose Viewpoint Prior12Its very difficult to estimate the viewpoint directly from an image, so to begin with, we simply have a loose prior on the viewpoint.

Object Position/SizesViewpointObject Size Camera Viewpoint 13We can use that to refine our estimate of the viewpoint

Object Position/SizesViewpointObject Size Camera Viewpoint 14Which can in turn be used to determine the rough size of other objects in the image.

Object Position/SizesViewpointObject Size Camera Viewpoint 15But, since we know neither the objects nor the viewpoint to begin with, we need to solve for both of them at once.

Object Size Camera Viewpoint

Object Position/SizesViewpoint16And, of course, the more objects that we can detect, the better our viewpoint estimate, leading to a better estimate of how big those objects should be.Efficient from surface and viewpoint

Image

P(object)

P(object | surfaces)

P(surfaces)

P(viewpoint)

P(object | viewpoint)17So far, Ive described the objects relationships to surfaces and viewpoint in the scene. But how can we use support and viewpoint together to help object recognition? Suppose were looking for pedestrians in this image. Initially, we have the uniform distribution which is assumed by local object detectors, as illustrated on the bottom. What if we have some surface information? Here, the surface confidences for support, vertical, and sky, are shown in the green, red, and blue channels. The surfaces tell us where a pedestrian is likely to be in the image, but not how big he should be. Likewise, if we have some .

ImageP(object | surfaces, viewpoint)

Efficient from surface and viewpoint

P(object)

P(surfaces)

P(viewpoint)Scene Parts Are All InterconnectedObjects

3D Surfaces

Camera Viewpoint

Input to Algorithm

Surface EstimatesViewpoint PriorSurfaces: [Hoiem-Efros-Hebert 2005]Local Car DetectorLocal Ped DetectorObject DetectionLocal Detector: [Dalal-Triggs 2005]Approximate ModelObjects

3D Surfaces

Viewpoint

s1o1on...snLocal Object EvidenceLocal Surface EvidenceLocal Object EvidenceLocal Surface EvidenceViewpointObjectsLocal SurfacesInference over Tree

Viewpoint estimation

Viewpoint PriorHorizonHeightHeightHorizonLikelihoodLikelihoodViewpoint Final

Object IdentitieLocal detector

Surface GeometryProbability map

Object detection4 TP / 2 FP3 TP / 2 FP4 TP / 1 FPPed DetectionCar DetectionLocal Detector: [Dalal-Triggs 2005]4 TP / 0 FPCar: TP / FPPed: TP / FPInitial (Local)Final (Global)Experiments on LabelMe Dataset

Testing with LabelMe dataset: 422 images923 Cars at least 14 pixels tall720 Peds at least 36 pixels tall

Each piece of evidence improves performanceLocal Detector from [Murphy-Torralba-Freeman 2003]Car DetectionPedestrian Detection

Can be used with any detector that outputs confidencesLocal Detector: [Dalal-Triggs 2005] (SVM-based)

Car DetectionPedestrian DetectionAccurate Horizon Estimation

Median Error:8.5%4.5%3.0%90% Bound:[Murphy-Torralba-Freeman 2003][Dalal- Triggs 2005]Horizon Prior

30Qualitative ResultsInitial: 2 TP / 3 FPFinal: 7 TP / 4 FPLocal Detector from [Murphy-Torralba-Freeman 2003]Car: TP / FP Ped: TP / FP

Qualitative ResultsLocal Detector from [Murphy-Torralba-Freeman 2003]Car: TP / FP Ped: TP / FP

Initial: 1 TP / 14 FPFinal: 3 TP / 5 FPQualitative ResultsCar: TP / FP Ped: TP / FPLocal Detector from [Murphy-Torralba-Freeman 2003]

Initial: 1 TP / 23 FPFinal: 0 TP / 10 FPQualitative ResultsLocal Detector from [Murphy-Torralba-Freeman 2003]Car: TP / FP Ped: TP / FP

Initial: 0 TP / 6 FPFinal: 4 TP / 3 FPGeometric ContextEstimate surface

ground: green, sky: blue, vertical: red, o:porous, x: solidGeometric Cues

Color

LocationTexture

Perspective

Robust Spatial Support

RGB Pixels

Superpixels[Felzenszwalb and Huttenlocher 2004]oversegmentationMultiple Segmentations

Superpixels

Multiple Segmentations

Labeling Segments

Learn from training imagesmultiple segmentation ground, vertical, sky, or mixedboosted decision trees 8 nodes per treeLogistic regression version of Adaboost[Collins and Schapire and Singer 2002]

Label Likelihood

Homogeneity Likelihood

Image Labeling

Labeled SegmentationsLabeled PixelsLearned from training imagesSummary & Future Work

metersmetersPedPedCarReasoning in 3D:Object to objectScene labelObject segmentation42. Its important to note that these ideas have been around for a long time.ConclusionImage understanding is a 3D problemMust be solved jointly

This paper is a small stepMuch remains to be done

43In conclusion, image understanding is a 3D problem, even if all we have is a single image. And image understanding requires solving many different tasks: recognition, scene geometry, segmentation. But we can only get so far dealing with these tasks in isolation. We need to reason to about them together and not in the 2d world of the image but the 3D world in which we live. In this paper, weve taken a small step in this direction; much more remains to be done.

We believe that now that so much progress has been made in these individual areas, the next big step is to tie these things together into the image understanding problem.CVRecovering Occlusion Boundaries from a Single Image,Closing the Loop in Scene Interpretation 200882646Back ground/

47Recovering Occlusion Boundaries from a Single ImageDerek HoiemAndrew N. Stein, Alexei A. EfrosMartial Hebert

Carnegie Mellon UniversityRobotics Institute

ICCV07Edge, region, depth

Watershed with Pb soft boundariesRegion, Boundary, 3D Cuesdepth : horizon + junction to groundBoundaryConditional random field (CRF)Boundary

resultsBoundary

Object popout

Closing the Loopin Scene InterpretationDerek HoiemAlexei A. EfrosMartial Hebert

Carnegie Mellon UniversityRobotics Institute

CVPR08

Putting Objects in Perspective4 TP / 2 FP3 TP / 2 FP4 TP / 1 FPPed DetectionCar DetectionLocal Detector: [Dalal-Triggs 2005]4 TP / 0 FPCar: TP / FPPed: TP / FPInitial (Local)Final (Global)Scene Parts Are All InterconnectedObjects

3D Surfaces

Camera Viewpoint

with OcclusionsPutting Objects in PerspectiveAutomatic Photo Pop-up

Occlusion, Boundary

Putting Objects

Initial : Dalal-Triggs Iter 1 : Hoiem et al. Final : This paperCar : Up, Ped : DownPhoto popup Occlusion, Object

Occlusion/Boundarygeometry, depth

Occlusion/Boundary

Boundary59In conclusion, image understanding is a 3D problem, even if all we have is a single image. And image understanding requires solving many different tasks: recognition, scene geometry, segmentation. But we can only get so far dealing with these tasks in isolation. We need to reason to about them together and not in the 2d world of the image but the 3D world in which we live. In this paper, weve taken a small step in this direction; much more remains to be done.

We believe that now that so much progress has been made in these individual areas, the next big step is to tie these things together into the image understanding problem.