Upload
zukun
View
204
Download
0
Embed Size (px)
Citation preview
AckowledgementWe acknowledge the support of NSF CAREER #1054127 and the Gigascale Systems Research Center. We thank Mohit Bagra for collecting the Kinect dataset and Min Sun for helpful feedback.
Joint Likelihood MaximizationMain challenge: High dimensionality of unknowns => Sample P(q,u,o|C,O,Q) with MCMC
Parameter Initialization- Use object detection scale and pose to initialize cameras relative poses- Theorem: camera parameters can be estimated given: i) 3 objects with scale; ii) 2 objects with pose; iii) 1 object with scale and pose.
image 1 object det. image 2 object det.
one of camerapose initializations
one of camera poses and objects initializations
azimuth=70zenith =10
azimuth=90zenith =5
azimuth=20zenith =10
azimuth=-20zenith =9
Monte Carlo Markov Chain- Sampling starts from different initializations- Proposal distribution P(q,u,o|C,O,Q)- Combine all samples to identify the maximum
Results
1. Car Dataset [3] (available online) - Images and Dense Lidar Points - ~500 testing images in 10 scenarios
2. Kinect Office Dataset (available online) - Images and calibrated Kinect 3D range data - Mouse, Monitor, and Keyboard - 500 images in 10 scenarios
Comparison Baselines- Camera Pose Est.: Bundler [1]- Object Detection: LSVM [2]
Car Person Office
3. Person Dataset - A pair of stereo cameras - 400 image pairs in 10 scenarios
0.4 0.6 0.8 10
10
20
SSFM
Bundler
e (degree)T 2 Cam
1 Cam
Recall
False Positive per Image
3D object localizationCam. T. est. error v.s. baseline
Cam. T. est. error v.s. baseline
0 2 4 6 8 100
0.05
0.1
0.15
0.2
0.25
0.3Recall
False Positive per Image
3 Cam2 Cam1 Cam
4 Cam
1 2 3 40
20
40
SSFMBundler
e (degree)T
3D object localization
Reference[1] N. Snavely, S. M. Seitz, and R. S. Szeliski. Modeling the world from internet photo collections. IJCV. 2008.[2] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence of Pattern Analysis, 2009.[3] Gaurav Pandey, James McBride,,and Ryan Eustice,.Ford campus vision and lidar data set. International Journal of Robotics Research. 2011
Q
O
C
q
o
qo
C
SSFM Problem FormulationMeasurements- q: point features (e.g. DOG+SIFT)- u: point matches (e.g. threshold test)- o: 2D objects (e.g. [2])
Model Parameters (unknowns)- C: camera (K is known)- Q: 3D points (locations)- O: 3D objects (locations, poses, categories)
Intuition:In addition to point features, measurements of objects across views provide add-itional geometrical constraints that allow to relate cameras and scene parameters.
Model Overview{O,Q,C}=arg max P(q,u,o|C,O,Q)
=arg max P(q,u|C,Q)P(o|C,O)
Assumption: Given camera hypothesis, objects and points are independent
Point Likelihood P(q,u|C,Q)
Object Likelihood P(o|C,O)
- Estimate 3D object likelihood by 2D projection appearance:
Introduction
ConventionalStructure From Motion
2D object detection
Motivation:- Most 3D reconstruction methods do not povide semantic information.- Most recognition methds do not provide geometry and camera pose.- We propose to solve these two problem jointly.
Advantages:- Improve camera pose estimation, compared to feature-point-based SFM.- Improve object detections given multiple images, compared to independently detecting objects from each single images.- Establish object correspondences across views.
Goal: Estimate 3D location and pose of objects, 3D location of points, and camera parameters from 2 or more images.
Semantic Structure From MotionSid Yingze Bao and Silvio Savarese
Electrical and Computer Engineering, University of Michigan at Ann ArborSource code and data: http://www.eecs.umich.edu/vision/projects/ssfm/index.html
VISION LAB