38
Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford Brookes University

Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Embed Size (px)

Citation preview

Page 1: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Latent SVMs for Human Detection with a Locally Affine Deformation Field

Ľubor Ladický1 Phil Torr2 Andrew Zisserman1

1 University of Oxford 2 Oxford Brookes University

Page 2: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Object Detection

• Find all objects of interest

• Enclose them tightly in a bounding box

Page 3: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

HOG Detector

Dalal & Triggs CVPR05

• Sliding window using learnt HOG template

• Post-processing using non-maxima suppression

Page 4: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Dalal & Triggs CVPR05

HOG Detector

• Sliding window using learnt HOG template

• Post-processing using non-maxima suppression

Page 5: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Dalal & Triggs CVPR05

HOG Detector

• Sliding window using learnt HOG template

• Post-processing using non-maxima suppression

Page 6: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Dalal & Triggs CVPR05

HOG Detector

• Sliding window using learnt HOG template

• Post-processing using non-maxima suppression

Page 7: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Dalal & Triggs CVPR05

HOG Detector

• Sliding window using learnt HOG template

• Post-processing using non-maxima suppression

Page 8: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Dalal & Triggs CVPR05

HOG Detector

• Sliding window using learnt HOG template

• Post-processing using non-maxima suppression

Page 9: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Classifier response :

HOG Detector

Dalal & Triggs CVPR05

The weights w* and the bias b* learnt using the Linear SVM as:

Page 10: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Dalal & Triggs CVPR05

Does not fit well !• Sliding window using learnt HOG template

• Post-processing using non-maxima suppression

HOG Detector

Page 11: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Deformable Part-based Model

Felzenszwalb et al. CVPR08

• Allows parts to move relative to the centre

• Effectively allows the template to deform

• Multiple models based on an aspect ratio

Page 12: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Felzenszwalb et al. CVPR08

Deformable Part-based Model

• Allows parts to move relative to the centre

• Effectively allows the template to deform

• Multiple models based on an aspect ratio

Page 13: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Cells c displaced by the deformation field d:

Our approach

Page 14: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Cells c displaced by the deformation field d:

Our approach

Regularisation takes the form of a pairwise MRF:

Classifier response :

Page 15: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Cells c displaced by the deformation field d:

Our approach

The weights w* and the bias b* learnt using the Latent Linear SVM as:

Classifier response :

Page 16: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Comparison with our approach

HOG template

(no deformation)

Part-based model

(rigid movable parts)

Our model

(deformation field)

Page 17: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Our approach

Why hasn’t anyone tried it before?

Page 18: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Our approach

Why hasn’t anyone tried it before?

• Latent models with many latent variables tend to overfit

• Inference not feasible for a sliding window

Page 19: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Our approach

Why hasn’t anyone tried it before?

• Latent models with many latent variables tend to overfit

• Inference not feasible for a sliding window

• Deformation field used before only for

• Classification task (Duchenne et al ICCV11)

• Rescoring of detections (Ladický, PhD thesis)

Page 20: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Our approach

We restrict the deformation field to be locally affine ( ):

Page 21: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Our approach

We restrict the deformation field to be locally affine ( ):

Page 22: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Our approach

We restrict the deformation field to be locally affine ( ):

Page 23: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Optimisation

Weights / bias (w*, b*) and the deformation fields dk estimated iteratively

Page 24: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Optimisation

Weights / bias (w*, b*) and the deformation fields dk estimated iteratively

Given the deformation fields the problem is a standard linear SVM:

Page 25: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Optimisation

Weights / bias (w*, b*) and the deformation fields dk estimated iteratively

Given (w*, b*) the problem is a constrained MRF optimisation:

The last can be decomposed as :

By defining the optimisation becomes:

Page 26: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Optimisation

• The location of the cells in the first row and in the first column fully

determine the location of each cell

• Any locally affine deformation field can be reached by two moves :

• each column i moves by (Δcdix ,Δcdi

y)

• each row j moves by (Δrdjx ,Δrdj

y)

Page 27: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Optimisation

• The location of the cells in the first row and in the first column fully

determine the location of each cell

• Any locally affine deformation field can be reached by two moves :

• each column i moves by (Δcdix ,Δcdi

y)

• each row j moves by (Δrdjx ,Δrdj

y)

Page 28: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Optimisation

• The location of the cells in the first row and in the first column fully

determine the location of each cell

• Any locally affine deformation field can be reached by two moves :

• each column i moves by (Δcdix ,Δcdi

y)

• each row j moves by (Δrdjx ,Δrdj

y)

• Such moves do not alter the local affinity

Page 29: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Optimisation

• The location of the cells in the first row and in the first column fully

determine the location of each cell

• Any locally affine deformation field can be reached by two moves :

• each column i moves by (Δcdix ,Δcdi

y)

• each row j moves by (Δrdjx ,Δrdj

y)

• Such moves do not alter the local affinity

• Both moves can be solved quickly using dynamic programming

Page 30: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Learning multiple poses / viewpoints

Page 31: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Learning multiple poses / viewpoints

• We define a similarity measure between two training samples as :

where

Page 32: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Learning multiple poses / viewpoints

• We define a similarity measure between two training samples as :

where

• K-medoid clustering of S matrix clusters the data into multi model

Page 33: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Experiments

• Buffy dataset (typically used for pose estimation)

• Contains large variety of poses, viewpoints and aspect ratios

• Consists of 748 images

• Episode s5e3 used for training

• Episode s5e4 used for validation

• Episodes s5e2, s5e5 and s5e6 used for testing

Ferrari et al. CVPR08

Page 34: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Clustering of training samples

Each row corresponds to one model (out of 10 models)

Page 35: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Qualitative results

Page 36: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Quantitative results

Page 37: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Conclusion

• We propose

• Novel inference for locally affine deformation field (LADF)

• Object detector using LADF

• Clustering using LADF

Page 38: Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Questions?

Thank you