Learning Representations - web.stanford.edu · Learning Representations CS331B: Representation Learning in Computer Vision Amir R. Zamir Silvio Savarese

Learning RepresentationsCS331B: Representation Learning in Computer Vision

Amir R. ZamirSilvio Savarese

(class logistics)● Paper assignments out● Next weeks papers

○ Discriminative learning of deep convolutional feature point descriptors, Simo-Serra, E., Trulls, E., Ferraz, L., Kokkinos, I., Fua, P., & Moreno-Noguer, F., ICCV15

○ Data-Driven 3D Voxel Patterns for Object Category Recognition, Yu Xiang, Wongun Choi, Yuanqing Lin and Silvio Savarese., CVPR15.

○ Convolutional-recursive deep learning for 3d object classification, Socher, R., Huval, B., Bath, B., Manning, C. D., & Ng, A. Y., NIPS12

2

What we talked about so far...

3

Things... Our Knowledge...

4

“Transcript”

Cat

Macbeth was guilty.

[ 81 20 84 64 58 39 17 54 72 15]

Representation Mathematical Model (e.g., classifier)

5

Some basics concepts related to representations● Ill-posedness● Readout Non-linearity ● Dimensionality● Computational Complexity ● Encoding power (i.e., performance)● Narrowness of application domain (vertical vs horizontal representations)

6

Ill-posedness

7C. F. Bohren, D. R. Huffman, 1983.

8

With respect to: {modeling parameters (decision) , independent variables (representation)}

Linearity

Independent var. (x,y)

Modeling Param. (a,b,c,r)

Linear non-Linear

Linear Linear

Decision boundary

Not discussing kernels, reparametrization, etc

Handcrafting...

9

Represent these cats for a cat detector!

10

Represent these cats for a cat detector! (II)

11

Represent these cats for a cat detector! (III)

12

Represent these cats for a cat detector! (IV)

13

Color Histograms

Deformable Part based Models

(DPM)

Histogram of Gradients

(HOG)

Models based Shapes

14Felzenszwalb et al., 2010. Dalal and Triggs, 2005.Beis and Lowe, 1997.

Handcrafting...● Was the only way for a long time. ● (almost) Worked for many great applications:

○ Image Retrieval, Structure-from-motion, Face detection, Identification, etc

● Why alternatives?

15



● Why alternatives?○ Can’t quite find the discriminative signature for a problem.

16

Why a sad image?



● Why alternatives?○ Can’t quite find the discriminative signature for a problem.○ Discriminative signature can be found, but hard to approach programmatically.

17

Track the fish.

http://www.youtube.com/watch?v=hlCgKrlSKS8



● Why alternatives?○ Can’t quite find the discriminative signature for a problem.○ Discriminative signature can be found, but hard to approach programmatically.○ Too many contributing factors to the problem.

■ Fusion non-trivial. Rule-based fusion outruled. ■ Fusion of contributing factors itself a comparably complex representation problem.

18

Why a sad image?

Learning Representations

19

Two approaches to representation learning● Unsupervised

○ Representation constrained on reconstruction.

20

● Supervised○ Representation constrained on task(s).

Two approaches to representation learning● Supervised

○ Representation constrained on task(s).● Unsupervised


21LeCun et al. 1998. Hinton et al. 2006.

Pros and Cons● Supervised

○ Representation constrained on task(s).● Unsupervised


22LeCun et al. 1998. Hinton et al. 2006.

Unsupervised Representation Learning

23

Unsupervised representation learning● Unsupervised


24Hinton et al. 2006.

● Clarification: ○ Unsupervised does not necessarily

mean Generic and vice versa.○ Many of the generic representations are

actually trained discriminatively.○ Lecture 12



25Hinton et al. 2006.

● Usually a reconstruction loss:○ L2 pixel loss



26Johnson et al. 2016.Hinton et al. 2006.

● Usually a reconstruction loss:○ L2 pixel loss○ Perceptual loss

Ground Truth Bicubic Perceptual

Unsupervised representation learning methods● Sparse Coding● Basic Autoencoders● Restricted Boltzmann Machine● Generative Adversarial Network based feature learning● ...

27

28

Sparse Coding

Lee et al. 2006.Salakhutdinov. 2016.

29

Sparse Coding

Lee et al. 2006.Salakhutdinov. 2016.

30

Autoencoder

Hinton et al. 2006.

31

Autoencoder

Hinton et al. 2006.

32

Autoencoder

Hinton et al. 2006.

http://www.youtube.com/watch?v=jcxi-nXuR1U

33

Autoencoder: reconstruction results

Hinton et al. 2006.

Input

PCA

AE

Supervised Models

34

35Stanford CS231n

Neural Networks

Neural Net

A Neuron

Convolutional X

Low-level matching

36Zagoruyko & Komodakis. 2015.

● SIFT/HOG/handcrafted local features counterpart

Brown, Hua Winder, local feature learning dataset

37

Low-level matching


● Learned vs l2 metric

Low-level matching: Some qualitative results


Low-level matching: quantitative evaluation


● 1st layer filters● Notice similarity to Gabor

and sparse coding filters

Dimensionality

41Han et al. 2015.

Low-level matching: handcrafted vs learned features

42Zamir et al. 2016.

Course webpage:http://web.stanford.edu/class/cs331b/

http://www.cs.stanford.edu/~amirz/http://cvgl.stanford.edu/silvio/

http://web.stanford.edu/class/cs331b/

http://web.stanford.edu/class/cs331b/

http://www.cs.stanford.edu/~amirz/

http://www.cs.stanford.edu/~amirz/

http://cvgl.stanford.edu/silvio/

http://cvgl.stanford.edu/silvio/

Documents

Learning Representations - web.stanford.edu · Learning Representations CS331B: Representation Learning in Computer Vision Amir R. Zamir Silvio Savarese