37
An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker: Wei-Lun Chao Date: Nov. 23, 2011 DISP Lab, Graduate Institute of Communication Engineering, National Taiwan University 1

An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

An Introduction to Sparse Coding,

Sparse Sensing, and Optimization

Speaker: Wei-Lun Chao

Date: Nov. 23, 2011

DISP Lab, Graduate Institute of Communication Engineering, National Taiwan University 1

Page 2: An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

Outline

• Introduction

• The fundamental of optimization

• The idea of sparsity: coding V.S. sensing

• The solution

• The importance of dictionary

• Applications

2

Page 3: An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

Introduction

3

Page 4: An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

Introduction

• What is sparsity?

• Usage:

Compression

Analysis

Representation

Fast / sparse sensing4

Projection

bases

Reconstruction

bases

Page 5: An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

Introduction

• Why do we use Fourier transform and its modifications

for image and acoustic compression?

Differentiability (theoretical)

Intrinsic sparsity (data-dependent)

Human perception (human-centric)

• Better bases for compression or representation?

Wavelets

How about data-dependent bases?

How about learning?

5

Page 6: An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

Introduction

• Optimization

Frequently faced in algorithm design

Used to implement you creative idea

• Issue

What kinds of mathematical form and its corresponding

optimization algorithms do guarantee the convergence to

local or global optima?

6

Page 7: An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

The Fundamental of Optimization

7

Page 8: An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

A Warming-up Question

• How do you solve the following problems?

(1)

(2)

8

2min ( ) ( 5)w

f w w

Local minima

Global minima

(a) Plot

5

(b) Take derivatives, check = 0

Page 9: An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

An Advanced Question

• How about the following questions?

(3)

(4)

(5)

9

1

min ( )N

nw

n

f w

(a) Plot? (b) Take derivative = 0?

2min ( ) ( 5)

s.t. 3

wf w w

w

1

min ( )

s.t.

N

n

i

n

i

f

w b

w

w

53

Derivative?

How to do?

Page 10: An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

Illustration

• 2-D case:

10

1 21 21 2

,min ( , ), s. ( , )t. w w

g ww wf w b

1w

2w

1 2( , ) 1f w w

1 2( , ) 2f w w

1 2( , ) 3f w w

1 2( , ) 4f w w

1 2( , ) 5f w w 1 2( , ) 6f w w

1 2( , )g w w b

Page 11: An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

How to Solve?

• Thanks to……

Lagrange multiplier

Linear programming, quadratic programming, and recently,

convex optimization

• Standard form:

11

0min ( )

s.t. ( ) , 1,......,

s.t. ( ) , 1,......,

i i

i i

f

h b i m

g c i n

ww

w

w

Page 12: An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

Fallacy

• A quadratic programming problem with constraints

12

2

min A x

x b

1

1 2

| | | |

......

|| | |

N

N

x

b

x

a a a

The importance of each food

Personal nutrient need

Nutrient content of each food

0ix

(1) Take derivative (x)

(2) Quadratic programming (o)

(3) Sparse coding (o)

2

arg min , s.t . 0iA b x x

x

1( )

choose with 0

T T

i i

A A A

x x

x b

Page 13: An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

The Idea of Sparsity

13

Page 14: An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

What is Sparsity?

• Think about a problem?

14

2

1

1 2

min

| | | |

......

|| | |

d

N

N

N

A

x

R

x

xx b

a a a b

2

Many can achieve

min 0

x

A x

x b

Which do you want?

Assume full rank, N > d

Choose the x with the

least nonzero component

2

0arg min , s.t. 0A

xx x b

Page 15: An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

Why Sparsity?

• The more concise, the more better

• In some domain, there naturally exists a sparse latent vector

that controls the data we saw. (ex. MRI, music)

• In some domain, samples from the same class have the sparse

property.

• The domain can be learned.

15

1 2

0

| | | |

...... (

| | | |

0

noise)

i

d

j

x

x

b a a aA k-sparse domain means that each b can

be constructed by a x vector with at most

k nonzero element

Page 16: An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

Sparse Sensing VS. Sparse Coding

• Assume that:

16

Sparse

coding

Sparse

sensing

2*

0arg min , s.t. 0A

xx x x b

We have , . Now an observation comes ind dNA RN dR b

, p d

W

W R d p

y b

dRb

pRy 2**

0arg min , s.t. 0Q

xx x x y

, with sparseAb x x

, with sp rse aW W QA y b x x x

* **x = x

Note: p is based on the sparsity of the data (on k)

Page 17: An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

Sparse Sensing

17

, p d

W

W R d p

y b

dRb

pRy

, with sparseAb x x

, with sp rse aW W QA y b x x x

* **x x

Page 18: An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

Sparse Sensing VS. Sparse Coding

• Sparse sensing (compressed sensing):

It spends much time or money to get b, so get y first then

recover b

• Sparse coding (sparse representation):

Believe that there exists the sparse property in the data,

otherwise sparse representation means nothing.

x is used to be the feature of b

x can be used to efficiently store b and reconstruct b

18

Page 19: An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

The Solution

19

Page 20: An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

How to Get The Sparse Solution?

• There is no algorithm other than exhaustively searching to solve:

• While in some situations (ex. special form of A), the solution of

l1 minimization approaches the one of l0 minimization

20

2*

0arg min , s.t. 0A

xx x x b

2*** ( )

11

*** *

arg min = , s.t. 0N

n

n

x A

x

x x x b

x x

Page 21: An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

Why l1?

• Question 1: Why l1 can result in a sparse solution?

21

2

11

2arg min , arg min , s.t. 0 s.t. A cA

x xx b xx x b

1w

2w

2

A x b

1 cx

2 cx

Page 22: An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

Why l1?

• Question 2: Why the sparse solution achieved by l1

minimization approaches the one of l0 minimization?

This is a matter or Mathematics

No matter how, sparse representation based on l1 minimization

has been widely used for pattern recognition.

In addition, if one doesn’t care about using the sparse solution

for representation (feature), it seems OK if these two solutions

are not the same.

22

***

*

A

A

b x

b x

Page 23: An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

Noise

• Sometimes, the data is observed with nose

• The answer seems to be negative

23

*

1 2

*

0

| | | |

......

| | | |

0

i

d

i

x

x

b a a anoise b b

0 1( ) minimizationl l

???x x

2* *

1

* * * *

2* * *

1

not

, arg min , s.t. 0

, and is usually

arg min , s.t. 0 is neither equal to no

spar

se

r to

A A noise

A

y

x

b x y y y

y x x y

x x x b y x x

possibly not sparse

Page 24: An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

Noise

• Several ways to overcome this:

• What is the difference between:

24

2

1 1 2

1 1

arg min , s.t. 0 arg min , s.t.

arg min , s.t.

A A c

A c

x x

x

x x b x x b

x x b

22

1 1arg min , s.t. 0 arg min , s.t. | 0, where A A I

x z

xx x b z z b z

t

2 1 and A c A c x b x b

Page 25: An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

Equivalent form

• You may also see several forms for the problem:

• These equivalent forms are derived from Lagrange

multiplier

• There have been several publications aiming at how

solving the l1 minimization problem.

25

1 1 1 1

1

arg min , s.t. arg min

arg min , s.t.

A Ac

dA

x x

x

x x b x x b

x b x

Page 26: An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

The Importance of Dictionary

26

Page 27: An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

Dictionary generation

• If the preceding sections, we generally assume that

the (over-complete) bases A is existed and known

• However in practice, we usually need to build it:

Wavelet + Fourier + Haar + ……

Learning based on data

• How to learn?

• May result in over-fitting27

( ) (1) (2) ( )

1

* * (1) (2) (2 )

1,

Given a training set , form as ......

, arg min , where ......

Ni d N

i

N

A X F

R B B

A X B AX X X

b b b b

x x x

Page 28: An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

Applications

28

Page 29: An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

Back to the problem we have

• A quadratic programming problem with constraints

29

2

min A x

x b

1

1 2

| | | |

......

|| | |

N

N

x

b

x

a a a

The importance of each food

Personal nutrient need

Nutrient content of each food

0ix

(1) Take derivative (x)

(2) Quadratic programming (o)

(3) Sparse coding (o)

2

arg min , s.t . 0iA b x x

x

1( )

choose with 0

T T

i i

A A A

x x

x b

Page 30: An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

Face Recognition (1)

30

Page 31: An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

Face Recognition (2)

31

Page 32: An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

An important issue

• When using sparse representation as a way of feature

extraction, you may wonder, even if there exists the

sparsity property in the data, does sparse feature

really come up with better results? Does it contain

any semantic meaning?

• Successful areas:

Face recognition

Digit recognition

Object recognition (with carful design):

Ex. K-means Sparse representation

32

Page 33: An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

De-noising

33

Learn a patch dictionary.

For each patch, compute

the sparse representation

then use it to reconstruct

the patch.

*

1 1

*

arg min A

A

xx x x b

b x

Page 34: An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

Detection based on reconstruction

34

Learn a patch dictionary for a specific

object. For each patch in the image,

compute the sparse representation

and use it to reconstruct the image.

Check the error for each patch, and

identify those with small error as

detected object.

*

1 1

*

2

2

arg min

check

A

A

xx x x b

b x

b bMaybe not over-complete

Other cases: Foreground-background detection, pedestrian detection, ……

Page 35: An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

Conclusion

35

Page 36: An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

What you should know

• What is the form of standard optimization?

• What is sparsity?

• What is sparse coding and sparse sensing?

• What kind of optimization method to solve it?

• Try to use it !!

36

Page 37: An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

Thank you for listening

37