64
Introduction The Basics Adding Prior Knowledge Conclusions Sparse Coding: An Overview Brian Booth SFU Machine Learning Reading Group November 12, 2013

Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

  • Upload
    vokiet

  • View
    215

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Sparse Coding: An Overview

Brian Booth

SFU Machine Learning Reading Group

November 12, 2013

Page 2: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

The aim of sparse coding

Every columnof D is aprototype

Similar to, butmore generalthan, PCA

Page 3: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

The aim of sparse coding

Every columnof D is aprototype

Similar to, butmore generalthan, PCA

Page 4: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

The aim of sparse coding

Every columnof D is aprototype

Similar to, butmore generalthan, PCA

Page 5: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Example: Sparse Coding of Images

Page 6: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Sparse Coding in V1

Page 7: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Example: Image Denoising

Page 8: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Example: Image Restoration

Page 9: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Sparse Coding and Acoustics

Inner ear (cochlea) alsodoes sparse coding offrequencies

Page 10: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Sparse Coding and Acoustics

Inner ear (cochlea) alsodoes sparse coding offrequencies

Page 11: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Sparse Coding and Natural Language Processing

Page 12: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Outline

1 Introduction: Why Sparse Coding?

2 Sparse Coding: The Basics

3 Adding Prior Knowledge

4 Conclusions

Page 13: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Outline

1 Introduction: Why Sparse Coding?

2 Sparse Coding: The Basics

3 Adding Prior Knowledge

4 Conclusions

Page 14: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Outline

1 Introduction: Why Sparse Coding?

2 Sparse Coding: The Basics

3 Adding Prior Knowledge

4 Conclusions

Page 15: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Outline

1 Introduction: Why Sparse Coding?

2 Sparse Coding: The Basics

3 Adding Prior Knowledge

4 Conclusions

Page 16: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Outline

1 Introduction: Why Sparse Coding?

2 Sparse Coding: The Basics

3 Adding Prior Knowledge

4 Conclusions

Page 17: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

The aim of sparse coding, revisited

We assume our data x satisfies

x ≈n∑

i=1

αidi = αD

Learning:Given training data xj , j ∈ 1, · · · ,mLearn dictionary D and sparse code α

Encoding:Given test data x, dictionary DLearn sparse code α

Page 18: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

The aim of sparse coding, revisited

We assume our data x satisfies

x ≈n∑

i=1

αidi = αD

Learning:Given training data xj , j ∈ 1, · · · ,mLearn dictionary D and sparse code α

Encoding:Given test data x, dictionary DLearn sparse code α

Page 19: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

The aim of sparse coding, revisited

We assume our data x satisfies

x ≈n∑

i=1

αidi = αD

Learning:Given training data xj , j ∈ 1, · · · ,mLearn dictionary D and sparse code α

Encoding:Given test data x, dictionary DLearn sparse code α

Page 20: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Learning: The Objective Function

Dictionary learning involves optimizing:

arg mindi,αj

m∑j=1

‖xj −n∑

i=1

αjidi‖2

+ βm∑

j=1

n∑i=1

|αji |

subject to ‖di‖2 ≤ c, ∀i = 1, · · · ,n.

In matrix notation:

arg minD,A‖X− AD‖2F + β

∑i,j

|αi,j |

subject to∑

i

D2i,j ≤ c, ∀i = 1, · · · ,n.

Split the optimization over D and A in two.

Page 21: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Learning: The Objective Function

Dictionary learning involves optimizing:

arg mindi,αj

m∑j=1

‖xj −n∑

i=1

αjidi‖2 + β

m∑j=1

n∑i=1

|αji |

subject to ‖di‖2 ≤ c, ∀i = 1, · · · ,n.

In matrix notation:

arg minD,A‖X− AD‖2F + β

∑i,j

|αi,j |

subject to∑

i

D2i,j ≤ c, ∀i = 1, · · · ,n.

Split the optimization over D and A in two.

Page 22: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Learning: The Objective Function

Dictionary learning involves optimizing:

arg mindi,αj

m∑j=1

‖xj −n∑

i=1

αjidi‖2 + β

m∑j=1

n∑i=1

|αji |

subject to ‖di‖2 ≤ c, ∀i = 1, · · · ,n.

In matrix notation:

arg minD,A‖X− AD‖2F + β

∑i,j

|αi,j |

subject to∑

i

D2i,j ≤ c, ∀i = 1, · · · ,n.

Split the optimization over D and A in two.

Page 23: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Learning: The Objective Function

Dictionary learning involves optimizing:

arg mindi,αj

m∑j=1

‖xj −n∑

i=1

αjidi‖2 + β

m∑j=1

n∑i=1

|αji |

subject to ‖di‖2 ≤ c, ∀i = 1, · · · ,n.

In matrix notation:

arg minD,A‖X− AD‖2F + β

∑i,j

|αi,j |

subject to∑

i

D2i,j ≤ c, ∀i = 1, · · · ,n.

Split the optimization over D and A in two.

Page 24: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Learning: The Objective Function

Dictionary learning involves optimizing:

arg mindi,αj

m∑j=1

‖xj −n∑

i=1

αjidi‖2 + β

m∑j=1

n∑i=1

|αji |

subject to ‖di‖2 ≤ c, ∀i = 1, · · · ,n.

In matrix notation:

arg minD,A‖X− AD‖2F + β

∑i,j

|αi,j |

subject to∑

i

D2i,j ≤ c, ∀i = 1, · · · ,n.

Split the optimization over D and A in two.

Page 25: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Step 1: Learning the Dictionary

Reduced optimization problem:

arg minD‖X− AD‖2F

subject to∑

i

D2i,j ≤ c, ∀i = 1, · · · ,n.

Introduce Lagrange multipliers:

L (D, λ) = tr(

(X− AD)T (X− AD))

+n∑

j=1

λj

(∑i

Di,j − c

)

where each λj ≥ 0 is a dual variable...

Page 26: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Step 1: Learning the Dictionary

Reduced optimization problem:

arg minD‖X− AD‖2F

subject to∑

i

D2i,j ≤ c, ∀i = 1, · · · ,n.

Introduce Lagrange multipliers:

L (D, λ) = tr(

(X− AD)T (X− AD))

+n∑

j=1

λj

(∑i

Di,j − c

)

where each λj ≥ 0 is a dual variable...

Page 27: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Step 1: Learning the Dictionary

Reduced optimization problem:

arg minD‖X− AD‖2F

subject to∑

i

D2i,j ≤ c, ∀i = 1, · · · ,n.

Introduce Lagrange multipliers:

L (D, λ) = tr(

(X− AD)T (X− AD))

+n∑

j=1

λj

(∑i

Di,j − c

)

where each λj ≥ 0 is a dual variable...

Page 28: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Step 1: Moving to the dual

From the Lagrangian

L (D, λ) = tr(

(X− AD)T (X− AD))

+n∑

j=1

λj

(∑i

D2i,j − c

)

minimize over D to obtain Lagrange dual

D (λ) = minDL (D, λ) =

tr(

XT X− XAT(

AAT + Λ)−1 (

XAT)T− cΛ

)

The dual can be optimized using congugate gradientOnly n, λ values compared to D being n × k

Page 29: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Step 1: Moving to the dual

From the Lagrangian

L (D, λ) = tr(

(X− AD)T (X− AD))

+n∑

j=1

λj

(∑i

D2i,j − c

)

minimize over D to obtain Lagrange dual

D (λ) = minDL (D, λ) =

tr(

XT X− XAT(

AAT + Λ)−1 (

XAT)T− cΛ

)

The dual can be optimized using congugate gradientOnly n, λ values compared to D being n × k

Page 30: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Step 1: Moving to the dual

From the Lagrangian

L (D, λ) = tr(

(X− AD)T (X− AD))

+n∑

j=1

λj

(∑i

D2i,j − c

)

minimize over D to obtain Lagrange dual

D (λ) = minDL (D, λ) = tr

(XT X− XAT

(AAT + Λ

)−1 (XAT

)T− cΛ

)

The dual can be optimized using congugate gradientOnly n, λ values compared to D being n × k

Page 31: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Step 1: Moving to the dual

From the Lagrangian

L (D, λ) = tr(

(X− AD)T (X− AD))

+n∑

j=1

λj

(∑i

D2i,j − c

)

minimize over D to obtain Lagrange dual

D (λ) = minDL (D, λ) = tr

(XT X− XAT

(AAT + Λ

)−1 (XAT

)T− cΛ

)

The dual can be optimized using congugate gradient

Only n, λ values compared to D being n × k

Page 32: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Step 1: Moving to the dual

From the Lagrangian

L (D, λ) = tr(

(X− AD)T (X− AD))

+n∑

j=1

λj

(∑i

D2i,j − c

)

minimize over D to obtain Lagrange dual

D (λ) = minDL (D, λ) = tr

(XT X− XAT

(AAT + Λ

)−1 (XAT

)T− cΛ

)

The dual can be optimized using congugate gradientOnly n, λ values compared to D being n × k

Page 33: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Step 1: Dual to the Dictionary

With the optimal Λ, our dictionary is

DT =(

AAT + Λ)−1 (

XAT)T

Key point: Moving to the dual reduces the number ofoptimization variables, speeding up the optimization.

Page 34: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Step 1: Dual to the Dictionary

With the optimal Λ, our dictionary is

DT =(

AAT + Λ)−1 (

XAT)T

Key point: Moving to the dual reduces the number ofoptimization variables, speeding up the optimization.

Page 35: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Step 2: Learning the Sparse Code

With D now fixed, optimize for A

arg minA‖X− AD‖2F + β

∑i,j

|αi,j |

Unconstrained, convex quadratic optimizationMany solvers for this (e.g. interior point methods, in-crowdalgorithm, fixed-point continuation)

Note:

Same problem as the encoding problem.Runtime of optimization in the encoding stage?

Page 36: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Step 2: Learning the Sparse Code

With D now fixed, optimize for A

arg minA‖X− AD‖2F + β

∑i,j

|αi,j |

Unconstrained, convex quadratic optimization

Many solvers for this (e.g. interior point methods, in-crowdalgorithm, fixed-point continuation)

Note:

Same problem as the encoding problem.Runtime of optimization in the encoding stage?

Page 37: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Step 2: Learning the Sparse Code

With D now fixed, optimize for A

arg minA‖X− AD‖2F + β

∑i,j

|αi,j |

Unconstrained, convex quadratic optimizationMany solvers for this (e.g. interior point methods, in-crowdalgorithm, fixed-point continuation)

Note:

Same problem as the encoding problem.Runtime of optimization in the encoding stage?

Page 38: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Step 2: Learning the Sparse Code

With D now fixed, optimize for A

arg minA‖X− AD‖2F + β

∑i,j

|αi,j |

Unconstrained, convex quadratic optimizationMany solvers for this (e.g. interior point methods, in-crowdalgorithm, fixed-point continuation)

Note:Same problem as the encoding problem.Runtime of optimization in the encoding stage?

Page 39: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Speeding up the testing phase

Fair amount of work on speeding up the encoding stage:

H. Lee et al., Efficient sparse coding algorithmshttp://ai.stanford.edu/~hllee/nips06-sparsecoding.pdf

K. Gregor and Y. LeCun, Learning Fast Approximations ofSparse Codinghttp://yann.lecun.com/exdb/publis/pdf/gregor-icml-10.pdf

S. Hawe et al., Separable Dictionary Learninghttp://arxiv.org/pdf/1303.5244v1.pdf

Page 40: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Outline

1 Introduction: Why Sparse Coding?

2 Sparse Coding: The Basics

3 Adding Prior Knowledge

4 Conclusions

Page 41: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Outline

1 Introduction: Why Sparse Coding?

2 Sparse Coding: The Basics

3 Adding Prior Knowledge

4 Conclusions

Page 42: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Relationships between Dictionary atoms

Dictionaries areover-complete bases

Dictate relationshipsbetween atoms

Example: Hierarchicaldictionaries

Page 43: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Relationships between Dictionary atoms

Dictionaries areover-complete bases

Dictate relationshipsbetween atoms

Example: Hierarchicaldictionaries

Page 44: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Relationships between Dictionary atoms

Dictionaries areover-complete bases

Dictate relationshipsbetween atoms

Example: Hierarchicaldictionaries

Page 45: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Example: Image Patches

Page 46: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Example: Document Topics

Page 47: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Problem Statement

Goal:Have sub-groups of sparse codeα all be non-zero (or zero).

Hierarchical:If a node is non-zero, it’s parentmust be non-zeroIf a node’s parent is zero, thenode must be zero

Implementation:Change the regularizationEnforce sparsity differently...

Page 48: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Problem Statement

Goal:Have sub-groups of sparse codeα all be non-zero (or zero).

Hierarchical:If a node is non-zero, it’s parentmust be non-zeroIf a node’s parent is zero, thenode must be zero

Implementation:Change the regularizationEnforce sparsity differently...

Page 49: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Problem Statement

Goal:Have sub-groups of sparse codeα all be non-zero (or zero).

Hierarchical:If a node is non-zero, it’s parentmust be non-zeroIf a node’s parent is zero, thenode must be zero

Implementation:Change the regularizationEnforce sparsity differently...

Page 50: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Grouping Code Entries

Level k included in k + 1 groups

Add |αi | to objective function once for each group

Page 51: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Grouping Code Entries

Level k included in k + 1 groupsAdd |αi | to objective function once for each group

Page 52: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Group Regularization

Updated objective function:

arg minD,αj

m∑j=1

[‖xj − Dαj‖2

+ βΩ(αj)

]

whereΩ (α) =

∑g∈P

wg‖α|g‖

α|g are the code values for group g.wg weights the enforcement of the hierarchySolve using proximal methods.

Page 53: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Group Regularization

Updated objective function:

arg minD,αj

m∑j=1

[‖xj − Dαj‖2 + βΩ

(αj) ]

whereΩ (α) =

∑g∈P

wg‖α|g‖

α|g are the code values for group g.wg weights the enforcement of the hierarchySolve using proximal methods.

Page 54: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Group Regularization

Updated objective function:

arg minD,αj

m∑j=1

[‖xj − Dαj‖2 + βΩ

(αj) ]

whereΩ (α) =

∑g∈P

wg‖α|g‖

α|g are the code values for group g.wg weights the enforcement of the hierarchySolve using proximal methods.

Page 55: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Group Regularization

Updated objective function:

arg minD,αj

m∑j=1

[‖xj − Dαj‖2 + βΩ

(αj) ]

whereΩ (α) =

∑g∈P

wg‖α|g‖

α|g are the code values for group g.

wg weights the enforcement of the hierarchySolve using proximal methods.

Page 56: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Group Regularization

Updated objective function:

arg minD,αj

m∑j=1

[‖xj − Dαj‖2 + βΩ

(αj) ]

whereΩ (α) =

∑g∈P

wg‖α|g‖

α|g are the code values for group g.wg weights the enforcement of the hierarchy

Solve using proximal methods.

Page 57: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Group Regularization

Updated objective function:

arg minD,αj

m∑j=1

[‖xj − Dαj‖2 + βΩ

(αj) ]

whereΩ (α) =

∑g∈P

wg‖α|g‖

α|g are the code values for group g.wg weights the enforcement of the hierarchySolve using proximal methods.

Page 58: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Other Examples

Other examples of structured sparsity:

M. Stojnic et al., On the Reconstruction of Block-SparseSignals With an Optimal Number of Measurements,http://dx.doi.org/10.1109/TSP.2009.2020754

J. Mairal et al., Convex and Network Flow Optimization forStructured Sparsity, http://jmlr.org/papers/volume12/mairal11a/mairal11a.pdf

Page 59: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Outline

1 Introduction: Why Sparse Coding?

2 Sparse Coding: The Basics

3 Adding Prior Knowledge

4 Conclusions

Page 60: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Outline

1 Introduction: Why Sparse Coding?

2 Sparse Coding: The Basics

3 Adding Prior Knowledge

4 Conclusions

Page 61: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Summary

Two interestingdirections:

Increasingspeed of thetesting phase

Optimizingdictionarystructure

Page 62: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Summary

Two interestingdirections:

Increasingspeed of thetesting phase

Optimizingdictionarystructure

Page 63: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Summary

Two interestingdirections:

Increasingspeed of thetesting phase

Optimizingdictionarystructure

Page 64: Sparse Coding: An Overview - UBC Computer Scienceschmidtm/MLRG/sparseCoding.pdf · Introduction The Basics Adding Prior Knowledge Conclusions The aim of sparse coding Every column

Introduction The Basics Adding Prior Knowledge Conclusions

Summary

Two interestingdirections:

Increasingspeed of thetesting phase

Optimizingdictionarystructure