Échantillonnage de signaux sur graphes via des processus ...nicolas.tremblay/files/GRETSI_2017.pdf · Echantillonnage de signaux sur graphes via des processus d eterminantaux Nicolas

Echantillonnage de signaux sur graphesvia des processus determinantaux

Nicolas Tremblay, Simon Barthelme, Pierre-Olivier Amblard

CNRS, Gipsa-lab, Grenoble

Introduction

The graph sampling problem and existing methods

Sampling via DPP

Conclusion

Introduction The graph sampling problem and existing methods Sampling via DPP Conclusion

What’s a graph signal ?

Why sample ?

General reason : reduce dimensions (thereby costs) for• statistics estimation : mean, moments, etc.• perfect or lossy reconstruction (compression)• run costly algorithms in smaller dimensions (e.g. community detection)• cases where measuring each node is costly (e.g. sensor networks)• etc.

N. Tremblay Graph sampling with DPPs GRETSI, September 2017 1 / 16


Three useful matrices


The adjacency matrix : The degree matrix :

The Laplacian matrix :

W =

0 1 1 0

1 0 1 1

1 1 0 0

0 1 0 0

D =

2 0 0 0

0 3 0 0

0 0 2 0

0 0 0 1

L = D−W =

2 −1 −1 0

−1 3 −1 −1

−1 −1 2 0

0 −1 0 1


Notations

L = D−W = UΛU>

• U is a Fourier basis of the graph [Hammond ’11]

• a Fourier transform of a signal x reads :

x = U>x

• Λ = Diag(λ1, λ2, · · · , λN) the spectrum

A low frequency Fourier mode A high frequency Fourier mode



Introduction


Sampling via DPP

Conclusion



Given a graph signal x ∈ RN , sampling consists in :

1. choosing a subset of nodes S = s1, . . . , sm2. measuring x on S : y = Mx + n ∈ Rm.

How to reconstruct the original signal x from its measurement y ?

Basically, we need :

1. a (low-dimensional) model for the signal to sample

2. a method to choose the nodes to sample

3. a “decoder” that exactly recovers the signal given its samples


M =

δᵀs1

δᵀs2

...

δᵀsm

∈ Rm×N (1)



1. choosing a subset of nodes S = s1, . . . , sm

2. measuring x on S : y = Mx + n ∈ Rm.







M =

δᵀs1

δᵀs2

...

δᵀsm

∈ Rm×N (1)










M =

δᵀs1

δᵀs2

...

δᵀsm

∈ Rm×N (1)










M =

δᵀs1

δᵀs2

...

δᵀsm

∈ Rm×N (1)










M =

δᵀs1

δᵀs2

...

δᵀsm

∈ Rm×N (1)


Low-dimensional model : smoothness assumption

In 1D sig proc, a smooth signal has most of its energy at low frequencies.

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Smooth signal in time−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

3

Fourier transform

Definition (Bandlimited graph signal [Chen ’15,Anis ’16,Marques ’16,Puy ’16])

A k-bandlimited signal x ∈ RN on G is a signal that satisfies, for some α ∈ Rk

x = Ukα =k∑

i=1

αiui



Low-dimensional model : smoothness assumption

In 1D sig proc, a smooth signal has most of its energy at low frequencies.

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Smooth signal in time−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

3

Fourier transform

Definition (Bandlimited graph signal [Chen ’15,Anis ’16,Marques ’16,Puy ’16])

A k-bandlimited signal x ∈ RN on G is a signal that satisfies, for some α ∈ Rk

x = Ukα =k∑

i=1

αiui



The sampling problem

Recall that :

1. y = Mx + n ∈ Rm is the noisy measurement of x on the sampled nodes S.

2. x is supposed approximately bandlimited : x = Ukα + nsuch that :

y = MUkα + n,

where n = Mn + n encompasses measurement noise and the distance-to-model.

Our objective : tight sampling for perfect reconstruction.

Two cases :

Uk is computable.Uk is too expensive to compute :

only partial information isaccessible.




Recall that :



y = MUkα + n,



Two cases :






Recall that :



y = MUkα + n,



Two cases :





Known Uk case

Decoder. xrec = argminz∈span(Uk )

‖Mz − y‖2 = Uk (MUk )†y .

Theorem. Reconstruction (up to the noise level) is perfect iff σ1(MUk ) > 0. In thiscase, (MUk )† = (Uᵀ

k MᵀMUk )−1Uᵀk Mᵀ, and :

xrec = x + (MUk )†n.

Choosing m = k, there are in general many possible sets S such that σ1(MUk ) > 0.

Optimality measures. [Chen ’15, Anis ’16, Tsitsvero ’16] :

1. Worst-case error : SWCE = arg maxS s.t. |S|=k

σ21 ,

2. Mean-square error : SMSE = arg minS s.t. |S|=k

k∑i=1

1

σ2i

,

3. Maximum Volume : SMV = arg maxS s.t. |S|=k

k∏i=1

σ2i = arg maxS s.t. |S|=k

(det(MUk ))2

Practical algorithm. These combinatorial problems are NP-complete [Civril ’09].

→ Greedy approximate solutions S are computed with a cost of O(Nk4).

X 1st contribution : a DPP-based algorithm finding an approximate SMV in O(Nk2).



Known Uk case


‖Mz − y‖2 = Uk (MUk )†y .







σ21 ,


k∑i=1

1

σ2i

,


k∏i=1


(det(MUk ))2






Known Uk case


‖Mz − y‖2 = Uk (MUk )†y .







σ21 ,


k∑i=1

1

σ2i

,


k∏i=1


(det(MUk ))2






Known Uk case


‖Mz − y‖2 = Uk (MUk )†y .







σ21 ,


k∑i=1

1

σ2i

,


k∏i=1


(det(MUk ))2






Known Uk case


‖Mz − y‖2 = Uk (MUk )†y .







σ21 ,


k∑i=1

1

σ2i

,


k∏i=1


(det(MUk ))2






Unknown Uk case

First option : Uncorrelated random sampling [Puy ’16, Tremblay ’16].

1. Associate to each node i a probability pi = ‖Uᵀkδi‖

22/k to draw this node.

2. Draw the sampling set S of size m independently with replacement from p.

3. Theorem (Restricted Isometry Property) With high probability,reconstruction (up to the noise) is perfect, provided that

m > O(k log k).

X Up to the log factor, it is tight.

X There is an efficient algorithm that estimates p in O(|E | logN).

Second option : approximate the “known Uk” algorithms with spectral proxies.

1. approximate the greedy algorithms e.g. [Anis ’16], in O(mk|E |).

2. 2nd contribution : approximate the DPP sampling algorithm, inO(m|E |+ Nm2). See also [Chamon ’17].

In all cases, regularized decoder (O(|E |) w/ gradient descent) :

xrec = minz∈RN

∥∥∥P−1/2Ω (Mz − y)

∥∥∥2

2+ γ zᵀLrz .



Unknown Uk case






m > O(k log k).







xrec = minz∈RN

∥∥∥P−1/2Ω (Mz − y)

∥∥∥2

2+ γ zᵀLrz .



Unknown Uk case






m > O(k log k).







xrec = minz∈RN

∥∥∥P−1/2Ω (Mz − y)

∥∥∥2

2+ γ zᵀLrz .



Introduction


Sampling via DPP

Conclusion



Determinantal point processes (DPP)

Definition [Kulesza ’12]. Consider a point process, i.e., a process that randomlydraws an element A ∈ [N]. It is determinantal if, for every S ⊆ A,

P(S ⊆ A) = det(KS),

where K ∈ RN×N , a SDP matrix s.t. 0 K 1.

Negative correlation : P(xi and xj ∈ A) = KiKj − K2ij 6 P(xi ∈ A)P(xj ∈ A).

Illustration with Gaussian kernel in the plane : Kij = exp−‖xi−xj‖2

σ2

a.

b.

0.00 0.04 0.08 0.12

05

1015

20

Interpoint distance

Den

sity

c.

DPP

IID

Figure – a : uniform iid, b : DPP with Gaussian kernel, c : interdistance distribution









σ2

a.

b.

0.00 0.04 0.08 0.12

05

1015

20

Interpoint distance

Den

sity

c.

DPP

IID










σ2

a.

b.

0.00 0.04 0.08 0.12

05

1015

20

Interpoint distanceD

ensi

ty

c.

DPP

IID




A DPP kernel for graph sampling

In the context of graph sampling, consider the kernel

Kk = UkUᵀk

Recall that :

SMV = arg maxS s.t. |S|=k

k∏i=1

σ2i = arg max

S s.t. |S|=kdet(MUkUᵀ

kMᵀ)

= arg maxS s.t. |S|=k

det(Kk,S)

⇒ SMV is the most probable sample from the DPP associated to Kk .

Case 1 : known Uk . Sample from Kk : result is close to SMV, but faster tocompute O(Nk2), vs O(Nk4).Case 2 : unknown Uk . Two possible options :

• find spectral proxies to approximate the DPP sampling with Kk , inO(m|E |+ Nm2).

• use loop-erased random walks to efficiently samply from a DPP w/ kernelclose to Kk [Tremblay ’17].





Kk = UkUᵀk

Recall that :


k∏i=1

σ2i = arg max


kMᵀ)


det(Kk,S)


Case 1 : known Uk . Sample from Kk : result is close to SMV, but faster tocompute O(Nk2), vs O(Nk4).

Case 2 : unknown Uk . Two possible options :







Kk = UkUᵀk

Recall that :


k∏i=1

σ2i = arg max


kMᵀ)


det(Kk,S)









Kk = UkUᵀk

Recall that :


k∏i=1

σ2i = arg max


kMᵀ)


det(Kk,S)







DPP sampling with projectivekernel K = UkUᵀ

k

Input : K = UkUᵀk

S ← ∅, let p0 = diag(K) ∈ RN

p ← p0

for n = 1, . . . , k do :· Draw sn with probability

P(s) = p(s)/∑

i p(i)· S ← S ∪ sn· Update p :∀i p(i) = p0(i)− Kᵀ

S,iK−1S KS,i

end forOutput : S.

Equivalent algorithm

Input : K = UkUᵀk = [k1, . . . , kN ]

S ← ∅, let p = diag(K) ∈ RN


P(s) = p(s)/∑

i p(i)· S ← S ∪ sn· Compute· Normalize fn ← fn/

√fn(sn)

· Update p :∀i p(i)← p(i)− fn(i)2

end forOutput : S.

O(n3 + Nn2) for n = 1, . . . , k : O(Nk3) O(Nn) for n = 1, . . . , k : O(Nk2)


1st advantage : complexity gain of a factor k, for an added cost in memory of O(Nk).

2nd advantage : this formulation enables polynomial approximations. All we need is

• an estimate of diag(K)• estimates of m columns of K.



k

Input : K = UkUᵀk


p ← p0


P(s) = p(s)/∑


S,iK−1S KS,i

end forOutput : S.


Input : K = UkUᵀk = [k1, . . . , kN ]



P(s) = p(s)/∑

i p(i)· S ← S ∪ sn· Compute fn = ksn −

∑n−1l=1 fl fl(sn)

· Normalize fn ← fn/√

fn(sn)· Update p :

∀i p(i)← p(i)− fn(i)2

end forOutput : S.








k

Input : K = UkUᵀk


p ← p0


P(s) = p(s)/∑


S,iK−1S KS,i

end forOutput : S.


Input : K = UkUᵀk = [k1, . . . , kN ]



P(s) = p(s)/∑




fn(sn)· Update p :

∀i p(i)← p(i)− fn(i)2

end forOutput : S.

O(n3 + Nn2) for n = 1, . . . , k : O(Nk3)

O(Nn) for n = 1, . . . , k : O(Nk2)







k

Input : K = UkUᵀk


p ← p0


P(s) = p(s)/∑


S,iK−1S KS,i

end forOutput : S.


Input : K = UkUᵀk = [k1, . . . , kN ]



P(s) = p(s)/∑

i p(i)· S ← S ∪ sn· Compute fn = ksn−



fn(sn)· Update p :

∀i p(i)← p(i)− fn(i)2

end forOutput : S.








k

Input : K = UkUᵀk


p ← p0


P(s) = p(s)/∑


S,iK−1S KS,i

end forOutput : S.


Input : K = UkUᵀk = [k1, . . . , kN ]



P(s) = p(s)/∑




fn(sn)· Update p :

∀i p(i)← p(i)− fn(i)2

end forOutput : S.








k

Input : K = UkUᵀk


p ← p0


P(s) = p(s)/∑


S,iK−1S KS,i

end forOutput : S.


Input : K = UkUᵀk = [k1, . . . , kN ]



P(s) = p(s)/∑




fn(sn)· Update p :

∀i p(i)← p(i)− fn(i)2

end forOutput : S.







Polynomial approximation

1. Note that :

ki = Kδi = UkUᵀkδi = Uhλk (Λ)Uᵀδi ,

with hλk the ideal low-pass filter.λ

0 λk

1 2

hλ

k

(λ)

-0.5

0

0.5

1

1.5

The polynomial approximation hλk (λ) =∑p

i=1 αiλi enables to write :

ki = Uhλk (Λ)Uᵀδi ' Uhλk (Λ)Uᵀδi =

p∑i=1

αiLiδi

→ Estimating a column of K costs O(|E |p).

2. Consider a Gaussian random matrix R ∈ RN×r of mean 0 and variance 1/r ,and its approximate filtered version Rh ' KR. One has, if r = O(logN) :

Tr(RhRᵀh ) ' Tr(KRRᵀK) ' Tr(K) = k.

→ In O(p|E | logN), one may estimate : λk by dichotomy, and diag(K).




1. Note that :



0 λk 1 2

hλ

k(λ)

-0.5

0

0.5

1

1.5idealm=100m=20m=5

ppp




p∑i=1

αiLiδi








1. Note that :



0 λk 1 2

hλ

k(λ)

-0.5

0

0.5

1


ppp




p∑i=1

αiLiδi








1. Note that :



0 λk 1 2

hλ

k(λ)

-0.5

0

0.5

1


ppp




p∑i=1

αiLiδi








1. Note that :



0 λk 1 2

hλ

k(λ)

-0.5

0

0.5

1


ppp




p∑i=1

αiLiδi







Exact sampling from Kk = UkUᵀk

Input : Kk = UkUᵀk = [k1, . . . , kN ]



P(s) = p(s)/∑


∑n−1l=1 fl fl (sn)


fn(sn)· Update p :

∀i p(i)← p(i)− fn(i)2

end forOutput : S.

Approximate sampling from Kk

Input : L, k, p, mEstimate λk , the kth eigenvalue of LS ← ∅, estimate p ' diag(K) ∈ RN

for n = 1, . . . ,m do :· Draw sn with probability

P(s) = p(s)/∑

i p(i)· S ← S ∪ sn· Estimate ksn ' Kδsn· Compute fn = ksn −



fn(sn)

· Update p(i)← p(i)− fn(i)2

end forOutput : S of size m

O((|E |k + Nk2)I )︸︷︷︸partial diago

+O(Nk2)︸︷︷︸sampling

= O((|E |+ Nk)kI )

O(p|E |I log N)︸︷︷︸est. λk and diag(K)

+ O(p|E |m)︸︷︷︸est. m cols of K

+ O(Nm2)︸︷︷︸add. sampling cost

= O(p|E |I + Nm2).


Speed / accuracy trade-off :

1. At small k, very good approximation, but not much faster than exact sampling.

2. At large k, speed is much improved, but approximation error increases.






P(s) = p(s)/∑




fn(sn)· Update p :

∀i p(i)← p(i)− fn(i)2

end forOutput : S.




P(s) = p(s)/∑




fn(sn)





= O((|E |+ Nk)kI )




= O(p|E |I + Nm2).










P(s) = p(s)/∑




fn(sn)· Update p :

∀i p(i)← p(i)− fn(i)2

end forOutput : S.




P(s) = p(s)/∑




fn(sn)





= O((|E |+ Nk)kI )




= O(p|E |I + Nm2).










P(s) = p(s)/∑




fn(sn)· Update p :

∀i p(i)← p(i)− fn(i)2

end forOutput : S.




P(s) = p(s)/∑




fn(sn)





= O((|E |+ Nk)kI )




= O(p|E |I + Nm2).










P(s) = p(s)/∑




fn(sn)· Update p :

∀i p(i)← p(i)− fn(i)2

end forOutput : S.




P(s) = p(s)/∑




fn(sn)





= O((|E |+ Nk)kI )




= O(p|E |I + Nm2).










P(s) = p(s)/∑




fn(sn)· Update p :

∀i p(i)← p(i)− fn(i)2

end forOutput : S.




P(s) = p(s)/∑




fn(sn)





= O((|E |+ Nk)kI )




= O(p|E |I + Nm2).






Toy experiments

Known Uk case, on the SBM,compare greedy vs DPP w/ Kk :

(m = k samples)10−2 10−1 100

ε/εc

10−2

10−1

1

||xrec−x|| 2

DPP w/ Kk

greedy

a)

10 11 12 13 14 15 16 17 18

m

10−2

10−1

100

101

||xrec−x|| 2

b)

10 11 12 13 14 15 16 17 18

m

10−2

10−1

100

101

||xrec−x|| 2

c)

10 11 12 13 14 15 16 17 18

m

10−2

10−1

100

||xrec−x|| 2 iid uniforme

iid avec p = diag(Kk)

iid avec p ' diag(Kk)

notre methode: alg.3PPD avec Kk

Figure – Median reconstruction performance over 100 realisations of 10-bandlimitedsignals on a) a transportation graph, b) a 3D mesh graph, c) a realisation of SBM.



Introduction


Sampling via DPP

Conclusion



To sum up : various strategies to sample k-bandlimited graph signals withthe objective of perfect reconstruction


diago. cost algorithm # nec. samples sampling cost

Known Uk O(|E |k + Nk2I )

leverage score m = O(k log k) † O(Nm)

greedy w/ WCE

m = k †O(Nk4)greedy w/ MSE

greedy w/ MV

DPP w/ Kk O(Nk2)

Unknown Uk N/A

random uniform m = O(N maxi∥∥Uᵀ

k δi

∥∥2

2log k)† O(m)

lev. score w/ proxies m & O(k log k) ∗ O(|E | log N + Nm)

greedy w/ proxies m & k ∗ O(mk|E |)

Kk -DPP w/ proxies m & k ∗ O(m|E | + Nm2)

LERW m & k ∗ ' O(|E |d/q

)

∗ : heuristics, † : provably







greedy w/ WCE


greedy w/ MV

DPP w/ Kk O(Nk2)

Unknown Uk N/A


k δi

∥∥2

2log k)† O(m)





)








greedy w/ WCE


greedy w/ MV

DPP w/ Kk O(Nk2)

Unknown Uk N/A


k δi

∥∥2

2log k)† O(m)





)








greedy w/ WCE


greedy w/ MV

DPP w/ Kk O(Nk2)

Unknown Uk N/A


k δi

∥∥2

2log k)† O(m)





)








greedy w/ WCE


greedy w/ MV

DPP w/ Kk O(Nk2)

Unknown Uk N/A


k δi

∥∥2

2log k)† O(m)





)








greedy w/ WCE


greedy w/ MV

DPP w/ Kk O(Nk2)

Unknown Uk N/A


k δi

∥∥2

2log k)† O(m)





)








greedy w/ WCE


greedy w/ MV

DPP w/ Kk O(Nk2)

Unknown Uk N/A


k δi

∥∥2

2log k)† O(m)





)








greedy w/ WCE


greedy w/ MV

DPP w/ Kk O(Nk2)

Unknown Uk N/A


k δi

∥∥2

2log k)† O(m)





)








greedy w/ WCE


greedy w/ MV

DPP w/ Kk O(Nk2)

Unknown Uk N/A


k δi

∥∥2

2log k)† O(m)





)∗ : heuristics, † : provably


Conclusion

X DPP w/ Kk improves over greedy optim. of the “max volume” set SMV

X One can use polynomial approximations to avoid to compute Uk explictly

X One can add prior information on noise structure to improve reconstruction

X Authors in [Chamon ’17] show empirically very good performance (in fact,close to the exact solution) on different types of graphs

× In the approximate sampling framework, lack of control over thespeed/precision trade-off + lack of reconstruction guarantees

Future questions

1. truly “graph-based” algorithms ?

2. distributed algorithms ?

3. applying this work to different clustering problem such as coresets.


References

[Hammond ’11] Wavelets on graphs via spectral graph theory, ACHA[Chen ’15] Discrete Signal Processing on Graphs : Sampling Theory, TSP[Anis ’16] Efficient Sampling Set Selection for Bandlimited Graph . . . , TSP[Marques ’16] Sampling of graph signals with successive local. . . , TSP[Puy ’16] Random sampling of bandlimited signals . . . , ACHA[Tsitsvero ’16] Signals on Graphs : Uncertainty Principle and Sampling, TSP[Civril ’09] On selecting a maximum volume. . . , Theoretical Comp. Science[Chamon ’17] Greedy Sampling of Graph Signals, Arxiv 1704.01223[Tremblay ’17] Graph sampling with determinantal processes, EUSIPCO[Tremblay ’16] Compressive spectral clustering, ICML[Kulesza ’12] Determinantal Point Processes. . . , Found. and Trends in ML[Avena ’13] Random spanning forests, Markov matrix. . . , Arxiv 1310.1723

Documents

Échantillonnage de signaux sur graphes via des processus ...nicolas.tremblay/files/GRETSI_2017.pdf · Echantillonnage de signaux sur graphes via des processus d eterminantaux Nicolas