DIPLOMARBEIT - univie.ac.at · DIPLOMARBEIT Titel der Diplomarbeit Gabor Analysis of Structured Sparsity and some Applications Verfasser Dominik Fuchs angestrebter akademischer Grad

DIPLOMARBEIT

Titel der Diplomarbeit

Gabor Analysis of Structured Sparsityand some Applications

Verfasser

Dominik Fuchs

angestrebter akademischer Grad

Magister der Naturwissenschaften (Mag. rer. nat.)

Sturienkennzahl: A 405Studienrichtung: MathematikBetreuer: Ao. Univ.-Prof. Dr. Hans Georg Feichtinger

Wien, April 2013

Acknowledgments

I want to thank my supervisor Prof. Hans Georg Feichtinger from the Numerical Har-

monic Analysis Group (NuHAG) for his guidance, support and experienced advices during

my work.

Special thanks to Monika Dorfler for giving me an understanding of Structured Sparsity

and for her time and patience while guiding me through this theory, Kai Siedenburg for

his help concerning MATLABTM routines, and Markus Schaner for sharing his experience

in the field of audio engineering with me.

Last but not least I want to thank my study colleagues and friends as well as my beloved

family for their support and for believing in me.

I should also thank my guitar which always psyched me up when I was dejected.

1

2

Abstract

“Thresholding” is well known in the field of audio engineering and often used to obtain

noise suppressed signals. There are many possibilities to fix the analog way of a signals

and erase the disturbing noise but if the signal has already been recorded, it becomes a

very difficult task to suppress resp. take out noise afterwards. This motivates the idea for

this thesis, which is mainly based on [Siedenburg1]. This thesis differs from the latter one

mainly in the aspect, that the access to structured sparsity is be simplified for newcomer

in signal processing, i.e. more prior knowledge will be elucidated. In addition a new oper-

ator type called empirical Wiener estimation is introduced.

‘Sparsity’ is a very new field in mathematics which provides powerful tools for applica-

tions in signal processing. Denoising is the most researched application for this, but it

turns out that structured sparsity entails many other advantages which offer solutions for

more applications as declipping or signal decompositions. In this case the background of

sparsity is the representation of a signal by using as few non-zero coefficients as possible.

It stands for reason to connect this assignment with Gabor analysis, i.e. with frames,

since this field is based on redundant signals. Frames generalize bases with the conse-

quence of non-uniqueness while the energy stays preserved.

The first chapter gives a fast overview of the mathematical tools in order to understand

the basics of audio processing. The focus is on the Fourier transform which maps a signal

into its frequency space, i.e. a more exact investigation of signals concerning their inher-

ent frequencies can be done. Furthermore a typical application in audio engineering is

explained to show the importance of convolution, which is a operation for filtering a signal.

Finally the essentially short time Fourier transform is explained. This one uses window

functions shifted over the signal in order to perform a FT in each step to build an image

where the time and frequency label are confronted.

One special kind of the STFT is the Gabor transform which is discussed in Chapter two.

Gabor analysis uses frames instead of bases. The dictionary, which is an alternative name

for frame, is build by time-frequency shifts. It turned out that that windowed Fourier and

cosine bases perform very good in audio processing and are additionally easy to interpret.

Since frames are redundant they are a good playground for structured sparsity.

The third chapter gives an understanding of sparsity, regularization and thresholding,

first for basis in order to generalize them subsequently to frames. This chapter is all about

the minimization of the so-called Lagrangian. This problem contains the two aspects of

minimizing the discrepancy of the synthesis and minimizing the number of non-zero coef-

ficients depending on a threshold function. For the latter problem the `1 norm can be used

3

which yields the Lasso (least absolute shrinkage operator), but it is shown that replacing

this norm by so-called weighted mixed norms provides further flexibility and opportuni-

ties. Finally the ISTA (iterative soft-thresholding algorithm) and its improved version FISTAare introduced for frames, while in case of a basis only one threshold step is necessary.

In the fourth chapter an additional improvement is presented which includes the coeffi-

cients neighborhoods. They can be taken into account by convolution with the threshold

function. Alternative to the discussed operators the empirical Wiener estimation which is

observed at the end of this chapter works with estimation risks between the original and

the reconstructed signal. Additionally we obtain some suggestions of choosing the thresh-

old level.

At the end of this thesis, in chapter six, some examples for applications are presented as

well as a small prospect of research possibilities.

Some papers and experiments as well as the download link for the “StrucAudioToolbox”

used in this thesis are collected on the webpage:

http://homepage.univie.ac.at/monika.doerfler/StrucAudio.html.

4

Zusammenfassung

“Thresholding” ist weit verbreitet im Gebiet der Tontechnik und wird oft benutzt um

rauschunterdruckte Signale zu erhalten. Es gibt viele Moglichkeiten den analogen Weg

eines Signals zu sichern und das storende Rauschen zu entfernen. Ist das Signal aber

bereits aufgenommen, wird es eine schwierige Aufgabe das Rauschen im Nachhinein zu

unterdrucken bzw. herauszunehmen. Dies motiviert die Idee hinter dieser These, die sich

an [Siedenburg1] orientiert. Diese These unterscheidet sich von letzterer hauptsachlich

in dem Aspekt, dass der Zugang zu Structured Sparsity erleichtert wird fur Neueinsteiger

in der Signalverarbeitung, im speziellen werden mehr Vorkenntnisse erlautert. Zusatzlich

wird ein neuer Operatortyp namens ‘Empirical Wiener Estimation’ eingefuhrt.

‘Structured Sparsity’ ist ein neues Gebiet der Mathematik, das machtige Werkzeuge fur

Anwendungen in der Signalverarbeitung zur Verfugung stellt. Entrauschung ist die am

meisten untersuchte Aufgabe dafur, aber es wird sich herausstellen, dass ‘Structured

Sparsity’ viele Voretile mit sich bringt, die Losungen fur weitere Anwendungen wie ‘declip-

ping’ oder Signalzerlegungen bieten. In diesem Zusammenhang besteht der Hintergrund

von ‘Sparsity’ darin, ein Signal durch Verwendung von so wenigen nicht-Null Koeffizienten

wie moglich darzustellen. Es macht Sinn diese Aufgabe mit ‘Gabor analysis’, insbeson-

dere ‘Frames’, in Verbindung zu bring, da dieses Gebiet auf redundanten Signalen basiert.

Frames verallgemeinern Basen mit der Folge der nicht-Eindeutigkeit, wahrend aber die

Energie erhalten bleibt.

Das erste Kapitel gibt einen schnellen Uberblick von mathematischen Methoden um die

Grundlagen des Audioverarbeitung zu verstehen. Das Hauptaugenmerk liegt auf der

Fourier Transformation, die ein Signal in seinen Frequenzraum abbildet oder genauer, wir

konnen damit Signale bezuglich ihrer enthaltenen genauer untersuchen. Des weiteren

wird eine typische Anwendung aus der Tontechnik erklart, um die Bedeutsamkeit der Fal-tung zu zeigen, welche eine Operation zur Filterung von Signalen ist. Schließlich wird die

essentielle Short Time Fourier Transformation erklart. Diese benutzt Fenster-Funktionen,

die uber das Signal geschoben werden, um dann die FT in jedem Schritt durchzufuhren,

sodass ein Bild erzeugt wird, in dem Zeit und Frequenz gegenubergestellt werden.

Eine spezielle Form der STFT ist die Gabor Transformation, die in Kapitel 2 behandelt

wird. Gabor Analysis benutzt Frames anstelle von Basen. Das ‘Dictionary’, das eine Alter-

native bezeichnung fur Frame ist, wird durch Zeit-Frequenz Verschiebungen konstruiert.

Es hat sich herausgestellt das gefensterte Fourier und Kosinus Basen sich sehr gut in der

Audioverabeitung behaupten und sind dazu noch einfach zu interpretieren. Da Frames

redundant sind, bieten sie eine gute Grundlage fur ‘Structured Sparsity’.

Das dritte Kapitel vermittelt das Grundverstandnis von ‘Sparsity, Regularization’ und

‘Thresholding’ zuerst fur Basen, um sie in Folge fur Frames zu verallgemeinern. Dieses

Kapitel dreht sich um das Minimierungsproblem des sogenannten ‘Lagrangian’. Dieses

Problem beinhaltet die zwei Aspekte der Minimierung der Diskrepanz der Synthese wie

auch die Minimierung der Anzahl der nicht-Null Koeffizienten, das von der ‘Threshold

Funktion’ abhangt. Fur das letztere Problem kann die `1 Norm benutzt, woraus sich das

Lasso (‘least absolute shrinkage operator’), aber es zeigt sich, dass sich weitere Flexibilitat

und Moglichkeiten ergeben, indem wir diese Norm durch sogenannte gewichtete gemis-chte Normen. Schließlich wird der ISTA (‘iterative soft-thresholding algorithm’) und seine

verbesserte Version FISTA fur Frames eingefuhrt, wahrend im Fall von Bases nur ein

‘Thresholding’ Schritt notwendig ist.

Im vierten Kapitel wird eine zusatzliche Verbesserungsmoglichkeit prasentiert, welche

die Nachbarschaft der Koeffizienten miteinbezieht. Diese konnen durch Faltung mit der

‘Threshold Funktion’ miteinberechnet werden. Als Alternative zu den behandelten Oper-

atoren arbeitet die Empirische Wiener Schatzung, die am Ende dieses Kapitels betrachtet

wird, mit dem Schatzungsrisiko zwischen dem Original und dem Rekonstruierten Signal.

Zusatzlich erhalten wir ein paar Vorschlage den ‘Threshold Level’ zu wahlen.

Am Ende dieser These, in Kapitel Sechs, werden ein paar Beispiele fur Anwendungen

prasentiert wie auch ein kleiner Ausblick auf Untersuchungsmoglichkeiten.

Einige Artikel und Experimente, wie auch der Downloadlink fur die “StrucAudioToolbox”,

die in dieser These benutzt wird, wurden auf folgender Webpage gesammelt:

http://homepage.univie.ac.at/monika.doerfler/StrucAudio.html.

6

Contents

1 Basics 111.1 Banach and Hilbert Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.2 Introduction to Signals and Processing . . . . . . . . . . . . . . . . . . . . . . . . . 14

2 Introduction to Gabor Analysis 232.1 Frames and Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.2 Gabor Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.3 Discrete Gabor analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3 Structured Sparse Recovery 313.1 Sparsity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.2 Sparse Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.3 Thresholding with Mixed Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.4 Thresholding Algorithms for Frames . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4 Improvements and Threshold Selection 494.1 Persistence and Neighborhoods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.2 Empirical Wiener Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5 Applications and Experiments 55

6 Conclusion 65

Bibliography 67

Appendix 71

Curriculum Vitae 77

7

8

List of Figures

1.1 Sine and cosine oscillation with frequency k and wavelength λ . . . . . . . . . 141.2 Frequency-shift: sine oscillation with original and doubled frequency . . . . . 151.3 Time-shift: original sine oscillation and phase-shifted signal . . . . . . . . . . 151.4 Logarithmic sine sweep with frequency 20-22000 Hz. . . . . . . . . . . . . . . 181.5 System chain for generating an impulse. The grey boxes show the system

components. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.6 Signal with 3 different frequencies represented in the time label. [Dopfner] . . 201.7 Signal from 1.6 represented in the frequency label. [Dopfner] . . . . . . . . . . 201.8 Signal from 1.6 represented in the time-frequency plane. [Dopfner] . . . . . . 211.9 Signal short-time-Fourier transformed with a wide resp. a small window.

[Dopfner] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.1 Chamber pitch ‘a’ with 440Hz . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.2 Spectrogram of the famous Star Wars Theme . . . . . . . . . . . . . . . . . . . 24

3.1 Unit balls of `q for: (a) q = 0, (b) q = 0.5, (c) q = 1, (d) q = 1.5, (e) q = 2 . . . . . . 343.2 Unit balls of the mixed norms, with horizontal group Γ1 = {x, y} and elevation

group Γ2 = {z} Top: left: `1 (Lasso), right: `2 (Tykhonov), Bottom: left: `2,1(Group-Lasso), right: `1,2 (Elitist-Lasso) [Siedenburg1] . . . . . . . . . . . . . . 36

3.3 (a) Standard-Gaussian, (b) Standard-Gaussian with thresholding value λ =0.2, (c) after soft-thresholding, (d) after hard-thresholding . . . . . . . . . . . . 38

3.4 Sketch of the stepwise intended partition of Γ. . . . . . . . . . . . . . . . . . . 44

4.1 Sketch of different neighborhoods. Rectangular or triangular windows canbe used as well as different chosen centers. . . . . . . . . . . . . . . . . . . . . 50

4.2 Non-rectangular neighborhood. Not implemented in Toolbox. . . . . . . . . . . 51

5.1 Clean, noisy and denoised signal coefficients. . . . . . . . . . . . . . . . . . . . 565.2 Transformation in every step with bad shift causes increasing number of

coefficients and therefore noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . 575.3 Different shift values for fixed window length w = 1024. . . . . . . . . . . . . . 585.4 Lasso with alternating shift. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585.5 Relative error in each iteration step for different shrinkage levels. . . . . . . . 595.6 Table of necessary iteration steps until algorithm aborts. . . . . . . . . . . . . 595.7 SNR in each iteration step for different shrinkage levels. . . . . . . . . . . . . . 605.8 Reconstructed voice with loud background. . . . . . . . . . . . . . . . . . . . . 615.9 Reconstructed voice with loud background; group label in time. . . . . . . . . 625.10Multi-layered decomposition from noisy ‘musical clock’. . . . . . . . . . . . . . 63

9

10

1 Basics

We will start with some basic knowledge of signals. This chapter deals with the definition

of a signal in the one dimensional case with its most important properties and leads to

the most important tools of simple signal and image processing. For this we first need

some statements and definitions from functional analysis to know on which spaces we

operate. These results can be found in nearly every work that covers the field of functional

analysis. A good reference book would be e.g. [Heuser].

1.1 Banach and Hilbert Spaces

We consider a vector space over the scalar field K. In this work the relevant scalar fields

are R and C, hence we expect it to be one of them.

Definition 1.1. A mapping ‖.‖ : V → R+ with the properties

• ‖x‖ = 0⇒ x = 0

• ‖α · x‖ = |α| · ‖x‖

• ‖x+ y‖ ≤ ‖x‖+ ‖y‖

for vectors x, y ∈ V and scalar α ∈ K, is called norm.

A vector space V together with a norm is called normed space. Its notation is (V, ‖.‖V ).

Definition 1.2. For a subset W ⊆ V of a vector space we call(i) W a subspace of V , if for all x, y ∈W , α ∈ K⇒ x+ y ∈W , αx ∈W .

(ii) W a dense subspace of V , if for any x ∈ V there is a sequence an ∈W with limn an = x.

(iii) the linear span of W , span(W ), the smallest subspace of V which contains W . Thiscoincides with the set of all finite linear combinations of elements in W . The closedlinear span of W is the closure span(w).

Definition 1.3. A normed space (X, ‖.‖) is said to be separable if there exists a countabledense subset Y ⊆ X.

Definition 1.4. Let an be a sequence in V.(i) an converges to a in V , if ∀ε > 0, ∃N 0 : ‖an − a‖ < ε, ∀n > N .

(ii) an is a Cauchy sequence in V , if ∀ε > 0, ∃N 0 : ‖an − am‖ < ε, ∀n,m > N .

Definition 1.5 (Banach Space). Let (B, ‖.‖B) be a normed space. B is said to be completeif every Cauchy sequence in B converges in B. A Banach space B is a complete normedspace. It is called separable if it contains a countable dense subset.

11

1 Basics

Two examples of important Banach spaces in context of the basic Fourier transform are.

(i) lp(I) := {a = {ai}i∈I : ai ∈ C, ‖a‖p <∞}

with ‖a‖p :=(∑

i∈I |ai|p)1/p

. We call this the space of p-summable sequences. We

obtain a special case for setting p = ∞, the space of all bounded sequences `∞(I)

equipped with the infinity norm‖a‖∞ := supi∈I |ai|.

(ii) Lp(Rd) := {f : ‖f‖p <∞}

with ‖f‖p :=(∫ d

R |f(x)|pdx)1/p

. These Banach spaces are called the space of p-integrable

functions, referring to the Lebesgue integral.

We will see some of the most important signal spaces are Banach spaces as for example

Feichtinger’s algebra S0(Rd) from Definition 1.16.

Definition 1.6 (Inner Product). For a vector space V over R resp. C, we call a mapping〈.|.〉 : V × V → K an inner product, if it has the following properties:

• 〈v|w〉 = 〈w|v〉

• 〈v1 + λv2|w〉 = 〈v1|w〉+ λ〈v2|w〉

• 〈v|v〉 ≥ 0; 〈v|v〉 = 0⇔ v = 0

for vectors v, w ∈ V and scalar λ ∈ K.

Definition 1.7 (Hilbert Space). A vector space V together with an inner product 〈.|.〉 is calledan inner product space. With the induced norm ‖x‖V :=

√〈x|x〉, V is also normed space.

An inner product space with induced norm that is additionally complete is called a Hilbertspace.

In fact Hilbert spaces are special cases of Banach spaces. We can also see this fact by

considering two very important examples.

(i) l2(I) := {x = {xi}i∈I : xi ∈ C,∑i∈I |xp|2 <∞}

with the inner product 〈x, y〉 :=∑i∈I xnyn.

(ii) L2(Rd) := {f :∫R |f |

2 <∞}

with the inner product 〈f, g〉 :=∫Rd f(x)g(x)dx

The most important Hilbert spaces we will need in the field of structured sparsity are the

synthesis-space Hs and the coefficient-space Hc. But first we will discuss some other very

important Hilbert spaces in the terms of the Fourier transform and Short-Time-Fourier

transform such as the space S0(Rd).

Definition 1.8 (Bounded linear operators). Let V,W be normed spaces. A linear operatorT : V →W with finite operator norm

|||T ||| := sup‖x‖V ≤1

‖Tx‖W = sup‖x‖V =1

‖Tx‖W <∞.

is called bounded. We denote the set of these operators by L(V,W ).

12

1.1 Banach and Hilbert Spaces

Theorem 1.1. Let (V, ‖.‖V ) and (W, ‖.‖W ) be a normed space over K and T : V →W a linearoperator. Then the following statements are equivalent:

(i) T is continuous.

(ii) T is continuous in 0.

(iii) T is bounded.

Proof: Can be found in [Heuser].

Definition 1.9 (Self-adjoint Operator). Let T ∈ L(V,W ). Then there exists a unique operatorT ∗ ∈ L(V,W ) satisfying

〈Tx|y〉V = 〈x|T ∗y〉W

T ∗ is called the adjoint operator of T. If T = T ∗, T is called self-adjoint.

13

1 Basics

1.2 Introduction to Signals and Processing

In this section the fundamental properties of signals and processing these are given. The

definitions and results are based on [Haltmeier] and [Grochenig].

A 1-dimensional signal is a mapping

f : D → C

where D ⊂ R.

The choice of D provides which kind of signal we observe:

• For D = R, f is a continuous signal.

• For D = [a, b] bounded, f : [a, b]→ C is a finite signal.

• For D = Z, f : Z→ C is a discrete signal.

Analogously, we can look at signals with higher dimensions. For example, dimension 2

gives us an image. In this work we consider only 1-dimensional signals for audio

processing.

Two of the simplest signals we can think of are the sine and cosine functions.

Figure 1.1: Sine and cosine oscillation with frequency k and wavelength λ

It is important to understand how and with which parameters a signal is build, so we will

take a short look at these. We handle signals by using these values:

• Wavelength: after distance λ the function repeats this part of length λ.

• Frequency: defined as k = 1λ , the number of oscillation periods in [0, 1] (also known

as wavenumber. See 1.2

• Phase: ϕ = ∆xλ · 2π is a constant we add to the argument for getting a shift in time.

See Figure 1.3.

• Amplitude: by multiplying the parameter r we scale the function (change the

volume of the sound).

14


Figure 1.2: Frequency-shift: sine oscillation with original and doubled frequency

Figure 1.3: Time-shift: original sine oscillation and phase-shifted signal

We see that we can modify a given signal with simple operations. The operators realizing

these shifts are defined in the following way.

Definition 1.10. Let x, ω ∈ Rd.(i) The translation operator (time shift) Tx : L2(Rd)→ L2(Rd) is defined by

Txf(t) := f(t− x) (1.1)

(ii) The modulation operator (frequency shift) Mω : L2(Rd)→ L2(Rd) is defined by

Mωf(t) := e2πitωf(t) (1.2)

They yield the property that

TxMωf(t) = e−2πixωMωTxf(t).

15

1 Basics

The combination of these two operators will later be used to define a so called ‘atom’.

Atoms are the smallest building blocks for the Gabor transform discussed in Chapter 2.

Thats the reason for their name’s choice. For completeness, we define the

time-frequency shift operator.

Definition 1.11. Let λ = (x, ω) ∈ Rd × Rd = R2d. The time-frequency shiftπ(λ) : L2(Rd)→ L2(Rd)

π(λ) := MωTx.

Thereforeπ(λ2)π(λ1) = e−2πix2ω1π(λ1 + λ2).

Of course this is not sufficient to do useful processing. We need more information about

the signal. This leads us to the most important definition to deal with signals, the

Fourier transform.

Definition 1.12 (Fourier transform). For f ∈ L1(Rd) and ω ∈ Rd we set

(Ff)(ω) := f(ω) =

∫Rdf(x)e−2πixωdx. (1.3)

The mapping f : ω → f(ω), f ∈ L1(Rd) is called Fourier transform of f .

We observe signals in terms of complex analysis, thus each point x ∈ Rd is associated to a

complex value f(x) ∈ Rd. For applications it might be more useful to consider a signal as

a mapping f : time→ amplitude. To get the additional information of frequency, the

Fourier transform has been developed. By introducing this tool we get a possibility to

lead our time domain into a frequency domain and change our mapping into

f : frequency → magnitude. This tells us which frequencies occur in the whole signal and

what are their magnitudes.

Remark 1.2.1. The time values t are measured in seconds and the frequency values ω inhertz.

Remark 1.2.2. Note that the frequency space tells us which frequencies generally exist,but not at which time! For this task consider the Short-Time Fourier transform below.

In other words we could say we partition our signal into its basic oscillations. The proofs

for all following results can be found in [Haltmeier].

Theorem 1.2. The Fourier transform

F : L1(Rd) −→ C0(Rd)

f 7−→ f

is well-defined, linear and continuous and vanishes at infinity. Furthermore it yields theinequality

‖f‖∞ ≤ ‖f‖1.

16


Theorem 1.3 (Inversion Formula). For f ∈ L1(Rd), if f ∈ L1(Rd), then

f(x) =

∫Rdf(ω)e2πixωdω (1.4)

in all points where f is continuous.

It turns out that extending F to a unitary operator on L2(R) we get an energy preserving

operation on signals. This ensures the theorem of Plancherel.

Theorem 1.4 (Plancherel). For f ∈ L1(R) ∩ L2(R) is

‖f‖2 = ‖f‖2

and for f, g ∈ L2(R) Parseval’s formula

〈f |g〉 = 〈f |g〉.

The Fourier transform is usually first defined on L1. By Plancherel’s theorem it is shown

that we can extend it to L2. It is even possible to extend the definition to Lp or even bigger

classes of functions.i.e. the Schwartz class. In [Feichtinger2] the Fourier transform is

first defined for bounded measures in order to cover the full background. This one is

often referred to as Fourier-Stieltjes transform since it can be performed over R by

using Riemann-Stieltjes integrals.

We should keep in mind that for application we always use the discrete analogon. By

recalling (1.3) the Fourier transform on Cn is normalized as a unitary operator ν 7→ ν in

Cn,

ν(k) =1√n

n−1∑l=0

ν(k)e−2πikl/n (1.5)

for k = 0, ..., n− 1, which is called discrete Fourier transform (DFT) An Algorithm has

been implemented to compute the Discrete Fourier transform in a very fast way, this

algorithm is called Fast Fourier Transform (FFT).

Definition 1.13 (Convolution). The convolution of two functions f, g ∈ L1(Rd) is defined by

(f ∗ g)(x) :=

∫Rdf(y)g(x− y)dy (1.6)

and satisfies‖f ∗ g‖1 ≤ ‖f‖1‖g‖1

andF(f ∗ g) = Ff · Fg. (1.7)

Remark 1.2.3.(f ∗ g)(x) :=

∫Rdf(y)g(x− y)dy =

∫Rdf(x− y)g(y)dy

17

1 Basics

Equation (1.7) tells us that the convolution in the time space is analogue to the

multiplication in the Fourier/frequency space. This statement is also right when

observing a multiplication in time and convolution in frequency.

The convolution is a very important tool for modifying a signal. A well known example

would be the so called ”convolution reverb”. In this special case a (music-)signal gets

convolved with an impulse obtained by the recorded response in a room to capture its

reverb characteristics and gives us the original one with additional computed reverb. In

fact many plug-ins for audio recording and engineering programs work with impulse

yield by convolution. We will see in the following example more detailed how this can be

used in the field of audio engineering.

Example 1.2.1. It is possible to generate an impulse of an existing system in order tosimulate a ‘real-life’ system. To realize this one can use for example the freeware ”VoxengoDeconvolver” (http://www.voxengo.com). This program gives us the opportunity to capturethe system behavior, i.e. the before mentioned convolution reverb or the frequency responseof a guitar power amp with speaker cabinet. For demonstration purposes we take a look atthe latter.

First we build a sine sweep with help of the Voxengo Deconvolver (also possible to computein MATLABTM with the routine ‘chirp’) and import the obtained audio file into the digitalaudio workstation. A sine sweep is a sine oscillation with increasing frequency over time,in other words we sweep through the human audible range from 20 Hz to 22 kHz and hasa logarithmic behavior. The corresponding spectrogram (see short-time Fourier transform) isshown in Figure 1.4.

Figure 1.4: Logarithmic sine sweep with frequency 20-22000 Hz.

18


Figure 1.5: System chain for generating an impulse. The grey boxes show the systemcomponents.

In Figure 1.5 are the ordered components (and their reference models used for thisexperiment) listed that can be used to generate an impulse. It is necessary to have a fullduplex audio interface for playing and recording audio simultaneously. The recordedsweep response is therefore used to create the intended impulse by deconvolution of thesweep with the sweep response. That means for sine sweep s and its response h, we arelooking for the impulse g in

s ∗ g = h.

Thus in plug-ins like ”Poulin LeCab” (http://lepouplugins.blogspot.co.at/) our impulse g canbe loaded in order to be convolved to every recorded signal f. By performing thisconvolution f ∗ g we finally found a way to simulate the system characteristics for a signal.So when a suited impulse is given this is the most promising opportunity to get non-linearsounds, i.e. guitar distortion, in the digital way.

19

1 Basics

Now it’s time to fish for more information out of our given signal. We have seen so far,

that we are able to look at the signal either in time or in frequency. Although it’s good to

know which frequencies appear it would also be nice to know when these frequency

occur. This makes sense, since for example if we have a piano piece, we want to know

when a tone ‘C’ is played, not just how often it is played in this piece. In order to obtain

this information, Time-Frequency analysis is the right setting for us.

The idea is quite simple: localizing f in time at a point x by multiplying with a suitable

window function g centered at x and then perform the already known Fourier transform.

In this way it should be possible to obtain information about the frequencies contained in

the area around x. By bringing this into a mathematical statement we get the

Short-Time Fourier transform.

Definition 1.14 (Short-Time Fourier transform). Let g ∈ L2(Rd), g 6= 0, be fixed. Then theshort-time Fourier transform (STFT) of a function f with respect to g is defined as

Vgf(x,w) : = F(f · Txg(ω)) (1.8)

=

∫Rdf(t)g(t− x)e−2πitωdt (1.9)

=

∫Rdf(t)π(λ)gdt (1.10)

for x, ω ∈ Rd.

Figure 1.6: Signal with 3 different frequencies represented in the time label. [Dopfner]

Figure 1.7: Signal from 1.6 represented in the frequency label. [Dopfner]

20


Figure 1.8: Signal from 1.6 represented in the time-frequency plane. [Dopfner]

How the different representations of a signal look like is shown in Figure 1.6, 1.7 and

1.8. The x-axis corresponds to time, the y-axis to frequency and each pixel represents

the value of the coefficients’ energy in decibel (= 10log10(ck,j) in the discrete Gabor

notation discussed in Chapter 2). For the sake of continuity and to avoid unwanted noise

the standard window used for the STFT is the smooth Gaussian. Furthermore the

Gaussian minimizes the below mentioned uncertainty principle.

Definition 1.15 (Gaussian). Letφa(x) = e−

πx2

a (1.11)

denote the non-normalized Gaussian function with width a > 0, a ∈ R.

So does this give us the full and exact information about time and frequency?

Unfortunately not. There is a theorem about the relation of time and frequency in the

signals picture called the uncertainty principle.

Theorem 1.5. Uncertainty Principle For f ∈ L2(R) and a, b ∈ R

(∫R

(x− a)2|f(x)|2dx)1/2(∫

R(ω − b)2|f(ω)|2dω

)1/2

≥ 1

4π‖f‖22 (1.12)

with equality if and only if f is a multiple of TaMbϕc(x) = e2πib(x−a) · e−π(x−a)2/c for somea, b ∈ R and c > 0.

Proof: See [Grochenig].

The consequences of the uncertainty principle are represented in Figure 1.9. If we choose

a wide window the frequency approximation is more accurate but unfortunately the

accuracy of time gets smeared. In reverse for small windows the time representation gets

better and the frequency representation suffers.

The last definition in this chapter which will not be discussed here in detail for is

Feichtinger’s algebra S0(Rd) ⊂ L2 which is the appropriate window class for

time-frequency analysis. For more information, see [FeichGroch] or [Feichtinger2].

21

1 Basics

Figure 1.9: Signal short-time-Fourier transformed with a wide resp. a small window.[Dopfner]

Definition 1.16. Let g be the normalized Gaussian g(x) = ϕ1(x) = e−πx2

, then the so-calledFeichtinger’s algebra S0(Rd) is defined by

S0(Rd) := {f ∈ L1(Rd), ‖Vgf‖L1(Rd×Rd) <∞}. (1.13)

Remark 1.2.4. This definition works equivalently for any arbitrary fixed Schwartz classwindow g 6= 0 be used. See [Dopfner].

We will stop here since it was our desire to give the fundamental comprehension of

signals. Further important definitions and properties as for the Schwartz space S(Rd)resp. S0(Rd) can be found in [Grochenig]. Another important tool would be the

Banach-Gelfand-Triple, which is discussed for example in [CorFeiLu], [Feichtinger1] or

[Bannert]. We jump to the next section discussing fundamentals of Gabor Analysis which

provides the transform used in the further context of structured sparsity.

22

2 Introduction to Gabor Analysis

We learned in the first chapter about the the Short-Time-Fourier transform to represent

a signal simultaneously in both, time and frequency. We will now discuss a specification

of this, the so-called Gabor analysis theory. The further theory of structured sparsity will

be discussed on the ”playground” of Gabor analysis, of course this field is very copious,

so we try to concentrate on the most important definitions and their fundamental grasp.

To get a short introduction, we show one of Peter Soendergard’s ‘LTFAT’-toolbox

implementations (http://ltfat.sourceforge.net/). The contained function ‘sgram’ which

provides the spectrogram of a signal resp. to its settings uses a discrete Gabor transform.

The following example uses this function in combination of a small excursion of how to

generate synthetic signals in MATLABTM. The MATLABTM-Code can be found at the end

of this thesis.

Example 2.0.1. First of all we can provide a simple signal by determining a sinusoid. So ifwe do this with a frequency a human is able to hear, we can hear this sound in MATLABTM

by using the command ‘sound’. The spectrogram is shown in Figure 2.1.

Figure 2.1: Chamber pitch ‘a’ with 440Hz

23


We can build any kind of melody by composing different sinusoids. The Chamber pitch ‘a’may be computed with

a = sin(2π · 440 · H

8000)

where 440(Hz) is the chosen frequency, the divisor 8000 the sampling rate with sequencepoints H. H should be chosen based on the wished tone length, a whole note in this termcorrespond to the sequence H = (h)i, i = 1, 2, 3, ..., 8000.An example for this may be the famous ‘Star Wars’ - Theme depicted in Figure 2.2. We willsee that we can clearly look at every note of the melody as frequency.It should be noted that at the beginning (the onset) and the end of an digital generatedsignal will alway be a blur. By generating more than one sinusoided signal and puttingthem together in one matrix we also adopt the onset-blurs, a small side effect we have toapprove. We could try to hush this problem up by operating with envelopes, but this maynot be considered now. But still this is a very simple method to get a simple melody just byprogramming.

Figure 2.2: Spectrogram of the famous Star Wars Theme

24

2.1 Frames and Bases

2.1 Frames and Bases

The definitions and results are given in this and the next section are mainly taken from

[Grochenig] and [Dopfner]. Another good reference for Gabor analysis is [Dorfler3].

We start right away with the definition of frames.

Definition 2.1 (Frame). Let H be a separable Hilbert space and Γ a countable index set.The sequence of functions (ϕγ)γ∈Γ ⊂ H is called frame for H, if there exist constantsA,B > 0 such that

A‖f‖2 ≤∑γ∈Γ

|〈f |ϕγ〉|2 ≤ B‖f‖2 ∀f ∈ H. (2.1)

The scalars A and B are called frame bounds. A frame is called tight if A = B. If theframe constitutes additionally a basis, i.e. (ϕγ)γ ∈ Γ are linearly independent and theexpansion coefficients are unique, the frame is called Riesz basis. We could say, if a frameis a basis, then it is a Riesz basis.

Definition 2.2 (Riesz basis). A Riesz basis for a Hilbert space H is a family of the form{f}∞k = 1 = {Uek}∞k = 1, where {ek}∞k = 1 is an orthonormal basis for H and U : H → H is abounded bijective operator.

Remark 2.1.1. Note that this is is not the usual definition taught in lectures. Commonly anequivalent one, cf. [Heuser] is defined for a complete sequence {f}∞k ∈ H and constantsA,B > 0 such that for every finite scalar sequence ck we have

A∑|ck|2 ≤ ‖

∑ckfk‖2 ≤ B

∑|ck|2

We make a small discussion, to specify the fact that frames generalize bases. In fact the

energy stays preserved in the expansion. Furthermore the inequality (2.1) implies

span{(ϕγ)γ∈Γ} = H. The idea behind frames is to get higher flexibility than in bases since

we might not always find a basis. Unfortunately frames have the negative connotation

that we loose the uniqueness of the analysis coefficients, but the expansion property

stays preserved. We know that we obtain a tight frame for equal frame bounds A and B.

Any orthogonal basis is a tight frame with A = B = 1, a union of two orthogonal bases is a

tight frame with A = B = 2 and so on. This tells us that the union of finitely many frames

is again a frame.

We introduce now the earlier mentioned signal space Hs resp. coefficient space Hc. These

two play a decisive role in this work. Both are assumed to be separable Hilbert spaces. In

continuous time we assume Hs = L2(R) as the function- and Hc = l2(Γ) as the

sequence-space of finite energy. In discrete terms we think of Hs = CL and Hc = Cp with

L ≥ p.

The corresponding operators that will help us in the further context of structured

sparsity are given in the following definition.

25


Definition 2.3. For the frame (ϕγ)γ∈Γ with atoms ϕγ the synthesis operator Φ : Hc → Hswith c = (cγ)γ∈Γ ∈ Hc is defined as

Φc =∑γ∈Γ

cγϕγ (2.2)

Its adjoint operator Φ : Hs → Hc with

Φ∗f = (〈f |ϕγ〉)γ∈Γ (2.3)

is called the analysis operator.This operator fulfills

〈f |ϕ〉 = 〈f |∑γ∈Γ

cγϕγ〉 =∑γ∈Γ

cγ〈f |ϕγ〉 = 〈ϕ∗f |c〉.

Since it’s much easier to handle coefficients in the sense of processing it’s common first

to analyze a signal, then to do our computations in the coefficient space, i.e. sparse

recovery introduced in Chapter 3, and finally synthesize these coefficients to recover the

changed signal.

Definition 2.4. For analysis operator Φ and synthesis operator Φ∗ the frame operator isdefined by

S := ΦΦ∗ : Hs → Hs (2.4)

and fulfills the frame inequality

A‖f‖2 ≤ 〈Sf |f〉 ≤ B‖f‖2 ∀f ∈ Hs. (2.5)

Remark 2.1.2. In the finite/discrete case we can treat all these operators as (infinite)matrices. In this sense the atoms ϕγ constitutes the columns of Φ. The smallest and largesteigenvalues of S represent the optimal frame bounds and the ratio B

A corresponds to thenumerical condition number responsible for its stability.

We will later observe constrained minimization problems to get sparsity of a signal. They

are of the form

minc

1

2‖c‖2

s.t. f = Φc

and have the solution

c = Φ∗S−1f. (2.6)

Solution (2.6) is often called ‘method of frames’. In fact this minimization problem is

analog to the Tykonov-regularization introduced in Chapter 3.

We look at one more corollary before we finally discuss Gabor frames.

26

2.2 Gabor Frames

Corollary 2.1.1. If (ϕγ)γ∈Γ is a tight frame with bound A, then the canonical dual frame is(A−1ϕγ)γ∈Γ and

f = A−1ΦΦ∗f ∀f ∈ Hs.

Remark 2.1.3. Analogous to the canonical dual frame, there exists a canonical tight frame.

This corollary tells us the consequence of tight frames is that analysis and synthesis uses

the same frame. This is very useful since this makes it possible to reduce the effort. To

be more precise, also g/√A for example is a tight atom reproducing the identity which is

good in case of Gabor multipliers. In fact constant functions are providing multiples of

the identity operator,i.e. the real case gives symmetric operators, for which we can find

eigenvalues resp. eigenvectors. If it is necessary to use a different dual window for

synthesis the Gabor multipliers would not be symmetric anymore. For more details see

[Dopfner].

Proposition 2.1.1. Let (ϕγ)γ∈Γ be a frame and S the associated frame operator. Then(S−

12ϕγ)γ∈Γ is a tight frame with frame bounds A = B = 1, which is called canonical tight

frame.

Remark 2.1.4. Note that S−12 is self adjoint, i.e. S−

12SS−

12 = Id.

For Gabor Frames, this result is comparable to a canonical tight window gt := S− 1

2

α,βg0 of

closest to the original window g0 for the frame operator Sα,β with given given lattice

values.

2.2 Gabor Frames

From the point of time-frequency analysis the two most important specifications are

wavelet analysis and Gabor analysis. Two differences should be mentioned namely while

wavelet transforms focus on terms of time-scale representations, the Gabor transform

gives us superpositions of time-frequency atoms. The other important point is that the

wavelet case can be generated with orthonormal basis in contrast to Gabor analysis,

where we live only in dependence of frames. It is said that wavelets may be better for

image processing and Gabor analysis for audio processing but this depends of course on

the respective application.

We already defined in (1.1) and (1.2) the two main tools that generate a Gabor frame, the

translation and modulation operator, or more precise their combination TaMb. But this

time we apply a fixed window function along a discrete subset of Rd, i.e. Hs = L2(R) and

Hc = l2(Γ) for Γ = αZ× βZ.

Definition 2.5. Let g ∈ L2(R) be a non-zero window function. The set of functions

G(g, α, β) := (TαnMβmg)(n,m)∈Z×Z

is called Gabor system. If G(g, α, β) constitutes a frame it is called a Gabor frame.

27


Remark 2.2.1. The window function g is typically non-negative, centered at the origin andsymmetric.

In connection with the STFT we can write Vgf(x, ω) = 〈f,MωTxg〉 = e−2πixω〈f, TxMωg〉restricted to a discrete lattice (x, ω) ∈ αZ× βZ. The frame operator yields the form

Sf =∑m,n∈Z

〈f |TαnMβmg〉TαnMβmg

=∑m,n∈Z

Vgf(αn, βm)MβmTαng.

Notation: Usually we note atoms corresponding to a Gabor frame by ϕm,n = TαnMβmg.

Proposition 2.2.1. Let G(g, α, β) be a frame for L2(R). Then there exists a dual windowg = S−1g, such that G(g, α, β) is the dual frame of G(g, α, β). Furthermore every f ∈ L2(R)

satisfies

f =∑m,n∈Z

〈f |TαnMβmg〉TαnMβmg

=∑m,n∈Z

〈f |TαnMβmg〉TαnMβmg.

So up to a constant factor the dual frame is close to the original frame.

Theorem 2.1. Let g ∈ L2(R), α, β > 0 and let G(g, α, β) be a frame. Then αβ ≤ 1. If αβ = 1,then G(g, α, β) is a Riesz basis.

This Theorem gives us the connection of frames and Riesz bases. It seems to be ideal to

use this connection for finding basis, but it turns out that Riesz basis do not have the

expected good properties. The theorem of Balian and Low (see [Grochenig]) is kind of a

uncertainty principle for Gabor frames which shows the weakness of a basis in this

context. It makes us aware of the importance of redundancy and the form of frames

since there exists no basis of L2(R) inhibiting the structure of Gabor systems which have

good time-frequency resolution.

So the conclusion is that if a window g is well time-frequency localized then we have

αβ < 1. Furthermore αβ < 1 is fulfilled for all Gabor frames if the Gabor window belongs

to Feichtinger’s algebra S0(Rd). In [Kaiblinger] is discussed to which extent we can use

the discrete setting to approximate the continuous setting.

Remark 2.2.2. There exists a generalization of Gabor frames, the so-called non-stationaryGabor Frames. The special advantage is that they allow adaptivity of windows and latticein either time or frequency. It is clear that this would offer more flexibility and betterresults. They are discussed for example in [DorfMatu] or [VelHoliDorfGrill].

28

2.3 Discrete Gabor analysis

2.3 Discrete Gabor analysis

Since we can only perform computations in the discrete way for applications, we try to

translate some facts into its discrete analogies.

For this task we usually observe signals f ∈ CL, L ∈ N. This appears to make more sense

when we imagine this as an embedding into the space constructed by L-periodic

sequences, with convex values, i.e. CL ∼= l2(N/LN).

Definition 2.6. For f ∈ CL the

• discrete translation operator is defined by

(Tkf)[n] := f [n− k]

• discrete modulation operator is defined by

(Mjf)[n] := e2πijnL f [n]

for k, j ∈ Z

By time-frequency shifting only one window we generate a discrete Gabor system with

atoms

ϕk,j := MjbTkag

for g ∈ CL, k = 0, ...,K − 1, j = 0, ..., J − 1 and Ka = Jb = L. Note that a gives the number of

samples, that means the window gets a-times shifted over the signal, and J = Lb

determines the number of frequency channels in each area regarding the Fourier

transform. By summarizing this we obtain, with our Gabor atoms, the Gabor coefficients

ck,j = 〈f, ϕk,j〉.

Remark 2.3.1. In case ab = L the discrete Gabor frame corresponds to a basis and ifab < L the frame is oversampled with redundancy L

ab . The discrete analogue of the STFTseems to be the Gabor transform with a = b = 1, i.e. a maximal redundancy of L.

It is suggestive to choose windows of length l << L. Therefore we get a frequency lattice Ll

and reduced redundancy Lab = l

a with a ≤ l. The best results have been achieved with

redundancies of 2 to 8 in practical applications.

In order to constitute a frame from a discrete Gabor system it is necessary that

r := KJ ≥ L and that the matrix Φ = (ϕk,j)k,j ∈ CL×r occupies full rank. Furthermore the

frame operator S has a diagonal form if supp(g) ≤ J , where g denotes the window and J

the length of the FFT (mentioned in the first chapter) and corresponding to the Walnut

representation given in [Dorfler1] (original: [HeilWalnut])

Sp,q =

J(∑K−1k=0 Tkag(p)Tkag(q) if |p− q| mod J = 0

0 else

29


we get

Sf [n] = J

(K−1∑k=0

Tka|g[n]|2)f [n]

for n = 1, ..., L and thus the calculation of the dual window

g[n] = g

/(J

K−1∑k=0

Tka|g[n]|2)

For more details on the complexity of the discrete Gabor form, see [Holighaus]. Note that

in applications all operations are implemented on the basis of the FFT.

30

3 Structured Sparse Recovery

Up to now we have explained how to represent a given signal. These results will further

be used to observe some methods for processing realistic signals. If one has ever tried to

record an instrument or someone’s singing by making use of a simple microphone, you

might have noticed, that there always seems to be a background noise as a well known

‘air-rustle’. So we have to observe now a signal with additive noise e. Let f be the wanted

signal and y the observed data from recording. Hence then we can consider the signal as

y = f + e (3.1)

The aim of our further work concerns the recovering of the signal f , at least

approximately. We will only operate on two special Hilbert spaces

• Hs... signal space

• Hc... coefficient space

Hs in application can be seen as L2(R) or CL. We represent f as a linear combination of

frame elements, f =∑γ cγϕγ with coefficients c ∈ Hc and corresponding index set γ ∈ Γ.

So c ∈ Hc can be thought of as being `2(Γ) or also as CL in applications. Φ : Hc → Hs is

known as the synthesis operator with Φ = (ϕ1, ..., ϕγ , ...). We need to search the

coefficients, which minimizes the discrepancy

∆(c) :=1

2‖y − Φc‖22 (3.2)

of the given data y and the image of c. As [[Siedenburg1], p.16] says: “This task can be

considered as a linear inverse problem, as we seek to infer c from its noisy image under

the linear operator Φ.” Unfortunately the linear problem does not have a unique solution

and is furthermore not continuously dependent on the data. So we are lead to the term

of regularization. By adding some constraints on the coefficients in form of a penaltymeasure Ψ : Hc → R+

0 we obtain the regularized functional, the so-called Lagrangian

L(c) := Ly,λ(c) := ∆(c) + λΨ(c) (3.3)

and seek c ∈ Hc such that

c = argmincL(c). (3.4)

31


The value λ > 0 is named Lagrange-multiplier. The more common name for λ is sparsitylevel since it gives us the weight given to the penalty term. The bigger λ, the more the

penalty will be taken into account and and vice versa.

3.1 Sparsity

The idea behind structured sparsity is to represent a signal with as less information as

necessary, what actually means ‘sparse’.

Example 3.1.1. As example for sparsity we can observe a matrix with zeros in it.2 0 0 0

0 7 0 0

0 0 5 0

3 0 14 1

Unless it’s unnecessary to save the whole matrix (because it would need more data incomputational application need more and thus an application might get slower) we alsocould save only the relevant entries which are not zero. In MATLABTM we get therepresentation

(1,1) 2(4,1) 3(2,2) 7(3,3) 5(4,3) 14(4,4) 1

with the command sparse(’matrix’) .

As in the matrix example, we try to compute an approximation of the signal s by using as

few atoms ϕγ as possible. Indeed large classes of signals seem to be sparsely represented

by using a suited dictionary. That causes the importance of this method.

An useful example for sparsity in the following sense is the separation of an audio signal

into a separation model that can be thought of signal = tonal + transient+ noise. We call

this resolution a multi-layer decomposition. For imagination we can think of the tonal

part as melody composed of frequencies and the transient part as a composition of beats

of drums or attack of an electric guitar. We can connect the tonal layer to the stationary

part of sounds. This one is sparsely represented in a dictionary realized with a long

supported window. In contrast, the transients which actually are the non-stationary

parts will be be represented in a Gabor frame with short support. This kind of

decomposition is shown with an example in Chapter 5.

More natural examples may be the mentioned application of denoising. In this thesis our

focus will lie on this one which is directly connected to the term of structured sparsity.

32

3.2 Sparse Regularization

This application effects miracles, if someone is trying to make a ‘noisy’ signal clean. Just

think of the crunchy sound known from ancient discs for a phonograph.

And a third very important example is declipping. Clipping is a kind of waveform

distortion that occurs when an amplifier is overdriven and attempts to output a signal

current beyond its maximum capability. This engenders an unwanted ‘click’ where the

signal is clipped. But often deliberately overdriving of a signal is exactly what is wanted.

Many electric guitar players for example intentionally overdrive their guitar amplifiers to

cause clipping in order to get a desired sound, a so called ‘Overdrive’- resp.

‘Distortion’-effect.

Fact is that penalty measures Ψ which support sparsity will offer a solution which seems

to be more natural and provides with higher resolution of the recovery. In this chapter

this kind of problem will be discussed based on [Siedenburg1]. An overview of how we got

to research structured sparsity is also to find there.

3.2 Sparse Regularization

The problem we have to observe is clarified in (3.4). The next step lies in the task of

finding a suited penalty measure Ψ for the Lagrangian defined in (3.3). In fact this is the

point for which we will define all operators in the further context.

To realize this in respect of a sparse representation, it is clear that the aim is the

minimization of the number of non-zero coefficients given by ‖c‖0 := #{cγ : cγ 6= 0}, i.e.

Ψ = ‖c‖0. ”This approach was also called ideal atomic decomposition by [Donoho] since

is it yields the most efficient representation of a signal” as mentioned in [Siedenburg1].

But there is one little problem. Using the `0-penalty brings us to a non-convex problem,

which is NP-hard. We make a small digression in the next paragraph which can be

omitted for our further work.

A decision problem is a question in some formal system with an answer that can only be

‘yes’ or ‘no’, depending on the values of some input parameters. For example, the problem

”given two numbers x and y, does x evenly divide y?”. NP means non-deterministicpolynomial-time and is the set of all such decision problems for which the instances,

where the answer is ”yes”, have efficiently verifiable proofs of the fact that the answer is

indeed ”yes”. These proofs have to be verifiable in polynomial time. NP-hard is a class of

problems that are ”at least as hard as the hardest problems in NP”. For understanding

the detailed theory of NP-Completeness the lecture [GareyJohnson] is recommended.

The only thing we should keep in mind is that this non-convex problem cannot be solved

in finite time. But the `0 norm will still be important to compare sparsity properties.

We continue to find a solution. For getting an easier and more solid way of computing, we

will use instead of the `0-norm the `1-norm, i.e. Ψ(c) = ‖c‖1. It was shown that the

`0-solution can be recovered uniquely by the `1-minimization under the condition of a

33


sufficient sparse signal. See [Donoho].

Figure 3.1: Unit balls of `q for: (a) q = 0, (b) q = 0.5, (c) q = 1, (d) q = 1.5, (e) q = 2

We could say the `1-norm is a convexification of `0. This point may not be clear

immediately in [Siedenburg1]. So this can be visualized by approximating the `0-norm by

a `q pseudo-norm

‖c‖q =

(∑γ

|cγ |q)1/q

(3.5)

which is non-convex for 0 ≤ q < 1, and convex and thus a norm for q ≥ 1. If we take a look

at Figure 3.1 we can observe in two dimensions for decreasing q, that the unit balls of

vectors ‖c‖q ≤ 1 approach the `0 unit ball, which corresponds to the two axes. Thus

analogue to that the `1 Lagrangian minimization ‘convexificates’ the `0 Lagrangian

minimization. Based on these considerations our problem (3.4) changed to the convex

problem

minc

{1

2‖y − Φc‖22 + λ‖c‖1

}. (3.6)

An equivalent formulation of this problem was given by [Tibshirani] in the field of

statistics with the name Basis Pursuit Denoising (BPDN) and LASSO - least absoluteshrinkage and selection operator. [Siedenburg1] tells us that the deterministic

linear-inverse problem and the stochastic based regression-problem coincide. We will

discuss this problem from our deterministic point of view.

Remark 3.2.1. To get a sense for BP, we take a look of an other term, which is mentionedin [WeinWakin]. One of the most promising fields for recovering certain signal information isCompressive sensing (see [Mallat] or [Candles]).Compressive sensing (also known as compressed sensing, compressive sampling, orsparse sampling) is a signal processing technique for efficiently acquiring andreconstructing a signal by finding solutions to underdetermined linear systems. This takesadvantage of the signal’s sparseness or compressibility in some domain, allowing theentire signal to be determined from relatively few measurements. So we can conclude thatthis plays an important role in the application of denoising. Knowing this, we see BP as thecanonical CS-method for recovering a sparse signal.The main difference between BP and Lasso is that the first one only works for

34

3.3 Thresholding with Mixed Norms

underdetermined systems. Whereas the Lasso is more tailored for overdeterminedsystems. The last one is realized by minimizing the squared error rather than constrainingit to be equal to zero. As mentioned before, the equivalent problem to LASSO is BPDN.


With the Lasso, we have our first operator that delivers us a way to sparse regularization.

However there may be a more natural way of considering (3.6). Since we are observing

problems from the point of view of Gabor analysis with atoms ϕγ, generated by

translation and modulation and therefore ordered along two dimensions, it makes sense

to split our indices into groups and members. The way to realize this to replace our `1penalty by a weighted mixed norm. This yields the result that the global (group) level

and local (member) level will work in a different way.

Definition 3.1 (Weighted Mixed Norm). Let Γ be a doubly labelled index set andK,J1, J2, ..., Jk, ... be countable index sets such that Γk := {(k, j) : j ∈ Jk} ∀k ∈ K we haveΓ =

⋃k∈KΓk, in other words Γ is the disjoint union of the groups of indices Γk.

Let w = (wγ)γ∈Γ be a positive sequence of weights. The weighted mixed norm `w,p,q on Hcfor 1 ≤ p, q <∞ is defined by

‖c‖w,p,q :=

∑k∈K

∑j∈Jk

wk,j |ck,j |pq/p

1/q

(3.7)

So one of these groups consists of one fixed index k together with all appointed indices j.

We will see that these groups can be ordered along the time as well as the frequency

label, to achieve different results.

Remark 3.3.1. We assume our index sets always to be countable.

Note that weighted mixed norms are a generalization of the regular weighted norms, that

means for p=q:

`w,p,q = `w,p = `w,q

These norms fulfill the known norm properties from Definition 1.1 since they are

equivalent to a composition of `p and `q:

‖c‖w,p,q =

∥∥∥∥{∥∥∥(ck,j(wk,j)1/p)j∈Jk

∥∥∥p

}k∈K

∥∥∥∥q

(3.8)

Notation: The notation of the indices will further be used depending on their reference. γ

denotes the non-partitioned set Γ while (k, j) is connected to its explicit group and

member form Γ =⋃k Γk.

With help of this structure we can emphasize sparsity on either member or group level.

The penalties we will use further on are the already known regular Lasso `w,1,1 = `w,1 and

the new mixed penalties

35


Figure 3.2: Unit balls of the mixed norms, with horizontal group Γ1 = {x, y} and elevationgroup Γ2 = {z} Top: left: `1 (Lasso), right: `2 (Tykhonov), Bottom: left: `2,1(Group-Lasso), right: `1,2 (Elitist-Lasso) [Siedenburg1]

36


• `w,2,1... Group Lasso

• `w,1,2... Elitist Lasso

`w,2,1 which was first discussed in [YuanLin] intensifies sparsity on the group level while

retaining diversity on the level of members. It is reasonable that `w,1,2 does the opposite.

The name ‘Elitist Lasso’ appears first in [Kowalski]. As the names suggests, either a

group of coefficients is important and will be held or discarded or the coefficients are

treated completely individual and are discarded only according to their size. To simplify

the latter one we could say ‘only the strongest will survive’. It turns out that each of

these strategies has advantages and disadvantages. To get an impression, Figure 3.2

sketches the unit balls corresponding to p, q ∈ {1, 2}.

Reformulating the Lagrangian by setting Ψ(c) = 1q‖c‖

qw,p,q gives us

Lw,p,q(c) :=1

2‖y − Φc‖22 +

1

q‖c‖qw,p,q.

Thus our sparse recovery problem (3.4) has changed to

mincLw,p,q(c). (3.9)

Remark 3.3.2. The information of the λ is w.l.o.g. adopted by the weights w.

The following definition is the most elementary tool for proceeding in sparse

regularization. After the definition of the generalized thresholding operator we will

derive the four cases with the underlying mixed norms.

Definition 3.2 (Generalized Thresholding Operator). For z, w ∈ Hc, wγ > δ > 0 and anon-negative function ξ = ξγ,w : Hc → [0,∞], the generalized thresholding operator isdefined component-wise by

Sξ(zγ) := zγ(1− ξγ,w(z))+ (3.10)

where b ∈ R, b+ := max(b, 0). ξ is called the threshold function.

Notation: We usually write Sξ(z) := (Sξ(zγ))γ∈Γ and the subscript is mostly adjusted for

the respective dependencies. For example, Sw,p,q highlights the relation to ξγ,wcorresponding to the weighted mixed norm `w,p,q (see below).

In face there are two possibilities of defining a threshold operator. The one we already

know is the so-called soft-thresholding operator SST which can be written as

SSTλ (z) := z

(1− λ

|z|

)+

=

z − λ : z ≥ λ

0 : |z| < λ

z + λ : z ≤ −λ

, i.e. ξST =λ

|z|

for z ∈ C.

37


The second type is the hard-thresholding operator SHT :

SHTλ (z) :=

z : |z| > λ

0 : |z| ≤ λ

The interesting connection between these two operators is

ξHT (z) = limk→∞(ξST (z))k.

In Figure 3.3 is shown how the two thresholding operators work. While the

hard-thresholding operator just sets every value under the shrinkage level λ equal zero

and leaves the rest untouched, the soft-thresholding operator lowers in addition the rest

by λ. Soft-thresholding provides a smoother result which mostly delivers better results.

Figure 3.3: (a) Standard-Gaussian, (b) Standard-Gaussian with thresholding value λ =0.2, (c) after soft-thresholding, (d) after hard-thresholding

[[Kowalski], Theorem 3] gives the following theorem for the thresholding operators

corresponding to each of our weighted norms. A proof can also be found there.

38


Theorem 3.1. Let Φ be the unitary synthesis operator and w = (wγ)γ the strictly positivesequence of weights. For γ = (k, j), let xk := (xk,j)j denote the subsequence of members and‖xk‖p its respective `p-norm. Then the minimizer Lw,p,q from (3.9) is given by thegeneralized soft thresholding operation

c = Sξ(Φ∗y), (3.11)

which is defined component-wise by ξw for zk,j 6= 0 in the following cases:

(i) p = q = 1 : ξw(zγ) =wγ|zγ | (Lasso)

(ii) p = q = 2 : ξw(zγ) =wγ

1+wγ(Tykhonov Regularization)

(iii) p = 2, q = 1;wk,j = wk∀k, j : ξw(zk,j) =√wk,j‖zk‖2 (Group-Lasso)

(iv) p = 1, q = 2 : ξw(zk,j) =wk,j

1+Wwk||||zk|||wk|zk,j | (Elitist-Lasso)

where Wwk :=∑Jkjk=1 w

2k,jk

, and |||zk|||wk =∑Jkjk=1 wk,jk |zk,jk | and for any k, jk is a

sequence of indices such that rk,jk :=|zk,jk |wk,jk

is decreasing in jk, and Jk is the quantityverifying

rk,Jk+1 ≤Jk+1∑jk=1

w2k,jk

(rk,jk − rk,Jk+1) and rk,Jk >

Jk∑jk=1

w2k,jk

(rk,jk − rk,Jk)

Proof: The proof is taken from [[Siedenburg1], Theorem 2.6]. The aim is the

minimization of ,

1

2‖y − Φc‖22 +

1

q‖c‖qw,p,q. (3.12)

The cases for subsequences ck = 0 for p = 2, q = 1 and with components ck,j = 0 for p = 1,

q ∈ {1, 2} will be shown separately, since the problem (3.12) becomes non-differentiable in

these.

We define y′ := Φ∗y, θγ := arg(y′γ) and ϑγ := arg(cγ). From the unitarity of Φ follows

‖y − Φc‖22 = ‖y′ − c‖22 =∑γ

|y′γ − cγ |2 =∑γ

|y′γ |2 + |cγ |2 − 2|y′γ ||cγ |cos(θγ − ϑγ).

This tells us that we can execute the minimization component-wise. Next, we fix γ = (k, j)

and ck,j 6= 0, and differentiate (3.12) to obtain

|ck,j | = |y′k,j |cos(θk,j − ϑk,j)− wk,j |ck,j |p−1∑k

∑j

wk,j |ck,j |p

q−pp

(3.13)

0 = 2|cγ ||y′γ | sin(θγ − ϑγ). (3.14)

forθγ = ϑγ + lπ with l ∈ {0, 1}. Note that the differentiation made with regard to modulus

39


and argument of cγ. From (3.13) can be derived that l = 1 is impossible, so θγ = ϑγ must

be fulfilled.

Summarizing the sums over k and j of the righthandside of (3.13) yields the variational

equations

|ck,j | = |y′k,j | − wk,j |ck,j |p−1‖ck‖q−pwk,p(3.15)

arg(cγ) = arg(y′γ). (3.16)

We will now observe the four cases separately:

(i) p = q = 1. For wγ < |y′γ |, we have

|cγ | = |y′γ | − wγ . (3.17)

For wγ ≥ |y′γ |, (3.15) can not be verified, which implies cγ = 0. Thus in combination

with (3.16) and (3.17) finally yields cγ = y′γ(1− wγ|y′γ |

)+.

(ii) p = q = 2. This one is the simplest case since problem is differentiable everywhere

and we obtain cγ =y′γ

1+wγ= y′γ(1− wγ

1+wγ)+.

(iii) p = 2,q = 1. Since the weights w are assumed to be constant over each

subsequence, we can neglect the second index j and write wk = wk,j. For the special

case ck 6= 0 mentioned at the beginning of the proof, (3.15) implies

|ck,j | = |y′k,j | − wk|ck,j |‖ck‖−1wk,2

. (3.18)

The index j can w.l.o.g. be substituted by an independent index l, because wk is

constant in this group as mention before. So we can write

wk‖ck‖−1wk,2

=|y′k,l| − |ck,l||ck,l|

.

If we insert this term into

|ck,j | =|y′k,j |

1 + wk‖ck‖−1wk,2

we obtain

|ck,j | =|y′k,j ||y′k,l|

|ck,l| ⇐⇒ |ck,l| = |ck,j ||y′k,l||y′k,j |

∀j, l.

We use the last identity to decouple the coefficients in the group norm of c:

‖ck‖wk,2 =

√∑l

wl|ck,l|2 =

√∑l

wl|y′k,l|2|ck,j |2|y′k,j |2

=√wl|ck,j ||y′k,j |

‖y′k‖2.

Inserting into equation (3.18) yields

40


|ck,j | = |y′k,j |(

1−√wl

‖y′k‖2

).

and since for√wl > ||y′k||2, (3.18) does not hold, except for ck = 0 we receive

ck,j = y′k,j

(1−

√wl

‖y′k‖2

)+

.

(iv) p = 1,q = 2. In this case, we will decouple the entries of c in order to a re-sort the

coefficients. Thus an explicit expression for thresholding can be achieved. We start

by rewriting (3.15) for all k and j which has the form.

|ck,j | = ||y′k,j | − wk,j‖ck‖wk,1, (3.19)

By rearranging this equation changes to

‖ck‖wk,1 =|y′k,j | − |ck,j |

wk,j∀k, j (3.20)

Then, ∀k, j, l with ck,j 6= 0 and ck,l 6= 0: |ck,l| = |y′k,l| −wk,l(|y′k,j |−|ck,j |)

wk,j. To keep the

overview we use the abbreviations

Wwk :=∑

l:|ck,l|6=0

w2k,l, |||y′k|||wk =

∑l:|ck,l|6=0

wk,l|y′k,l| and ρk := ‖ck‖wk,1,

and consider

ρk =∑l

wk,l|ck,l|

=∑

l:|ck,l|6=0

wk,l

(|y′k,l| −

wk,l(|y′k,j | − |ck,j |)wk,j

)(3.21)

=

∑l:|ck,l|6=0

wk,l|y′k,l|

−Wwk

|y′k,j | − |ck,j |wk,j

= |||y′k|||wk −Wwkρk,

which is equivalent to

ρk =|||y′k|||wk

1 +Wwk

(3.22)

Note that this is still an implicit form which depends on the support set

{l : |ck,l| 6= 0} = {l : |y′k,l | > wk,lρk}. To receive an explicit expression for the support,

we observe a sequence of indices jk for every k, such that the sequence rk,jk :=|y′k,jk |wk,jk

decreases. Next, we choose Jk such that rk,Jk+1 ≤ ρk < rk,Jk , i.e. jk = 1, ..., Jk should

belong to the support of c. Now we use (3.21) to obtain

41



w2k,jk

(rk,jk − ρk) and rk,Jk >

Jk∑jk=1

w2k,jk

(rk,jk − ρk)

and for inserted ρ


w2k,jk

(rk,jk − rk,Jk+1) and rk,Jk >

Jk∑jk=1

w2k,jk

(rk,jk − rk,Jk).

We finally attained an implicit expression of Jk which still is independent of c. We

consider that

Wwk =

Jk∑jk=1

w2k,jk

, and |||y′k|||wk =

Jk∑jk=1

wk,jk |y′k,jk |.

Combining (3.22), (3.20) and (3.16), the formula we where looking for is shown:

ck,j = y′k,j

(1− wk,j

(1 +Wwk)

|||y′k|||w−k|y′k,j |

)+

.

Remark 3.3.3. The Tykhonov-regularization with p = q = 2 is in fact no thresholding in oursense, since ξw(zγ) =

wγ1+wγ

< 1, i.e. Sξ,w(z) 6= 0 ∀z 6= 0. That means, no non-zero coefficientsare indeed set to zero and thus the solution exhibits no sparsity. But it still has anadvantage: a solution with closed form for general linear operators: c = (Φ∗Φ + Iw)−1Φ∗y)

These formulas seem quite complicated. It is not that important, not to understand them

in detail if we just see them as a tool for thresholding, i.e. for the later discussed

algorithms. The following corollary from [Kowalski] gives us comparison between mixed

norm regularization and the weighted `1 norm, which simplifies the handling of mixed

norms.

Corollary 3.3.1. Let Φ be unitary and w = wγ > 0. Then there exists a strictly positivesequence u = (uγ)γ∈Γ depending on y′ := Φ∗y, such that the minimizer and minima of Lw,p,qand Lu,1 coincide, i.e. for Lw,p,q-minimizer c and Lu,1-minimizer c is

‖c‖u,1 = ‖c‖qw,p,q . (3.23)

The proof can also be found in [Siedenburg1].

42

3.4 Thresholding Algorithms for Frames


It is time to proceed some methods for obtaining the intended minimizer of 3.9 for

Frames. The most obvious way seems to be finding algorithms which converge to this

solution. By considering the case of orthonormal bases we receive an algorithm called

block coordinate relaxation for finite dimensions, proposed by [Sardy]. Unfortunately

Gabor analysis in which we want to operate does not give us the foundation of a

orthonormal basis as for example the modified discrete cosine transform (MDCT) has it.

We already know that we can use an alternative to these, the Gabor frames. Therefore we

want to generalize the soft-thresholding results to frames or general linear operators. We

should keep in mind that in case of an orthonormal basis we only need one step resp.

operation while in Gabor representation we need an iterative process.

To get the desired iterations, we start with the so-called Landweber iteration for the

approximate solution of inverse integral operator problems and proceed by modifying it,

since it does not contain a thresholding step.

First we discuss the case of LASSO with `1 and then generalize them for the other cases.

The following Theorem which similar to [Daubechies] guarantees the convergence of the

sequence generating the the iterative algorithm.

Theorem 3.2. Let Φ : Hc → Hs a bounded linear operator s.t. ‖Φ‖ < 1, the weightsw = (wγ)γ uniformly bounded from below, i.e. wγ ≥ w > 0 get a sequence. Then, forarbitrary c0 ∈ Hc the sequence (χn(c0))n∈N⊂Hc with

χ(c) = Sw(c+ Φ∗(y − Φc)) (3.24)

that converges to a minimizer of the Lagrangian Lw,1.

Adapting this result we obtain the iterative soft-thresholding algorithm (ISTA), also

called thresholded Landweber iteration:

cn+1 = (χn+1(c0))n+1 = Sw(cn + Φ∗(y − Φcn)) (3.25)

We will not cite the proof of Theorem 3.2 and hence the convergence of the ISTA for the

case of `1 penalization in this work since this is a long and technical task. But the idea

should be considered. In fact [Opial] gives the following theorem that serves as a tool for

proofing Theorem 3.2. In further consequence the generalized ISTA called multi-layer

decompositions can be proofed analogously. All of these proofs can be read in

[Siedenburg1].

43


Theorem 3.3. Let H be a Hilbert space and let χ : H → H satisfy the conditions(i) χ is non-expansive, i.e. ‖χ(c)− χ(a)‖2 ≤ ‖c− a‖2 ∀c, a ∈ H;

(ii) χ is asymptotically regular, i.e. ∀c ∈ H : ‖χn+1(c)− χn(c)‖2 → 0 for n→∞;

(iii) the set Fix(χ) of fixed points of χ is non-empty.

Then the sequence (χn(c0))n∈N converges weakly to a fixed point in Fix(χ), for all c0 ∈ H.

It turns out that the ISTA converges even strongly.

Theorem 3.4 ([Daubechies]). Under the assumptions of Theorem 3.2, the sequence ofiterates (cn)n∈N with

cn+1 = Sw(cn + Φ∗(y − Φcn))

converges strongly to a minimizer Lw,1.

In order to obtain the implication from weak to strong convergence we use the following

Lemma 3.4.1. In fact to be able to proof the lemma two other results proven in

[Daubechies] are necessary, namely that for c? := w − limncn and h := c? + Φ∗(y − Φc?) we

yield

• ‖Φ(cn − c?)‖22 → 0 for n→∞;

• ‖Sw(h+ (cn − c?))− Sw(h)− (cn − c?)‖2 → 0 for n→∞.

Lemma 3.4.1. If a ∈ Hc and (bn)n∈N ⊂ Hc, with weak limit w − limnbn = 0, and

limn‖Sw(a+ bn)− Sw(a)− bn‖2 = 0, then ‖bn‖2 → 0 for n→∞.

Proof: This proof is taken from [Siedenburg1] with small alterations. We follow the idea:

a partition shown in Figure 3.4 of the index set Γ into Γ0, Γn1 and Γn1 , ∀n will be made in

order to show strong for each of these, i.e. Γn1 will vanish for sufficiently large n.

Figure 3.4: Sketch of the stepwise intended partition of Γ.

44


For w := infγ wγ, define a finite set Γ0 ⊂ Γ such that for Γ1 := Γ \ Γ0,

∑γ∈Γ1

|aγ |2 ≤ (w

4)2.

Since we have weak convergence of bn we yield strong convergence on this finite set Γ0,

i.e.

∑γ∈Γ0

|bnγ |2n→∞−−−−→ 0.

The next sets are defined as Γn1 := {γ ∈ Γ1 : |bnγ + aγ | < wγ} and Γn1 := Γ1 \ Γn1 . If γ ∈ Γn1 , we

have 0 = Swγ (aγ + bnγ ) = Swγ (aγ), where the last equality fulfillsis |aγ | ≤ w4 ≤ wγ.

Thus |Swγ (aγ + bnγ )− Swγ (aγ)− bnγ | = |bnγ |, implying

∑γ∈Γn1

|bnγ |2 ≤∑γ∈Γ

|Swγ (aγ + bnγ )− Swγ (aγ)− bnγ |2n→∞−−−−→ 0.

The strong convergence is now showed for the first sets. If we are successful with showing

that Γn1 vanishes for large n, i.e.∑γ∈Γn1

|bnγ |2 → 0, the proof is completed. We consider Γn1

summarized as set {y ∈ Γn1 : |aγ + bnγ | ≥ wγ ; wγ − w4 > |aγ |} For this we estimate

|bnγ − Swγ (aγ + bnγ ) + Swγ (aγ)| = |bnγ − Swγ (aγ + bnγ )|

= |bnγ − ei arg(aγ+bnγ )(|aγ + bnγ | − wγ)|

≥∣∣∣|ei arg(aγ+bnγ )wγ | − |ei arg(aγ+bnγ )(|aγ + bnγ |) + bnγ |

∣∣∣= |wγ − |aγ ||

>w

4,

and therefore

∑γ∈Γn1

|Swγ (aγ + bnγ )− Swγ (aγ)− bnγ |2 ≥( w

4

)2

ρ.

where ρ := #Γn1 . As assumed,∑γ∈Γn1

|Swγ (aγ + bnγ )− Swγ (aγ)− bnγ |2 → 0 for n→∞.

Hence the result from Theorem 3.4 follows.

Now we will generalize the received results to the so-called multi-layer decomposition,

i.e. to multi-frame/penalty expansions and simultaneously to mixed norms penalties. An

example is discussed in Chapter 5. First we adjust the notation for this generalized case

similar to [Siedenburg1].

45


Notation: We have the signal space Hc = Hc,1 × · · · × Hc,M for M ∈ N. The respective

coefficients are of the form c = (c[1], ..., c[M ])T ∈ Hc. The syntheses operator is denoted by

Φ = ⊕Mi=1Φi : Hc → Hs and Φc =∑i Φic[i] with the frames Φi : Hc,i → Hs. Furthermore

w = (w[1], ..., w[M ]) is the sequence of strictly positive weights and the multi-indices

p = (p1, ..., pM ), q = (q1, ..., qM ) corresponding to pi, qi ∈ {1, 2} will be necessary. The

multi-layered Lagrangian will be

Lw,p,q(c) :=1

2‖y −

∑i

Φic[i]‖22 +∑i

1

qi‖c[i]‖qiw[i],pi,qi

(3.26)

Sw,p,q = (Sw[1],p1,q1 , ...,Sw[M],pM ,qM ) should be considered as soft-thresholding operator,

which works independently on each c[i] and each Sw[i],pi,qi .

Theorem 3.5 ([Kowalski]). Let M ∈ N and Φ,Ψ, c, w, p, q be the concatenated multi-layeroperators, coefficients, weights and parameters as defined above. Let each Φi be a linearbounded operator such that ‖Φ‖ < 1, w[i] a positive sequence and strictly bounded frombelow, and pi, qi ∈ {1, 2} for i = 1, ...,M . Then for any c0 ∈ Hc, the iterative sequence

cn+1 = Sw,p,q(cn + Φ∗(y − Φcn)) (3.27)

converges strongly to the minimizer of Lw,p,q.

The proof given in [Siedenburg1] is made by several steps. The idea:

First an analogue approach to Theorem 3.3 for the extended case is made to receive weak

convergence. Next the assumption is made and proven that under the assumptions of

Theorem 3.5 there is a strictly positive sequence u = (uγ)γ∈Γ depending on the weak c?

limit of (3.27), such that the ISTAs associated to

Lw,p,q(c) =1

2‖y − Φc‖22 +

∑i

1

qi‖c[i]‖qiw[i],pi,qi

and Lu,1(c) =1

2‖y − Φc‖22 + ‖c‖u,1

reach their minima at the same point c?.

Finally by combining the result for weak convergence and the assumption from above,

the strong convergence can be proven.

The algorithm for applications would be as followed.

Algorithm 1 (Multi-layered ISTA)

(c0[1], ..., c0[M ])

T ∈ Hcrepeat

for i=1:M docn+1[i] = Sw[i],pi,qi(c

n[i] + Φ∗i (y − Φcn))

end foruntil convergence

46


Remark 3.4.1. For the sake of easier implementation the non-negative threshold functionsξ = ξ(g,m),λ corresponding to their weighted mixed norms which define the generalizedthresholding operator Sλ,ξ(zg,m) = zg,m(1− ξ(z))+, where (g,m) refers to the group -memberstructure, can be represented equivalently as

(i) p = q = 1 : ξL(cg,m) = λ|cg,m| (Lasso)

(ii) p = 2, q = 1; ξGL(cg,m) = λ

(∑m |cg,m|2)

12

(Group-Lasso)

(iii) p = 1, q = 2 : ξEL(cg,m) = λ1+Mgλ

‖cg‖1|cg,m| (Elitist-Lasso)

where cg = (c′g,1, ..., c′g,M ) and {c′g,m′}m′ denotes for each group the sequence of scalars |cg,m|

in descendant order. Mg denotes some natural number depending on the magnitudes ofcoefficients in the group (cg,1, ..., cg,M ).

The implementation of the ‘StrucAudioToolbox’ used in Chapter 5 is based on thisrepresentation.

We finally have established the algorithm for any case discussed before. But since the

velocity of an algorithm is still as important as its convergence in applications a simple

modification to the ISTA has been made. The so-called fast ISTA (FISTA) given below

performs best in almost all situations.

The ISTA is constructed by setting cn = S(bn) with bn = cn−1 + Φ∗(y − Φcn−1). The

modification that provides the FISTA with its velocity is the choice of bn which should be

replaced by a linear combination of cn and cn−1. For comparison while the ISTA

converges in the image sub-linearly like O(1/n), the FISTA converges at the rate O(1/n2).

Algorithm 2 (FISTA)S = Sw,p,qc0 = b1 ∈ Hst1 = 1repeat

cn = S(bn + Φ∗(y − Φbn))tn+1 = 1

2 (1 +√

1 + 4t2n)

bn+1 = cn +(tn−1tn+1

)(cn − cn−1)

until convergence

47

48

4 Improvements and Threshold Selection

This chapter gives a further improvisation called Persistence regarding the threshold

function ξ in dependence of a coefficients’ neighbored. After this a short view into a new

method called Empirical Wiener Estimation will be presented.

4.1 Persistence and Neighborhoods

We recall what we have done so far. We defined the general problem (3.4) that made it

possible to deal with thresholded sparsity. The second step, in order to optimize our

problem, we introduced mixed norms (3.7) and replaced our penalty term Ψ by these.

The minimizers of Lw,p,q in (3.9) for p, q ∈ {1, 2} have been achieved by generalized

soft-thresholding with a threshold function ξ. This soft-thresholding operator introduced

in Definition 3.2 has the form S(z) := z(1− ξ(z))+ together with one of the threshold

functions derived in Theorem 3.1). All of this has been done solely considering

orthonormal bases. Since the analysis coefficients of frames are not unique, we had to

derive an iterative approach in order to minimize the problem. Still using the threshold

operator and therefore a threshold function, the IST-algorithm resp. its faster version

FISTA, discussed in the last section, will realize this.

We now want to go a step further to adjust the framework in order to cover a wider

spectrum of audio signals. Observing some ‘real’ signal examples we may recognize that

most parts are sparse in time but persistent in frequency resp. vice versa. By regarding

this fact a benefit should be extracted. For this task, the threshold function ξ will be

considered again. This approach is known from [KowBruno] as persistent generalizedthresholding. The new operators will be evaluated in terms of Gabor analysis.

Notation: Since we give the following terms in the general case of the mixed norms, and

for the sake of brevity, we write ξ = ξw = ξw,p,q.

Definition 4.1 (Time-Frequency Neighborhood). For the countable index set Γ thetime-frequency neighborhood weights are defined as the non-negative sequencesvγ = vγ(γ) ≥ 0 ∀γ, γ ∈ Γ which fulfill the following properties:

• ‖vγ‖2 = 1

•∑γ

vγ(γ) ≤ C <∞ ∀γ

• vγ(γ) > 0 ∀γ

49


Figure 4.1: Sketch of different neighborhoods. Rectangular or triangular windows can beused as well as different chosen centers.

Nγ := supp(vγ) = {γ ∈ Γ : vγ(γ) > 0} is called the time-frequency neighborhood of γ.For given neighborhood weights vγ , let the neighborhood-smoothing functional η : Hc → R+

0

is defined component-wise by

η(cγ) :=

∑γ∈Γ

vγ(γ)|cγ |21/2

(4.1)

For c ∈ Hc, we set η(c) := (η(cγ))γ∈Γ).

The advantage of neighborhoods in contrast to the groups of GL and EL is that

neighborhoods can be modeled flexibly, e.g. using weighting and overlap.

Equipped with this definitions, we are now able to exploit the soft-thresholding operator

via convolution.

Definition 4.2 (Persistent Soft-Thresholding Operator). For neighborhood weights vγ , thepersistent soft-thresholding operator is defined as

S∗p,q(c) := Sp,q,v(c) := c(1− ξ∗p,q(c))+ (4.2)

with threshold function ξ∗p,q := ξp,q ∗ η.

The index γ determining the neighborhood can take any positive size. In particular, it

may happen that the neighborhood comprises only one coefficients index, i.e. Nγ = {γ}.Then η(cγ) = vγ |cγ | = |cγ |, since vγ = 1. You can see this kind of neighborhood choice in

the middle of Figure 4.1. In fact the regular operators occur to be a special case of the

persistent soft-thresholding operators, since the neighborhood weighting seems to

disappear for this case and the single coefficient resp. the center stays independently.

Remark 4.1.1. It might be a good idea of choosing a neighborhood with a non-rectangularground. An example is shown in Figure 4.2. But the effects of kind of neighborhood haven’t

50

4.1 Persistence and Neighborhoods

been researched yet. We note that in the ‘StrucAudioToolbox’ solely rectangular basedneighborhoods are implemented since they are represented in form of matrices.

Figure 4.2: Non-rectangular neighborhood. Not implemented in Toolbox.

We make a list of all threshold functions used for our different thresholding operators:

ξL = ξ1,1 − Lasso (L)

ξGL = ξ2,1 − Group− Lasso (GL)

ξEL = ξ1,2 − Elitist− Lasso (EL)

ξWGL = ξ∗1,1 = ξL ∗ ηN − windowed GroupLasso (WGL)

ξPGL = ξ∗2,1 = ξGL ∗ ηN − persistent GroupLasso (PGL)

ξPEL = ξ∗1,2 = ξEL ∗ ηN − persistent Elitist− Lasso (PEL)

We remember that there was another threshold function, the Tykonov-case with p=q=2,

but because of the reasons mentioned before, is not useful in this context and therefore

not further discussed.

Remark 4.1.2. It might be confusing why we use the term ‘windowed Group Lasso’instead of ‘persistent Lasso’. As discussed in [KowBruno], the persistent Lasso (PL) is oftenconsidered as modification of GL and thus the name WGL was used. In detail if theneighborhoods Nγ are chosen as a partition of Γ, i.e. Γ = ∪kNk, and the neighborhood- andpenalty weights are constant over each neighborhood Nk, then PL/WGL coincides with theGL. It might also be appropriate to use the label PL but in sake of non-confusing inconnection to other References and for further research we will use the label WGL.

51


4.2 Empirical Wiener Estimation

Before we finally come to applicated examples and experiments we take a short look at

an alternative denoising operator based on neighborhood smoothed shrinkage acting

similar to Wiener filters. In this concept it will be possible to find resp. compute a suited

selection of the threshold λ. A consequence is that in the iterative process of Gabor

analysis, the optimal λ can be determined in each step, but this will not be a part of this

work.

Some definitions from [Siedenburg2] are presented in this section without giving further

computations to get an impression. For more details it is recommended to read this

paper as well as [Kowalski].

Since the results are easier to discuss in terms of orthonormal bases we work here with

the MDCT, which is a special case of orthogonal time frequency transform given with

atoms

ϕl,k(n) = gl(n)

√2

Lcos

[π

L(k +

1

2)(n+ nl)

]where nl = (L+ 1)/2− lL and k = 0, ..., L− 1 and window

gl(n) = sin[π

2L(n− lL+

L

2)].

The hop size L is the half of the window length since the MDCT is critically sampled,

which means that though it is 50% overlapped, the signal data after transformation and

back-transformation is different from its original, but the number of coefficients is the

same. Note that the MDCT is an orthonormal on the global signal and not on local

frames, i.e. Φ∗Φ = I, where I denotes the identity matrix. An important consequence is

that white noise is again transformed into white noise.

The aim is to minimize the estimation risk

R(f, f) = E[‖f − f‖2] = E[

p∑n=1

|f(n)− f(n)|2] (4.3)

by choosing an appropriate diagonal operator D = diag(d1, ..., dp) with non-negative dγ ≥ 0.

f = ΦDy∗ is the diagonal estimate of f , i.e. a reconstruction of dγ-weighted analysis

coefficients. After some computations we obtain the estimator

ξγ := σ−2y∗2

γ − 1 (4.4)

of the a priori SNR (sound to noise ratio; explained in Chapter 5) ξγ, i.e. E[ξγ ] = ξγ, and

therefore the shrinkage weights dγ = (1− σ2

|y∗γ |2)+ with noise level σ, where (x)+ = max(x, 0)

and 10 =∞. Since they are similar to Wiener filters, the name empirical Wiener

attenuation (EW) has got picked. So the empirical Wiener diagonal estimation is given

component wise by

SEW (zγ) := zγdγ .

52

4.2 Empirical Wiener Estimation

The question is now how the empirical Wiener Operator is related to the operators

discussed in Chapter 3. Generalizing the soft-thresholding operator yields

Sαλ(y∗) := y∗(1−[

λ

‖y∗‖?

]α)+. (4.5)

Remembering the representation of the threshold functions in Remark 3.4.1 we see that

(4.5) corresponds to the Lasso for α = 1 and ‖ · ‖?. Furthermore it seems to be natural to

choose for λ the noise level sigma. Now for setting α = 2 and λ = σ we receive SEW = S2σ.

Remark 4.2.1. There are more ways of choosing an appropriate threshold λ.(i) The first and most natural (non adaptive) choice seems to be the noise level, i.e. λ = σ.

The disadvantage is that for a zero signal, this would imply to retain around one thirdof the overall noise, which is not a desirable result.

(ii) Another possibility would be the universal threshold λ = σ√

2ln(p). Slowlyincreasing with the signal length it often produces estimates which are to sparse.

(iii) The most common but also most complicated choice is the Stein unbiased riskestimate (SURE). This one is a tool for adapting the threshold to the actual data inorder to minimize the estimation risk. But for very sparse signals it is useful to toreplace the SURE by the universal threshold.

In [Siedenburg2] is also discussed a way of automatically choosing the threshold.

This approach works for the Lasso, WGL, and EL. We learned in the previous section

about operators that make usage of their neighborhood, so we still have to observe a

combination of neighborhood persistent and empirical Wiener shrinkage. The received

operator is called persistent empirical Wiener (PEW).Analogous to 4.4 estimate for persistent SNR is

ξ∗γ := σ−2∑γ′∈Γ

wγγ′ |y∗γ′ |2 − 1 (4.6)

with the sequence wγ = (wγγ′)γ′∈Γ of non-negative and normalized neighborhood weights

for each γ fulfilling∑γ′∈Γ wγγ′ = 1 and wγγ > 0. In other words the requirements from

Definition 4.1 are fulfilled. Then the PEW is coordinate-wise given by

S(y∗γ) := y∗γ

(1− σ2∑

γ′∈Γ wγγ′ |y∗γ′ |2

)∗(4.7)

Note that the new operator differs from the WGL only in the exponent α = 1 to α = 2 in

4.5. Furthermore to obtain SPEW = S2σ we only have to set

‖y∗‖? =

√∑γ′∈Γ

wγγ′ |y∗γ′ |2.

Again analogous to the relations discussed in the previous section, the PEW coincides

with the EW for neighborhood weights with a single-coefficient support supp(wγ) = γ.

53


Finally we are able to compare the operators EW and PEW by considering the squared

error risk (4.3) of ξ∗γ and ξγ. It yields for any estimator ξ

R(ξγ , f) = E[(ξγ − f)2] = V ar(ξ) +Bias(ξ)2

with

V ar(ξ) = 2∑γ′∈Γ

w2γγ′ + 4σ−2

∑γ′∈Γ

w2γγ′c

∗γ′

2

Bias(ξ)2 = σ−4

∑γ′∈Γ

wγγ′c∗γ′

2 − c∗γ2

2

for the persistent case and otherwise

R(ξγ , f) = 2 + 4σ−2c∗γ2.

Dependent on max(R(ξγ , ξγ), R(ξ∗γ , ξγ)

)the corresponding operator to the bigger risk is

preferred for the given signal.

54

5 Applications and Experiments

We will now give some examples for the discussed operations as well as some

observations for performance differences. For this task we use the StrucAudioToolboximplemented by Kai Siedenburg. First of all, we need to know, how we can observe the

results of the different methods of structured sparsity. The best tool for this seems to be

the so-called Signal-to-Noise ratio (SNR)

SNR(f, f) = 20log10

(‖f‖‖f − f‖

)

where f denotes the clean signal and f the noisy signal. The SNR is measured in decibel

(dB). It compares the amounts of the signal to the amount of noise. A component with a

SNR of 100dB means that the level of the signal is 100 dB higher than the level of the

noise, so it is clear that this one would be a better specification than a component with

SNR of 90dB for example.

Furthermore the relative error is used to attain a break condition for the iterative

process. It is defined as err = ‖f−f‖‖f‖ . We could say the algorithm aborts, when the

difference between the signal and its noisy version is sufficient small.

For using the certain operator types, e.g. Lasso, GL, PEW, etc., we first need to set the

right parameters to get the desired one. The description of the toolbox in included in the

download. We give an example for obtaining the Lasso.

Example 5.0.1. A signal has to be imported, which is here the recorded slowly playedguitar chord E-Dur, where neck- and bridge-pickups are used simultaneously. It is commonfor research purposes to use an artificially created noise over (good) signals. In MATLABTM

this can be realized by adding the term l · randn(length(‘signal′)), 1), where l is the noiselevel, for this task we chose 0.01. Note that we work with mono signals, i.e. with vectors.After getting the default values with [settings] = thresholding(′settings′);, the parameterscorresponding to the Lasso should be adjusted. Since we don’t use an EW, we set theexponent α = 1, and no neighborhood is taken into account, so we set the neighborhoodmatrix N = 1 which is of course center too. This already guarantees us the usage of theLasso. For comparison, the default values of the StrucAudioToolbox would be α = 2 and therow vector N = (1 1 1 1 1 1 1∗ 1 1) where 1∗ denotes the center, i.e. a PEW with asymmetricneighborhood in the time-label. The last change concerns the shrinkage level which hasbeen changed to λ = 0.001 and only one iteration is used.The rest is default valued, e.g. a Gabor transform with tight window is used withfrequency channel M = 1024 shift parameter s = 256. We can plot the coefficients of the

55


clean, noisy and denoised signal as in Figure 5.1.

Figure 5.1: Clean, noisy and denoised signal coefficients.

With this presetting the reconstructed signal sounds a little muffled, i.e. the highfrequencies has been erased. Better results can be obtained by the usage of otheroperators, neighborhoods and shrinkage level respective the number of iteration.

Remark 5.0.2. One may have mentioned that the default setting in the Toolboxrecommends a tight Hann window which is defined by

gH(x) :=1 + cos(2πx)

21[− 1

2 ,12 ](x)

with indicator function 1 of the translated unit interval. From [[Holighaus], Lemma 3.2.6]can be inferred that the Gabor system G(gH ,

14 , 1) forms a tight frame with bounds

A = 4‖gH‖2. This one is for this application preferable since it is zero outside of the intervalwithout a sharp cutoff in contrast the Gaussian and provides ”more continuity”. It has notyet been researched, what are the consequences and what happens by choosing differentwindows. More to this problem can be found in [Holighaus] resp. [Siedenburg1].

56

It is reasonable that many errors can occur for bad selections.

For example it is necessary to do the iterative process with the one time transformed

coefficients. If someone tries to perform a forward and backward transform in every step

with a very small hop size that is proportional not suited for its window length w the

shifted windows overlap highly and generate more and more coefficients. The result for

Lasso is shown in Figure 5.2.

Figure 5.2: Transformation in every step with bad shift causes increasing number of co-efficients and therefore noise.

So if we avoid this mistake transform before the iteration and back afterwards, the

concept of overlapping windows is still compatible with this concept. For fixed window

length, e.g. w = 1024, and different shifts a, again performed with Lasso we see that the

smaller the shift the bigger the relative error, shown in Figure 5.3. The iteration steps

(Iter), elapsed time and the highest relative error in the iteration is listed in Table 5.4.

This assumption seems to be clear, when recalling the redundancy of frames. The

smaller the hop size, the more will the same areas be taken into account with included

noise. In contrast, for big valued shifts the resolution becomes coarse which implies that

less coefficients will be there to be thresholded.

57


Figure 5.3: Different shift values for fixed window length w = 1024.

Shift 32 64 128 256 512Iter 849 504 363 254 211Time 3315.461599 993.200849 357.043096 138.336217 73.177419ErrorMax 0.042640 0.035324 0.029582 0.024477 0.016181

Figure 5.4: Lasso with alternating shift.

We already discussed in Chapter 4 that the choice of the threshold level λ is very

important. It can also cause unpleasant reconstructions by choosing a too big resp. too

small λ. In Example 5.0.1 we made the experience that λ was chosen a little bit too big

which ended in a muffled reconstruction, and a even bigger choice would increase this

effect. Observing an iterative process there also seems to be the little side effect that a

bigger λ causes in the first few steps more noise but of cause it has the advantage of

being faster. In Figure 5.5 the number of iterations is plotted against the relative error for

four different shrinkage levels observation to see the difference. The number of necessary

iterations (IT) is listed in table 5.6.

58

Figure 5.5: Relative error in each iteration step for different shrinkage levels.

Shrink 0.001 0.005 0.01 0.05Iter 437 327 255 138

Figure 5.6: Table of necessary iteration steps until algorithm aborts.

For λ = 0.005 resp. λ = 0.001 we hear and see that the reconstructed signal is still very

noisy although the SNR is close to zero. We take a look at the corresponding SNR curves

in Figure 5.7.

59


Figure 5.7: SNR in each iteration step for different shrinkage levels.

The reconstruction seems to perform better after some steps. This assumption will be left

undiscussed since many more experiments should be observed on different signals.

In fact this was by the example of Lasso just a small prospect on the way how the

threshold selection can be researched.

60

We will give two more examples what can be done with suited settings.

Example 5.0.2 (Party). This example show a very useful application of iterativethresholding. It is not just possible to filter a signal from a consistent noise, it even filters‘real life noise’. In this example a recording of a women saying reciting a famous‘Giotto’-commercial is used which has been recorded at a party with much noise from otherpeople in the background. We use the GL in the frequency label, that means a groups ofcoefficients in certain frequencies are taken into account. This seems to be a good ideasince the the womens voice has an other (higher) frequency than the background andprotrudes therefore. In addition the shrinkage level has been set to λ = 0.003 and 5iterations are used while the neighborhood is set to default as mentioned in Example 5.0.1.In fact we have a PGL. In Figure 5.8 is the original signal and below its denoised versionrepresented, which sounds pretty well but one can still hear some artifacts which caneasily be removed by playing with the setting.

Figure 5.8: Reconstructed voice with loud background.

If we work with GL it is important to know which label should be used. When setting nowthe GL in the time label it will cause that certain time groups are removed, that means thetime when the women pauses speaking is the ‘weakest’ and the reconstructed signal willswitch between voice with loud background and total silence, which is not the intendedresult. The coefficients of such a reconstruction is shown in Figure 5.9. We will see that

61


using the time label in this context gives us the perfect tool for transient reconstructionsdiscussed in the following Example 5.0.3.

Figure 5.9: Reconstructed voice with loud background; group label in time.

Example 5.0.3 (Multilayer Decomposition). Another very useful application of multilayerdecomposition is a separation of tonal, transient and noise parts. Multilayer decompositionas we discussed the in Chapter 3 allows us to combine different operators which makesthis task possible. We make usage of the uncertainty principle, i.e. a window for thetransformation can be chosen in order to yield more exact time resp. frequency information.Therefore it makes sense to use a wide window for the tonal parts while a small window isused to receive the transient parts. In this example a tight Hann-window with lengthw = 4096 and hop size a = 1024 are the parameters for the tonal analysis and for thetransient w = 128 and a = 32. In [Siedenburg1] some experiments have been done whichoperator performs best for each case. It turned out that for tonal aims the WGL withneighborhood expansion in time is preferable and for transients a simple GL with time asgroup index. Here we will use a PGL which respects additionally one coefficient left to itscenter to obtain some additional persistence in frequency. Summarizing this we can writein the notation of (3.26) for the corresponding generated Gabor frames Φ1 and Φ2 the

62

Lagrangian1

2‖y − Φ1c[1] + Φ2c[2]‖22 + ‖c[1]‖w[1],1 + ‖c[2]‖w[2],2,1. (5.1)

(5.1) is then minimized by c? = (c?[1], c?[2]) where Φ1c

?[1] is the tonal and Φ2c

?[2] the transient

signal layer.The constructed algorithm given in the Appendix is realized by setting the tonal parametersand perform the thresholding with a suited shrinkage level for the signal. Next the separatetransient parameters parameters are set and applied by thresholding on the signal minusthe tonal reconstruction. This yields the transient reconstruction. By adding the two signalreconstructions the original signal is obtained without noise. The different coefficientrepresentations are shown in Figure 5.10.

Figure 5.10: Multi-layered decomposition from noisy ‘musical clock’.

63

64

6 Conclusion

This thesis was designed to give the reader a fast way understanding of structured

sparsity started by discussing the most important tools for signal processing. In the first

chapter the necessary mathematical tools for knowing what actually defines a signal and

what are the main results for processing them where given, i.e. functional analytic

basics, time- and frequency-shifts, the Fourier transform, Convolution as well as the

short-time Fourier transform.

The second chapter introduced the idea behind special field of Gabor analysis, the fact

that bases are generalize by frames, where the latter one provides redundancy. Therefore

Gabor frames serve as a good playground for sparsity.

The introduction to the main topic of structured sparsity was given in chapter three. The

minimization of the so-called Lagrangian had a high priority. This linear inverse

optimization problem contained two aspects. The first was the minimization of the

discrepancy for receiving a good synthesis. The second was the reduction of the non-zero

coefficients with an additional threshold function, that determined how this minimization

is taken into account. For the latter problem the `1 norm was used which gave us the

Lasso (least absolute shrinkage operator). Replacing this Norm by weighted mixed norms

provided the further operators Elitist-Lasso and Group-Lasso. By introducing the ISTA

(iterative soft-thresholding algorithm) and its improved version FISTA, thresholding has

been suited for frames. The properties of the FISTA are barely researched yet, so this

would be a good point for further research. In the fourth chapter persistent operators

have been discussed. They gave a way of taking coefficients in the neighborhood into

account. Furthermore with the empirical Wiener estimate an alternative to the previous

operators was offered.

The last chapter made clear how big the area of numerical research still is for structured

sparsity. Three applications have been mentioned: denoising, declipping, signal

decomposition. Many other applications can be found for using this theory.

Furthermore considering the Gabor transform, different window can be chosen with

different lengths, shifts and frequency channels. It is still unknown in detail how these

things affects to sparse recovery. For the selection of the shrinkage level some proposals

were mentioned in chapter 4. They seem to work fine, but there is still no rule for which

kind of signal resp. application the different choices serve with good results. In fact the

kind of signal is an important point.

Another point: it is know that a Gabor transform with different windows, the so-called

non-stationary Gabor transform (NSGT) gives very good results for signal representations.

It may be useful to extend the terms of structured sparsity to non-stationary Gabor

frames. The problem is that no regular time-frequency lattice is generated, so it will be

65

6 Conclusion

necessary to develop novel strategies.

It is obvious that there is enough place for researches on the whole field, but since

structured sparsity in the sense of harmonic analysis is not very old, researchers can get

optimistic to work.

66

Bibliography

[Bannert] Severin Bannert: Banach-Gelfand Triples and Applications in Time-Frequency

Analysis, [mastersthesis] University of Vienna, 2010.

[Candles] E.J. Candes: Compressive sampling, in Proc. Int. Congr. Math., vol. 17, no. 4,

Spain, 2006.

[CorFeiLu] Elena Cordero, Hans G. Feichtinger, Franz Luef Banach Gelfand triples for

Gabor analysis, [incollection] in ”Pseudo-differential Operators ”, Springer, Lecture

Notes in Mathematics, Vol.1949 p.1–33, Berlin, 2008.

[Daubechies] Ingrid Daubechies, Michel Defrise, Christine de Mol: An iterative

thresholding algorithm for linear inverse problems with a sparsity constraint,

Communication on Pure and Applied Mathematics, 2004.

[Donoho] David Donoho: For most large underdetermined systems of linear equations

the minimal `1-norm solution is also the sparsest solution, Communication on Pure

and Applied Mathematics, 59(6):797-829, 2006.

[Dopfner] Kirian Dopfner: Quality of Gabor Multipliers for Operator Approximation,

Diplomarbeit, Universitat Wien, 2013.

[Dorfler1] Monika Dorfler: Time-frequency analysis for music signals A mathematical

approach, Journal of New Music Research, Vol.30 No.1, p.3-12, 2001.

[Dorfler2] Monika Dorfler: What Time-Frequency Analysis Can Do to Music Signals,

”Matematica e Cultura 2003”, Springer Italia, 2003.

[Dorfler3] Monika Dorfler: Gabor Analysis for a Class of Signals called Music,

Dissertation, Universitat Wien, 2002.

[DorfMatu] Monika Dorfler, Ewa Matusiak: Nonstationary Gabor Frames - Existence and

Construction, preprint, submitted, http://arxiv.org/abs/1112.5262, 2012.

[FeichGroch] Hans Georg Feichtinger, Karlheinz Grochenig: Gabor frames and

time-frequency analysis of distributions, J. Funct. Anal. 146(2), 464-495, 1997.

[Feichtinger1] Hans Georg Feichtinger: Banach Gelfand triples for applications in

physics and engineering, [inproceedings] Amer. Inst. Phys., AIP Conf. Proc., AIP

Conf. Proc., Vol.1146 No.1, p.189-228, 2009.

[Feichtinger2] Hans Georg Feichtinger: A Functional Analytic Approach to Applied

Analysis, Script NuHAG, Autumn 2012.

67

Bibliography

[FeichLuef] Hans G. Feichtinger, Franz Luef: Gabor analysis and time-frequency

methods,Article, Encyclopedia of Applied and Computational Mathematics, 2012.

[GareyJohnson] Michael R. Garey, David S. Johnson: Computers and Intractability : a

guide to the theory of NP-Completeness., Freeman, New York, 1985.

[Grochenig] Karlheinz Grochenig: Foundations of Time-Frequency Analysis, Applied and

Nummerical Harmonic Analysis, Birkhauser Boston, 2001.

[Haltmeier] Markus Haltmeier: Bild und Signalverarbeitung, Skriptum, Universitat Wien,

2011.

[HeilWalnut] C. Heil, D. Walnut Continuous and discrete wavelet transforms, SIAM

Review, 31(4), 628-666, 1989.

[Heuser] Harro Heuser: Funktionalanalysis: Theorie und Anwendung, Vieweg+Teubner

Verlag, 2006.

[Holighaus] Nicki Holighaus: Zeit-frequenz analyse mit methoden der gabor analysis.,

Master’s thesis, Universitat Giessen, 2010.

[Kaiblinger] Norbert Kaiblinger: Approximation of the Fourier Transform and the Dual

Gabor Window, Journal of Fourier Analysis and Applications, Vol.11 No.1, p.25-42,

2005.

[Kowalski] Matthieu Kowalski: Sparse regression using mixed norms, Applied and

Computational Harmonic Analysis, vol. 27, no. 3, pp. 303-324, 2009.

[KowBruno] Matthieu Kowalski, Bruno Torresani:Sparsity and persistence: mixed norms

provide simple signal models with dependent coefficients, Signal, Image and Video

Processing, vol. 3, no. 3, pp. 251-264, 2008.

[KowSidDorf] Matthieu Kowalski , Kai Siedenburg and Monika Dorfler: Social Sparsity!

Neighborhood Systems Enrich Structured Shrinkage Operators, IEEE Trans. Signal

Process, 2013.

[Krieger] J. Krieger: Stoffzusammenfassung zur Bild- und Signalverarbeitung,

Universitat Heidelberg, 2006.

[Mallat] Stephane Mallat: A Wavelet Tour of Signal Processing: The Sparse Way, 2008.

[Missbauer] Andreas Missbauer: Gabor Frames and the Fractional Fourier Transform,

Mastersthesis, University of Vienna, 2012.

[Opial] Zdzislaw Opial: Weak convergence of the sequence of successive approximations

for nonexpansive mappings, Bulletin of the American Mathematical Society,

73:591-597, 1967.

[Sardy] Sylvain Sardy, Andrew G. Bruce, Paul Tseng: Block coordinate relaxation

methods for nonparametric wavelet denoising, Journal of Computational and

Graphical Statistics, 2000.

68

Bibliography

[Siedenburg1] Kai Siedenburg: Structured Sparsity in Time-Frequency Analysis,

Diplomarbeit, Humboldt-Universitat zu Berlin, 2011.

[Siedenburg2] Kai Siedenburg: Persistent Empirical Wiener Estimation with adaptive

threshold selection for audio denoising, Proceedings of the 9th Sound and Music

Computing Conference, Kopenhagen, July 11-14th 2012.

[SiedenburgDorfler] Kai Siedenburg, Monika Dorfler: Structured Sparsity for Audio

Signals, Proceedings of the 14th International Conference on Digital Audio Effects,

Paris, 2011.

[Tibshirani] Robert Tibshirani: Regression shrinkage and selection via the lasso, Journal

of the Royal Statistical Society. Series B (Statistical Methodology), 1996.

[VelHoliDorfGrill] Gino Angelo Velasco, Nicki Holighaus, Monika Dorfler, Thomas Grill

Constructing an invertible constant-Q transform with non-stationary Gabor frames,

Article, Proceedings of DAFX11, 2011.

[WeinWakin] Alejandro J. Weinstein, Michael B. Wakin Recovering a Clipped Signal in

Sparseland, to appear in Sampling Theory in Signal and Image Processing, 2011.

[YuanLin] Ming Yuan, Yi Lin Model selection and estimation in regression with grouped

variables, Journal of the Royal Statistical Society. Series B (Statistical Methodology),

68:49-67, 2006.

69

70

Appendix

MATLAB Files

This code gives the possibility of creating a melody in MATLABTM by the example of the

‘Star Wars’ theme. The Spectrogram is realized by a routine of the toolbox LTFAT(http://ltfat.sourceforge.net/).

1 %% STAR WARS

2

3 function starwars

4

5 H=linspace(1,8000,8000); %half note

6 V=linspace(1,4000,4000); %quater note

7 T=linspace(1,1333,1333); %triplet

8

9 w1=2*sin(2*pi*233.081*H/8000); %notes for the melody

10 w2=2*sin(2*pi*349.228*H/8000);

11 w3=2*sin(2*pi*311.1*T/8000);

12 w4=2*sin(2*pi*293.7*T/8000);

13 w5=2*sin(2*pi*261.6*T/8000);

14 w6=2*sin(2*pi*466.2*H/8000);

15 w7=2*sin(2*pi*349.228*V/8000);

16 w8=2*sin(2*pi*311.1*T/8000);

17 w9=2*sin(2*pi*293.7*T/8000);

18 w10=2*sin(2*pi*261.6*T/8000);

19 w11=2*sin(2*pi*466.2*H/8000);

20 w12=2*sin(2*pi*349.228*V/8000);

21 w13=2*sin(2*pi*311.1*T/8000);

22 w14=2*sin(2*pi*293.7*T/8000);

23 w15=2*sin(2*pi*311.1*T/8000);

24 w16=2*sin(2*pi*261.6*H/8000);

25

26 song=[w1,w2,w3,w4,w5,w6,w7,w8,w9,w10,w11,w12,w13,w14,w15,w16]; %composed

melody

27 %wavwrite(song,’starwars’); %creates a wave file if wished

71

Appendix

28 sound(song) %plays melody

29

30 figure

31 fs=8000;

32 sgram(song,fs,90,’wlen’,round(20/200*fs)); %plots spectogram with

LTFAT

33 axis([0 7.495 0 1000]);

34 shg

35

36 %% further observations

37 figure

38 test=[w1,w2]; %discontinuity problem,

39 plot(test) %can be solved by shifting or smoothing

40 axis([7900 8100 -2.5 2.5]); xlabel(’Time’); ylabel(’Amplitude’)

41

42 figure

43 subplot(211) %comparison of frequencies of 1. and 6. note

44 plot(w1)


46 subplot(212)

47 plot(w6)


72

All of the following Codes are based on the database of the toolboxes LTFAT and

StrucAudioToolbox (http://homepage.univie.ac.at/monika.doerfler/StrucAudio.html).

Most of the MATLABTM files perform the concerning operation(s) with made presettings.

1 %% LASSO

2

3 % get the noisy signal:

4 [sig, fs] = wavread(’CleanNB.wav’);

5 sig_noisy = sig + 0.01*randn(length(sig),1);

6

7

8 % settings for Lasso

9 [settings]=thresholding(’settings’);

10 settings.shrink.expo = 1;

11 settings.shrink.neigh = 1;

12 settings.shrink.center = [1 1];

13

14 % changeable coefficients

15 settings.shrink.lambda = 0.01;

16 settings.trans.M = 1024;

17 settings.trans.shift = 256;

18

19

20 settings.iter.maxit = 1; % set number of iterations

21 settings.iter.disp = 1; % display relative error

22

23 % denoising

24 G = trafo(sig, settings.trans); % clean analysis coefficients

25 Gn = trafo(sig_noisy, settings.trans); % noisy analysis coefficients

26

27 [sig_rec, Gs] = thresholding(sig_noisy, settings);

28

29 subplot(131);

30 imagesc(20*log10(abs(G))); axis off; title(’Clean’);

31 subplot(132);

32 imagesc(20*log10(abs(Gn))); axis off; title(’Noisy’);

33 subplot(133);

34 imagesc(20*log10(abs(Gs))); axis off; title(’Denoised’);

35 shg

73

Appendix

1 %% PARTY NOISE

2


4 [sig, fs] = wavread(’giotto.wav’);

5

6 [settings]=thresholding(’settings’);

7 settings.shrink.expo = 1;

8 settings.shrink.lambda = 0.003;

9 settings.shrink.type=’gl’;

10 settings.shrink.glabel=’frequency’; %’time’ causes bad results

11

12

13 % disp

14 settings.iter.maxit = 5; %HIER DIE ANZAHL DER ITERATIONEN ANGEBEN!

15 settings.iter.disp = 1; %UM DEN FEHLER AUSZUGEBEN!

16 %settings.shrink

17

18 % thresholding algorithm

19 G = trafo(sig, settings.trans); % original analysis coefficient

20

21 [sig_rec, Gs] = thresholding(sig, settings);

22

23 subplot(211);

24 imagesc(20*log10(abs(G))); axis off; title(’Original’);

25 subplot(212);

26 imagesc(20*log10(abs(Gs))); axis off; title(’Denoised’);

27 shg

74

1 %MULTILAYER - DECOMPOSITION

2


4 [sig, fs] = wavread(’spieluhr.wav’);

5 sig = sig + 0.01*randn(length(sig),1);

6

7

8 % settings for tonal layer:

9 [set_ton] = thresholding(’settings’, ’transtype’, ’gab’, ’M’, 4096);

10 set_ton.shrink.neigh = ones(1,12); % persistence in time

11 set_ton.shrink.center = [1, 10]; % center point non-symmetric

12 set_ton.shrink.lambda = 0.01; % adjust tonal threshold

13

14 % settings for transient layer:

15 [set_tra] = thresholding(’settings’, ’transtype’, ’gab’, ’M’, 128);

16 set_tra.shrink.type = ’gl’; % group EW

17 set_tra.shrink.glabel = ’time’; % group labels in time

18 set_tra.shrink.neigh = ones(1,2); % some additional persistence in

frequency

19 set_tra.shrink.center = [1,2]; % non-symmetry

20 set_tra.shrink.lambda = 0.0044; %0.0048; % adjust transient threshold

21 % perform multilayer decomposition:

22 [sig_ton, Gs_ton] = thresholding(sig, set_ton); % estimate tonal layer

23 [sig_tra, Gs_tra] = thresholding(sig - sig_ton, set_tra);

24 sig_mult=sig_ton+sig_tra;

25 % estimate transient layer from residual

26 % plots:

27 [set_ana] = thresholding(’settings’); % get ’neutral’ analysis settings

28 G = trafo(sig, set_ana.trans); % get Gabor analysis coeffs with trafo.m

29 G_mult = trafo(sig_mult, set_ana.trans); % get Gabor analysis

30 % coeffs from both layers

31 subplot(2,2,1); imagesc(20*log10(abs(G))); axis off; title(’Original’);

32 subplot(2,2,2); imagesc(20*log10(abs(Gs_ton))); axis off; title(’Tonal’)

;

33 subplot(2,2,3); imagesc(20*log10(abs(Gs_tra))); axis off; title(’

Transient’);

34 subplot(2,2,4); imagesc(20*log10(abs(G_mult))); axis off; title(’

Multilayer’);

35 shg

75

76

Curriculum Vitae

Personal information

Name: Dominik Torsten Fuchs

Nationality: Austria

Education

1991-1995 Primary School Jagdgasse, Vienna

1997-2006 High School; Gymnasium Laaerberg, Vienna

Graduation June 2006

2012-2013 Programmer at Phonicscore GMBH

2007-2013 Study of mathematics at the University of Vienna

77

Documents

DIPLOMARBEIT - univie.ac.at · DIPLOMARBEIT Titel der Diplomarbeit Gabor Analysis of Structured Sparsity and some Applications Verfasser Dominik Fuchs angestrebter akademischer Grad