Independent Component Analysisvia Distance Covariance
David S. MattesonDepartment of Statistical Science
Cornell University
[email protected]/~matteson
Joint work with: Ruey S. Tsay, Booth School of Business, University of Chicago
Sponsorship: National Science Foundation
2014 June 3
David S. Matteson ([email protected]) dCovICA 2014 June 3 1 / 38
Introduction
Introduction
I Notation
I Measuring Mutual Multivariate IndependenceI A Pairwise Measure: Distance Covariance
I Extending the Pairwise Measure
I An Empirical Measure: U-statistics
I Independent Component AnalysisI Estimation of Independent Components: dCovICA
I Strong Consistency
I Testing for the Existence of Independent Components
I Examples
David S. Matteson ([email protected]) dCovICA 2014 June 3 2 / 38
Introduction Notation
Notation
I Let 〈t, s〉 denote the scalar product of vectors t, s ∈ Rd
I For complex-valued functions φ(·) :I The complex conjugate of φ is denoted by φ
I The absolute square |φ|2 is defined as φφ
I The Euclidean norm of x ∈ Rd is denoted simply as |x |
I Primed variables X ′ and X ′′ are each an independent copy of XI X ,X ′ and X ′′ are independent and identically distributed (iid)
David S. Matteson ([email protected]) dCovICA 2014 June 3 3 / 38
Mutual Multivariate Independence Characteristic Functions
Characteristic Functions and Mutual IndependenceI A random vector X ∈ Rd with distribution Fx has a unique
characteristic function
φx(t) = E (e i〈t,X 〉) for t ∈ Rd
I The characteristic function always existsI e i〈t,X〉 is bounded and |φx(t)| ≤ 1 for all t
I Let X1, . . . ,Xc denote a given partition of the elements of X into ccomponents, for some 2 ≤ c ≤ d and let t = (t1, . . . , tc)
I Then X1, . . . ,Xc are mutually independent if and only if for all t:
φx(t) = E(e i
Pcj=1〈tj ,Xj 〉) = E
( c∏j=1
e i〈tj ,Xj 〉)
=c∏
j=1
E(e i〈tj ,Xj 〉
)= φx1(t1) · · ·φxc (tc)
David S. Matteson ([email protected]) dCovICA 2014 June 3 4 / 38
Mutual Multivariate Independence Characteristic Functions
Characteristic Functions and Mutual IndependenceI A random vector X ∈ Rd with distribution Fx has a unique
characteristic function
φx(t) = E (e i〈t,X 〉) for t ∈ Rd
I The characteristic function always existsI e i〈t,X〉 is bounded and |φx(t)| ≤ 1 for all t
I Let X1, . . . ,Xc denote a given partition of the elements of X into ccomponents, for some 2 ≤ c ≤ d and let t = (t1, . . . , tc)
I Then X1, . . . ,Xc are mutually independent if and only if for all t:
φx(t) = E(e i
Pcj=1〈tj ,Xj 〉) = E
( c∏j=1
e i〈tj ,Xj 〉)
=c∏
j=1
E(e i〈tj ,Xj 〉
)= φx1(t1) · · ·φxc (tc)
David S. Matteson ([email protected]) dCovICA 2014 June 3 4 / 38
Mutual Multivariate Independence Characteristic Functions
Characteristic Functions and Mutual IndependenceI A random vector X ∈ Rd with distribution Fx has a unique
characteristic function
φx(t) = E (e i〈t,X 〉) for t ∈ Rd
I The characteristic function always existsI e i〈t,X〉 is bounded and |φx(t)| ≤ 1 for all t
I Let X1, . . . ,Xc denote a given partition of the elements of X into ccomponents, for some 2 ≤ c ≤ d and let t = (t1, . . . , tc)
I Then X1, . . . ,Xc are mutually independent if and only if for all t:
φx(t) = E(e i
Pcj=1〈tj ,Xj 〉) = E
( c∏j=1
e i〈tj ,Xj 〉)
=c∏
j=1
E(e i〈tj ,Xj 〉
)= φx1(t1) · · ·φxc (tc)
David S. Matteson ([email protected]) dCovICA 2014 June 3 4 / 38
Mutual Multivariate Independence Pairwise Independence
A Pairwise Measure of Multivariate IndependenceFor random vectors X ∈ Rdx and Y ∈ Rdy
I φxy denotes the joint characteristic function of (X ,Y )
I φx and φy denote the marginal characteristic functions of X and Y
A pairwise measure of multivariate independence may be defined as
V2(X ,Y ; w) =
∫Rdx +dy
|φxy (t, s)− φx(t)φy (s)|2 w(t, s) dt ds,
in which w(t, s) denotes an arbitrary positive weight function.
We consider
w∗(t, s) =
(π(1+dx )/2
Γ((1 + dx)/2)
π(1+dy )/2
Γ((1 + dy )/2)|t|1+dx |s|1+dy
)−1
,
in which Γ(·) is the complete gamma function.
David S. Matteson ([email protected]) dCovICA 2014 June 3 5 / 38
Mutual Multivariate Independence Pairwise Independence
A Pairwise Measure of Multivariate IndependenceFor random vectors X ∈ Rdx and Y ∈ Rdy
I φxy denotes the joint characteristic function of (X ,Y )
I φx and φy denote the marginal characteristic functions of X and Y
A pairwise measure of multivariate independence may be defined as
V2(X ,Y ; w) =
∫Rdx +dy
|φxy (t, s)− φx(t)φy (s)|2 w(t, s) dt ds,
in which w(t, s) denotes an arbitrary positive weight function.
We consider
w∗(t, s) =
(π(1+dx )/2
Γ((1 + dx)/2)
π(1+dy )/2
Γ((1 + dy )/2)|t|1+dx |s|1+dy
)−1
,
in which Γ(·) is the complete gamma function.David S. Matteson ([email protected]) dCovICA 2014 June 3 5 / 38
Mutual Multivariate Independence Distance Covariance
Distance CovarianceSuppose (X ,Y ), (X ′,Y ′), (X ′′,Y ′′)
iid∼ Fx ,y
Let
I(X ,Y ) = E |X − X ′||Y − Y ′| − E |X − X ′||Y − Y ′′|−E |X − X ′′||Y − Y ′|+ E |X − X ′|E |Y − Y ′|
Theorem
For any pair of random vectors, X ∈ Rdx and Y ∈ Rdy
I If E (|X |+ |Y |) <∞I Then I(X ,Y ) = V2(X ,Y ; w∗)
I And I(X ,Y ) ∈ [0,∞), with I(X ,Y ) = 0 if and only if X ⊥⊥ Y
See Szekely and Rizzo (2007) and Matteson (2014).
David S. Matteson ([email protected]) dCovICA 2014 June 3 6 / 38
Mutual Multivariate Independence Distance Covariance
Distance CovarianceSuppose (X ,Y ), (X ′,Y ′), (X ′′,Y ′′)
iid∼ Fx ,y
Let
I(X ,Y ) = E |X − X ′||Y − Y ′| − E |X − X ′||Y − Y ′′|−E |X − X ′′||Y − Y ′|+ E |X − X ′|E |Y − Y ′|
Theorem
For any pair of random vectors, X ∈ Rdx and Y ∈ Rdy
I If E (|X |+ |Y |) <∞I Then I(X ,Y ) = V2(X ,Y ; w∗)
I And I(X ,Y ) ∈ [0,∞), with I(X ,Y ) = 0 if and only if X ⊥⊥ Y
See Szekely and Rizzo (2007) and Matteson (2014).David S. Matteson ([email protected]) dCovICA 2014 June 3 6 / 38
Mutual Multivariate Independence Empirical Pairwise Measure
An Empirical Pairwise Multivariate MeasureLet (X,Y) = {(Xi ,Yi ) : i = 1, . . . , n} be an iid sample from the jointdistribution of random vectors X ∈ Rdx & Y ∈ Rdy , E (|X |+ |Y |) <∞.
Define In(X,Y) = T1,n + T2x ,nT2y ,n − T3,n,
in which
T1,n =
(n
2
)−1∑i<j
|Xi − Xj ||Yi − Yj |,
T2x ,n =
(n
2
)−1∑i<j
|Xi − Xj |,
T2y ,n =
(n
2
)−1∑i<j
|Yi − Yj |,
T3,n =
(n
3
)−1 ∑i<j<k
(|Xi − Xj ||Yi − Yk |+ |Xi − Xk ||Yi − Yj |
).
David S. Matteson ([email protected]) dCovICA 2014 June 3 7 / 38
Mutual Multivariate Independence An Alternative Measure
An Alternative MeasureI In(X,Y) depends on marginal distributions
I Apply probability integral transformation (PIT)
marginal CDF Fxi ,Fyj : R→ [0, 1], define Ui = Fxi (Xi ),Vj = Fyj (Yj)
I(U,V ) = 0 iff X and Y are independent
I The F are unknown
I Use F , e.g. marginal ranks
I Let Ui = Fxi (Xi ), Vj = Fyj (Yj)
Lemma
In(U, V)a.s.−→ I(U,V ), and if H0 : X ⊥⊥ Y , then nIn(U, V)
D−→ r.v.
David S. Matteson ([email protected]) dCovICA 2014 June 3 8 / 38
Serial Dependence A Joint Multivariate Test
A Simultaneous Test for Multivariate Serial Dependence
I Assuming yt ∈ Rd is strictly stationarity and E|yt | <∞,I(yt , yt−k) measures lag-k multivariate serial dependence
I Let Yt−k = {yt−1, . . . , yt−k}, then I(yt ,Yt−k) jointly measuresmultivariate serial dependence up to lag-k
I Joint hypothesis for multivariate serial dependence
H0 : φyt ,Yt−k= φytφYt−k
stationarity⇐⇒ φyt ,yt−1,...,yt−k= φytφyt−1 · · ·φyt−k
PIT⇐⇒ φut ,ut−1,...,ut−k= φutφut−1 · · ·φut−k
I We define our test statistic as
Qd(Y, k) = (n − k) In(ut , Ut−k)
I Approximate p-values via permutation
David S. Matteson ([email protected]) dCovICA 2014 June 3 9 / 38
Serial Dependence A Joint Multivariate Test
Asymptotic Distribution
Let ζk(a, b) denote a mean zero complex Gaussian process with covariancefunction
Rk(c , c0) =(φu(a− a0)− φu(a)φu(a0)
)(φuk
(b − b0)− φuk(b)φuk
(b0))
for c = (a, b), c0 = (a0, b0) ∈ R× Rk
I φu characteristic function of ut, and φuk= φu · · ·φu
I Under H0,
Qd(Y, k)D−→ ||ζk(a, b)||2ω
I For univariate time series (d=1) this is an asy. distribution free test
David S. Matteson ([email protected]) dCovICA 2014 June 3 10 / 38
Serial Dependence Unemployment Rates
Seasonally Adjusted Monthly Unemployment Rates (%)CA, FL, IL, MI, OH, & WI, from January 1976 through August 2010
Year
Mon
thly
Une
mpl
oym
ent R
ate
Perc
enta
ge
1975 1980 1985 1990 1995 2000 2005 2010
46
810
1214
16
CAFLILMIOHWI
David S. Matteson ([email protected]) dCovICA 2014 June 3 11 / 38
Serial Dependence Unemployment Rates
Standardized Change in Monthly Unemployment Rate %First difference series, scaled by monthly standard deviations
Year
Sta
ndar
dize
d C
hang
e in
Mon
thly
Une
mpl
oym
ent R
ate
Perc
enta
ge
1975 1980 1985 1990 1995 2000 2005 2010
−4−2
02
46 CA
FLILMIOHWI
David S. Matteson ([email protected]) dCovICA 2014 June 3 12 / 38
Serial Dependence Unemployment Rates
Testing for Serial Dependence
I Transform to stationary yt
I First difference series, scaled by monthly standard deviations
I Q6(Y, k = 12) = 30.92 with p-value ≈ 0
I Fit a vector autoregression of order three, via OLS
yt = β0 + β1yt−1 + β2yt−2 + β3yt−3 + et
I Calculate residuals et
I Q6(E, k = 12) = 0.11 with p-value ≈ 0.1
I ⇒ Linear model is sufficient
David S. Matteson ([email protected]) dCovICA 2014 June 3 13 / 38
Serial Dependence Unemployment Rates
Residual Series, Monthly Unemployment Rate %No significant serial dependence
Year
Res
idua
l for
VA
R(3
) Mon
thly
Une
mpl
oym
ent R
ate
Perc
enta
ge
1975 1980 1985 1990 1995 2000 2005 2010
−4−2
02
4
CAFLILMIOHWI
David S. Matteson ([email protected]) dCovICA 2014 June 3 14 / 38
Multivariate Analysis
Residual Distribution: Univariate and Bivariate
||||| | || ||| || |||| || || || | || ||| | ||| || || |||| || | |||| ||| || | |||| | || || ||| ||| |||| | | || || ||| || || | || || || | ||| |||
MI
−3 −2 −1 0 1 2
●
●●●
●
●
●
●
●●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●●
●
●
●
●
●●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
−4 −2 0 2 4
−1.0
−0.5
0.00.5
1.01.5
2.0
●
●●●
●
●
●
●
●●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
−3−2
−10
12
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
● ●
●●
●
●
●
●
●
●
|| | || | |||| || || | || || |||| ||| | || | ||| || || || |||| || | || ||| || ||| || | ||| | |||||| || || || | ||| ||| || ||| ||| |||| || || ||
OH
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
● ●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●●
●●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●●
●●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
|| | ||| || || | || |||| || || || | || | | || |||||| ||| || ||| |||| | || || ||||| |||| | || | || | || | | ||| || | |||| ||| | || || ||| || || ||
CA
−1.0
−0.5
0.00.5
1.01.5
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
● ●
●●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
−1.0 0.0 1.0 2.0
−4−2
02
4
●
●●
●
●
●
●●
●●
●
●
●
●●
●●
●
●●● ●● ●
●●
●●
●●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
● ●
●●
●
●●
●
●
●●
●
●
●
●●
●●
●
●
●
●●
●●
●
●●●●● ●
●●
● ●
●●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●●
●●
●
●●
●
−1.0 0.0 0.5 1.0 1.5
●
●●
●
●
●
●●
●●
●
●
●
●●
●●
●
●●● ●● ●
●●
● ●
●●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●●
●
| || || | ||| || | |||||| |||||| || ||||| ||| || || || || | | | |||||| || | |||| |||| | ||||| | || || ||| | ||| ||| |||| || || ||| ||| || |
WI
David S. Matteson ([email protected]) dCovICA 2014 June 3 15 / 38
Multivariate Analysis Pairwise to Mutual Independence
Extending the Pairwise Independence MeasureFor a random vector X ∈ Rd
I Let X1, . . . ,Xc denote a partition of X into 2 ≤ c ≤ d components
I Let Xk+ = (Xk+1, . . . ,Xc), t = (t1, . . . , tc), and tk+ = (tk+1, . . . , tc)
I DefineI(X ) =
c−1∑k=1
I(Xk ,Xk+)
Theorem
For any random vector X ∈ Rd , with disjoint components X1, . . . ,Xc , ifE |X | <∞, then I(X ) ∈ [0,∞), with I(X ) = 0 if and only if X1, . . . ,Xc
are mutually independent.
See Matteson (2014), and note that for every t ∈ Rd
|φx(t)− φx1(t1) · · ·φxc (tc)| ≤c−1∑k=1
|φxk ,xk+ (tk , tk+)− φxk(tk)φxk+
(tk+)|
David S. Matteson ([email protected]) dCovICA 2014 June 3 16 / 38
Independent Component Analysis Notation
Independent Component Analysis
For iid vector observations yt , assume independent components (ICs) st
exist, such thatyt = Mst
I M denotes the mixing matrix
I Validity of assumption is tested
For simplicity
I O, an uncorrelating matrix
I zt = Oyt , uncorrelated observations
Then st = M−1yt = M−1O−1zt ≡Wzt
in which W = M−1O−1 is referred to as the separating matrix
David S. Matteson ([email protected]) dCovICA 2014 June 3 17 / 38
Independent Component Analysis Assumptions
Assumptions
I yt = (y1t , . . . , ydt)T a d-dimensional random vector
I yt has continuous distribution function
I ytiid∼ Fy
I E|yt |2 <∞I E (yt) = 0
I st = (s1t , . . . , sdt)T a random vector of ICs
I E{sit} = 0 and Var{sit} = 1, ∀i
I Separating matrix W is orthogonal
I I = Cov(st) = WCov(zt)WT = WWT
I Parameterized by p = d(d − 1)/2 vector θ of rotation angles, Wθ
David S. Matteson ([email protected]) dCovICA 2014 June 3 18 / 38
Independent Component Analysis Estimation
dCovICA Estimator
I Define st(θ) = Wθzt , and S(θ) = [st(θ)] (n × d)
I Let k+ = {k + 1, . . . , d}
I Define the dCovICA objective function as
In(θ; Z) = In(S(θ)) =d−1∑k=1
In(Sk(θ),Sk+(θ))
I The dCovICA estimator is θn = argminθ In(θ; Z)
I Define Wn = Wθ=bθn
and st = st(θn) = Wnzt
David S. Matteson ([email protected]) dCovICA 2014 June 3 19 / 38
Independent Component Analysis Estimation
An Alternative Estimator
I Empirical measure depends on marginal distributions
I For continuous r.v. apply probability integral transformation (PIT)
I Marginals Fsk : R→ [0, 1], define uk,t = Fsk (sk,t), for each sk
I The Fsk are unknownI Empirical counterpart, use marginal ranksI However, objective function not continuousI Estimators must depend on locations {sk,t}nt=1, not just relative location
I Apply kernel smoothing to approximate Fsk with a continuous function
I LetFsk ,n,hn
(s) =n∑
t=1
G
(sk,t − s
hn
)
G : integral of a density kernel; hn: random, data-dependent bandwidth
David S. Matteson ([email protected]) dCovICA 2014 June 3 20 / 38
Independent Component Analysis Consistency
PITdCovICA EstimatorLet st(θ) = Wθzt , uk,t(θ) = Fsk (θ),n,hn
(sk,t(θ)) , and U(θ) = [ut(θ)]
I Objective function:
Jn(θ; Z) =d−1∑k=1
In(Uk(θ), Uk+(θ))
I Estimator: θn = argminθ Jn(θ)
Let uk(θ) = Fsk (θ) (sk(θ)) and
J (θ; Z) =d−1∑k=1
I(uk(θ), uk+(θ)) (1)
AssumptionThe random bandwidth hn is a measurable function of {yk,t}nt=1 such that
hna.s.→ 0; further, the kernel function G is Lipschitz continuous
TheoremIf there exists a minimizer θ0 of Equation (1), then θn
a.s.−→ θ0
David S. Matteson ([email protected]) dCovICA 2014 June 3 21 / 38
Independent Component Analysis Consistency
PITdCovICA EstimatorLet x1, . . . , xn be a sample from an (unknown) distribution F ∈ F in which Fdenotes the class of all continuous distribution functions. Let
Fn(x) =1
n
n∑i=1
1(−∞,x](xi )
denote the empirical cumulative distribution function (ECDF). Letx1:n ≤ . . . ≤ xn:n denote the order statistics from the sample x1, . . . , xn, and forsome b < n let
hn,b = min{xj :n − x(j−b):n : j = b + 1, ..., n} (2)
denote the minimum bth order spacing among the order statistics. Define thekernel estimator
Fn,b(x) =1
n
n∑i=1
G
(x − xi
hn,b
),
in which we assume G (x) = 0 for x ≤ −1/2, G (x) = 1 for x ≥ 1/2, G (0) = 1/2,
and G (x) is continuous and nondecreasing in (−1/2, 1/2).
David S. Matteson ([email protected]) dCovICA 2014 June 3 22 / 38
Independent Component Analysis Consistency
PITdCovICA Estimator
Then, for i = 1, . . . , n we note that |Fn(xi :n)− Fn(xi :n)| = b2n , and
supx∈R|Fn,b(x)− F (x)| = oP(n−1).
This Lemma extends the result of Zielinski (2007), and its proof followsfrom similar arguments. Examples for G (x) include Φ(logit(x + 1
2 )) andΦ(tan(xπ)), in which Φ(x) denotes the standard Gaussian CDF. We usethe former in our simulations and applications. We found that b = b
√nc
worked well in our simulations, with little sensitivity to this choice.
If the assumptions of the previous Theorem and Lemma hold, if Fyk(y) is
twice continuously differentiable ∀k , with derivatives fyk(y) and fyk
(y),respectively, if E |fyk
(yk)|2 <∞ and E |fyk(yk)|2 <∞, ∀k , and if
E[∂∂θ n
(θ)∣∣θ=θ0
]= 0, then |θn − θ0| = OP(n−1/2).
David S. Matteson ([email protected]) dCovICA 2014 June 3 23 / 38
Independent Component Analysis Consistency
PITdCovICA Estimator
Then, for i = 1, . . . , n we note that |Fn(xi :n)− Fn(xi :n)| = b2n , and
supx∈R|Fn,b(x)− F (x)| = oP(n−1).
This Lemma extends the result of Zielinski (2007), and its proof followsfrom similar arguments. Examples for G (x) include Φ(logit(x + 1
2 )) andΦ(tan(xπ)), in which Φ(x) denotes the standard Gaussian CDF. We usethe former in our simulations and applications. We found that b = b
√nc
worked well in our simulations, with little sensitivity to this choice.
If the assumptions of the previous Theorem and Lemma hold, if Fyk(y) is
twice continuously differentiable ∀k , with derivatives fyk(y) and fyk
(y),respectively, if E |fyk
(yk)|2 <∞ and E |fyk(yk)|2 <∞, ∀k , and if
E[∂∂θ n
(θ)∣∣θ=θ0
]= 0, then |θn − θ0| = OP(n−1/2).
David S. Matteson ([email protected]) dCovICA 2014 June 3 23 / 38
Independent Component Analysis Inference
Statistical Inference
Although the minimizers θn and θn always exists, an important questionfor all ICA methods is: do the independent components exist or not?
To evaluate this issue statistically, we construct a test of the null hypothesis
H0 : Y = SMT
in which M is non-singular and S1, . . . ,Sd are mutually independent
I Recall, nIn(S) converges in distribution under mutual independence
I M is unknown in practice, hence S not directly observed
I The limiting distribution of nIn(S) is different than that of nIn(S)
David S. Matteson ([email protected]) dCovICA 2014 June 3 24 / 38
Independent Component Analysis Inference
Inference Based on Resampling
Define Mn = U−1
n W−1θn
as the estimated mixing matrix
I Un is the estimated uncorrelating matrix
I θn is either the dCovICA or PITdCovICA estimator
The proposed resampling scheme consists of the following steps:
1. For each k = 1, . . . , d , jointly sample the entire sequenceS∗k = (s∗1,k , . . . , s
∗n,k)T by randomly permuting the n elements of Sk
2. Let Y∗ = S∗MTn
3. Replace the sample Y with Y∗
4. Given Y∗, estimate M∗ via same procedure used to estimate Mn
5. Define S∗
= Y∗M∗−T
6. Calculate I∗n(S) = In(S∗)
David S. Matteson ([email protected]) dCovICA 2014 June 3 25 / 38
Independent Component Analysis Testing Existence
A Test for the Existence of Independent Components
The Y∗ are generated according to H0
I Components of Y∗M−Tn are genuine Independent Components
Under H0, given the sample Y: nI∗n(S)D∼ nIn(S)
Repeat the resampling scheme N times:
I Reject H0 if nIn(S) is greater than the (Nα)th largest value of thenI∗n(S), in which α ∈ (0, 1) is the size of the test
I Accounts for uncertainty in estimating Independent Components givenZn, and uncertainty in estimating Zn
Procedure independent of estimation, may use with any ICA method
David S. Matteson ([email protected]) dCovICA 2014 June 3 26 / 38
Application Unemployment Rates
PCA vs. ICA: Filtered Unemployment Rate et
Test statistic and approximate p-value for joint test of ICs
nIn(·) yt et zt st
Test Statistic 39.70 5.27 0.41 -0.42Approx. p-value 0 0 0 0.91
David S. Matteson ([email protected]) dCovICA 2014 June 3 27 / 38
Application PCA vs. ICA
PCA vs. ICA: Probability Integral Transformation
−2 −1 0 1 2 3
−6−4
−20
2PCA
Z1
Z 2
●●● ●
●●●●
●
●
●
●●●
●●
●● ●
●●●
●● ●
●● ●●
●●● ●
●
●
●●
●
●●
●
●
●
●●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●●
●
●
● ●●
●●●
●●
●
●
●● ●●● ●●●
●
●●
●
●●●
●
−2 −1 0 1 2
−20
12
3
ICA
S1
S 2
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●●●
●
●●
●
●
●●●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●●
●●
●
●
●
●
●
●● ●
●
●
●
●
●● ●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●●
●●
●
●
●
●●
●
●
●
●
●●
●●
●
●
● ●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
0.0 0.2 0.4 0.6 0.8 1.0
0.00.4
0.8
PCA
F(Z1)
F(Z2)
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
0.00.4
0.8
ICA
F(S1)
F(S2)
David S. Matteson ([email protected]) dCovICA 2014 June 3 28 / 38
Application Interpretation of ICs
Interpretation of ICs
CA : e1 = −0.89s1 − 0.11s2 + 0.36s4 + 0.23s6
FL : e2 = −0.24s1 − 0.10s2 − 0.83s5 + 0.48s6
IL : e3 = −0.32s2 − 0.87s3 + 0.27s4 + 0.24s6
MI : e4 = −0.33s1 − 0.85s2 − 0.12s3 − 0.16s5 − 0.34s6
OH : e5 = −0.11s1 − 0.65s2 − 0.19s4 + 0.45s5 + 0.56s6
WI : e6 = −0.48s2 + 0.32s3 + 0.81s4
I s2 is related to each state
I s2 has positive relationship with seasonally adjusted GDP
I Supports hypothesis −s2 is national component of unemployment rate
David S. Matteson ([email protected]) dCovICA 2014 June 3 29 / 38
Simulation
SimulationI Compare dCovICA and PITdCovICA with SymR-est, AsyR-est,
FastICA, KDICA
I 18 source distributions: Student-t, uniform, exponential, mixtures...
I n = 1, 000; random mixing M0; pre-whitened via PCA; 1,000 reps
I Error Metric (Ilmonen et al. 2010):
D(M0, M) =1√
d − 1infC∈C||CM−1M0 − Id ||F
|| · ||F : Frobenius normI The infimum is taken such that D is invariant to the three ambiguities
associated with ICA by defining
C = {C ∈M : C = P±B for some P± and B}I M: set of d × d non-singular matricesI P±: a signed permutation matrixI B: a diagonal matrix with positive elementsI R package JADE (Nordhausen et al. 2011)
David S. Matteson ([email protected]) dCovICA 2014 June 3 30 / 38
Simulation d = 2
a b c
d e f
g h i
j k l
m n o
p q r
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
Distribution
Loga
rithm
of M
ean
Err
or fr
om T
rue
M (
+/−
Sta
ndar
d E
rror
)
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
a b c d e f g h i j k l m n o p q r
log(
0.05
)lo
g(0.
1)lo
g(0.
15)
log(
0.25
)lo
g(0.
4)lo
g(0.
6)
●
Method
dCovICAPITdCovICASym R−est ICA via dCovICAAsy R−est ICA via dCovICAFastICAKDICA
David S. Matteson ([email protected]) dCovICA 2014 June 3 31 / 38
Simulation d = 4, 8, 16
Simulation
Table: Mean error distance (×100), Equation (3), approximate standard error,and mean computation time in seconds (s) for N = 1,000 simulations in R4,R8,and R16 with sample size n = 1,000: randomly selecting with replacement fromthe 18 distributions.
Joint Estimation AsymmetricdCovICA PITdCovICA FastICA KDICA R-est ICA
Mean Error 8.075 8.036 19.381 9.951 17.664R4 Standard Error 0.150 0.132 0.512 0.362 0.415
Mean Time (s) 8.96 23.77 0.02 0.18 1.67Mean Error 8.600 8.628 32.070 20.439 N/A
R8 Standard Error 0.040 0.039 0.476 0.479 N/AMean Time (s) 26.77 64.80 0.07 5.54 N/A
Mean Error 8.884 8.878 48.396 39.930 N/AR16 Standard Error 0.019 0.019 0.364 0.473 N/A
Mean Time (s) 66.56 124.86 0.12 156.61 N/A
David S. Matteson ([email protected]) dCovICA 2014 June 3 32 / 38
Applications Yield Curve
Interest Rate Yields1
23
45
6Daily Treasury Rates (%) 9/1998 − 8/2008
Year
Rat
e (%
)
2000 2002 2004 2006 2008
David S. Matteson ([email protected]) dCovICA 2014 June 3 33 / 38
Applications Yield Curve
Yield Curve
David S. Matteson ([email protected]) dCovICA 2014 June 3 34 / 38
Applications Volatility Modeling
Volatility: Σt = MCov{st |Ft−1}M′ = Mdiag{σ2it}M′
Time
0 500 1000 1500 2000 2500
510
2030
Time
0 500 1000 1500 2000 2500
−10
−5
05
1015
Time
0 500 1000 1500 2000 2500
020
4060
80
Time
0 500 1000 1500 2000 2500
−20
−10
010
20
Time
0 500 1000 1500 2000 2500
510
1520
2530
Time
0 500 1000 1500 2000 2500
−15
−5
05
10
David S. Matteson ([email protected]) dCovICA 2014 June 3 35 / 38
Applications fMRI data
ICA of fMRI, ADHD-200 Global, & AcknowledgmentsTYP
1
1
ADHD
c(1:10)
c(1
:10
)
DIFF
1
1
c(1:10)
c(1
:10
)
1
1
−2−101234
c(1:10)
c(1
:10
)
1
1
c(1:10)
c(1
:10
)
1
c(1
:10
)
Benjamin B. Risk andDavid Ruppert, Cornell University
ADHD-200 Global CompetitionWinning team:Johns Hopkins UniversityBrian CaffoCiprian CrainiceanuAni EloyanFang HanHan LiuJohn MuschelliMary Beth NebelTuo Zhao
The Neuro Bureau:neurobureau.projects.nitrc.org/ADHD200
David S. Matteson ([email protected]) dCovICA 2014 June 3 36 / 38
Conclusions Thank you!
ConclusionsI Joint test for multivariate serial dependence
I A measure for mutual multivariate independence
I A statistical framework for independent component analysis
I A statistical test for checking the existence of independent components
I We combine nonparametric probability integral transformation with ageneralized nonparametric whitening method
I Limiting properties of the proposed estimator under weak conditions
Future Work
I Generalize to dependent data
I Extend to high dimensional data
I Derive asymptotic critical values for general test statistics
I Explore new applications
David S. Matteson ([email protected]) dCovICA 2014 June 3 37 / 38
Bibliography
BibliographyBach, F., and Jordan, M. (2003), “Kernel Independent Component Analysis,” The Journal of Machine Learning Research,
3, 1–48.Chen, A. (2006), Fast kernel density independent component analysis,, in Proceedings of the 6th international conference on
Independent Component Analysis and Blind Signal Separation, Springer-Verlag, pp. 24–31.Chen, A., and Bickel, P. (2005), “Consistent Independent Component Analysis and Prewhitening,” IEEE Trans. Signal
Processing, 53(10), 3625–3632.Eriksson, J., and Koivunen, V. (2003), “Characteristic-Function-Based Independent Component Analysis,” Signal Process,
83, 2195–2208.Hallin, M., and Mehta, C. (2013), “R-Estimation for Asymmetric Independent Component Analysis,” arXiv preprint
arXiv:1304.3073, .Hastie, T., and Tibshirani, R. (2003), “Independent Components Analysis Through Product Density Estimation,” Advances in
Neural Information Processing Systems, 15, 665–672.Hastie, T., and Tibshirani, R. (2010), ProDenICA: Product Density Estimation for ICA using Tilted Gaussian Density Estimates.
R Package Version 1.0.Hyvarinen, A., and Oja, E. (1997), “A Fast Fixed-Point Algorithm for Independent Component Analysis,” Neural Computation,
9(7), 1483–1492.Ilmonen, P., Nordhausen, K., Oja, H., and Ollila, E. (2010), “A New Performance Index for ICA: Properties, Computation and
Asymptotic Analysis,” Latent Variable Analysis and Signal Separation, pp. 229–236.Ilmonen, P., and Paindaveine, D. (2011), “Semiparametrically efficient inference based on signed ranks in symmetric independent
component models,” Annals of Statistics, 39(5), 2448–2476.Matteson, D. S., and Tsay, R. S. (2011), “Dynamic Orthogonal Components for Multivariate Time Series,” Journal of the
American Statistical Association, 106(496), 1450–1463.Matteson, D. S., and Tsay, R. S. (2012), “Independent Component Analysis via U-Statistics,” Under Review, .Nordhausen, K., Cardoso, J.-F., Oja, H., and Ollila, E. (2011), JADE: JADE and ICA Performance Criteria. R Package Version
1.0-4.Nordhausen, K., Oja, H., and Paindaveine, D. (2009), “Signed-rank tests for location in the symmetric independent component
model,” Journal of Multivariate Analysis, 100(5), 821–834.R Development Core Team (2010), R: A Language and Environment for Statistical Computing, R Foundation for Statistical
Computing, Vienna, Austria.Szekely, G. J., and Rizzo, M. L. (2009), “Brownian Distance Covariance,” Annals of Applied Statistics, 3(4), 1236–1265.Szekely, G. J., Rizzo, M. L., and Bakirov, N. K. (2007), “Measuring and Testing Dependence by Correlation of Distances,”
Annals of Statistics, 35(6), 2769–2794.Zielinski, R. (2007), “Kernel Estimators and the Dvoretzky-Kiefer-Wolfowitz Inequality,” Applicationes Mathematicae, 34(4), 401.
David S. Matteson ([email protected]) dCovICA 2014 June 3 38 / 38