1
Optimal Copula Transport for Clustering Time Series Gautier Marti 1,2 , Frank Nielsen 2 , Philippe Donnat 1 1 Hellebore Capital Limited & 2 Ecole Polytechnique Clustering Time Series Which Dependence Measure? For Which Dependence? Many bivariate dependence measures are avail- able. Usually, they aim at measuring: any deviation from independence, any deviation from co/counter-monotonicity. Motivation: What if we aim at specific dependence, and try to “ignore” some others? Dependence to detect (ρ ij := 1) Dependence to ignore (ρ ij := 0) Problem: A dependence measure powerful enough to detect y = f (x 2 ) will also detect y = g (x), f increasing, g decreasing. Copulas & Dependence Sklar’s Theorem: F (x i ,x j )= C ij (F i (x i ),F j (x j )) C ij , the copula, encodes the dependence structure Fréchet-Hoeffding bounds: max {u i + u j - 1, 0}≤ C ij (u i ,u j ) min{u i ,u j } Bivariate dependence measures: deviation from lower and upper bounds Spearman’s ρ S , Gini’s γ deviation from independence u i u j Spearman, Copula MMD, Schweizer-Wolff’s σ , Hoeffding’s Φ 2 Figure 1: (left) lower-bound copula, (mid) independence copula, (right) upper-bound copula Optimal Transport Wasserstein metrics: W p p (μ, ν ) := inf γ Γ(μ,ν ) Z M ×M d(x, y ) p dγ (x, y ) In practice, the distance W 1 is estimated on discrete data by solving the following linear program with the Hungarian algorithm: EMD(s 1 ,s 2 ) := min f X 1k,l n p k - q l f kl subject to f kl 0, 1 k,l n, n X l =1 f kl w p k , 1 k n, n X k =1 f kl w q l , 1 l n, n X k =1 n X l =1 f kl =1. It is called the Earth Mover Distance (EMD) in the CS literature. A target-oriented dependence coefficient Build the independence copula C ind Build the target-dependence copulas {C k } k Compute the empirical copula C ij from x i ,x j TDC(C ij )= EMD(C ind ,C ij ) EMD(C ind ,C ij ) + min k EMD(C ij ,C k ) Figure 2: Dependence is measured as the relative distance from independence to the nearest target-dependence EMD between Copulas Probability integral transform of a variable x i : F T (x k i )= 1 T T X t=1 I (x t i x k i ), i.e. computing the ranks of the realizations, and normalizing them into [0,1] Why the Earth Mover Distance? Figure 3: Copulas C 1 ,C 2 ,C 3 encoding a correlation of 0.5, 0.99, 0.9999 respectively; Which pair of copulas is the nearest? For Fisher-Rao, Kullback-Leibler, Hellinger and re- lated divergences: D(C 1 ,C 2 ) D(C 2 ,C 3 ); EMD(C 2 ,C 3 ) EMD(C 1 ,C 2 ) Benchmark: Power of Estimators Our coefficient can robustly target complex depen- dence patterns such as the ones displayed in Fig. 4. x-axis measures the noise added to the sample y-axis measures the frequency the coefficient is able to discern between the dependent sample and the independent one Basic check: no coefficient can discern between the “dependent” sample (with no dependence) and the independent sample. 0.0 0.4 0.8 0.0 0.4 0.8 cor dCor MIC ACE RDC TDC 0.0 0.4 0.8 0 20 40 60 80 100 0.0 0.4 0.8 0 20 40 60 80 100 Noise Level Power Figure 4: Dependence estimators power as a function of the noise for several deterministic patterns + noise. Their power is the percentage of times that they are able to distinguish between dependent and independent samples. Clustering of Credit Default Swaps We use the two targets from Fig. 2 Clustering distance: D ij = q (1 - TDC(C ij ))/2 Figure 5: Impact of different measures on clusters Conclusion The methodology presented is non-parametric, robust, deterministic. It has some scalability issues: in dimension, non-parametric density estimation; in time, EMD is costly to compute. Approximation schemes or parametric modelling can alleviate these issues. Information Web: www.datagrapple.com Email: [email protected]

Optimal Copula Transport for Clustering Time Series · Optimal Copula Transport for Clustering Time Series GautierMarti1,2,FrankNielsen2,PhilippeDonnat1 1HelleboreCapitalLimited&2EcolePolytechnique

  • Upload
    others

  • View
    32

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Optimal Copula Transport for Clustering Time Series · Optimal Copula Transport for Clustering Time Series GautierMarti1,2,FrankNielsen2,PhilippeDonnat1 1HelleboreCapitalLimited&2EcolePolytechnique

Optimal Copula Transport for Clustering Time SeriesGautier Marti1,2, Frank Nielsen2, Philippe Donnat1

1Hellebore Capital Limited & 2Ecole Polytechnique

Clustering Time SeriesWhich Dependence Measure?

For Which Dependence?

Many bivariate dependence measures are avail-able. Usually, they aim at measuring:• any deviation from independence,• any deviation from co/counter-monotonicity.Motivation: What if•we aim at specific dependence,• and try to “ignore” some others?

Dependence to detect (ρij := 1)

Dependence to ignore (ρij := 0)

Problem: A dependence measure powerfulenough to detect y = f (x2) will also detecty = g(x), f increasing, g decreasing.

Copulas & Dependence

•Sklar’s Theorem:F (xi, xj) = Cij(Fi(xi), Fj(xj))

•Cij, the copula, encodes the dependence structure•Fréchet-Hoeffding bounds:

max{ui + uj − 1, 0} ≤ Cij(ui, uj) ≤ min{ui, uj}•Bivariate dependence measures:

• deviation from lower and upper bounds• Spearman’s ρS, Gini’s γ

• deviation from independence uiuj• Spearman, Copula MMD, Schweizer-Wolff’s σ, Hoeffding’s Φ2

Figure 1: (left) lower-bound copula, (mid) independence copula,(right) upper-bound copula

Optimal Transport

Wasserstein metrics:W p

p (µ, ν) := infγ∈Γ(µ,ν)

∫M×M

d(x, y)pdγ(x, y)

In practice, the distanceW1 is estimated on discretedata by solving the following linear program withthe Hungarian algorithm:

EMD(s1, s2) := minf

∑1≤k,l≤n

‖pk − ql‖fkl

subject to fkl ≥ 0, 1 ≤ k, l ≤ n,n∑l=1fkl ≤ wpk, 1 ≤ k ≤ n,

n∑k=1

fkl ≤ wql, 1 ≤ l ≤ n,

n∑k=1

n∑l=1fkl = 1.

It is called the Earth Mover Distance (EMD) in theCS literature.

A target-oriented dependencecoefficient

•Build the independence copula Cind

•Build the target-dependence copulas {Ck}k•Compute the empirical copula Cij from xi, xj

TDC(Cij) = EMD(Cind, Cij)EMD(Cind, Cij) + mink EMD(Cij, Ck)

Figure 2: Dependence is measured as the relative distance fromindependence to the nearest target-dependence

EMD between Copulas

•Probability integral transform of a variable xi:

FT (xki ) = 1T

T∑t=1I(xti ≤ xki ),

i.e. computing the ranks of the realizations, andnormalizing them into [0,1]

Why the Earth Mover Distance?

Figure 3: Copulas C1, C2, C3 encoding a correlation of0.5, 0.99, 0.9999 respectively; Which pair of copulas is thenearest? For Fisher-Rao, Kullback-Leibler, Hellinger and re-lated divergences: D(C1, C2) ≤ D(C2, C3); EMD(C2, C3) ≤EMD(C1, C2)

Benchmark: Power of Estimators

Our coefficient can robustly target complex depen-dence patterns such as the ones displayed in Fig. 4.

• x-axis measures the noise added to the sample• y-axis measures the frequency the coefficient isable to discern between the dependent sampleand the independent one

•Basic check: no coefficient can discern betweenthe “dependent” sample (with no dependence)and the independent sample.

0.0

0.4

0.8

xvals

pow

er.cor[typ,]

xvals

pow

er.cor[typ,]

0.0

0.4

0.8

xvals

pow

er.cor[typ,]

xvals

pow

er.cor[typ,]

cordCorMICACERDCTDC

0.0

0.4

0.8

xvals

pow

er.cor[typ,]

xvals

pow

er.cor[typ,]

0 20 40 60 80 100

0.0

0.4

0.8

xvals

pow

er.cor[typ,]

0 20 40 60 80 100

xvals

pow

er.cor[typ,]

Noise Level

Pow

er

Figure 4: Dependence estimators power as a function of thenoise for several deterministic patterns + noise. Their power isthe percentage of times that they are able to distinguish betweendependent and independent samples.

Clustering of Credit Default Swaps

•We use the two targets from Fig. 2•Clustering distance: Dij =

√(1− TDC(Cij))/2

Figure 5: Impact of different measures on clusters

Conclusion

The methodology presented is•non-parametric, robust, deterministic.It has some scalability issues:• in dimension, non-parametric density estimation;• in time, EMD is costly to compute.Approximation schemes or parametric modellingcan alleviate these issues.

Information•Web: www.datagrapple.com•Email: [email protected]