A closer look at correlations

Preview:

Citation preview

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

A closer look at correlationsParis Machine Learning Meetup #3 Season 4

G. Marti, S. Andler, F. Nielsen, P. Donnat

HELLEBORECAPITAL

November 9, 2016

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

1 Introduction

2 Standard correlation coefficientsPearson correlation coefficientSpearman correlation coefficient

3 A metric space for copulasOn the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC

4 ApplicationsExplore the correlations with clusteringQuery your dataset about correlations with TFDC

5 Conclusion

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

What is correlation?

E[XiXj ]− E[Xi ]E[Xj ]√(E[X 2

i ]− E[Xi ]2)(E[X 2j ]− E[Xj ]2)

∈ [−1, 1]

∑Nk=1(xik − xi )(xjk − xj)√∑N

k=1(xik − xi )2∑N

k=1(xjk − xj)2∈ [−1, 1]

import numpy as np

np.corrcoef(x_i,x_j)

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

Pearson correlation coefficientSpearman correlation coefficient

1 Introduction

2 Standard correlation coefficientsPearson correlation coefficientSpearman correlation coefficient

3 A metric space for copulasOn the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC

4 ApplicationsExplore the correlations with clusteringQuery your dataset about correlations with TFDC

5 Conclusion

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

Pearson correlation coefficientSpearman correlation coefficient

1 Introduction

2 Standard correlation coefficientsPearson correlation coefficientSpearman correlation coefficient

3 A metric space for copulasOn the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC

4 ApplicationsExplore the correlations with clusteringQuery your dataset about correlations with TFDC

5 Conclusion

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

Pearson correlation coefficientSpearman correlation coefficient

Pearson correlation

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

Pearson correlation coefficientSpearman correlation coefficient

Pearson correlation

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

Pearson correlation coefficientSpearman correlation coefficient

Pearson correlation

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

Pearson correlation coefficientSpearman correlation coefficient

Pearson correlation

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

Pearson correlation coefficientSpearman correlation coefficient

Pearson correlation

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

Pearson correlation coefficientSpearman correlation coefficient

Pearson correlation

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

Pearson correlation coefficientSpearman correlation coefficient

Pearson correlation with outliers

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

Pearson correlation coefficientSpearman correlation coefficient

1 Introduction

2 Standard correlation coefficientsPearson correlation coefficientSpearman correlation coefficient

3 A metric space for copulasOn the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC

4 ApplicationsExplore the correlations with clusteringQuery your dataset about correlations with TFDC

5 Conclusion

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

Pearson correlation coefficientSpearman correlation coefficient

Spearman correlation: Pearson on ranks

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

Pearson correlation coefficientSpearman correlation coefficient

Spearman correlation: Pearson on ranks

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

Pearson correlation coefficientSpearman correlation coefficient

Spearman correlation: Pearson on ranks

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

Pearson correlation coefficientSpearman correlation coefficient

Spearman correlation: Pearson on ranks

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

Pearson correlation coefficientSpearman correlation coefficient

Spearman correlation: Pearson on ranks

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

Pearson correlation coefficientSpearman correlation coefficient

Spearman correlation: Pearson on ranks

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

Pearson correlation coefficientSpearman correlation coefficient

Spearman correlation with outliers

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

On the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC

1 Introduction

2 Standard correlation coefficientsPearson correlation coefficientSpearman correlation coefficient

3 A metric space for copulasOn the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC

4 ApplicationsExplore the correlations with clusteringQuery your dataset about correlations with TFDC

5 Conclusion

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

On the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC

1 Introduction

2 Standard correlation coefficientsPearson correlation coefficientSpearman correlation coefficient

3 A metric space for copulasOn the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC

4 ApplicationsExplore the correlations with clusteringQuery your dataset about correlations with TFDC

5 Conclusion

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

On the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC

From ranks to empirical copula

Sklar’s Theorem [3]

For (Xi ,Xj) having continuous marginal cdfs FXi ,FXj , its joint cumulativedistribution F is uniquely expressed as

F (Xi ,Xj) = C (FXi (Xi ),FXj (Xj)),

where C is known as the copula of (Xi ,Xj).

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

On the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC

Minimum, Independence, Maximum copulas

Frechet–Hoeffding copula bounds

For any copula C : [0, 1]2 → [0, 1] and any (u, v) ∈ [0, 1]2 the followingbounds hold:

W(u, v) ≤ C (u, v) ≤M(u, v),

where W is the copula for counter-monotonic random variables, and Mis the copula for co-monotonic random variables.

0 0.5 1

ui

0

0.5

1

uj

w(ui, uj)

0.000

0.002

0.004

0.006

0.008

0.010

0.012

0.014

0.016

0.018

0.020

0 0.5 1

ui

0

0.5

1

uj

W(ui, uj)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 0.5 1

ui

0

0.5

1

uj

π(ui, uj)

0.00036

0.00037

0.00038

0.00039

0.00040

0.00041

0.00042

0.00043

0.00044

0 0.5 1

ui

0

0.5

1

uj

Π(ui, uj)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 0.5 1

ui

0

0.5

1

uj

m(ui, uj)

0.000

0.002

0.004

0.006

0.008

0.010

0.012

0.014

0.016

0.018

0.020

0 0.5 1

ui

0

0.5

1

uj

M(ui, uj)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

On the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC

1 Introduction

2 Standard correlation coefficientsPearson correlation coefficientSpearman correlation coefficient

3 A metric space for copulasOn the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC

4 ApplicationsExplore the correlations with clusteringQuery your dataset about correlations with TFDC

5 Conclusion

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

On the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC

A metric space for copulas

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

On the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC

A metric space for copulas

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

On the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC

Which metric? (Regularized) Optimal Transport

Distance is the minimum cost of transportation to transform onepile of dirt into another one, i.e. the amount of dirt moved timesthe distance by which it is moved.

EMD = |x1 − x2| EMD = 16 |x1 − x3|+ 1

6 |x2 − x3|

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

On the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC

Which metric? (Regularized) Optimal Transport

Its geometry has good properties in general [1], and for copulas [2].

0 0.5 10

0.5

1

0.0000

0.0015

0.0030

0.0045

0.0060

0.0075

0.0090

0.0105

0.0120

0 0.5 10

0.5

1

0.0000

0.0015

0.0030

0.0045

0.0060

0.0075

0.0090

0.0105

0.0120

0 0.5 10

0.5

1

0.0000

0.0015

0.0030

0.0045

0.0060

0.0075

0.0090

0.0105

0.0120

0 0.5 10

0.5

1

0.0000

0.0015

0.0030

0.0045

0.0060

0.0075

0.0090

0.0105

0.0120

0 0.5 10

0.5

1Bregman barycenter copula

0.0000

0.0008

0.0016

0.0024

0.0032

0.0040

0.0048

0.0056

0 0.5 10

0.5

1Wasserstein barycenter copula

0.0000

0.0004

0.0008

0.0012

0.0016

0.0020

0.0024

0.0028

0.0032

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

On the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC

A metric space for copulas

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

On the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC

A metric space for copulas

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

On the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC

A metric space for copulas

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

On the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC

A metric space for copulas

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

On the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC

1 Introduction

2 Standard correlation coefficientsPearson correlation coefficientSpearman correlation coefficient

3 A metric space for copulasOn the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC

4 ApplicationsExplore the correlations with clusteringQuery your dataset about correlations with TFDC

5 Conclusion

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

On the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC

The Target/Forget Dependence Coefficient (TFDC)

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

On the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC

The Target/Forget Dependence Coefficient (TFDC)

Now, we can define our bespoke dependence coefficient:

Build the forget-dependence copulas {CFl }l

Build the target-dependence copulas {CTk }k

Compute the empirical copula Cij from xi , xj

TFDC(Cij) =minl D(CF

l ,Cij)

minl D(CFl ,Cij) + mink D(Cij ,CT

k )

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

On the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC

TFDC Power

0.0

0.2

0.4

0.6

0.8

1.0

xvals

pow

er.cor[typ,]

xvals

pow

er.cor[typ,]

0.0

0.2

0.4

0.6

0.8

1.0

xvals

pow

er.cor[typ,]

xvals

pow

er.cor[typ,]

cordCorMICACEMMDCMMDRDCTFDC

0.0

0.2

0.4

0.6

0.8

1.0

xvals

pow

er.cor[typ,]

xvals

pow

er.cor[typ,]

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

xvals

pow

er.cor[typ,]

0 20 40 60 80 100

xvals

pow

er.cor[typ,]

Noise Level

Pow

er

Figure: Power of several dependence coefficients as a function of thenoise level in eight different scenarios. Insets show the noise-free form ofeach association pattern. The coefficient power was estimated via 500simulations with sample size 500 each.

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

Explore the correlations with clusteringQuery your dataset about correlations with TFDC

1 Introduction

2 Standard correlation coefficientsPearson correlation coefficientSpearman correlation coefficient

3 A metric space for copulasOn the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC

4 ApplicationsExplore the correlations with clusteringQuery your dataset about correlations with TFDC

5 Conclusion

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

Explore the correlations with clusteringQuery your dataset about correlations with TFDC

1 Introduction

2 Standard correlation coefficientsPearson correlation coefficientSpearman correlation coefficient

3 A metric space for copulasOn the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC

4 ApplicationsExplore the correlations with clusteringQuery your dataset about correlations with TFDC

5 Conclusion

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

Explore the correlations with clusteringQuery your dataset about correlations with TFDC

Clustering of empirical copulas

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

Explore the correlations with clusteringQuery your dataset about correlations with TFDC

Financial correlations - Stocks CAC 40

Figure: Stocks: More mass in the bottom-left corner, i.e. lower taildependence. Stock prices tend to plummet together.

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

Explore the correlations with clusteringQuery your dataset about correlations with TFDC

Financial correlations - Credit Default Swaps

Figure: Credit default swaps: More mass in the top-right corner, i.e.upper tail dependence. Insurance cost against entities’ default tends tosoar in stressed market.

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

Explore the correlations with clusteringQuery your dataset about correlations with TFDC

Financial correlations - FX rates

Figure: FX rates: Empirical copulas show that dependence between FXrates are various. For example, rates may exhibit either strongdependence or independence while being anti-correlated during extremeevents.

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

Explore the correlations with clusteringQuery your dataset about correlations with TFDC

Associations between features in UCI datasets

Dependence patterns (= clustering centroids) found between features in UCI datasets

Breast Cancer (wdbc) 0 0.5 10

0.5

1

0 0.5 10

0.5

1

0 0.5 10

0.5

1

0 0.5 10

0.5

1

0 0.5 10

0.5

1

Libras Movement 0 0.5 10

0.5

1

0 0.5 10

0.5

1

0 0.5 10

0.5

1

0 0.5 10

0.5

1

0 0.5 10

0.5

1

Parkinsons 0 0.5 10

0.5

1

0 0.5 10

0.5

1

0 0.5 10

0.5

1

0 0.5 10

0.5

1

0 0.5 10

0.5

1

Gamma Telescope 0 0.5 10

0.5

1

0 0.5 10

0.5

1

0 0.5 10

0.5

1

0 0.5 10

0.5

1

0 0.5 10

0.5

1

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

Explore the correlations with clusteringQuery your dataset about correlations with TFDC

1 Introduction

2 Standard correlation coefficientsPearson correlation coefficientSpearman correlation coefficient

3 A metric space for copulasOn the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC

4 ApplicationsExplore the correlations with clusteringQuery your dataset about correlations with TFDC

5 Conclusion

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

Explore the correlations with clusteringQuery your dataset about correlations with TFDC

The Art of formulating questions about correlations

Encode your dependence hypothesis as a copula, and your query as a

“k-NN search”.

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

1 Introduction

2 Standard correlation coefficientsPearson correlation coefficientSpearman correlation coefficient

3 A metric space for copulasOn the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC

4 ApplicationsExplore the correlations with clusteringQuery your dataset about correlations with TFDC

5 Conclusion

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

Summary

Designing data-driven tailored correlation coefficients

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

Take Home Message

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

Internships at Hellebore

If you are interested by an internship at Hellebore

in applied machine learning for Finance (NLP, TextClassification, Information Extraction), please contact:

stage@helleboretech.com

in ML/Finance research (copulas, bayesian inference,clustering, time series analysis), please contact:

gmarti@helleborecapital.com

Gautier Marti A closer look at correlations

HELLEBORECAPITAL

IntroductionStandard correlation coefficients

A metric space for copulasApplications

Conclusion

Marco Cuturi.Sinkhorn distances: Lightspeed computation of optimaltransport.In Advances in Neural Information Processing Systems, pages2292–2300, 2013.

Gautier Marti, Sebastien Andler, Frank Nielsen, and PhilippeDonnat.Optimal transport vs. fisher-rao distance between copulas forclustering multivariate time series.In IEEE Statistical Signal Processing Workshop, SSP 2016,Palma de Mallorca, Spain, June 26-29, 2016, pages 1–5, 2016.

A Sklar.Fonctions de repartition a n dimensions et leurs marges.Universite Paris 8, 1959.

Gautier Marti A closer look at correlations