Upload
hellebore-capital-limited
View
145
Download
0
Embed Size (px)
Citation preview
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
A closer look at correlationsParis Machine Learning Meetup #3 Season 4
G. Marti, S. Andler, F. Nielsen, P. Donnat
HELLEBORECAPITAL
November 9, 2016
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
1 Introduction
2 Standard correlation coefficientsPearson correlation coefficientSpearman correlation coefficient
3 A metric space for copulasOn the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC
4 ApplicationsExplore the correlations with clusteringQuery your dataset about correlations with TFDC
5 Conclusion
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
What is correlation?
E[XiXj ]− E[Xi ]E[Xj ]√(E[X 2
i ]− E[Xi ]2)(E[X 2j ]− E[Xj ]2)
∈ [−1, 1]
∑Nk=1(xik − xi )(xjk − xj)√∑N
k=1(xik − xi )2∑N
k=1(xjk − xj)2∈ [−1, 1]
import numpy as np
np.corrcoef(x_i,x_j)
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
Pearson correlation coefficientSpearman correlation coefficient
1 Introduction
2 Standard correlation coefficientsPearson correlation coefficientSpearman correlation coefficient
3 A metric space for copulasOn the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC
4 ApplicationsExplore the correlations with clusteringQuery your dataset about correlations with TFDC
5 Conclusion
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
Pearson correlation coefficientSpearman correlation coefficient
1 Introduction
2 Standard correlation coefficientsPearson correlation coefficientSpearman correlation coefficient
3 A metric space for copulasOn the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC
4 ApplicationsExplore the correlations with clusteringQuery your dataset about correlations with TFDC
5 Conclusion
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
Pearson correlation coefficientSpearman correlation coefficient
Pearson correlation
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
Pearson correlation coefficientSpearman correlation coefficient
Pearson correlation
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
Pearson correlation coefficientSpearman correlation coefficient
Pearson correlation
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
Pearson correlation coefficientSpearman correlation coefficient
Pearson correlation
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
Pearson correlation coefficientSpearman correlation coefficient
Pearson correlation
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
Pearson correlation coefficientSpearman correlation coefficient
Pearson correlation
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
Pearson correlation coefficientSpearman correlation coefficient
Pearson correlation with outliers
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
Pearson correlation coefficientSpearman correlation coefficient
1 Introduction
2 Standard correlation coefficientsPearson correlation coefficientSpearman correlation coefficient
3 A metric space for copulasOn the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC
4 ApplicationsExplore the correlations with clusteringQuery your dataset about correlations with TFDC
5 Conclusion
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
Pearson correlation coefficientSpearman correlation coefficient
Spearman correlation: Pearson on ranks
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
Pearson correlation coefficientSpearman correlation coefficient
Spearman correlation: Pearson on ranks
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
Pearson correlation coefficientSpearman correlation coefficient
Spearman correlation: Pearson on ranks
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
Pearson correlation coefficientSpearman correlation coefficient
Spearman correlation: Pearson on ranks
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
Pearson correlation coefficientSpearman correlation coefficient
Spearman correlation: Pearson on ranks
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
Pearson correlation coefficientSpearman correlation coefficient
Spearman correlation: Pearson on ranks
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
Pearson correlation coefficientSpearman correlation coefficient
Spearman correlation with outliers
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
On the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC
1 Introduction
2 Standard correlation coefficientsPearson correlation coefficientSpearman correlation coefficient
3 A metric space for copulasOn the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC
4 ApplicationsExplore the correlations with clusteringQuery your dataset about correlations with TFDC
5 Conclusion
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
On the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC
1 Introduction
2 Standard correlation coefficientsPearson correlation coefficientSpearman correlation coefficient
3 A metric space for copulasOn the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC
4 ApplicationsExplore the correlations with clusteringQuery your dataset about correlations with TFDC
5 Conclusion
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
On the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC
From ranks to empirical copula
Sklar’s Theorem [3]
For (Xi ,Xj) having continuous marginal cdfs FXi ,FXj , its joint cumulativedistribution F is uniquely expressed as
F (Xi ,Xj) = C (FXi (Xi ),FXj (Xj)),
where C is known as the copula of (Xi ,Xj).
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
On the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC
Minimum, Independence, Maximum copulas
Frechet–Hoeffding copula bounds
For any copula C : [0, 1]2 → [0, 1] and any (u, v) ∈ [0, 1]2 the followingbounds hold:
W(u, v) ≤ C (u, v) ≤M(u, v),
where W is the copula for counter-monotonic random variables, and Mis the copula for co-monotonic random variables.
0 0.5 1
ui
0
0.5
1
uj
w(ui, uj)
0.000
0.002
0.004
0.006
0.008
0.010
0.012
0.014
0.016
0.018
0.020
0 0.5 1
ui
0
0.5
1
uj
W(ui, uj)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0 0.5 1
ui
0
0.5
1
uj
π(ui, uj)
0.00036
0.00037
0.00038
0.00039
0.00040
0.00041
0.00042
0.00043
0.00044
0 0.5 1
ui
0
0.5
1
uj
Π(ui, uj)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0 0.5 1
ui
0
0.5
1
uj
m(ui, uj)
0.000
0.002
0.004
0.006
0.008
0.010
0.012
0.014
0.016
0.018
0.020
0 0.5 1
ui
0
0.5
1
uj
M(ui, uj)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
On the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC
1 Introduction
2 Standard correlation coefficientsPearson correlation coefficientSpearman correlation coefficient
3 A metric space for copulasOn the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC
4 ApplicationsExplore the correlations with clusteringQuery your dataset about correlations with TFDC
5 Conclusion
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
On the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC
A metric space for copulas
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
On the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC
A metric space for copulas
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
On the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC
Which metric? (Regularized) Optimal Transport
Distance is the minimum cost of transportation to transform onepile of dirt into another one, i.e. the amount of dirt moved timesthe distance by which it is moved.
EMD = |x1 − x2| EMD = 16 |x1 − x3|+ 1
6 |x2 − x3|
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
On the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC
Which metric? (Regularized) Optimal Transport
Its geometry has good properties in general [1], and for copulas [2].
0 0.5 10
0.5
1
0.0000
0.0015
0.0030
0.0045
0.0060
0.0075
0.0090
0.0105
0.0120
0 0.5 10
0.5
1
0.0000
0.0015
0.0030
0.0045
0.0060
0.0075
0.0090
0.0105
0.0120
0 0.5 10
0.5
1
0.0000
0.0015
0.0030
0.0045
0.0060
0.0075
0.0090
0.0105
0.0120
0 0.5 10
0.5
1
0.0000
0.0015
0.0030
0.0045
0.0060
0.0075
0.0090
0.0105
0.0120
0 0.5 10
0.5
1Bregman barycenter copula
0.0000
0.0008
0.0016
0.0024
0.0032
0.0040
0.0048
0.0056
0 0.5 10
0.5
1Wasserstein barycenter copula
0.0000
0.0004
0.0008
0.0012
0.0016
0.0020
0.0024
0.0028
0.0032
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
On the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC
A metric space for copulas
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
On the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC
A metric space for copulas
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
On the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC
A metric space for copulas
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
On the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC
A metric space for copulas
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
On the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC
1 Introduction
2 Standard correlation coefficientsPearson correlation coefficientSpearman correlation coefficient
3 A metric space for copulasOn the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC
4 ApplicationsExplore the correlations with clusteringQuery your dataset about correlations with TFDC
5 Conclusion
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
On the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC
The Target/Forget Dependence Coefficient (TFDC)
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
On the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC
The Target/Forget Dependence Coefficient (TFDC)
Now, we can define our bespoke dependence coefficient:
Build the forget-dependence copulas {CFl }l
Build the target-dependence copulas {CTk }k
Compute the empirical copula Cij from xi , xj
TFDC(Cij) =minl D(CF
l ,Cij)
minl D(CFl ,Cij) + mink D(Cij ,CT
k )
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
On the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC
TFDC Power
0.0
0.2
0.4
0.6
0.8
1.0
xvals
pow
er.cor[typ,]
xvals
pow
er.cor[typ,]
0.0
0.2
0.4
0.6
0.8
1.0
xvals
pow
er.cor[typ,]
xvals
pow
er.cor[typ,]
cordCorMICACEMMDCMMDRDCTFDC
0.0
0.2
0.4
0.6
0.8
1.0
xvals
pow
er.cor[typ,]
xvals
pow
er.cor[typ,]
0 20 40 60 80 100
0.0
0.2
0.4
0.6
0.8
1.0
xvals
pow
er.cor[typ,]
0 20 40 60 80 100
xvals
pow
er.cor[typ,]
Noise Level
Pow
er
Figure: Power of several dependence coefficients as a function of thenoise level in eight different scenarios. Insets show the noise-free form ofeach association pattern. The coefficient power was estimated via 500simulations with sample size 500 each.
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
Explore the correlations with clusteringQuery your dataset about correlations with TFDC
1 Introduction
2 Standard correlation coefficientsPearson correlation coefficientSpearman correlation coefficient
3 A metric space for copulasOn the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC
4 ApplicationsExplore the correlations with clusteringQuery your dataset about correlations with TFDC
5 Conclusion
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
Explore the correlations with clusteringQuery your dataset about correlations with TFDC
1 Introduction
2 Standard correlation coefficientsPearson correlation coefficientSpearman correlation coefficient
3 A metric space for copulasOn the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC
4 ApplicationsExplore the correlations with clusteringQuery your dataset about correlations with TFDC
5 Conclusion
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
Explore the correlations with clusteringQuery your dataset about correlations with TFDC
Clustering of empirical copulas
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
Explore the correlations with clusteringQuery your dataset about correlations with TFDC
Financial correlations - Stocks CAC 40
Figure: Stocks: More mass in the bottom-left corner, i.e. lower taildependence. Stock prices tend to plummet together.
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
Explore the correlations with clusteringQuery your dataset about correlations with TFDC
Financial correlations - Credit Default Swaps
Figure: Credit default swaps: More mass in the top-right corner, i.e.upper tail dependence. Insurance cost against entities’ default tends tosoar in stressed market.
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
Explore the correlations with clusteringQuery your dataset about correlations with TFDC
Financial correlations - FX rates
Figure: FX rates: Empirical copulas show that dependence between FXrates are various. For example, rates may exhibit either strongdependence or independence while being anti-correlated during extremeevents.
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
Explore the correlations with clusteringQuery your dataset about correlations with TFDC
Associations between features in UCI datasets
Dependence patterns (= clustering centroids) found between features in UCI datasets
Breast Cancer (wdbc) 0 0.5 10
0.5
1
0 0.5 10
0.5
1
0 0.5 10
0.5
1
0 0.5 10
0.5
1
0 0.5 10
0.5
1
Libras Movement 0 0.5 10
0.5
1
0 0.5 10
0.5
1
0 0.5 10
0.5
1
0 0.5 10
0.5
1
0 0.5 10
0.5
1
Parkinsons 0 0.5 10
0.5
1
0 0.5 10
0.5
1
0 0.5 10
0.5
1
0 0.5 10
0.5
1
0 0.5 10
0.5
1
Gamma Telescope 0 0.5 10
0.5
1
0 0.5 10
0.5
1
0 0.5 10
0.5
1
0 0.5 10
0.5
1
0 0.5 10
0.5
1
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
Explore the correlations with clusteringQuery your dataset about correlations with TFDC
1 Introduction
2 Standard correlation coefficientsPearson correlation coefficientSpearman correlation coefficient
3 A metric space for copulasOn the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC
4 ApplicationsExplore the correlations with clusteringQuery your dataset about correlations with TFDC
5 Conclusion
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
Explore the correlations with clusteringQuery your dataset about correlations with TFDC
The Art of formulating questions about correlations
Encode your dependence hypothesis as a copula, and your query as a
“k-NN search”.
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
1 Introduction
2 Standard correlation coefficientsPearson correlation coefficientSpearman correlation coefficient
3 A metric space for copulasOn the importance of the normalizationWhich metric? (Regularized) Optimal TransportA customizable dependence coefficient: TFDC
4 ApplicationsExplore the correlations with clusteringQuery your dataset about correlations with TFDC
5 Conclusion
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
Summary
Designing data-driven tailored correlation coefficients
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
Take Home Message
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
Internships at Hellebore
If you are interested by an internship at Hellebore
in applied machine learning for Finance (NLP, TextClassification, Information Extraction), please contact:
in ML/Finance research (copulas, bayesian inference,clustering, time series analysis), please contact:
Gautier Marti A closer look at correlations
HELLEBORECAPITAL
IntroductionStandard correlation coefficients
A metric space for copulasApplications
Conclusion
Marco Cuturi.Sinkhorn distances: Lightspeed computation of optimaltransport.In Advances in Neural Information Processing Systems, pages2292–2300, 2013.
Gautier Marti, Sebastien Andler, Frank Nielsen, and PhilippeDonnat.Optimal transport vs. fisher-rao distance between copulas forclustering multivariate time series.In IEEE Statistical Signal Processing Workshop, SSP 2016,Palma de Mallorca, Spain, June 26-29, 2016, pages 1–5, 2016.
A Sklar.Fonctions de repartition a n dimensions et leurs marges.Universite Paris 8, 1959.
Gautier Marti A closer look at correlations