122
Robust and Nonparametric Statistical Tools for Big Data in Neuroscience Henry Laniado. Seminar PhD in Mathematical Engineering August 18, 2017

Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Robust and Nonparametric StatisticalTools for Big Data in Neuroscience

Henry Laniado.Seminar

PhD in Mathematical Engineering

August 18, 2017

Page 2: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Motivation: fMRI problem

Figure: Functional Magnetic Resonance

Page 3: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Motivation: fMRI problem

Figure: Stimulus

Page 4: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Motivation: fMRI problem

Figure: 900000 Voxels

Page 5: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Motivation: fMRI problemThe theoretical activity is compared with that observed in e achvoxel.

Figure: Activation and non-activation

Page 6: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Motivation: fMRI problemThe theoretical activity is compared with that observed in e achvoxel.

Figure: Activation and non-activation

Page 7: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Motivation: fMRI problem

Figure: Activation and non-activation

Page 8: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Motivation: fMRI problem

Figure: Activation and non-activation

Page 9: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Motivation: fMRI problem

Figure: Identification

Page 10: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Main GoalFind robust and non parametric estimators to explain lineardependence and apply them in the construction of covariancematrices to improve the significance of linear multivariatestatistical models in the presence of outliers

Page 11: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Main GoalFind robust and non parametric estimators to explain lineardependence and apply them in the construction of covariancematrices to improve the significance of linear multivariatestatistical models in the presence of outliers

Hypothesis 1Significance in linear models is too sensitive in the presenc e ofoutliers.

Page 12: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Main GoalFind robust and non parametric estimators to explain lineardependence and apply them in the construction of covariancematrices to improve the significance of linear multivariatestatistical models in the presence of outliers

Hypothesis 1Significance in linear models is too sensitive in the presenc e ofoutliers.

Hypothesis 2Significance in linear models can be improved with robustestimators or non parametric estimators

Page 13: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Outline

1. Robustness: a fast review

2. General Lineal Model (GLM)

3. Robust GLM, with an explanatory variables

4. Robust GLM, multivariate case

5. Non-parametric estimators

6. Conclusions

7. Research lines

Page 14: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Robustness:a fast review

Central Region: bivariate case

Central Region: functional case

General Linear Model (GLM)

Robust GLM, with an explanatory variableContaminating a 5%Contaminating a 10%

Robust GLM, Multiavarite caseContaminating a 5%Contaminating a 10%

Non parametric estimators

Conclusions

Research Lines

Page 15: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Population: unknown parameters. θ =∫

Page 16: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Population: unknown parameters. θ =∫

Page 17: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Population: unknown parameters. θ =∫

Sample: estimators. θ̂ =∑

Page 18: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Population: unknown parameters. θ =∫

Sample: estimators. θ̂ =∑

µ =

f(x)dx the estimator X̄ =

xi

n

Page 19: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Example

Table: Simulated data from N(0, 1)

0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273 0.7939 0.45153 0.858 0.6697

-1.838 2.5976 -0.282 0.1757 -0.5986 0.9407 -0.036-0.531 0.6208 1.4303 -1.044 -0.1349 0.2476 -0.397-1.432 1.0849 -1.12 0.6788 -1.1078 -0.245 0.5031-1.136 -0.011 1.5668 0.6654 -0.6194 -1.481 -0.4462.0077 -0.898 -0.886 -2.145 -2.435 -0.396 -0.116

Page 20: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Example

Table: Simulated data from N(0, 1)

0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273 0.7939 0.45153 0.858 0.6697

-1.838 2.5976 -0.282 0.1757 -0.5986 0.9407 -0.036-0.531 0.6208 1.4303 -1.044 -0.1349 0.2476 -0.397-1.432 1.0849 -1.12 0.6788 -1.1078 -0.245 0.5031-1.136 -0.011 1.5668 0.6654 -0.6194 -1.481 -0.4462.0077 -0.898 -0.886 -2.145 -2.435 -0.396 -0.116

X̄ =

∑49i=1 xi

49= −0.09

Page 21: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Example

Table: Data from N(0, 1). Contaminated data

0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273 8.2356 0.45153 0.858 0.6697

-1.838 2.5976 -0.282 0.1757 -0.5986 0.9407 12.266-3.82 0.6208 1.4303 -1.044 -0.1349 0.2476 -0.397

-1.432 1.0849 -1.12 0.6788 -1.1078 -0.245 0.5031-1.136 -0.011 1.5668 0.6654 -0.6194 -1.481 -0.4462.0077 -0.898 -0.886 -2.145 -5.1321 -0.396 9.266

Page 22: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Example

Table: Data from N(0, 1). Contaminated data

0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273 8.2356 0.45153 0.858 0.6697

-1.838 2.5976 -0.282 0.1757 -0.5986 0.9407 12.266-3.82 0.6208 1.4303 -1.044 -0.1349 0.2476 -0.397

-1.432 1.0849 -1.12 0.6788 -1.1078 -0.245 0.5031-1.136 -0.011 1.5668 0.6654 -0.6194 -1.481 -0.4462.0077 -0.898 -0.886 -2.145 -5.1321 -0.396 9.266

X̄ =

∑49i=1 xi

49= 0.4252

Page 23: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Example

Table: Data from N(0, 1). Contaminated data

-5.132 -3.82 -2.1448 -1.4809 -1.4318 -1.14 -1.1199-1.108 -1.044 -1.0384 -0.8979 -0.886 -0.84 -0.6194-0.599 -0.531 -0.4462 -0.3974 -0.3956 -0.39 -0.2819-0.273 -0.268 -0.2455 -0.1604 -0.1349 -0.01 0.12040.176 0.2476 0.354 0.45153 0.5031 0.621 0.66540.67 0.6788 0.7939 0.85803 0.9407 1.085 1.4303

1.567 2.0077 2.1261 2.59761 8.2356 9.266 12.266

X̄ =

∑49i=1 xi

49= 0.4252

Page 24: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Example

Table: Data from N(0, 1). Contaminated data

X X X -1.4809 -1.4318 -1.14 -1.1199-1.108 -1.044 -1.0384 -0.8979 -0.886 -0.84 -0.6194-0.599 -0.531 -0.4462 -0.3974 -0.3956 -0.39 -0.2819-0.273 -0.268 -0.2455 -0.1604 -0.1349 -0.01 0.12040.176 0.2476 0.354 0.45153 0.5031 0.621 0.66540.67 0.6788 0.7939 0.85803 0.9407 1.085 1.4303

1.567 2.0077 2.1261 2.59761 X X X

Trimmed mean

X̄α =

∑k2

i=k1x[i]

k2 − k1 + 1

k1 = [α2 n] k2 = [n(1− α2 )]

X̄0.1 = 0.0503

Page 25: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Depth

b

b

b

b

b

b

b

b b

b b

b

bb

bbb

b

b

b

b

b

b b

b

b

bb

bbb

b

b

b

b

bbb

b

bb

b b

bb

bb

b

bbb

b b bbb b

Page 26: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Depth

b

b

b

b

b

b

b

b b

b b

b

bb

bbb

b

b

b

b

b

b b

b

b

bb

bbb

b

b

b

b

bbb

b

bb

b b

bb

bb

b

bbb

b b bbb b

Page 27: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Depth

b

b

b

b

b

b

b

b b

b b

b

bb

bbb

b

b

b

b

b

b b

b

b

bb

bbb

b

b

b

b

bbb

b

bb

b b

bb

bb

b

bbb

b b bbb b

Page 28: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Depth

b

b

b

b

b

b

b

b b

b b

b

bb

bbb

b

b

b

b

b

b b

b

b

bb

bbb

b

b

b

b

bbb

b

bb

b b

bb

bb

b

bbb

b b bbb b

Page 29: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Depth

b

b

b

b

b

b

b

b b

b b

b

bb

bbb

b

b

b

b

b

b b

b

b

bb

bbb

b

b

b

b

bbb

b

bb

b b

bb

bb

b

bbb

b b bbb b

Page 30: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Dependent uniform distribution in dimension 2

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Central Region

Figure: Random Tukey Depth

Page 31: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Normal distribution in dimension 3

−4−2

02

4

−4−2

02

4−4

−2

0

2

4

Central Region

−4 −3 −2 −1 0 1 2 3−4

−3

−2

−1

0

1

2

3

4

X1

X2

Projection on X1 and X2

−4 −3 −2 −1 0 1 2 3−4

−3

−2

−1

0

1

2

3

X1

X3

Projection on X1 and X3

−4 −3 −2 −1 0 1 2 3−4

−3

−2

−1

0

1

2

3

4

X3

X2

Projection on X3 and X2

Figure: Random Tukey Depth

Page 32: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Central Region: univariate case

−4 −3 −2 −1 0 1 2 3 4 5−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Figure: Data

Page 33: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Central Region: univariate case

−4 −3 −2 −1 0 1 2 3 4 5−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Figure: Median

Page 34: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Central Region: univariate case

−4 −3 −2 −1 0 1 2 3 4 5−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Figure: Central Region 10%

Page 35: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Central Region: univariate case

−4 −3 −2 −1 0 1 2 3 4 5−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Figure: Central Region 20%

Page 36: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Central Region: univariate case

−4 −3 −2 −1 0 1 2 3 4 5−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Figure: Central Region 30%

Page 37: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Central Region: univariate case

−4 −3 −2 −1 0 1 2 3 4 5−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Figure: Central Region 40%

Page 38: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Central Region: univariate case

−4 −3 −2 −1 0 1 2 3 4 5−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Figure: Central Region 50%

Page 39: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Central Region: univariate case

−4 −3 −2 −1 0 1 2 3 4 5−1

−0.5

0

0.5

1

−4 −3 −2 −1 0 1 2 3 4 5

1

Values

Col

umn

Num

ber

Figure: Box plot

Page 40: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Robustness:a fast review

Central Region: bivariate case

Central Region: functional case

General Linear Model (GLM)

Robust GLM, with an explanatory variableContaminating a 5%Contaminating a 10%

Robust GLM, Multiavarite caseContaminating a 5%Contaminating a 10%

Non parametric estimators

Conclusions

Research Lines

Page 41: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Central Region: bivariate case

−4 −3 −2 −1 0 1 2 3 4−3

−2

−1

0

1

2

3

4

Figure: Data

Page 42: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Central Region: bivariate case

−4 −3 −2 −1 0 1 2 3 4−3

−2

−1

0

1

2

3

4

Figure: deepest point

Page 43: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Central Region: bivariate case

−4 −3 −2 −1 0 1 2 3 4−3

−2

−1

0

1

2

3

4

Figure: Central Region 10%

Page 44: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Central Region: bivariate case

−4 −3 −2 −1 0 1 2 3 4−3

−2

−1

0

1

2

3

4

Figure: Central Region 30%

Page 45: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Central Region: bivariate case

−4 −3 −2 −1 0 1 2 3 4−3

−2

−1

0

1

2

3

4

Figure: Central Region 40%

Page 46: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Central Region: bivariate case

−4 −3 −2 −1 0 1 2 3 4−3

−2

−1

0

1

2

3

4

Figure: Central Region 50%

Page 47: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Robustness:a fast review

Central Region: bivariate case

Central Region: functional case

General Linear Model (GLM)

Robust GLM, with an explanatory variableContaminating a 5%Contaminating a 10%

Robust GLM, Multiavarite caseContaminating a 5%Contaminating a 10%

Non parametric estimators

Conclusions

Research Lines

Page 48: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Central Region: functional case

0 10 20 30 40 50 60 70−2

0

2

4

6

8

10

12

Figure: Data

Page 49: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Central Region: functional case

0 10 20 30 40 50 60 70−2

0

2

4

6

8

10

12

Figure: deepest curve

Page 50: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Central Region: functional case

0 10 20 30 40 50 60 70−2

0

2

4

6

8

10

12

Figure: Central Region 10%

Page 51: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Central Region: functional case

0 10 20 30 40 50 60 70−2

0

2

4

6

8

10

12

Figure: Central Region 20%

Page 52: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Central Region: functional case

0 10 20 30 40 50 60 70−2

0

2

4

6

8

10

12

Figure: Central Region 20%

Page 53: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Central Region: functional case

0 10 20 30 40 50 60 70−2

0

2

4

6

8

10

12

Figure: Central Region 50%

Page 54: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Central Region: functional case

0 10 20 30 40 50 60 70−2

0

2

4

6

8

10

12

Figure: Central Region , Extremes

Page 55: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Central Region: functional case

0 10 20 30 40 50 60 70−2

0

2

4

6

8

10

12

Figure: Data

Page 56: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Central Region: functional case

0 10 20 30 40 50 60 70−2

0

2

4

6

8

10

12

Figure: Contaminated Data

Page 57: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Central Region: functional case

0 10 20 30 40 50 60 70−2

0

2

4

6

8

10

12

Figure: Central Region , Extremes

Page 58: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Robustness:a fast review

Central Region: bivariate case

Central Region: functional case

General Linear Model (GLM)

Robust GLM, with an explanatory variableContaminating a 5%Contaminating a 10%

Robust GLM, Multiavarite caseContaminating a 5%Contaminating a 10%

Non parametric estimators

Conclusions

Research Lines

Page 59: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

General Linear Model

0 0.2 0.4 0.6 0.8 1−0.2

0

0.2

0.4

0.6

0.8

1

1.2

x

y

Page 60: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

General Linear Model

0 0.2 0.4 0.6 0.8 1−0.2

0

0.2

0.4

0.6

0.8

1

1.2

x

y

Page 61: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

General Linear Model

0 0.2 0.4 0.6 0.8 1−0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

x

y

Page 62: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

General Linear Model

0 0.2 0.4 0.6 0.8 1−0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

x

y

Page 63: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Robustness:a fast review

Central Region: bivariate case

Central Region: functional case

General Linear Model (GLM)

Robust GLM, with an explanatory variableContaminating a 5%Contaminating a 10%

Robust GLM, Multiavarite caseContaminating a 5%Contaminating a 10%

Non parametric estimators

Conclusions

Research Lines

Page 64: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Significant variable

0 20 40 60 80 100 1200

0.2

0.4

0.6

0.8

1onsets

0 20 40 60 80 100 120−2

0

2

4

6Explanatory Signal (HRF)

0 20 40 60 80 100 120−5

0

5

10Explained Signal (YfMRI)

Y = X +N(0, 1)

Page 65: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Significant variable and Contaminating a 5%

0 20 40 60 80 100 120−5

0

5

10Explained Signal (YfMRI)

0 20 40 60 80 100 120−5

0

5

10Points on the time where the signal is contaminated

0 20 40 60 80 100 120−20

−10

0

10Signal contaminated

The red points will be generated by:◮ Normal (0, σ2), σ2 varying from 1 to 100.

Page 66: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Significant variable and Contaminating a 5%

0 20 40 60 80 100 120−5

0

5

10Explained Signal (YfMRI)

0 20 40 60 80 100 120−5

0

5

10Points on the time where the signal is contaminated

0 20 40 60 80 100 120−10

0

10

20

30Signal contaminated

The red points will be generated by:◮ Normal (µ, σ2), both varying from 1 to 100.

Page 67: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Significant variable and Contaminating a 5%

0 20 40 60 80 100 120−5

0

5

10Explained Signal (YfMRI)

0 20 40 60 80 100 120−5

0

5

10Points on the time where the signal is contaminated

0 20 40 60 80 100 120−10

0

10

20Signal contaminated

The red points will be generated by:◮ Normal (µ, 1), µ varying from 1 to 100.

Page 68: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Significant variable and Contaminating a 5%

0 20 40 60 80 100 120−10

0

10

20

30T−value Variations

GLMRobustGLMT−value=2

0 20 40 60 80 100 120−2

0

2

4B1 Variations

GLMRobustGLMReal−Beta1=1.01

0 20 40 60 80 100 120−4

−2

0

2

4B0 Variations

GLMRobustGLMReal−Beta0=0.1

◮ Normal (0, σ2), σ2 varying from 1 to 100.◮ X-axis represents σ2 values varying from 1 to 100.

Page 69: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Significant variable and Contaminating a 5%

0 20 40 60 80 100 1200

10

20

30T−value Variations

GLMRobustGLMT−value=2

0 20 40 60 80 100 1200

2

4

6B1 Variations

GLMRobustGLMReal−Beta1

0 20 40 60 80 100 120−2

0

2

4

6B0 Variations

GLMRobustGLMReal−Beta0

◮ Normal (µ, σ2), both varying from 1 to 100.◮ X-axis represents [µ; σ2] values both varying from 1 to 100.

Page 70: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Significant variable and Contaminating a 5%

0 20 40 60 80 100 1200

10

20

30T−value Variations

GLMRobustGLMT−value=2

0 20 40 60 80 100 1200

1

2

3B1 Variations

GLMRobustGLMReal−Beta1

0 20 40 60 80 100 1200

1

2

3B0 Variations

GLMRobustGLMReal−Beta0

◮ Normal (µ, 1), µ varying from 1 to 100.◮ X-axis represents µ values varying from 1 to 100.

Page 71: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

No Significant variable and Contaminating a 5%

0 20 40 60 80 100 120−4

−2

0

2

4T−value Variations

GLMRobustGLMT−value=2

0 20 40 60 80 100 120−4

−2

0

2B1 Variations

GLMRobustGLMReal−Beta1

0 20 40 60 80 100 120−4

−2

0

2

4B0 Variations

GLMRobustGLMReal−Beta0

◮ Normal (0, σ2), σ2 varying from 1 to 100.◮ X-axis represents σ2 values varying from 1 to 100.

Page 72: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

No Significant variable and Contaminating a 5%

0 20 40 60 80 100 120−1

0

1

2

3T−value Variations

GLMRobustGLMT−value=2

0 20 40 60 80 100 120−2

0

2

4

6B1 Variations

GLMRobustGLMReal−Beta1

0 20 40 60 80 100 120−2

0

2

4

6B0 Variations

GLMRobustGLMReal−Beta0

◮ Normal (µ, σ2), both varying from 1 to 100.◮ X-axis represents [µ; σ2] values both varying from 1 to 100.

Page 73: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

No Significant variable and Contaminating a 5%

0 20 40 60 80 100 120−1

0

1

2T−value Variations

GLMRobustGLMT−value=2

0 20 40 60 80 100 120−1

0

1

2B1 Variations

GLMRobustGLMReal−Beta1

0 20 40 60 80 100 120−1

0

1

2

3B0 Variations

GLMRobustGLMReal−Beta0

◮ Normal (µ, 1), µ varying from 1 to 100.◮ X-axis represents µ values varying from 1 to 100.

Page 74: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Significant variable and Contaminating a 10%

0 20 40 60 80 100 120−5

0

5

10Explained Signal (YfMRI)

0 20 40 60 80 100 120−5

0

5

10Points on the time where the signal is contaminated

0 20 40 60 80 100 120−20

−10

0

10

20Signal contaminated

The red points will be generated by:◮ Normal (0, σ2), σ2 varying from 1 to 100.

Page 75: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Significant variable and Contaminating a 10%

0 20 40 60 80 100 120−5

0

5

10Explained Signal (YfMRI)

0 20 40 60 80 100 120−5

0

5

10Points on the time where the signal is contaminated

0 20 40 60 80 100 120−20

0

20

40Signal contaminated

The red points will be generated by:◮ Normal (µ, σ2), both varying from 1 to 100.

Page 76: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Significant variable and Contaminating a 10%

0 20 40 60 80 100 120−5

0

5

10Explained Signal (YfMRI)

0 20 40 60 80 100 120−5

0

5

10Points on the time where the signal is contaminated

0 20 40 60 80 100 120−5

0

5

10

15Signal contaminated

The red points will be generated by:◮ Normal (µ, 1), µ varying from 1 to 100.

Page 77: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Significant variable and Contaminating a 10%

0 20 40 60 80 100 120−10

0

10

20

30T−value Variations

GLMRobustGLMT−value=2

0 20 40 60 80 100 120−4

−2

0

2

4B1 Variations

GLMRobustGLMReal−Beta1=1.01

0 20 40 60 80 100 120−10

−5

0

5

10B0 Variations

GLMRobustGLMReal−Beta0=0.1

◮ Normal (0, σ2), σ2 varying from 1 to 100.◮ X-axis represents σ2 values varying from 1 to 100.

Page 78: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Significant variable and Contaminating a 10%

0 20 40 60 80 100 120−10

0

10

20

30T−value Variations

GLMRobustGLMT−value=2

0 20 40 60 80 100 120−4

−2

0

2

4B1 Variations

GLMRobustGLMReal−Beta1=1.01

0 20 40 60 80 100 120−10

0

10

20

30B0 Variations

GLMRobustGLMReal−Beta0=0.1

◮ Normal (µ, σ2), both varying from 1 to 100.◮ X-axis represents [µ; σ2] values both varying from 1 to 100.

Page 79: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Significant variable and Contaminating a 10%

0 20 40 60 80 100 120−10

0

10

20

30T−value Variations

GLMRobustGLMT−value=2

0 20 40 60 80 100 120−1

0

1

2B1 Variations

GLMRobustGLMReal−Beta1=1.01

0 20 40 60 80 100 1200

5

10

15B0 Variations

GLMRobustGLMReal−Beta0=0.1

◮ Normal (µ, 1), µ varying from 1 to 100.◮ X-axis represents µ values varying from 1 to 100.

Page 80: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

No Significant variable and Contaminating a 10%

0 20 40 60 80 100 120−4

−2

0

2T−value Variations

GLMRobustGLMT−value=2

0 20 40 60 80 100 120−4

−2

0

2B1 Variations

GLMRobustGLMReal−Beta1=−0.0381

0 20 40 60 80 100 120−10

−5

0

5

10B0 Variations

GLMRobustGLMReal−Beta0=0.01

◮ Normal (0, σ2), σ2 varying from 1 to 100.◮ X-axis represents σ2 values varying from 1 to 100.

Page 81: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

No Significant variable and Contaminating a 10%

0 20 40 60 80 100 120−4

−2

0

2T−value Variations

GLMRobustGLMT−value=2

0 20 40 60 80 100 120−4

−2

0

2B1 Variations

GLMRobustGLMReal−Beta1=−0.0381

0 20 40 60 80 100 120−10

0

10

20B0 Variations

GLMRobustGLMReal−Beta0=0.01

◮ Normal (µ, σ2), both varying from 1 to 100.◮ X-axis represents [µ; σ2] values both varying from 1 to 100.

Page 82: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

No Significant variable and Contaminating a 10%

0 20 40 60 80 100 120−2

−1

0

1

2T−value Variations

GLMRobustGLMT−value=2

0 20 40 60 80 100 120−2

−1

0

1B1 Variations

GLMRobustGLMReal−Beta1=−0.0381

0 20 40 60 80 100 1200

5

10

15B0 Variations

GLMRobustGLMReal−Beta0=0.01

◮ Normal (µ, 1), µ varying from 1 to 100.◮ X-axis represents µ values varying from 1 to 100.

Page 83: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Robustness:a fast review

Central Region: bivariate case

Central Region: functional case

General Linear Model (GLM)

Robust GLM, with an explanatory variableContaminating a 5%Contaminating a 10%

Robust GLM, Multiavarite caseContaminating a 5%Contaminating a 10%

Non parametric estimators

Conclusions

Research Lines

Page 84: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Significant variable

0 10 20 30 40 50 60 70 800

0.2

0.4

0.6

0.8

1onsets

0 10 20 30 40 50 60 70 80−2

0

2

4

6Explanatory Signal (HRF)

0 10 20 30 40 50 60 70 80−5

0

5

10Explained Signal (YfMRI)

Y = X1 + β2X2 + · · ·+ β6X6 +N(0, 1)

Page 85: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Significant variable and Contaminating a 5%

0 10 20 30 40 50 60 70 80−5

0

5

10Explained Signal (YfMRI)

0 10 20 30 40 50 60 70 80−5

0

5

10Points on the time where the signal is contaminated

0 10 20 30 40 50 60 70 80−5

0

5

10Signal contaminated

◮ Normal (0, σ2), σ2 varying from 1 to 100.

Page 86: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Significant variable and Contaminating a 5%

0 10 20 30 40 50 60 70 80−5

0

5

10Explained Signal (YfMRI)

0 10 20 30 40 50 60 70 80−5

0

5

10Points on the time where the signal is contaminated

0 10 20 30 40 50 60 70 80−5

0

5

10Signal contaminated

◮ Normal (µ, σ2), both varying from 1 to 100.

Page 87: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Significant variable and Contaminating a 5%

0 10 20 30 40 50 60 70 80−5

0

5

10Explained Signal (YfMRI)

0 10 20 30 40 50 60 70 80−5

0

5

10Points on the time where the signal is contaminated

0 10 20 30 40 50 60 70 80−5

0

5

10

15Signal contaminated

◮ Normal (µ, 1), µ varying from 1 to 100.

Page 88: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Significant variable and Contaminating a 5%

0 10 20 30 40 50 60 70 80 90 100−10

0

10

20

30T−value Variations

GLMRobustGLMMahaGLM

TrimingR2

T−value=2

0 10 20 30 40 50 60 70 80 90 100−1

0

1

2

3B1 Variations

GLMRobustGLMMahaGLM

TrimingR2

Real−Beta1

0 10 20 30 40 50 60 70 80 90 100−10

−5

0

5

10B0 Variations

GLMRobustGLMMahaGLM

TrimingR2

Real−Beta0

◮ Normal (0, σ2), σ2 varying from 1 to 100.◮ X-axis represents σ2 values varying from 1 to 100.

Page 89: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Significant variable and Contaminating a 5%

0 10 20 30 40 50 60 70 80 90 100−10

0

10

20

30T−value Variations

GLMRobustGLMMahaGLM

TrimingR2

T−value=2

0 10 20 30 40 50 60 70 80 90 100−1

0

1

2

3B1 Variations

GLMRobustGLMMahaGLM

TrimingR2

Real−Beta1

0 10 20 30 40 50 60 70 80 90 100−5

0

5

10B0 Variations

GLMRobustGLMMahaGLM

TrimingR2

Real−Beta0

◮ Normal (µ, σ2), both varying from 1 to 100.◮ X-axis represents [µ; σ2] values both varying from 1 to 100.

Page 90: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Significant variable and Contaminating a 5%

0 10 20 30 40 50 60 70 80 90 100−10

0

10

20

30T−value Variations

GLMRobustGLMMahaGLM

TrimingR2

T−value=2

0 10 20 30 40 50 60 70 80 90 1000.8

1

1.2

1.4

1.6B1 Variations

GLMRobustGLMMahaGLM

TrimingR2

Real−Beta1

0 10 20 30 40 50 60 70 80 90 100−2

0

2

4B0 Variations

GLMRobustGLMMahaGLM

TrimingR2

Real−Beta0

◮ Normal (µ, 1), µ varying from 1 to 100.◮ X-axis represents µ values varying from 1 to 100.

Page 91: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

No Significant variable and Contaminating a 5%

0 10 20 30 40 50 60 70 80 90 100−2

−1

0

1

2T−value Variations

GLMRobustGLMMahaGLM

TrimingR2

T−value=2

0 10 20 30 40 50 60 70 80 90 100−2

−1

0

1

2B1 Variations

GLMRobustGLMMahaGLM

TrimingR2

Real−Beta1

0 10 20 30 40 50 60 70 80 90 100−10

−5

0

5

10B0 Variations

GLMRobustGLMMahaGLM

TrimingR2

Real−Beta0

◮ Normal (0, σ2), σ2 varying from 1 to 100.◮ X-axis represents σ2 values varying from 1 to 100.

Page 92: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

No Significant variable and Contaminating a 5%

0 10 20 30 40 50 60 70 80 90 100−2

−1

0

1

2T−value Variations

GLMRobustGLMMahaGLM

TrimingR2

T−value=2

0 10 20 30 40 50 60 70 80 90 100−1

0

1

2B1 Variations

GLMRobustGLMMahaGLM

TrimingR2

Real−Beta1

0 10 20 30 40 50 60 70 80 90 100−5

0

5

10

15B0 Variations

GLMRobustGLMMahaGLM

TrimingR2

Real−Beta0

◮ Normal (µ, σ2), both varying from 1 to 100.◮ X-axis represents [µ; σ2] values both varying from 1 to 100.

Page 93: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

No Significant variable and Contaminating a 5%

0 10 20 30 40 50 60 70 80 90 100−2

−1

0

1

2T−value Variations

GLMRobustGLMMahaGLM

TrimingR2

T−value=2

0 10 20 30 40 50 60 70 80 90 100−0.2

0

0.2

0.4

0.6B1 Variations

GLMRobustGLMMahaGLM

TrimingR2

Real−Beta1

0 10 20 30 40 50 60 70 80 90 100−2

0

2

4B0 Variations

GLMRobustGLMMahaGLM

TrimingR2

Real−Beta0

◮ Normal (µ, 1), µ varying from 1 to 100.◮ X-axis represents µ values varying from 1 to 100.

Page 94: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Significant variable and Contaminating a 10%

0 10 20 30 40 50 60 70 80−5

0

5

10Explained Signal (YfMRI)

0 10 20 30 40 50 60 70 80−5

0

5

10Points on the time where the signal is contaminated

0 10 20 30 40 50 60 70 80−20

−10

0

10Signal contaminated

◮ Normal (0, σ2), σ2 varying from 1 to 100.

Page 95: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Significant variable and Contaminating a 10%

0 10 20 30 40 50 60 70 80−5

0

5

10Explained Signal (YfMRI)

0 10 20 30 40 50 60 70 80−5

0

5

10Points on the time where the signal is contaminated

0 10 20 30 40 50 60 70 80−5

0

5

10

15Signal contaminated

◮ Normal (µ, σ2), both varying from 1 to 100.

Page 96: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Significant variable and Contaminating a 10%

0 10 20 30 40 50 60 70 80−5

0

5

10Explained Signal (YfMRI)

0 10 20 30 40 50 60 70 80−5

0

5

10Points on the time where the signal is contaminated

0 10 20 30 40 50 60 70 80−5

0

5

10

15Signal contaminated

◮ Normal (µ, 1), µ varying from 1 to 100.

Page 97: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Significant variable and Contaminating a 10%

0 10 20 30 40 50 60 70 80 90 100−10

0

10

20

30T−value Variations

GLMRobustGLMMahaGLM

TrimingR2

T−value=2

0 10 20 30 40 50 60 70 80 90 100−5

0

5B1 Variations

GLMRobustGLMMahaGLM

TrimingR2

Real−Beta1

0 10 20 30 40 50 60 70 80 90 100−10

−5

0

5

10B0 Variations

GLMRobustGLMMahaGLM

TrimingR2

Real−Beta0

◮ Normal (0, σ2), σ2 varying from 1 to 100.◮ X-axis represents σ2 values varying from 1 to 100.

Page 98: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Significant variable and Contaminating a 10%

0 10 20 30 40 50 60 70 80 90 100−10

0

10

20

30T−value Variations

GLMRobustGLMMahaGLM

TrimingR2

T−value=2

0 10 20 30 40 50 60 70 80 90 1000

2

4

6

8B1 Variations

GLMRobustGLMMahaGLM

TrimingR2

Real−Beta1

0 10 20 30 40 50 60 70 80 90 100−20

−10

0

10B0 Variations

GLMRobustGLMMahaGLM

TrimingR2

Real−Beta0

◮ Normal (µ, σ2), both varying from 1 to 100.◮ X-axis represents [µ; σ2] values both varying from 1 to 100.

Page 99: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Significant variable and Contaminating a 10%

0 10 20 30 40 50 60 70 80 90 100−10

0

10

20

30T−value Variations

GLMRobustGLMMahaGLM

TrimingR2

T−value=2

0 10 20 30 40 50 60 70 80 90 1000

2

4

6B1 Variations

GLMRobustGLMMahaGLM

TrimingR2

Real−Beta1

0 10 20 30 40 50 60 70 80 90 100−6

−4

−2

0

2B0 Variations

GLMRobustGLMMahaGLM

TrimingR2

Real−Beta0

◮ Normal (µ, 1), µ varying from 1 to 100.◮ X-axis represents µ values varying from 1 to 100.

Page 100: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

No Significant variable and Contaminating a 10%

0 10 20 30 40 50 60 70 80 90 100−4

−2

0

2

4T−value Variations

GLMRobustGLMMahaGLM

TrimingR2

T−value=2

0 10 20 30 40 50 60 70 80 90 100−2

0

2

4B1 Variations

GLMRobustGLMMahaGLM

TrimingR2

Real−Beta1

0 10 20 30 40 50 60 70 80 90 100−10

−5

0

5

10B0 Variations

GLMRobustGLMMahaGLM

TrimingR2

Real−Beta0

◮ Normal (0, σ2), σ2 varying from 1 to 100.◮ X-axis represents σ2 values varying from 1 to 100.

Page 101: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

No Significant variable and Contaminating a 10%

0 10 20 30 40 50 60 70 80 90 100−2

0

2

4T−value Variations

GLMRobustGLMMahaGLM

TrimingR2

T−value=2

0 10 20 30 40 50 60 70 80 90 100−2

0

2

4

6B1 Variations

GLMRobustGLMMahaGLM

TrimingR2

Real−Beta1

0 10 20 30 40 50 60 70 80 90 100−20

−10

0

10B0 Variations

GLMRobustGLMMahaGLM

TrimingR2

Real−Beta0

◮ Normal (µ, σ2), both varying from 1 to 100.◮ X-axis represents [µ; σ2] values both varying from 1 to 100.

Page 102: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

No Significant variable and Contaminating a 10%

0 10 20 30 40 50 60 70 80 90 100−2

0

2

4T−value Variations

GLMRobustGLMMahaGLM

TrimingR2

T−value=2

0 10 20 30 40 50 60 70 80 90 1000

2

4

6B1 Variations

GLMRobustGLMMahaGLM

TrimingR2

Real−Beta1

0 10 20 30 40 50 60 70 80 90 100−6

−4

−2

0

2B0 Variations

GLMRobustGLMMahaGLM

TrimingR2

Real−Beta0

◮ Normal (µ, 1), µ varying from 1 to 100.◮ X-axis represents µ values varying from 1 to 100.

Page 103: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Robustness:a fast review

Central Region: bivariate case

Central Region: functional case

General Linear Model (GLM)

Robust GLM, with an explanatory variableContaminating a 5%Contaminating a 10%

Robust GLM, Multiavarite caseContaminating a 5%Contaminating a 10%

Non parametric estimators

Conclusions

Research Lines

Page 104: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Non-parametric estimation for Σ

Pearson correlation coefficientThe Pearson correlation matrix is related to the covariancematrix by

ρp(i, j) =C(i, j)

C(i, i)C(j, j)(1)

Our methodology basically changes in (1) the Pearsoncorrelation coefficient ρp(i, j) by a non parametric correlationcoefficient as:

Page 105: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Non-parametric estimation for Σ

Pearson correlation coefficientThe Pearson correlation matrix is related to the covariancematrix by

ρp(i, j) =C(i, j)

C(i, i)C(j, j)(1)

Our methodology basically changes in (1) the Pearsoncorrelation coefficient ρp(i, j) by a non parametric correlationcoefficient as:

◮ Kendall ρk(i, j)

◮ Spearman ρs(i, j)

Page 106: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Non-parametric estimation for Σ

Pearson correlation coefficientThe Pearson correlation matrix is related to the covariancematrix by

ρp(i, j) =C(i, j)

C(i, i)C(j, j)(1)

Our methodology basically changes in (1) the Pearsoncorrelation coefficient ρp(i, j) by a non parametric correlationcoefficient as:

◮ Kendall ρk(i, j)

◮ Spearman ρs(i, j)

◮ Robust version ρr(i, j)

Page 107: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Kendall

ρk(i, j) =2 ∗ (Nc −Nd)

N(N − 1)

Spearman

ρs(i, j) = 1−6∑

d2in(n2

− 1)

Robust version

ρr(i, j) =Cr(i, j)

Cr(i, i)Cr(j, j)

where,

Cr(i, j) = median{(xi − median(xi))(xi − median(xi))}

Page 108: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Non-parametric estimation for Σ

0 0.2 0.4 0.6 0.8 1−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

Figure: ρp = 0.81, ρk = ρs = ρr = 1

Page 109: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Non-parametric estimation for Σ

−4 −2 0 2 4 6 8 10 12−4

−2

0

2

4

6

8

10

12

Figure: ρp = 0.86, ρk = 0.15, ρs = 0.23, ρr = 0.005

Page 110: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Non-parametric estimation for Σ

−4 −2 0 2 4 6 8 10 12−4

−3

−2

−1

0

1

2

3

4

Figure: ρp = 0.61, ρk = 0.77, ρs = 0.89, ρr = 0.83

Page 111: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Non-parametric estimation for Σ

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

Figure: ρp = 0.94, ρk = 0.85, ρs = 0.93, ρr = 0.92

Page 112: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Robustness:a fast review

Central Region: bivariate case

Central Region: functional case

General Linear Model (GLM)

Robust GLM, with an explanatory variableContaminating a 5%Contaminating a 10%

Robust GLM, Multiavarite caseContaminating a 5%Contaminating a 10%

Non parametric estimators

Conclusions

Research Lines

Page 113: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Conclusions

◮ We have introduced the fMRI problem and the importance ofusing robust statistical techniques to study the brain

Page 114: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Conclusions

◮ We have introduced the fMRI problem and the importance ofusing robust statistical techniques to study the brain

◮ A fast review of different approaches to estimate thecovariance matrix has been made

Page 115: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Conclusions

◮ We have introduced the fMRI problem and the importance ofusing robust statistical techniques to study the brain

◮ A fast review of different approaches to estimate thecovariance matrix has been made

◮ The GLM is quite sensitive in the presence of outliers

Page 116: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Conclusions

◮ We have introduced the fMRI problem and the importance ofusing robust statistical techniques to study the brain

◮ A fast review of different approaches to estimate thecovariance matrix has been made

◮ The GLM is quite sensitive in the presence of outliers◮ The fMRI based on GLM has very hight false discovery rate.

Page 117: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Conclusions

◮ We have introduced the fMRI problem and the importance ofusing robust statistical techniques to study the brain

◮ A fast review of different approaches to estimate thecovariance matrix has been made

◮ The GLM is quite sensitive in the presence of outliers◮ The fMRI based on GLM has very hight false discovery rate.◮ The estimations introduced here improve the performance

of GLM

Page 118: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Robustness:a fast review

Central Region: bivariate case

Central Region: functional case

General Linear Model (GLM)

Robust GLM, with an explanatory variableContaminating a 5%Contaminating a 10%

Robust GLM, Multiavarite caseContaminating a 5%Contaminating a 10%

Non parametric estimators

Conclusions

Research Lines

Page 119: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Research lines

◮ Find good estimations for the variance and covariancematrix and its properties

Page 120: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Research lines

◮ Find good estimations for the variance and covariancematrix and its properties

◮ Implement the estimations to the fMRI problem

Page 121: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Research lines

◮ Find good estimations for the variance and covariancematrix and its properties

◮ Implement the estimations to the fMRI problem◮ Find some criterium to know the optimal cut-off in the

robust estimations

Page 122: Robust and Nonparametric Statistical Tools for Big Data in ... · Example Table: Data from N(0,1). Contaminated data 0.354 -0.838 0.1204 -0.353 -0.3853 2.1261 -0.268-0.16 -1.038 -0.273

Research lines

◮ Find good estimations for the variance and covariancematrix and its properties

◮ Implement the estimations to the fMRI problem◮ Find some criterium to know the optimal cut-off in the

robust estimations◮ To design estimators with a good computational

performance in high dimensional data and big data