36
Theil-Sen Estimator An alternative to least squares regression Srikanth K S talegari.wikidot.com [email protected] Document license: Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License Srikanth K S (talegari.wikidot.com) Theil-Sen Estimator 1 / 14

Theil-Sen Estimator An alternative to least squares regressiontalegari.wdfiles.com/local--files/files:_nothing/theil_free.pdf · Theil-Sen Estimator An alternative to least squares

Embed Size (px)

Citation preview

Page 1: Theil-Sen Estimator An alternative to least squares regressiontalegari.wdfiles.com/local--files/files:_nothing/theil_free.pdf · Theil-Sen Estimator An alternative to least squares

Theil-Sen EstimatorAn alternative to least squares regression

Srikanth K S

[email protected]

Document license: Creative Commons

Attribution-NonCommercial-ShareAlike 3.0 Unported License

Srikanth K S (talegari.wikidot.com) Theil-Sen Estimator 1 / 14

Page 2: Theil-Sen Estimator An alternative to least squares regressiontalegari.wdfiles.com/local--files/files:_nothing/theil_free.pdf · Theil-Sen Estimator An alternative to least squares

We discuss,

Basic idea of the classical Least Squares Regression in single predictor case.

Disadvantages of Least Squares Regression with practical problems.

Theil-Sen Estimator

Relative advantages of Theil-Sen Estimator over

Least Squares Regression .

An application: Using Theil-Sen Estimator as a outlier detection tool

Improvements over Theil-Sen Estimator

Srikanth K S (talegari.wikidot.com) Theil-Sen Estimator 2 / 14

Page 3: Theil-Sen Estimator An alternative to least squares regressiontalegari.wdfiles.com/local--files/files:_nothing/theil_free.pdf · Theil-Sen Estimator An alternative to least squares

We discuss,

Basic idea of the classical Least Squares Regression in single predictor case.

Disadvantages of Least Squares Regression with practical problems.

Theil-Sen Estimator

Relative advantages of Theil-Sen Estimator over

Least Squares Regression .

An application: Using Theil-Sen Estimator as a outlier detection tool

Improvements over Theil-Sen Estimator

Srikanth K S (talegari.wikidot.com) Theil-Sen Estimator 2 / 14

Page 4: Theil-Sen Estimator An alternative to least squares regressiontalegari.wdfiles.com/local--files/files:_nothing/theil_free.pdf · Theil-Sen Estimator An alternative to least squares

We discuss,

Basic idea of the classical Least Squares Regression in single predictor case.

Disadvantages of Least Squares Regression with practical problems.

Theil-Sen Estimator

Relative advantages of Theil-Sen Estimator over

Least Squares Regression .

An application: Using Theil-Sen Estimator as a outlier detection tool

Improvements over Theil-Sen Estimator

Srikanth K S (talegari.wikidot.com) Theil-Sen Estimator 2 / 14

Page 5: Theil-Sen Estimator An alternative to least squares regressiontalegari.wdfiles.com/local--files/files:_nothing/theil_free.pdf · Theil-Sen Estimator An alternative to least squares

We discuss,

Basic idea of the classical Least Squares Regression in single predictor case.

Disadvantages of Least Squares Regression with practical problems.

Theil-Sen Estimator

Relative advantages of Theil-Sen Estimator over

Least Squares Regression .

An application: Using Theil-Sen Estimator as a outlier detection tool

Improvements over Theil-Sen Estimator

Srikanth K S (talegari.wikidot.com) Theil-Sen Estimator 2 / 14

Page 6: Theil-Sen Estimator An alternative to least squares regressiontalegari.wdfiles.com/local--files/files:_nothing/theil_free.pdf · Theil-Sen Estimator An alternative to least squares

We discuss,

Basic idea of the classical Least Squares Regression in single predictor case.

Disadvantages of Least Squares Regression with practical problems.

Theil-Sen Estimator

Relative advantages of Theil-Sen Estimator over

Least Squares Regression .

An application: Using Theil-Sen Estimator as a outlier detection tool

Improvements over Theil-Sen Estimator

Srikanth K S (talegari.wikidot.com) Theil-Sen Estimator 2 / 14

Page 7: Theil-Sen Estimator An alternative to least squares regressiontalegari.wdfiles.com/local--files/files:_nothing/theil_free.pdf · Theil-Sen Estimator An alternative to least squares

We discuss,

Basic idea of the classical Least Squares Regression in single predictor case.

Disadvantages of Least Squares Regression with practical problems.

Theil-Sen Estimator

Relative advantages of Theil-Sen Estimator over

Least Squares Regression .

An application: Using Theil-Sen Estimator as a outlier detection tool

Improvements over Theil-Sen Estimator

Srikanth K S (talegari.wikidot.com) Theil-Sen Estimator 2 / 14

Page 8: Theil-Sen Estimator An alternative to least squares regressiontalegari.wdfiles.com/local--files/files:_nothing/theil_free.pdf · Theil-Sen Estimator An alternative to least squares

We discuss,

Basic idea of the classical Least Squares Regression in single predictor case.

Disadvantages of Least Squares Regression with practical problems.

Theil-Sen Estimator

Relative advantages of Theil-Sen Estimator over

Least Squares Regression .

An application: Using Theil-Sen Estimator as a outlier detection tool

Improvements over Theil-Sen Estimator

Srikanth K S (talegari.wikidot.com) Theil-Sen Estimator 2 / 14

Page 9: Theil-Sen Estimator An alternative to least squares regressiontalegari.wdfiles.com/local--files/files:_nothing/theil_free.pdf · Theil-Sen Estimator An alternative to least squares

Least Squares Regression

We intend to fit a line to get a approximate idea of the trend.

Data −→ set of pairs (xi , yi ), 1 ≤ i ≤ n.

xi ’s −→ the predictor

yi ’s −→ the response.

We intend to find b0 and b1 such that yi := E (yi |xi ) = b0 + xib1 and∑ni=1(yi − yi )

2 (the residual) is minimum.

Standard optimization techniques yield,

b1 =SxySxx

=

∑(xi − x)(yi − y)∑

(xi − x)2, b0 = y − b1x

Srikanth K S (talegari.wikidot.com) Theil-Sen Estimator 3 / 14

Page 10: Theil-Sen Estimator An alternative to least squares regressiontalegari.wdfiles.com/local--files/files:_nothing/theil_free.pdf · Theil-Sen Estimator An alternative to least squares

Least Squares Regression

We intend to fit a line to get a approximate idea of the trend.

Data −→ set of pairs (xi , yi ), 1 ≤ i ≤ n.

xi ’s −→ the predictor

yi ’s −→ the response.

We intend to find b0 and b1 such that yi := E (yi |xi ) = b0 + xib1 and∑ni=1(yi − yi )

2 (the residual) is minimum.

Standard optimization techniques yield,

b1 =SxySxx

=

∑(xi − x)(yi − y)∑

(xi − x)2, b0 = y − b1x

Srikanth K S (talegari.wikidot.com) Theil-Sen Estimator 3 / 14

Page 11: Theil-Sen Estimator An alternative to least squares regressiontalegari.wdfiles.com/local--files/files:_nothing/theil_free.pdf · Theil-Sen Estimator An alternative to least squares

Least Squares Regression

We intend to fit a line to get a approximate idea of the trend.

Data −→ set of pairs (xi , yi ), 1 ≤ i ≤ n.

xi ’s −→ the predictor

yi ’s −→ the response.

We intend to find b0 and b1 such that yi := E (yi |xi ) = b0 + xib1 and∑ni=1(yi − yi )

2 (the residual) is minimum.

Standard optimization techniques yield,

b1 =SxySxx

=

∑(xi − x)(yi − y)∑

(xi − x)2, b0 = y − b1x

Srikanth K S (talegari.wikidot.com) Theil-Sen Estimator 3 / 14

Page 12: Theil-Sen Estimator An alternative to least squares regressiontalegari.wdfiles.com/local--files/files:_nothing/theil_free.pdf · Theil-Sen Estimator An alternative to least squares

Some examples

(lsr07) (lsr11)

(lsrhetero)

lsr07 - This is a fairly good fit.

lsr11 - Outliers on the left top seem to have affected the fit.

lsrhetero - The the difference in the variance among different xi ’s has adverslyaffected the fit.

Srikanth K S (talegari.wikidot.com) Theil-Sen Estimator 4 / 14

Page 13: Theil-Sen Estimator An alternative to least squares regressiontalegari.wdfiles.com/local--files/files:_nothing/theil_free.pdf · Theil-Sen Estimator An alternative to least squares

Some examples

(lsr07) (lsr11)

(lsrhetero)

lsr07 - This is a fairly good fit.

lsr11 - Outliers on the left top seem to have affected the fit.

lsrhetero - The the difference in the variance among different xi ’s has adverslyaffected the fit.

Srikanth K S (talegari.wikidot.com) Theil-Sen Estimator 4 / 14

Page 14: Theil-Sen Estimator An alternative to least squares regressiontalegari.wdfiles.com/local--files/files:_nothing/theil_free.pdf · Theil-Sen Estimator An alternative to least squares

Least Squares Regression is sensitive to ...

1 Outliers affect the fit adversely. Any method based on mean is bound to beaffected by the outlier.

(lsr11) (lsr11mod)

2 Heteroscedasticity - If the variance among yi values corresponding todifferent xi values differ, fit is affected.

Srikanth K S (talegari.wikidot.com) Theil-Sen Estimator 5 / 14

Page 15: Theil-Sen Estimator An alternative to least squares regressiontalegari.wdfiles.com/local--files/files:_nothing/theil_free.pdf · Theil-Sen Estimator An alternative to least squares

Least Squares Regression is sensitive to ...

1 Outliers affect the fit adversely. Any method based on mean is bound to beaffected by the outlier.

(lsr11) (lsr11mod)

2 Heteroscedasticity - If the variance among yi values corresponding todifferent xi values differ, fit is affected.

Srikanth K S (talegari.wikidot.com) Theil-Sen Estimator 5 / 14

Page 16: Theil-Sen Estimator An alternative to least squares regressiontalegari.wdfiles.com/local--files/files:_nothing/theil_free.pdf · Theil-Sen Estimator An alternative to least squares

Least Squares Regression is sensitive to ...

1 Outliers affect the fit adversely. Any method based on mean is bound to beaffected by the outlier.

(lsr11) (lsr11mod)

2 Heteroscedasticity - If the variance among yi values corresponding todifferent xi values differ, fit is affected.

Srikanth K S (talegari.wikidot.com) Theil-Sen Estimator 5 / 14

Page 17: Theil-Sen Estimator An alternative to least squares regressiontalegari.wdfiles.com/local--files/files:_nothing/theil_free.pdf · Theil-Sen Estimator An alternative to least squares

Theil-Sen Estimator

offers a non-parametric (distribution-free) method to find a fit for heteroscedasticdata with outliers1. It is named after Henri Theil(1950) and Pranab K.Sen(1968).

Computation: Let (x1, y1), . . . , (xn, yn) be the data points.

1 Find the slope of line connecting each pair of points2.

2 The median m among the slopes is the slope of the ‘fit’ line.The median ofthe set {yi −mxi | 1 ≤ i ≤ n} gives the intercept c the ‘fit’ line.

It competes well against non-robust least squares even for normallydistributed data in terms of statistical power. It has been called ”themost popular nonparametric technique for estimating a linear trend”(source:wikipedia).

1allows 29% corruption of the data2unless they have the same x-coordinate

Srikanth K S (talegari.wikidot.com) Theil-Sen Estimator 6 / 14

Page 18: Theil-Sen Estimator An alternative to least squares regressiontalegari.wdfiles.com/local--files/files:_nothing/theil_free.pdf · Theil-Sen Estimator An alternative to least squares

Theil-Sen Estimator

offers a non-parametric (distribution-free) method to find a fit for heteroscedasticdata with outliers1. It is named after Henri Theil(1950) and Pranab K.Sen(1968).

Computation: Let (x1, y1), . . . , (xn, yn) be the data points.

1 Find the slope of line connecting each pair of points2.

2 The median m among the slopes is the slope of the ‘fit’ line.The median ofthe set {yi −mxi | 1 ≤ i ≤ n} gives the intercept c the ‘fit’ line.

It competes well against non-robust least squares even for normallydistributed data in terms of statistical power. It has been called ”themost popular nonparametric technique for estimating a linear trend”(source:wikipedia).

1allows 29% corruption of the data2unless they have the same x-coordinate

Srikanth K S (talegari.wikidot.com) Theil-Sen Estimator 6 / 14

Page 19: Theil-Sen Estimator An alternative to least squares regressiontalegari.wdfiles.com/local--files/files:_nothing/theil_free.pdf · Theil-Sen Estimator An alternative to least squares

Comparing fits of Least Squares Regression and Theil-Sen Estimator

(07lsr) (11lsr)

(11lsrmod) (lsrhetero)

Srikanth K S (talegari.wikidot.com) Theil-Sen Estimator 7 / 14

Page 20: Theil-Sen Estimator An alternative to least squares regressiontalegari.wdfiles.com/local--files/files:_nothing/theil_free.pdf · Theil-Sen Estimator An alternative to least squares

A Summary of comparative advantages

Condition Least Squares regression Theil-Sen EstimatorPresence ofOutliers

Not great Handles well.

Heteroscedasticity Not great Handles Well.

RobustnessEven the presence of asingle outlier affects the fit.

allows 29% data corruption. That is,with a sufficiently large sample size,about 29% of the points must bealtered to make the estimatearbitrarily large or small.

Confidence interval(95%)

Based on Normal or student’s T.This might not give an accurate CI.

The middle 95% of the slopesform the CI

Complexity O(n)O(n2), althoughrandomized algorithms reduce itto O(n log n)

Asymptotic efficiency(sample size for areasonable fit)

Not great. High.

Srikanth K S (talegari.wikidot.com) Theil-Sen Estimator 8 / 14

Page 21: Theil-Sen Estimator An alternative to least squares regressiontalegari.wdfiles.com/local--files/files:_nothing/theil_free.pdf · Theil-Sen Estimator An alternative to least squares

Theil-Sen Estimator to detect outliers and fit

Detecting outliers in a 2D (and higher) non-trivial task. We use

Theil-Sen Estimator ’s insensitivity to outliers to obtain a two-stepped process toremove outliers and get a better fit.

1 Identify points that lie at a large3 distance from the theil-sen line as outliers.

2 Compute Theil-Sen Estimator (or Least Squares Regression ) after

removing this data.

3farther than median + interquantilerangeSrikanth K S (talegari.wikidot.com) Theil-Sen Estimator 9 / 14

Page 22: Theil-Sen Estimator An alternative to least squares regressiontalegari.wdfiles.com/local--files/files:_nothing/theil_free.pdf · Theil-Sen Estimator An alternative to least squares

Theil-Sen Estimator to detect outliers and fit

Detecting outliers in a 2D (and higher) non-trivial task. We use

Theil-Sen Estimator ’s insensitivity to outliers to obtain a two-stepped process toremove outliers and get a better fit.

1 Identify points that lie at a large3 distance from the theil-sen line as outliers.

2 Compute Theil-Sen Estimator (or Least Squares Regression ) after

removing this data.

3farther than median + interquantilerangeSrikanth K S (talegari.wikidot.com) Theil-Sen Estimator 9 / 14

Page 23: Theil-Sen Estimator An alternative to least squares regressiontalegari.wdfiles.com/local--files/files:_nothing/theil_free.pdf · Theil-Sen Estimator An alternative to least squares

Theil-Sen Estimator to detect outliers and fit

Detecting outliers in a 2D (and higher) non-trivial task. We use

Theil-Sen Estimator ’s insensitivity to outliers to obtain a two-stepped process toremove outliers and get a better fit.

1 Identify points that lie at a large3 distance from the theil-sen line as outliers.

2 Compute Theil-Sen Estimator (or Least Squares Regression ) after

removing this data.

3farther than median + interquantilerangeSrikanth K S (talegari.wikidot.com) Theil-Sen Estimator 9 / 14

Page 24: Theil-Sen Estimator An alternative to least squares regressiontalegari.wdfiles.com/local--files/files:_nothing/theil_free.pdf · Theil-Sen Estimator An alternative to least squares

lsr11mod

(lsr11mod)

- - - - Theil-Sen line before outlier removal —– Theil-Sen line after outlier removal Outlier Points

- - - - Least squares line before outlier removal —– Least squares line after outlier removal

Srikanth K S (talegari.wikidot.com) Theil-Sen Estimator 10 / 14

Page 25: Theil-Sen Estimator An alternative to least squares regressiontalegari.wdfiles.com/local--files/files:_nothing/theil_free.pdf · Theil-Sen Estimator An alternative to least squares

lsr02

(lsr02)

- - - - Theil-Sen line before outlier removal —– Theil-Sen line after outlier removal Outlier Points

- - - - Least squares line before outlier removal —– Least squares line after outlier removal

Srikanth K S (talegari.wikidot.com) Theil-Sen Estimator 11 / 14

Page 26: Theil-Sen Estimator An alternative to least squares regressiontalegari.wdfiles.com/local--files/files:_nothing/theil_free.pdf · Theil-Sen Estimator An alternative to least squares

Code

R users can use the package mblm by Lukasz Komsta.

Srikanth K S (talegari.wikidot.com) Theil-Sen Estimator 12 / 14

Page 27: Theil-Sen Estimator An alternative to least squares regressiontalegari.wdfiles.com/local--files/files:_nothing/theil_free.pdf · Theil-Sen Estimator An alternative to least squares

Code

R users can use the package mblm by Lukasz Komsta.

Srikanth K S (talegari.wikidot.com) Theil-Sen Estimator 12 / 14

Page 28: Theil-Sen Estimator An alternative to least squares regressiontalegari.wdfiles.com/local--files/files:_nothing/theil_free.pdf · Theil-Sen Estimator An alternative to least squares

Code

R users can use the package mblm by Lukasz Komsta.

Srikanth K S (talegari.wikidot.com) Theil-Sen Estimator 12 / 14

Page 29: Theil-Sen Estimator An alternative to least squares regressiontalegari.wdfiles.com/local--files/files:_nothing/theil_free.pdf · Theil-Sen Estimator An alternative to least squares

Code

R users can use the package mblm by Lukasz Komsta.

Srikanth K S (talegari.wikidot.com) Theil-Sen Estimator 12 / 14

Page 30: Theil-Sen Estimator An alternative to least squares regressiontalegari.wdfiles.com/local--files/files:_nothing/theil_free.pdf · Theil-Sen Estimator An alternative to least squares

Code

R users can use the package mblm by Lukasz Komsta.

Srikanth K S (talegari.wikidot.com) Theil-Sen Estimator 12 / 14

Page 31: Theil-Sen Estimator An alternative to least squares regressiontalegari.wdfiles.com/local--files/files:_nothing/theil_free.pdf · Theil-Sen Estimator An alternative to least squares

Improvements over Theil-Sen Estimator

A variation of the TheilSen estimator due to Siegel (1982) determines, foreach sample point, the median mi of the slopes of lines through that point,and then determines the overall estimator as the median of these medians. Ahigher breakdown point, 50%, holds for the repeated median estimator ofSiegel.

(hetero)

Srikanth K S (talegari.wikidot.com) Theil-Sen Estimator 13 / 14

Page 32: Theil-Sen Estimator An alternative to least squares regressiontalegari.wdfiles.com/local--files/files:_nothing/theil_free.pdf · Theil-Sen Estimator An alternative to least squares

Improvements over Theil-Sen Estimator

A variation of the TheilSen estimator due to Siegel (1982) determines, foreach sample point, the median mi of the slopes of lines through that point,and then determines the overall estimator as the median of these medians. Ahigher breakdown point, 50%, holds for the repeated median estimator ofSiegel.

(hetero)

Srikanth K S (talegari.wikidot.com) Theil-Sen Estimator 13 / 14

Page 33: Theil-Sen Estimator An alternative to least squares regressiontalegari.wdfiles.com/local--files/files:_nothing/theil_free.pdf · Theil-Sen Estimator An alternative to least squares

A different variant pairs up sample points by the rank of their x-coordinates(the point with the smallest coordinate being paired with the first pointabove the median coordinate, etc.) and computes the median of the slopes ofthe lines determined by these pairs of points.

Variations of the Theil-Sen estimator based on weighted medians have alsobeen studied, based on the principle that pairs of samples whosex-coordinates differ more greatly are more likely to have an accurate slopeand therefore should receive a higher weight.

Srikanth K S (talegari.wikidot.com) Theil-Sen Estimator 14 / 14

Page 34: Theil-Sen Estimator An alternative to least squares regressiontalegari.wdfiles.com/local--files/files:_nothing/theil_free.pdf · Theil-Sen Estimator An alternative to least squares

A different variant pairs up sample points by the rank of their x-coordinates(the point with the smallest coordinate being paired with the first pointabove the median coordinate, etc.) and computes the median of the slopes ofthe lines determined by these pairs of points.

Variations of the Theil-Sen estimator based on weighted medians have alsobeen studied, based on the principle that pairs of samples whosex-coordinates differ more greatly are more likely to have an accurate slopeand therefore should receive a higher weight.

Srikanth K S (talegari.wikidot.com) Theil-Sen Estimator 14 / 14

Page 35: Theil-Sen Estimator An alternative to least squares regressiontalegari.wdfiles.com/local--files/files:_nothing/theil_free.pdf · Theil-Sen Estimator An alternative to least squares

References

1 Wikipedia Page(http://en.wikipedia.org/wiki/Theil%E2%80%93Sen estimator)

2 Some results on extensions and modications of the TheilSen regressionestimator - Rand R. Wilcox (British Journal of Mathematical and StatisticalPsychology (2004), 57, 265280)

3 The Theil-Sen Estimators In Linear Regression - Hanxiang Peng

4 Basic Statistics: Understanding Conventional Methods and Modern Insights -Rand Wilcox

Srikanth K S (talegari.wikidot.com) Theil-Sen Estimator 15 / 14

Page 36: Theil-Sen Estimator An alternative to least squares regressiontalegari.wdfiles.com/local--files/files:_nothing/theil_free.pdf · Theil-Sen Estimator An alternative to least squares

Thank you.

The presentation is available athttps://sites.google.com/a/cmrit.ac.in/srikanth/talks

Srikanth K S (talegari.wikidot.com) Theil-Sen Estimator 16 / 14