1
Robust Regression Kush Bhatia, Prateek Jain and Purushottam Kar Microsoft Research, Bengaluru, INDIA Goal: Perform statistical analysis of large-scale data in the presence of unbounded adversarial corruptions Statistical Analysis with Corrupted Data Cartography Astronomy Face Detection Neuroimaging Regression Analysis Full Paper: http://tinyurl.com/robustreg Problem Formulation: Recover the original curve in the face of a bounded number of adversarial corruptions KEY IDEA Two unknowns: clean set of points , original curve Observation 1: given , finding original curve easy Observation 2: given , finding clean points easy Proposal: can we alternate between estimating and ? Some existing approaches Brute Force Try out all possible sets and estimate using each set Random Selection (RANSAC) Try random sets , estimate using each and choose best L 1 Relaxations TORRENT Thresholding Operator-based Robust RegrEssioN meThod 1. Start with any arbitrary curve and set timer 2. Repeat until convergence i. Create active set using points “close” to ii. Create updated* curve using active set iii. Increment time counter 3. Return final curve Theoretical Guarantees Theorem: TORRENT can resist even an all powerful adversary that corrupts after viewing data and the curve (if < 1/65 th fraction of data is corrupted) In contrast, RANSAC offers no guarantees, and L 1 relaxation based methods give very poor guarantees in the face of all-powerful adversaries TORRENT in action Real Data Experiments On regression analysis tasks, TORRENT is up to 20x faster than leading methods on low, as well as high dimensional data and can tolerate up to 40% corruption! On face recognition tasks, TORRENT is able to recover the correct identity of the person in the presence of as much as 70% corruption! Structured noise 70% S/P noise *4 update variants FC, GD , HYB for low-dimensional and HD for high-dimensional problems Res: 0 Err: 0% Res: 4.3 Err: 65% Res: 5.1 Err: 110% Res: 11.7 Err: 150% Ordinary Least Squares (OLS) Regression Discovers a curve that “best fits” the entire data Favorable cases Failure cases Exact recovery in noiseless case Faithful recovery with white noise Catastrophic failure under adversity Theorem: TORRENT converges to an -optimal solution i.e. after no more than iterations.

Full Paper: // · 2015-08-07 · 2. Repeat until convergence i. Create active set using points “close” to ii. Create updated* curve using active set iii. Increment time counter

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Full Paper: // · 2015-08-07 · 2. Repeat until convergence i. Create active set using points “close” to ii. Create updated* curve using active set iii. Increment time counter

Robust RegressionKush Bhatia, Prateek Jain and Purushottam Kar

Microsoft Research, Bengaluru, INDIA

Goal: Perform statistical analysis of large-scale data in thepresence of unbounded adversarial corruptions

Statistical Analysis with Corrupted Data

Cartography

Astronomy

Face Detection

Neuroimaging

RegressionAnalysis

Full Paper: http://tinyurl.com/robustreg

Problem Formulation: Recover the original curve in the face ofa bounded number of adversarial corruptions

KEY

ID

EATwo unknowns: clean set of points , original curve

Observation 1: given , finding original curve easy

Observation 2: given , finding clean points easy

Proposal: can we alternate between estimating and ?

Some existing approaches

Brute ForceTry out all possible sets and estimate using each set

Random Selection (RANSAC)

Try random sets , estimate using each and choose best

L1 Relaxations

TORRENT

Thresholding Operator-based Robust RegrEssioN meThod

1. Start with any arbitrary curve and set timer2. Repeat until convergence

i. Create active set using points “close” to ii. Create updated* curve using active set iii. Increment time counter

3. Return final curve

Theoretical Guarantees

Theorem: TORRENT can resist even an all powerful adversary that corrupts

after viewing data and the curve (if < 1/65th fraction of data is corrupted)

In contrast, RANSAC offers no guarantees, and L1 relaxation based methods

give very poor guarantees in the face of all-powerful adversaries

TORRENT in action

Real Data Experiments

On regression analysis tasks, TORRENT is up to 20x faster than leading methods on low, as well as high dimensional data and can tolerate up to 40% corruption!

On face recognition tasks, TORRENT is able to recover the correct identity of the person in the presence of as much as 70% corruption!

Structured noise 70% S/P noise

*4 update variants FC, GD, HYB for low-dimensional and HD for high-dimensional problems

Res: 0Err: 0%

Res: 4.3Err: 65%

Res: 5.1Err: 110%

Res: 11.7Err: 150%

Ordinary Least Squares (OLS) Regression

Discovers a curve that “best fits” the entire data

Favorable cases Failure cases

Exact recovery in noiseless case

Faithful recovery with white noise

Catastrophic failure under adversity

Theorem: TORRENT converges to an 𝜖-optimal solution i.e.

after no more than iterations.