31
Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami

Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami

Embed Size (px)

Citation preview

Page 1: Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami

Hardness of Learning Halfspaces with Noise

Prasad Raghavendra

Advisor

Venkatesan Guruswami

Prasad Raghavendra
Larger Name
Page 2: Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami

Spam Problem10 Million Lottery Cheap Pharmacy Junk Is Spam

YES YES NO YES NO SPAM

NO YES YES NO YES NOT SPAM

YES YES YES YES YES SPAM

NO NO NO YES YES SPAM

YES NO YES NO YES NOT SPAM

NO YES NO NO NO SPAM

1

1

0

1

0

2 X 1 + 3 X 1 + 3 X 0 + 1 X 1 + 7 X 0 = 6

6 > 3Output SPAM

PERCEPTRON

32

1

33

7

Page 3: Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami

Halfspace Learning Problem

Input:Training Samples

Vectors : W1,W2,…Wm {-1,1 }n

Labels : l1, l2,…lm {-1,1}

+

++

+

+

+

+

+

--

-

--

-

--

X

Y

Output: Separating Halfspace:(A, θ)

A ∙ Wi < θ if li =-1A ∙ Wi ≥ θ if li =1

θ - Threshold

SPAM

NOT SPAM

Prasad Raghavendra
Light green is a bad color MakSpend more timeMake vector notationMake more time
Page 4: Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami

Perspective

• Perceptron classifiers are the simplest neural networks – widely used for classification.

• Perceptron learning algorithms can learn if the data is perfectly separated.

+

++

+

+

+

+

+

--

-

--

-

--

X

-+

-

+

SPAM

NOT SPAM

-

+

+

-

Page 5: Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami

Inseparability

• Who said Halfspaces can classify SPAM vs NOT SPAM?

Data is inherently inseparable- Agnostic Learning

• Even if data is separable, what about Noise? inherent in many forms of data

PAC learning

Page 6: Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami

In Presence of Noise

Agreement : fraction of the examples classified correctly

+

++

+

+

+

+

+

--

-

--

-

--

X

Y

-+

-

+

Classifies correctly 16 of the 20 examples :

Agreement = 0.8 or 80%

Classifies correctly 16 of the 20 examples :

Agreement = 0.8 or 80%

‘Find the hyperplane that maximizes the agreement with training examples’

Halfspace Maximum Agreement (HSMA)

Problem

Prasad Raghavendra
HSMA Halfspace expand it
Page 7: Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami

Related Work : Positive Results

Random Classification Noise • [Blum-Freize-Kannan-Vempala 96] : a PAC learning

algorithm that outputs a decision list of halfspaces• [Cohen 97] : a proper learning algorithm(outputs a

halfspace) for learning halfspacesDistribution of examples• [Kalai-Klivans-Mansour-Servedio 05] : an algorithm

that finds a close to optimal halfspace when examples are from uniform or any log-concave distribution.

Each label flipped with probability less than 1/2

Prasad Raghavendra
We assume arbitrary noise and arbitrary examples, there are algorithms either one of them is randomly generated.
Page 8: Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami

Related Work : Negative Results

• [Amaldi-Kann 98, Ben-David-Eiron-Long 92] HSMA is NP-hard to approximate with some constant factor [261/262, 415/418]

• [Bshouty-Burroghs 02] HSMA is NP-hard to approximate better than 84/85

• [Arora-Babai-Stern-Sweedyk 97, Amaldi-Kann 98] NP-hard to minimize disagreements within a factor of 2O(log n)

1

Page 9: Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami

Open Problem

Given that 99.9% of the examples are correct :No algorithm known that finds a halfspace with

agreement of 51%No hardness result ruled out getting an

agreement of 99%

• Closing this gap was stated as an open problem by [Blum-Frieze-Kannan-Vempala 96] • Highlighted in recent work by [Feldman 06] on (1-ε,1/2 +δ) tight hardness of learning monomials

Prasad Raghavendra
blue author namemake citation
Prasad Raghavendra
Weakest algorithm, weakest hardness.. emphasize
Page 10: Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami

Our Result

For any ε,δ > 0 , given a set of training examples, it is NP-hard to distinguish between following two cases:

• There is a halfspace with agreement 1- ε• No halfspace has agreement greater than ½ + δ

Even with 99.9% of examples non-noisy, the best we can do is output a random/trivial halfspace!

Page 11: Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami

Remarks

• [Feldman-Gopalan-Khot-Ponnuswami 06] independently showed a similar result.– Our Hardness result holds even for boolean examples

{-1,1}n (their result holds for Rn)– [Feldman et al.]’s hardness result gives stronger

hardness in the sub-constant regime

• We also show: Given a set of linear equations over integers that is

1-ε satisfiable it is NP-hard to find an assignment that satisfies more than δ fraction of the equations

Prasad Raghavendra
Say Hope the problem is clear, and lets get to the proof.
Page 12: Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami

Linear InequalitiesLet halfspace be

a1x1 + a2x2 +… +an xn ≥ θ

SupposeW1 = (-1, 1, -1, 1)

l1 = 1

Constraint :a1(-1) + a2(1) + a3(-1)+ a4(1) ≥ θ

Learning a Halfspace Solving a system of linear inequalities

UnknownsA = (a1,a2,a3,a4)

θ a1 + a2 + a3 + a4 ≥ θ

a1 + a2 + a3 - a4 < θ

a1 + a2 - a3 + a4 < θ

a1 + a2 - a3 + a4 ≥ θ

a1 - a2 + a3 - a4 ≥ θ

a1 - a2 + a3 + a4 < θ

a1 + a2 - a3 - a4 < θ

Page 13: Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami

Label Cover Problem

U, V : set of verticesE : set of edges{1,2… R} : set of labels πe: constraint on edge e

An assignment A satisfies an edge e = (u,v) E if

πe (A(u)) = A(v)

123..R

123..Rπe

U V

u

v

Find an assignment A that satisfies maximum number of edges

3

π e (3)=2

7

5

2

3

3

1

4

1

2

5

6

Page 14: Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami

Hardness of Label Cover

There exists γ > 0 such thatGiven a label cover instance Г =(U,V,E,R,π), it is

NP-hard to distinguish between :• Г is completely satisfiable• No assignment satisfies more than 1/Rγ

fraction of the edges.

[Raz 98]

Page 15: Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami

a1 + a2 + a3 + a4 ≥ θ

a1 + a2 + a3 - a4 < θ

a1 + a2 - a3 + a4 < θ

a1 + a2 - a3 + a4 ≥ θ

a1 - a2 + a3 - a4 ≥ θ

a1 - a2 + a3 + a4 < θ

a1 + a2 - a3 - a4 < θ

Aim

U V

Variables : a1,a2,a3,a4, θ

SATISFIABLE1/Rγ SATISFIABLE

Homogenous inequalities with +1, -1 coefficients

Prasad Raghavendra
Learning to linequatlities relation/Or Just add a slide Make w
Page 16: Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami

Variables

For each vertex u,R variables : u1,u2,…,uR

U V

123..R

If u is assigned label k then uk = 1

and uj = 0 for all j ≠k

Prasad Raghavendra
same u for vertex and variable
Page 17: Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami

Equation Tuples

All vertices are assigned exactly one label

123..R

123..Rπe

u

v

Most of the variables are zero

For all uu1 + u2 +.. uR = 1

For all u,vu1 + u2 +.. uR - (v1 + v2 +.. vR) = 0

For all constraints πe

all 1 ≤ k ≤ R∑ui = vk summation over all i, πe(i) = k

u1 – v1 = 0u2 + u3 – v2 = 0

Pick randomly t variables ui

ui = 0OVER ALL RANDOM CHOICES

EQUATION TUPLE

Page 18: Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami

There is an assignment that satisfies most of the equation tuples

Equation Tuples

SATISFIABLE1/Rγ SATISFIABLE

Suppose u2 + u3 – v2 = 0 is an equation|u2 + u3 – v2| > ε (u1 + u2 +.. uR )

Scaling Factor : u1 + u2 +.. uR

Page 19: Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami

Next Stepu1 – v1 = 0u2 + u3 – v2 = 0u1 + u2 + u3 – v1 –v2 – v3 = 0u1 = 0 u3 + v1 – v2 = 0

One Unsatisfied equation Most tuples have C equations that are not even approximately satisfied

•Introduce Several copies of the variables•Add consistency checks between the different copies of the same variable

Each variable appears exactly

once in a tuple, with coefficient +1, -1

Prasad Raghavendra
font size?
Prasad Raghavendra
font ize?emphasize C
Page 20: Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami

Recap

SATISFIABLE1/Rγ SATISFIABLE

Most tuples have C equations that are not even approximately satisfied

Most tuples are completely satisfied

Each variable appears exactly once

in a tuple, with coeffcient +1, -1

Using linear inequalities distinguish between a tuple that is •Completely Satisfied•Atleast C of its equations are not even approximately satisfied

Page 21: Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami

ObservationA – B < 0 A + B ≥ 0

B > 0 |A| < B

u1 – v1 = 0u4 + u5 – v2 = 0u6 + u2 + u7 – v4 –v5 – v6 = 0u3 = 0 u8 + v3 – v7 = 0

X 1+X 1+X -1+X 1+X -1+

u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7

=

u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7 - u1 + u2 +.. uR < 0u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7 + u1 + u2 +.. uR ≥ 0

Pick one of the equation tuples at random Scaling Factor : u1 + u2 +.. uR

Page 22: Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami

Good Case

u1 – v1 u2 + u3 – v2 u1 + u2 + u3 – v1 –v2 – v3 u1 u3 + v1 – v2

= 0= 0= 0= 0= 0

u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7 = 0

The assignment also satisfies, u1 + u2 +.. uR = 1

BOTH INEQUALITIES SATISFIED

With high probability over the choice of tuples

u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7 - u1 + u2 +.. uR < 0u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7 + u1 + u2 +.. uR ≥ 0

Page 23: Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami

Bad Case

u1 – v1 u2 + u3 – v2 u1 + u2 + u3 – v1 –v2 – v3 u1 u3 + v1 – v2

> ε (u1 + u2 +.. uR )

> ε (u1 + u2 +.. uR )> ε (u1 + u2 +.. uR )

For large enough C ,With high probability over choice of +1,-1 combination,

| u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7 | > (u1 + u2 +.. uR )

ATMOST ONE OF INEQUALITIES SATISFIED

With high probability over choice of equaton tuple,

u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7 - u1 + u2 +.. uR < 0u1 - u2 + u3 +u4+u5-u6-u7-u8–v1 –v2 – v3 +v4+v5+v6+v7 + u1 + u2 +.. uR ≥ 0

Page 24: Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami

For any vector v = (v1 ,v2 ,…vn) with sufficiently many large coordinates(> ε ), at least 1- δ fraction of the vectors u S satisfy

|u v| > 1∙

Interesting Set of Vectors

All possible {-1,1} combinations is exponentially large set. Construct a polynomial size subset S of {-1,1}n such that

Construction using 4-wise independent family and random grouping of coordinates

Page 25: Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami

Construction

V1

V2

V3

V4

V5

V6

V7

> ε> ε

> ε

> ε

-1 1-1 1-1 1 1

∙ -V1 +V2 -V3 +V4 -V5 +V6 +V7

= > 1

Four-wise independent family : some constant probabilityAll 2n combinations : probability close to 1

Page 26: Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami

=S5εε

= L11

=S10

=S2

= S1

εε

εε

ConstructionV1

V2

V3

V4

V5

V6

V7

..

..V89

V99

V100

V101

V102

V103

V1

….V8

V2

..

V5

V100

…V101

V103

V9

…V89

V6

-1..

1

1…-11

1….1

-1…1-1

1

-1

+1

1

εε

εε

εε

εε

εε

εε

εε

εε

εε

εε

εε

εε

All 2n combinations

4-wise independent set

By independence of grouping

By ChernoffBounds

All 2n combinations

Page 27: Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami

Conclusion

• Either an assumption on the distribution of examples or the noise is necessary for efficient halfspace learning algorithms.

• [Raghavendra-Venkatesan] Similar hardness result for learning Support vector machines in presence of adversarial noise.

Prasad Raghavendra
give citation
Page 28: Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami

THANK YOU

Page 29: Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami

Details

• All possible {-1,1} combinations is an exponentially large set.

• No variable should occur more than once in an equation tuple, to ensure that ultimately the inequalities all have coefficients in {-1,1}

Construction using 4-wise independent family and random grouping of coordinates

Use different copies of the variables for different equations, and careful choice of consistency checks

Page 30: Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami

For any vector v = (v1 ,v2 ,…vn) with sufficiently many large coordinates(> ε ), atmost δ fraction of the vectors u S satisfy

|u v| < 1∙

Interesting Set of Vectors

All possible {-1,1} combinations is exponentially large set. Construct a polynomial size subset S of {-1,1}n such that

Construction using 4-wise independent family and random grouping of coordinates

Page 31: Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami

Equation Tuple

u1 – v1 = 0u2 + u3 – v2 = 0u1 + u2 + u3 – v1 –v2 – v3 = 0u1 = 0 u3 + v1 – v2 = 0

ε-SatisfactionAn assignment A is said to ε-satisfy an equation E tuple if it satisfies all the equations in the tuple

u2 + u3 – v2 < ε (u1 + u2 + u3)