60
On the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur (Stanford University)

On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

On the Worst-Case Complexity of the k-Means Method

Sergei Vassilvitskii

David Arthur

(Stanford University)

Page 2: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Given points in split them into similar groups.

Clustering

2

Rd

n k

Page 3: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Clustering Objectives

k-Center:

3

Let be the closest cluster center to . C(x)

min maxx!X

!x " C(x)!

x

Page 4: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Clustering Objectives

k-Median:

4

Let be the closest cluster center to . C(x) x

min!

x!X

!x " C(x)!

Page 5: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Clustering Objectives

k-Median Squared:

5

Let be the closest cluster center to . C(x) x

min!

x!X

!x " C(x)!2

Much more sensitive to outliers

Page 6: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Lloyd’s Method: K-means

Initialize with random clusters

6

Page 7: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Lloyd’s Method: K-means

Assign each point to nearest center

7

Page 8: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Lloyd’s Method: K-means

Recompute optimum centers (means)

8

Page 9: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Lloyd’s Method: K-means

Repeat: Assign points to nearest center

9

Page 10: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Lloyd’s Method: K-means

Repeat: Recompute centers

10

Page 11: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Lloyd’s Method: K-means

Repeat...

11

Page 12: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Lloyd’s Method: K-means

Repeat...Until clustering does not change

12

Page 13: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Analysis

How good is this algorithm?

Finds a local optimum

13

That’s arbitrarily worse than optimal solution

Page 14: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Analysis

How fast is this algorithm?

In practice: VERY fast:

e.g. Digit Recognition dataset

with

Converges after 60 iterations

In theory: Stay Tuned.

14

n = 60, 000, d = 700

Page 15: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Previous Work

Lower Bounds

on the line

in the plane

Upper Bounds:

on the line, spread

Exponential bounds:

15

!(n2)

!(n)

O(n!2) ! =maxx,y !x " y!

minx,y !x " y!

O(kn),O(nkd)

Page 16: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Our Results

Lower Bound: 2!(

!

n)

16

Smoothed Upper Bounds:

is the smoothness factor

Diameter of the pointset

!

D

O

!

n2+2/d

"

D

!

#2

22n/d

$

O

!

nk+2/d

"

D

!

#2$

Page 17: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Rest of the Talk

Lower Bound Sketch

Upper Bound Sketch

Open Problems

17

Page 18: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Lower Bound

General Idea:

Make a “Reset Widget”:

If k-Means takes time on , create a new point set , s.t. k-Means takes time to terminate.

18

t X

X!

2t

Page 19: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Lower Bound: Sketch

Initial Clustering:

t steps

19

Page 20: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Lower Bound: Sketch

With Widget

t steps reset t steps

Page 21: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Lower Bound Details

Three Main Ideas:

Signaling - Recognizing when to start flipping the switch

Resetting - Setting the cluster centers back to original position

Odds & Ends - Clean-up to make the process recursive

21

Page 22: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Signaling

Suppose that when k-Means terminates, there is one cluster center that has never appeared before. We use this as a signal to start the reset sequence.

22

p

Page 23: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Signaling

Suppose that when k-Means terminates, there is one cluster center that has never appeared before. We use this as a signal to start the reset sequence.

23

p

Page 24: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Signaling

Suppose that when k-Means terminates, there is one cluster center that has never appeared before. We use this as a signal to start the reset sequence.

24

p

d ! ε

dBy setting we can control exactly when will switch.

ε

p

Page 25: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Signaling

Suppose that when k-Means terminates, there is one cluster center that has never appeared before. We use this as a signal to start the reset sequence.

25

p

Page 26: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Resetting

k properly placed points can reset the positions of the k current centers

Easy to compute locations of the reset points, so that new cluster centers are placed correctly:

26

current center

intended center

Page 27: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Resetting

k properly placed points can reset the positions of the k current centers

Easy to compute locations of the reset points, so that new cluster centers are placed correctly:

27

current center

intended center

add to clusterto reset mean

Page 28: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Resetting

k properly placed points can reset the positions of the k current centers

Easy to compute locations of the reset points, so that new cluster centers are placed correctly:

28

new center

Page 29: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Resetting

Easy to compute locations of the reset points, so that new cluster centers are placed correctly:

But must avoid accidentally grabbing other points

29

Page 30: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Resetting

Solution: Add two new dimensions

30

x

y

w

z

intended new position

center to reset

Page 31: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Resetting

Solution: Add two new dimensions

31

x

y

w

z

new point added

Page 32: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Resetting

Solution: Add two new dimensions

32

x

y

w

z

new point added

Page 33: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Multi-Signaling

So far have shown how to signal and reset a single cluster. Can use one signal to induce a signal from all clusters.

33

p

Page 34: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Multi-Signaling

All centers are stable before the main signaling has taken place.

34

p

d

d + !

Page 35: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Multi-Signaling

All centers are stable before the main signaling has taken place.

35

p

d

d + !

Page 36: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Multi-Signaling

Due to signaling the center moves away, now all centers absorb points above.

36

p

d + !

>d +

!

Page 37: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Multi-Signaling

Due to signaling the center q moves away, now all centers absorb points above. All clusters have previously unseen centers.

37

p

Page 38: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Put All Pieces Together

Start with a signaling configuration

Transform it, so that all clusters signal

Use the new signal to reset cluster centers (and therefore double the runtime of k-Means)

Ensure the new configuration is signaling

Repeat...

38

Page 39: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Construction in Pictures

Construction:

39Reflected Points

Page 40: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Construction in Pictures

After steps - signal by all clusters

40

t

Page 41: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Construction in Pictures

Main Clusters absorb “catalyst” points. Yellow centers move away

41

Page 42: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Construction in Pictures

The new points added are “reset” points - resetting the original cluster centers.

42

Page 43: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Construction in Pictures

Can ensure “catalyst” points leave the main clusters

43

Page 44: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Construction in Pictures

k-Means runs for another steps. The original centers will be signaling.

44

t

Page 45: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Construction Results

If we repeat the reseting widget construction times:

points in dimensions

clusters

Total running time:

45

r

O(r)

O(r2) O(r)

2!(r)

Page 46: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Construction Remarks

Currently construction has very large spread

Can use more trickery to decrease the spread to be constant, albeit with a blow up in the dimension.

As presented requires specific placement of initial cluster centers, in practice centers chosen randomly from points.

Can make construction work even in this case

Open question:

Can we decrease the dimensionality to constant d?

46

Page 47: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Outline

k-Means Intuition

Lower Bound Sketch

Upper Bound Sketch

Open Problems

47

Page 48: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Smoothed Analysis

Assume each point came from a Gaussian distribution with variance .

Data collection is inherently noisy

Or add some Gaussian noise (effect on final clustering is minimal)

48

!2

Key Fact: Probability mass inside any ball of radius is at most .

!

(!/")d

Page 49: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Potential Function

Use a potential function:

Original Potential at most

Potential decreases every step.

Reassignment reduces

Center recomputation finds optimal for the given partition

49

!(C) =!

x!X

!x " C(x)!2

nD2

x ! C(X)

!

Page 50: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Potential Decrease

Lemma Let be a pointset with optimal center and be any other point then:

50

cS c!

!(c) ! !(c!) = |S|"c ! c!"2

!(c) =!

x!S

(x ! c) · (x ! c)

=∑

x!S

(x ! c") · (x ! c

") + (c ! c") · (c ! c

") + 2(c ! c") · (x ! c

")

= !(c!) + |S|‖c − c!‖ + 2(c − c

!) ·∑

x"S

(x − c!)

=!

x∈S

(x ! c + c∗! c

∗) · (x ! c + c∗! c

∗)

Page 51: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Main Lemma

In a smoothed pointset, fix an . Then with probability at

least for any two clusters and with

optimal centers and we have that:

51

S T

!c(S) " c(T )! #!

2 min(|S|, |T |)

1 ! 22n

!

!

"

"d

! > 0

c(S) c(T )

Page 52: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Proof Sketch

Suppose and . Fix all points except for .

52

|S| < |T | x, x ! S, x "! T

x

To ensure , must lie in a ball of diameter .

x!c(S) " c(T )! # !

|S|!

Since came from a Gaussian of variance this probability is at most .

x

(|S|!"!1)d

!2

Finally, union bound the total error probability over all possible pairs of sets - .22n(!/")d

Page 53: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Potential Drop

At each iteration, examine a cluster whose center changed from to :

Therefore, the potential drops by

After iterations, the algorithm must terminate.

53

S

c c!

!c " c!! #

!

|S|

|S|!2

|S|2!

!2

4n

m =4n2D2

!2

Page 54: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

To Finish Up:

Chose . Then the total probability of

failure is: .

The total running time is

Remark: polynomial for

54

! = "n!

1

d 2!

2n

d

22n(!/")d = 1/n

d = !(n

log n)

m =4n2D2

!2= O

!

n2+2/d

"

D

"

#2

22n/d

$

Page 55: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Upper bound (2)

We used the union bound over all possible sets. However,

due to the geometry, not all sets arise. The total number

of distinct clusters that can appear is .

Carrying the same calculations through we can bound the total number of iterations as:

55

O(nkd)

O

!

nk+2/d

"

D

!

#2$

Page 56: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Remarks

The noise need not be Gaussian, need to avoid large probabilistic point masses.

e.g. Lipschitz conditions are enough.

56

Page 57: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Outline

k-Means Intuition

Lower Bound Sketch

Upper Bound Sketch

Open Problems

57

Page 58: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Conclusion - Lower Bounds

Showed super-polynomial lower bound on the execution time of k-Means:

However - construction requires many dimensions, does not preclude an upper bound

58

O(nd)

Page 59: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Conclusion - Upper Bounds

Can use smoothed analysis to reduce the best known upper bounds for k-Means.

But, is not polynomial for small values of d, or large values of k.

Even with smoothness there is an lower bound, which is never observed in practice.

59

!(n)

Page 60: On the Worst-Case Complexity of the k-Means …theory.stanford.edu/~sergei/slides/kMeans-hour.pdfOn the Worst-Case Complexity of the k-Means Method Sergei Vassilvitskii David Arthur

Thank you

Any Questions?