37
Deepak Turaga 1 , Michalis Vlachos 2 , Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means Cluster Preservation using Quantization Schemes

Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means

Embed Size (px)

Citation preview

Page 1: Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means

Deepak Turaga1, Michalis Vlachos2, Olivier Verscheure1

1IBM T.J. Watson Research Center, NY, USA2IBM Zürich Research Laboratory, Switzerland

On K-Means Cluster Preservation using Quantization Schemes

Page 2: Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means

overview – what we want to do…• Examine under what conditions compression methodologies

retain the clustering outcome• We focus on the K-Means algorithm

k-Means

cluster 1 cluster 2 cluster 3 cluster 1 cluster 2 cluster 3

k-Means

identical clustering

results

original data quantized data

Page 3: Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means

why we want to do that…

• Reduced Storage– The quantized data will take up less space

Page 4: Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means

why we want to do that…

• Reduced Storage– The quantized data will take up less space

• Faster execution– Since the data can be represented in a more

compact form the cluster algorithm will require less runtime

Page 5: Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means

why we want to do that…

• Reduced Storage– The quantized data will take up less space

• Faster execution– Since the data can be represented in a more compact

form the cluster algorithm will take less runtime

• Anonymization/Privacy Preservation– The original values are not disclosed

Page 6: Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means

why we want to do that…

• Reduced Storage– The quantized data will take up less space

• Faster execution– Since the data can be represented in a more compact form the

cluster algorithm will take less runtime

• Anonymization/Privacy Preservation– The original values are not disclosed

• Authentication– encode some message with the quantization

We will achieve the above and still guarantee same results

Page 7: Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means

other cluster preservation techniques

• We do not transform into another space• Space requirements same – no data simplification• Shape preservation

[Oliveira04] S. R. M. Oliveira and O. R. Zaane. Privacy Preservation When Sharing Data For Clustering, 2004[Parameswaran05] R. Parameswaran and D. Blough. A Robust Data Obfuscation Approach for Privacy Preservation of Clustered Data, 2005

original

quantized

Page 8: Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means

K-Means Algorithm:

1. Initialize k clusters (k specified by user) randomly.

2. Repeat until convergence

1. Assign each object to the nearest cluster center.

2. Re-estimate cluster centers.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

k-means overview

Page 9: Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means

-0.5 0 0.5 1 1.5

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

-0.5 0 0.5 1 1.5

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

-0.5 0 0.5 1 1.5

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

-0.5 0 0.5 1 1.5

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

-0.5 0 0.5 1 1.5

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

-0.5 0 0.5 1 1.5

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

-0.5 0 0.5 1 1.5

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

-0.5 0 0.5 1 1.5

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

-0.5 0 0.5 1 1.5

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

k-means example

Page 10: Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means

k-means applications/usage

• Fast pre-clustering

Page 11: Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means

k-means applications/usage

• Fast pre-clustering

• Real-time clustering (eg image, video effects)– Color/Image segmentation

Page 12: Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means

k-means objective function

• Objective: Mininize sum of intra-class variance

Cluster centroid

After some algebraic manipulations

clusters

Dimensions/Time instances 2nd moment 1st moment

Page 13: Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means

k-means objective function

So we can preserve the k-Means outcome if:

clusters

Dimensions/Time instances 2nd moment 1st moment

• We maintain the cluster assignment• We preserve the 1st and 2nd moment of the cluster objects

Page 14: Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means

moment preserving quantization

• 1st moment: average• 2nd (central) moment :

variance• 3rd moment: skewness• 4th moment: kyrtosis

Page 15: Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means

In order to preserve the first and second moment we will use the following quantizer:

g

g

N

NN

g

g

NN

N

Everything below the mean valueis ‘snapped’ here

Everything above the mean valueis ‘snapped’ here

Page 16: Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means

g

g

N

NN

g

g

NN

N

Everything above the mean valueis ‘snapped’ here

Everything below the mean valueis ‘snapped’ here

= 0.2049g

g

N

NN

g

g

NN

N

= -1.4795

-2.4240-0.22380.0581

-0.4246-0.2029-1.5131-1.1264-0.81500.3666

-0.58611.53740.1401

-1.8628-0.4542-0.65210.1033

-0.2206-0.2790-0.7337-0.0645

original -1.4795 0.2049 0.2049 0.2049 0.2049 -1.4795 -1.4795 -1.4795 0.2049 -1.4795 0.2049 0.2049 -1.4795 0.2049 -1.4795 0.2049 0.2049 0.2049 -1.4795 0.2049

quantized

-0.4689average =

-0.4689average =

Page 17: Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means

These are the points for one dimension and for one cluster of objects.

Process is repeated for all dimensions and for all clustersWe have one quantizer per class

Dimension d (or time instance d)

Page 18: Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means

our quantization

• One quantizer per class• The quantized data are binary

Page 19: Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means

our quantization

• The fact the we have 1 quantizer per class suggests that we need to run k-Means once before we quantize

• This is not a shortcoming of the technique as we need to know the cluster boundaries so that we know how much we can simplify the data.

Page 20: Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means

why quantization works?

• Why does the clustering remain same before and after quantization?

Centers do not change (averages remain same)

Page 21: Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means

why quantization works?

• Why does the clustering remain same before and after quantization?

Centers do not change (averages remain same)

Cluster assignment does not change because clusters ‘shrink’due to quantization

Page 22: Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means

will it always work?

• The results will be the same for datasets with well-formed clusters

• Discrepancy of results means that clusters were not that dense

Page 23: Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means

recap

• Use moment preserving quantization to preserve objective function

• Due to cluster shrinkage, cluster assignments will not change

• Identical results for optimal k-Means• One quantizer per class• 1-bit quantizer per dimension

clusters

Dimensions 2nd moment 1st moment

Page 24: Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means

example: shape preservation

Page 25: Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means

example: shape preservation

Page 26: Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means

example: shape preservation

[Bagnall06] A. J. Bagnall, C. A. Ratanamahatana, E. J. Keogh, S. Lonardi, and G. J. Janacek. A Bit Level Representation for Time Series Data Mining with Shape Based Similarity. In Data Min. Knowl. Discov. 13(1), pages 11–40, 2006.

Page 27: Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means

example: cluster preservation

• 3 years Nasdaq stock ticker data• We cluster into k=8 clusters

Confusion Matrix

Page 28: Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means

8

3% mislabeled data after the moment preserving quantization

With Binary Clipping: 80% mislabeled

Clu

ster

cen

ters

1 2

3 4

5 6

7

Page 29: Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means

quantization levels indicate cluster spread

g

g

N

NN

g

g

NN

N

Page 30: Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means

example: label preservation

• 2 datasets– Contours of fish – Contours of leaves

• Clustering and then k-NN voting

Acer platanoides Salix fragilis Tilia Quercus robur

For rotation invariance we use a rotation invariant features

0 20 40 600

1

2

3

4

5

0 10 20 300

5

10

15

20

25

30

35

space-timefrequency

Page 31: Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means

example: label preservation

• Very low mislabeling error for MPQ

• High error rate for Binary Clipping

Page 32: Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means
Page 33: Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means

other nice characteristics

• Low sensitivity to initial centers– Mismatch when starting from different centers is

around 7%

Page 34: Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means

other nice characteristics

• Low sensitivity to initial centers– Mismatch when starting from different centers is

around 7%

• Neighborhood preservation – even though we are not optimizing directly that…– Good results because we are preserving the ‘shape’

of the object

AB

Page 35: Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means

size reduction by a factor of 3when using the quantized scheme

• Compression reduces for increasing K

Page 36: Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means

summary

• 1-bit quantizer per dimension sufficient to preserve kMeans ‘as well as possible’

• Theoretically the results will be identical (under conditions)

• Good ‘shape’ preservation

Future work:• Multi-bit quantization• Multi-dimension quantization

Page 37: Deepak Turaga 1, Michalis Vlachos 2, Olivier Verscheure 1 1 IBM T.J. Watson Research Center, NY, USA 2 IBM Zürich Research Laboratory, Switzerland On K-Means

end..