Data-driven Clustering via Parameterized Lloyds Families04... · Learning Model •An unknown...

Preview:

Citation preview

Data-driven Clustering via Parameterized Lloyds Families

Travis DickJoint work with Maria-Florina Balcan and Colin White

Carnegie Mellon UniversityNeurIPS 2018

Data-driven Clustering

Data-driven Clustering• Clustering aims to divide a dataset into self-similar clusters.

Data-driven Clustering• Clustering aims to divide a dataset into self-similar clusters.• Goal: find some unknown natural clustering.

Data-driven Clustering• Clustering aims to divide a dataset into self-similar clusters.• Goal: find some unknown natural clustering.

Data-driven Clustering• Clustering aims to divide a dataset into self-similar clusters.• Goal: find some unknown natural clustering.

Data-driven Clustering• Clustering aims to divide a dataset into self-similar clusters.

• However, most clustering algorithms minimize a clustering cost function.

• Goal: find some unknown natural clustering.

Data-driven Clustering• Clustering aims to divide a dataset into self-similar clusters.

• However, most clustering algorithms minimize a clustering cost function.

• Goal: find some unknown natural clustering.

• Hope that low-cost clusterings recover the natural clusters.

Data-driven Clustering• Clustering aims to divide a dataset into self-similar clusters.

• However, most clustering algorithms minimize a clustering cost function.

• Goal: find some unknown natural clustering.

• Hope that low-cost clusterings recover the natural clusters.• There are many algorithms and many objectives.

Data-driven Clustering• Clustering aims to divide a dataset into self-similar clusters.

• However, most clustering algorithms minimize a clustering cost function.

How do we choose the best algorithm for a specific application?

• Goal: find some unknown natural clustering.

• Hope that low-cost clusterings recover the natural clusters.• There are many algorithms and many objectives.

Data-driven Clustering• Clustering aims to divide a dataset into self-similar clusters.

• However, most clustering algorithms minimize a clustering cost function.

How do we choose the best algorithm for a specific application?

Can we automate this process?

• Goal: find some unknown natural clustering.

• Hope that low-cost clusterings recover the natural clusters.• There are many algorithms and many objectives.

Learning Model

Learning Model• An unknown distribution ! over clustering instances.

Learning Model• An unknown distribution ! over clustering instances.• Given a sample "#, … , "& ∼ ! annotated by their target clusterings.

Learning Model• An unknown distribution ! over clustering instances.

• Find an algorithm " that produces clusterings similar to the target clusterings.

• Given a sample #$, … , #' ∼ ! annotated by their target clusterings.

Learning Model• An unknown distribution ! over clustering instances.

• Find an algorithm " that produces clusterings similar to the target clusterings.

• Given a sample #$, … , #' ∼ ! annotated by their target clusterings.

• Want " to also work well for new instances from !!

Learning Model• An unknown distribution ! over clustering instances.

• Find an algorithm " that produces clusterings similar to the target clusterings.

• Given a sample #$, … , #' ∼ ! annotated by their target clusterings.

• Want " to also work well for new instances from !!

• In this work:1. Introduce large parametric family of clustering algorithms, (*, +)-Lloyds.

Learning Model• An unknown distribution ! over clustering instances.

• Find an algorithm " that produces clusterings similar to the target clusterings.

• Given a sample #$, … , #' ∼ ! annotated by their target clusterings.

• Want " to also work well for new instances from !!

• In this work:1. Introduce large parametric family of clustering algorithms, (*, +)-Lloyds.2. Efficient procedures for finding best parameters on a sample.

Learning Model• An unknown distribution ! over clustering instances.

• Find an algorithm " that produces clusterings similar to the target clusterings.

• Given a sample #$, … , #' ∼ ! annotated by their target clusterings.

• Want " to also work well for new instances from !!

• In this work:1. Introduce large parametric family of clustering algorithms, (*, +)-Lloyds.2. Efficient procedures for finding best parameters on a sample.3. Generalization: optimal parameters on sample are nearly optimal on !.

Lloyds Method

Lloyds Method• Maintains ! centers "#, … , "& that define clusters.

Lloyds Method• Maintains ! centers "#, … , "& that define clusters.• Perform local search to improve the !-means cost of the centers.

Lloyds Method• Maintains ! centers "#, … , "& that define clusters.• Perform local search to improve the !-means cost of the centers.

1. Assign each point to nearest center.

Lloyds Method• Maintains ! centers "#, … , "& that define clusters.• Perform local search to improve the !-means cost of the centers.

1. Assign each point to nearest center.2. Update each center to be the mean of assigned points.

Lloyds Method• Maintains ! centers "#, … , "& that define clusters.• Perform local search to improve the !-means cost of the centers.

1. Assign each point to nearest center.2. Update each center to be the mean of assigned points.3. Repeat until convergence.

Lloyds Method• Maintains ! centers "#, … , "& that define clusters.• Perform local search to improve the !-means cost of the centers.

1. Assign each point to nearest center.2. Update each center to be the mean of assigned points.3. Repeat until convergence.

Lloyds Method• Maintains ! centers "#, … , "& that define clusters.• Perform local search to improve the !-means cost of the centers.

1. Assign each point to nearest center.2. Update each center to be the mean of assigned points.3. Repeat until convergence.

Lloyds Method• Maintains ! centers "#, … , "& that define clusters.• Perform local search to improve the !-means cost of the centers.

1. Assign each point to nearest center.2. Update each center to be the mean of assigned points.3. Repeat until convergence.

Lloyds Method• Maintains ! centers "#, … , "& that define clusters.• Perform local search to improve the !-means cost of the centers.

1. Assign each point to nearest center.2. Update each center to be the mean of assigned points.3. Repeat until convergence.

Lloyds Method• Maintains ! centers "#, … , "& that define clusters.• Perform local search to improve the !-means cost of the centers.

1. Assign each point to nearest center.2. Update each center to be the mean of assigned points.3. Repeat until convergence.

Initial Centers are Important!

Initial Centers are Important!• Lloyd’s method can get stuck if initial centers are chosen poorly

Initial Centers are Important!• Lloyd’s method can get stuck if initial centers are chosen poorly• Initialization is a well-studied problem with many proposed procedures (e.g., !-means++)

Initial Centers are Important!• Lloyd’s method can get stuck if initial centers are chosen poorly• Initialization is a well-studied problem with many proposed procedures (e.g., !-means++)• Best method will depend on properties of the clustering instances.

The (", $)-Lloyds Family

The (", $)-Lloyds FamilyInitialization: Parameter "

The (", $)-Lloyds FamilyInitialization: Parameter "• Use &'-sampling (generalizing &(-sampling of )-means++)

The (", $)-Lloyds FamilyInitialization: Parameter "• Use &'-sampling (generalizing &(-sampling of )-means++)• Choose initial centers from dataset * randomly.

The (", $)-Lloyds FamilyInitialization: Parameter "• Use &'-sampling (generalizing &(-sampling of )-means++)• Choose initial centers from dataset * randomly.• Probability that point + ∈ * is center -. is proportional to & +, -/, … , -.1/ '.

The (", $)-Lloyds FamilyInitialization: Parameter "• Use &'-sampling (generalizing &(-sampling of )-means++)

" = 0: random initialization

• Choose initial centers from dataset , randomly.• Probability that point - ∈ , is center /0 is proportional to & -, /1, … , /031 '.

The (", $)-Lloyds FamilyInitialization: Parameter "• Use &'-sampling (generalizing &(-sampling of )-means++)

" = 0: random initialization " = 2: )-means++

• Choose initial centers from dataset - randomly.• Probability that point . ∈ - is center 01 is proportional to & ., 02, … , 0142 '.

The (", $)-Lloyds FamilyInitialization: Parameter "• Use &'-sampling (generalizing &(-sampling of )-means++)

" = 0: random initialization " = 2: )-means++ " = ∞: farthest first

• Choose initial centers from dataset . randomly.• Probability that point / ∈ . is center 12 is proportional to & /, 13, … , 1253 '.

The (", $)-Lloyds FamilyInitialization: Parameter "• Use &'-sampling (generalizing &(-sampling of )-means++)

" = 0: random initialization " = 2: )-means++ " = ∞: farthest first

Local search: Second parameter $ tweaks the local search. Details in paper.

• Choose initial centers from dataset . randomly.• Probability that point / ∈ . is center 12 is proportional to & /, 13, … , 1253 '.

The (", $)-Lloyds FamilyInitialization: Parameter "• Use &'-sampling (generalizing &(-sampling of )-means++)

" = 0: random initialization " = 2: )-means++ " = ∞: farthest first

Local search: Second parameter $ tweaks the local search. Details in paper.

Question: For a distribution . over tasks, what parameters give best performance?

• Choose initial centers from dataset / randomly.• Probability that point 0 ∈ / is center 23 is proportional to & 0, 24, … , 2364 '.

Results

ResultsEfficient Tuning on Sample:

ResultsEfficient Tuning on Sample:• Efficient algorithm for finding parameters on sample with best agreement to targets.

ResultsEfficient Tuning on Sample:• Efficient algorithm for finding parameters on sample with best agreement to targets.• “Algorithmically feasible to tune parameters on sample.”

ResultsEfficient Tuning on Sample:

Generalization Guarantee:

• Efficient algorithm for finding parameters on sample with best agreement to targets.• “Algorithmically feasible to tune parameters on sample.”

ResultsEfficient Tuning on Sample:

Generalization Guarantee:

• Efficient algorithm for finding parameters on sample with best agreement to targets.• “Algorithmically feasible to tune parameters on sample.”

• Analyze the intrinsic complexity of (", $)-Lloyds

ResultsEfficient Tuning on Sample:

Generalization Guarantee:

• Efficient algorithm for finding parameters on sample with best agreement to targets.• “Algorithmically feasible to tune parameters on sample.”

• Analyze the intrinsic complexity of (", $)-Lloyds• Show that need only roughly &' ( )*+ ,

-. clustering instances to ensure empirical cost for all parameters within / of expected cost.

ResultsEfficient Tuning on Sample:

Generalization Guarantee:

• Efficient algorithm for finding parameters on sample with best agreement to targets.• “Algorithmically feasible to tune parameters on sample.”

• Analyze the intrinsic complexity of (", $)-Lloyds• Show that need only roughly &' ( )*+ ,

-. clustering instances to ensure empirical cost for all parameters within / of expected cost.

• “Parameters tuned on the sample will work well for new instances!”

ResultsEfficient Tuning on Sample:

Generalization Guarantee:

Experiments: Evaluate (", $)-Lloyds family on real and synthetic data.CIFAR-10 MNIST Mixture of Gaussians CNAE-9

• Efficient algorithm for finding parameters on sample with best agreement to targets.• “Algorithmically feasible to tune parameters on sample.”

• Analyze the intrinsic complexity of (", $)-Lloyds• Show that need only roughly &' ( )*+ ,

-. clustering instances to ensure empirical cost for all parameters within / of expected cost.

• “Parameters tuned on the sample will work well for new instances!”

Recommended