Unsupervised Learning
ClusteringK-Means
Recall: Key Components of Intelligent Agents
Representation Language: Graph, Bayes Nets, Linear functions
Inference Mechanism: A*, variable elimination, Gibbs sampling
Learning Mechanism: Maximum Likelihood, Laplace Smoothing, gradient descent, perceptron, k-Nearest Neighbor, many more: k-means, EM, PCA, …
-------------------------------------Evaluation Metric: Likelihood, quadratic loss (a.k.a. squared error), regularized loss, margins, many more: 0-1 loss, conditional likelihood, precision/recall, …
Supervised vs. Unsupervised LearningSupervised Learning: “Labeled” Data
X11 X12 … X1N Y1
X21 X22 … X2N Y2
… … … … …
XM1 XM2 … XMN YM
Unsupervised Learning: “Unlabeled” Data
X11 X12 … X1N ?
X21 X22 … X2N ?
… … … … …
XM1 XM2 … XMN ?
In supervised learning, the learning algorithm is given training examples that contain inputs (the X values) and “labels” or “outputs” (the Y values).
In unsupervised learning, the learning algorithm is given training examples that contain inputs (the X values), but no “labels” or “outputs” (no Y values).
It’s called “unsupervised” because there are no “labels” to help “supervise” the learning algorithm during the learning process, to get it to the right model.
Example Unsupervised Problem 1
Are these data points distributed completely randomly, or do you see some structure in them?
How many clusters do you see?None12345
X1
X 2
Example Unsupervised Problem 1
Are these data points distributed completely randomly, or do you see some structure in them?
Structured – there are clusters!
How many clusters do you see?None12345
X1
X 2
Example Unsupervised Problem 2
There are 2 input variables, X1 and X2, in this space. So this is called a “2-dimensional space”.
How many dimensions are actually needed to describe this data?0123
X1
X 2
Example Unsupervised Problem 2
There are 2 input variables, X1 and X2, in this space. So this is called a “2-dimensional space”.
How many dimensions are actually needed to describe this data?1 dimension captures most of the variation in this data.2 dimensions will capture everything.
X1
X 2
Types of Unsupervised Learning
Density Estimation- Clustering (Example 1)- Dimensionality Reduction (Example 2)
Factor Analysis- Blind signal separation
Example Open Problem in AI: Unsupervised Image Segmentation (and Registration)
Examples taken from (Felzenszwab and Huttenlocher, Int. Journal of Computer Vision, 59:2, 2004). http://cs.brown.edu/~pff/segment/.
The K-Means Clustering Algorithm
Inputs: 1) Some unlabeled (no outputs) training data2) A number K, which must be greater than 1
Output:A label between 1 and K for each data point, indicating which cluster the data point belongs to.
Visualization of K-Means
Data
Visualization of K-Means
1. Generate K random initial cluster centers, or “means”.
Visualization of K-Means
2. Assign each point to the closest “mean” point.
Visualization of K-Means
2. Assign each point to the closest “mean” point.Visually, the mean points divide the space into a Voronoi diagram.
Visualization of K-Means
3. Recompute the “mean” (center) of each colored set of data.Notice: “means” do not have to be at the same position as a data point,
although some times they might be.
Visualization of K-Means
3. Recompute the “mean” (center) of each colored set of data.Notice: “means” do not have to be at the same position as a data point,
although some times they might be.
Visualization of K-Means
4. Repeat steps 2 & 3 until the “means” stop moving (convergence).a. Repeat step 2 (assign each point to the nearest mean)
Visualization of K-Means
4. Repeat steps 2 & 3 until the “means” stop moving (convergence).a. Repeat step 2 (assign each point to the nearest mean)
Visualization of K-Means
4. Repeat steps 2 & 3 until the “means” stop moving (convergence).a. Repeat step 2 (assign each point to the nearest mean)b. Repeat step 3 (recompute means)
Visualization of K-Means
4. Repeat steps 2 & 3 until the “means” stop moving (convergence).a. Repeat step 2 (assign each point to the nearest mean)b. Repeat step 3 (recompute means)
Quiz: Where will the means be after the next iteration?
Visualization of K-Means
4. Repeat steps 2 & 3 until the “means” stop moving (convergence).a. Repeat step 2 (assign each point to the nearest mean)b. Repeat step 3 (recompute means)
Answer: Where will the means be after the next iteration?
Visualization of K-Means
4. Repeat steps 2 & 3 until the “means” stop moving (convergence).a. Repeat step 2 (assign each point to the nearest mean)b. Repeat step 3 (recompute means)
Quiz: Where will the means be after the next iteration?
Visualization of K-Means
4. Repeat steps 2 & 3 until the “means” stop moving (convergence).a. Repeat step 2 (assign each point to the nearest mean)b. Repeat step 3 (recompute means)
Answer: Where will the means be after the next iteration?
Formal Description of the Algorithm
Input: 1) X11, …, X1N; … ; XM1, …, XMN
2) K
Output: Y1; …; YM, where each Yi is in {1, …, K}
Formal Description of the Algorithm
1. Init: For each k in {1, …, K}, create a random point Ck
2. Repeat until all Ck remain the same: Assignment (aka Expectation):For each Xi, let C[Xi] the k value for the closest Ck to Xi
Update (aka Maximization): For each Ck, let Dk{Xi |C[Xi] = k} (set of Xi assigned to cluster k) if |Dk| = 0, let Ck random new point else let Ck (average of points in Dk)
3. Return C[Xi] for each Xi
Evaulation metric for K-means
LOSS Function (or Objective function) for K-means:
Within-cluster-sum-of-squares loss (WCSS):WCSS(X1, …, XM, C1, …, CK)
Complexity of K-MeansFinding a globally-optimal solution to WCSS is known to be an NP-hard problem.
K-means is known to converge to a local minimum of WCSS.
K-means is a “heuristic” or “greedy” algorithm, with no guarantee that it will find the global optimum.
On real datasets, K-means usually converges very quickly. Often, people run it multiple times with different random initializations, and choose the best result.
In some cases, K-means will still take exponential time (assuming P!=NP), even to find a local minimum. However, such cases are rare in practice.
Quiz
Is K-meansClassification or Regression?
Generative or Discriminative?
Parametric or Nonparametric?
AnswerIs K-meansClassification or Regression?- classification: output is a discrete value (cluster label) for each point
Generative or Discriminative?- discriminative: it has fixed input variables and output variables.
Parametric or Nonparametric?- parametric: the number of cluster centers (K) does not change with the number of training data points
Quiz
Is K-meansSupervised or Unsupervised?
Online or batch?
Closed-form or iterative?
Answer
Is K-meansSupervised or Unsupervised?- Unsupervised
Online or batch?- batch: if you add a new data point, you need to revisit all the training data to recompute the locally-optimal model
Closed-form or iterative?-iterative: training requires many passes through the data
Quiz
Which of the following problems might be solved using K-Means? Check all that apply.For those that work, explain what the inputs and outputs (X and Y variables) would be.• Segmenting an image • Finding galaxies (dense groups of stars) in a
telescope’s image of the night sky• Identify different species of bacteria from DNA
samples of bacteria in seawater
AnswerWhich of the following problems might be solved using K-Means? Check all that apply.For those that work, explain what the inputs and outputs (X and Y variables) would be.• Segmenting an image: Yes. Inputs are the pixel intensities,
outputs are segment labels.• Finding galaxies (dense groups of stars) in a telescope’s image of
the night sky. Yes. Inputs are star locations, outputs are galaxy labels
• Identify different species of bacteria from DNA samples of bacteria in seawater. Yes. Inputs are gene sequences, outputs are species labels.