40
Supervised Learning Recap Machine Learning

Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

Embed Size (px)

Citation preview

Page 1: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

Supervised Learning Recap

Machine Learning

Page 2: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

Last Time

• Support Vector Machines• Kernel Methods

Page 3: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

Today

• Review of Supervised Learning• Unsupervised Learning – (Soft) K-means clustering– Expectation Maximization– Spectral Clustering– Principle Components Analysis– Latent Semantic Analysis

Page 4: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

Supervised Learning

• Linear Regression• Logistic Regression• Graphical Models– Hidden Markov Models

• Neural Networks• Support Vector Machines– Kernel Methods

Page 5: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

Major concepts

• Gaussian, Multinomial, Bernoulli Distributions• Joint vs. Conditional Distributions• Marginalization• Maximum Likelihood• Risk Minimization• Gradient Descent• Feature Extraction, Kernel Methods

Page 6: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

Some favorite distributions

• Bernoulli

• Multinomial

• Gaussian

Page 7: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

Maximum Likelihood

• Identify the parameter values that yield the maximum likelihood of generating the observed data.

• Take the partial derivative of the likelihood function• Set to zero• Solve

• NB: maximum likelihood parameters are the same as maximum log likelihood parameters

Page 8: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

Maximum Log Likelihood

• Why do we like the log function?• It turns products (difficult to differentiate) and

turns them into sums (easy to differentiate)

• log(xy) = log(x) + log(y)• log(xc) = c log(x)•

Page 9: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

Risk Minimization

• Pick a loss function– Squared loss– Linear loss– Perceptron (classification) loss

• Identify the parameters that minimize the loss function.– Take the partial derivative of the loss function– Set to zero– Solve

Page 10: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

Frequentists v. Bayesians

• Point estimates vs. Posteriors• Risk Minimization vs. Maximum Likelihood• L2-Regularization– Frequentists: Add a constraint on the size of the

weight vector– Bayesians: Introduce a zero-mean prior on the

weight vector– Result is the same!

Page 11: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

L2-Regularization

• Frequentists:– Introduce a cost on the size of the weights

• Bayesians:– Introduce a prior on the weights

Page 12: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

Types of Classifiers

• Generative Models– Highest resource requirements. – Need to approximate the joint probability

• Discriminative Models– Moderate resource requirements. – Typically fewer parameters to approximate than generative models

• Discriminant Functions– Can be trained probabilistically, but the output does not include

confidence information

Page 13: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

Linear Regression

• Fit a line to a set of points

Page 14: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

Linear Regression

• Extension to higher dimensions– Polynomial fitting

– Arbitrary function fitting• Wavelets• Radial basis functions• Classifier output

Page 15: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

Logistic Regression

• Fit gaussians to data for each class• The decision boundary is where the PDFs cross

• No “closed form” solution to the gradient.• Gradient Descent

Page 16: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

Graphical Models

• General way to describe the dependence relationships between variables.

• Junction Tree Algorithm allows us to efficiently calculate marginals over any variable.

Page 17: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

Junction Tree Algorithm

• Moralization– “Marry the parents”– Make undirected

• Triangulation– Remove cycles >4

• Junction Tree Construction– Identify separators such that the running intersection

property holds• Introduction of Evidence– Pass slices around the junction tree to generate marginals

Page 18: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

Hidden Markov Models

• Sequential Modeling– Generative Model

• Relationship between observations and state (class) sequences

Page 19: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

Perceptron

• Step function used for squashing.• Classifier as Neuron metaphor.

Page 20: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

Perceptron Loss

• Classification Error vs. Sigmoid Error– Loss is only calculated on Mistakes

Perceptrons usestrictly classificationerror

Page 21: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

Neural Networks

• Interconnected Layers of Perceptrons or Logistic Regression “neurons”

Page 22: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

Neural Networks

• There are many possible configurations of neural networks– Vary the number of layers– Size of layers

Page 23: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

Support Vector Machines

• Maximum Margin Classification Small Margin

Large Margin

Page 24: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

Support Vector Machines

• Optimization Function

• Decision Function

Page 25: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

25

Visualization of Support Vectors

Page 26: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

Questions?

• Now would be a good time to ask questions about Supervised Techniques.

Page 27: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

Clustering

• Identify discrete groups of similar data points• Data points are unlabeled

Page 28: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

Recall K-Means

• Algorithm– Select K – the desired number of clusters– Initialize K cluster centroids– For each point in the data set, assign it to the cluster

with the closest centroid

– Update the centroid based on the points assigned to each cluster

– If any data point has changed clusters, repeat

Page 29: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

k-means output

Page 30: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

Soft K-means

• In k-means, we force every data point to exist in exactly one cluster.

• This constraint can be relaxed.

Minimizes the entropy of cluster assignment

Page 31: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

Soft k-means example

Page 32: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

Soft k-means

• We still define a cluster by a centroid, but we calculate the centroid as the weighted mean of all the data points

• Convergence is based on a stopping threshold rather than changed assignments

Page 33: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

Gaussian Mixture Models

• Rather than identifying clusters by “nearest” centroids

• Fit a Set of k Gaussians to the data.

Page 34: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

GMM example

Page 35: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

Gaussian Mixture Models

• Formally a Mixture Model is the weighted sum of a number of pdfs where the weights are determined by a distribution,

Page 36: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

Graphical Modelswith unobserved variables

• What if you have variables in a Graphical model that are never observed?– Latent Variables

• Training latent variable models is an unsupervised learning application

laughing

amused

sweating

uncomfortable

Page 37: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

Latent Variable HMMs

• We can cluster sequences using an HMM with unobserved state variables

• We will train the latent variable models using Expectation Maximization

Page 38: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

Expectation Maximization

• Both the training of GMMs and Gaussian Models with latent variables are accomplished using Expectation Maximization– Step 1: Expectation (E-step)• Evaluate the “responsibilities” of each cluster with the

current parameters

– Step 2: Maximization (M-step)• Re-estimate parameters using the existing

“responsibilities”

• Related to k-means

Page 39: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

Questions

• One more time for questions on supervised learning…

Page 40: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods

Next Time

• Gaussian Mixture Models (GMMs)• Expectation Maximization