62
UNIT- 4 Machine Learning

Machine learning

Embed Size (px)

Citation preview

Page 1: Machine learning

UNIT- 4 Machine Learning

Page 2: Machine learning

What is Machine Learning?

Adapt to / learn from dataTo optimize a performance function

Can be used to:Extract knowledge from dataLearn tasks that are difficult to formaliseCreate software that improves over time

Page 3: Machine learning
Page 4: Machine learning
Page 5: Machine learning

When to learn Human expertise does not exist (navigating on Mars) Humans are unable to explain their expertise (speech

recognition) Solution changes in time (routing on a computer network) Solution needs to be adapted to particular cases (user

biometrics)

Learning involves Learning general models from data Data is cheap and abundant. Knowledge is expensive and scarce Customer transactions to computer behaviour Build a model that is a good and useful approximation to the data

Page 6: Machine learning

Applications Speech and hand-writing recognition Autonomous robot control Data mining and bioinformatics: motifs, alignment, … Playing games Fault detection Clinical diagnosis Spam email detection Credit scoring, fraud detection Web mining: search engines Market basket analysis,

Applications are diverse but methods are generic

Page 7: Machine learning

Generic methods

Learning from labelled data (supervised learning)Eg. Classification, regression, prediction, function approx.

Learning from unlabelled data (unsupervised learning)Eg. Clustering, visualisation, dimensionality reduction

Learning from sequential dataEg. Speech recognition, DNA data analysisAssociationsReinforcement Learning

Page 8: Machine learning

Statistical Learning

Machine learning methods can be unified within the framework of statistical learning:Data is considered to be a sample from a probability

distribution.Typically, we don’t expect perfect learning but only

“probably correct” learning.Statistical concepts are the key to measuring our expected

performance on novel problem instances.

Page 9: Machine learning

Induction and inference

Induction: Generalizing from specific examples.

Inference: Drawing conclusions from possibly incomplete knowledge.

Learning machines need to do both.

Page 10: Machine learning

Inductive learning

Data produced by “target”. Hypothesis learned from data in order to “explain”, “predict”,“model”

or “control” target. Generalisation ability is essential.

Inductive learning hypothesis: “If the hypothesis works for enough data

then it will work on new examples.”

Page 11: Machine learning

Example 1: Hand-written digits

Data representation: Greyscale imagesTask: Classification (0,1,2,3…..9)Problem features: Highly variable inputs from same class including some

“weird” inputs, imperfect human classification,high cost associated with errors so “don’t know” may be

useful.

Page 12: Machine learning

                                                

Page 13: Machine learning

Example 2: Speech recognition

Data representation: features from spectral analysis of speech signals (two in this simple example).

Problem features: Highly variable data with same classification.Good feature selection is very important.Speech recognition is often broken into a number of

smaller tasks like this.

Page 14: Machine learning
Page 15: Machine learning

Example 3: DNA microarrays

DNA from ~10000 genes attached to a glass slide (the microarray).

Green and red labels attached to mRNA from two different samples.

mRNA is hybridized (stuck) to the DNA on the chip and green/red ratio is used to measure relative abundance of gene products.

Page 16: Machine learning
Page 17: Machine learning

DNA microarrays

Data representation: ~10000 Green/red intensity levels ranging from 10-10000.

Tasks: Sample classification, gene classification, visualisation and clustering of genes/samples.

Problem features: High-dimensional data but relatively small number of examples. Extremely noisy data (noise ~ signal). Lack of good domain knowledge.

Page 18: Machine learning

Projection of 10000 dimensional data onto 2D using PCA effectively separates cancer subtypes.

Page 19: Machine learning

Probabilistic models

A large part of the module will deal with methodsthat have an explicit probabilistic interpretation:

Good for dealing with uncertaintyeg. is a handwritten digit a three or an eight ?Provides interpretable resultsUnifies methods from different fields

Page 20: Machine learning

20 of 15

Face Detection1. Image pyramid used to locate faces of different sizes2. Image lighting compensation3. Neural Network detects rotation of face candidate4. Final face candidate de-rotated ready for detection

Page 21: Machine learning

21 of 15

Face Detection (Con’t)

5. Submit image to Neural Networka. Break image into segmentsb. Each segment is a unique input to the networkc. Each segment looks for certain patterns (eyes,

mouth, etc)6. Output is likelihood of a face

Page 22: Machine learning

Supervised Learning: Uses

Prediction of future casesKnowledge extractionCompression of Data & knowledge

Page 23: Machine learning

Unsupervised Learning

Clustering: grouping similar instancesExample applications

Customer segmentation in CRMLearning pattern in bioinformaticsClustering items based on similarityClustering users based on interests

Page 24: Machine learning

Reinforcement Learning

Learning a policy: A sequence of outputsNo supervised output but delayed rewardCredit assignment problemGame playingRobot in a mazeMultiple agnts, partial observability

Page 25: Machine learning

ID3 Decision Tree

It is particularly interesting forIts representation of learned knowledgeIts approach to the management of complexity Its heuristic for selecting candidate conceptsIts potential for handling noisy data

Page 26: Machine learning

ID3 Decision Tree

Page 27: Machine learning

ID3 Decision Tree

The previous table can be represented as the following decision tree:

Page 28: Machine learning

ID3 Decision Tree

In a decision tree, each internal node represents a test on some property Each possible value of that property corresponds to a branch of the tree Leaf nodes represents classification, such as low or moderate risk

Page 29: Machine learning

ID3 Decision Tree

A simplified decision tree for credit risk management

Page 30: Machine learning

ID3 Decision Tree

ID3 constructs decision trees in a top-down fashion.

ID3 selects a property to test at the current node of the tree and uses this test to partition the set of examples

The algorithm recursively constructs a sub-tree for each parturition

This continues until all members of the partition are in the same class

Page 31: Machine learning

ID3 Decision Tree

For example, ID3 selects income as the root property for the first step

Page 32: Machine learning

ID3 Decision Tree

Page 33: Machine learning

ID3 Decision Tree

How to select the 1st node? (and the following nodes)

ID3 measures the information gained by making each property the root of current subtree

It picks the property that provides the greatest information gain

Page 34: Machine learning

ID3 Decision Tree

If we assume that all the examples in the table occur with equal probability, then:P(risk is high)=6/14P(risk is moderate)=3/14P(risk is low)=5/14

Page 35: Machine learning

ID3 Decision Tree

I[6,3,5]=

Based on

531.1)145(log

145)

143(log

143)

146(log

146)5,3,6()( 222 IDInfo

n

iii mpmpMI

12 ))((log)()(

Page 36: Machine learning

ID3 Decision Tree

The information gain form income is:Gain(income)= I[6,3,5]-E[income]= 1.531-0.564=0.967

Similarly, Gain(credit history)=0.266 Gain(debt)=0.063 Gain(colletral)=0.206

Page 37: Machine learning

ID3 Decision Tree

Since income provides the greatest information gain, ID3 will select it as the root of the tree

Page 38: Machine learning

ID3 Decision Tree Pseudo Code

Page 39: Machine learning

Unsupervised Learning

The learning algorithms discussed so far implement forms of supervised learning

They assume the existence of a teacher, some fitness measure, or other external method of classifying training instances

Unsupervised Learning eliminates the teacher and requires that the learners form and evaluate concepts their own

Page 40: Machine learning

Unsupervised Learning

Science is perhaps the best example of unsupervised learning in humans

Scientists do not have the benefit of a teacher. Instead, they propose hypotheses to explain

observations,

Page 41: Machine learning

Unsupervised Learning

The result of this algorithm is a Binary Tree whose leaf nodes are instances and whose internal nodes are clusters of increasing size

We may also extend this algorithm to objects represented as sets of symbolic features.

Page 42: Machine learning

Unsupervised Learning

Object1={small, red, rubber, ball}Object1={small, blue, rubber, ball}Object1={large, black, wooden, ball}

This metric would compute the similary values:Similarity(object1, object2)= ¾Similarity(object1, object3)=1/4

Page 43: Machine learning

Machine Learning

Up till now: how to search or reason using a model

Machine learning: how to select a model on the basis of data / experienceLearning parameters (e.g. probabilities)Learning hidden concepts (e.g. clustering)

Page 44: Machine learning

Classification

In classification, we learn to predict labels (classes) for inputs

Examples: Spam detection (input: document, classes: spam / ham) OCR (input: images, classes: characters) Medical diagnosis (input: symptoms, classes: diseases) Automatic essay grader (input: document, classes: grades) Fraud detection (input: account activity, classes: fraud / no fraud) Customer service email routing … many more

Classification is an important commercial technology!

Page 45: Machine learning

Classification

Data: Inputs x, class labels y We imagine that x is something that has a lot of structure, like an

image or document In the basic case, y is a simple N-way choice

Basic Setup: Training data: D = bunch of <x,y> pairs Feature extractors: functions fi which provide attributes of an

example x Test data: more x’s, we must predict y’s

Page 46: Machine learning

Bayes Nets for Classification

One method of classification:Features are values for observed variablesY is a query variableUse probabilistic inference to compute most likely Y

Page 47: Machine learning

Simple Classification

Simple example: two binary featuresThis is a naïve Bayes model

M

S F

direct estimate

Bayes estimate (no assumptions)

Conditional independence

+

Page 48: Machine learning

General Naïve Bayes

A general naive Bayes model:

C

E1 EnE2

|C| parameters n x |E| x |C| parameters

|C| x |E|n parameters

Page 49: Machine learning

Inference for Naïve Bayes

Goal: compute posterior over causes Step 1: get joint probability of causes and evidence

Step 2: get probability of evidence

Step 3: renormalize

+

Page 50: Machine learning

A Digit Recognizer

Input: pixel grids

Output: a digit 0-9

Page 51: Machine learning

Examples: CPTs

1 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.10 0.1

1 0.012 0.053 0.054 0.305 0.806 0.907 0.058 0.609 0.500 0.80

1 0.052 0.013 0.904 0.805 0.906 0.907 0.258 0.859 0.600 0.80

Page 52: Machine learning

Parameter EstimationEstimating the distribution of a random variable X or X|Y

Empirically: use training data For each value x, look at the empirical rate of that value:

This estimate maximizes the likelihood of the data

Elicitation: ask a human! Usually need domain experts, and sophisticated ways of eliciting

probabilities (e.g. betting games) Trouble calibrating

r g g

Page 53: Machine learning

Handwritten characters classification

Page 54: Machine learning

Gray level pictures:object classification

Page 55: Machine learning

Gray level pictures: human action classification

Page 56: Machine learning

Expectation Maximization EM

when to usedata is only partially observableunsupervised clustering: target value unobservablesupervised learning: some instance attributes

unobservableapplications training Bayesian Belief Networksunsupervised clustering learning hidden Markov models

Page 57: Machine learning

Generating Data from Mixture of Gaussians

Each instance x generated by choosing one of the k Gaussians at randomGenerating an instance according to that Gaussian

Page 58: Machine learning

EM for Estimating k Means

Given: instances from X generated by mixture of k Gaussiansunknown means <m1,…,mk> of the k Gaussiansdon’t know which instance xi was generated by which

GaussianDetermine:maximum likelihood estimates of <m1,…,mk> Think of full description of each instance as yi=<xi,zi1,zi2>zij is 1 if xi generated by j-th Gaussianxi observablezij unobservable

Page 59: Machine learning

EM Algorithm

Converges to local maximum likelihood and provides estimates of hidden variables zij.

In fact local maximum in E [ln (P(Y|h)]Y is complete (observable plus non-observable

variables) dataExpected valued is taken over possible values of

unobserved variables in Y

Page 60: Machine learning

General EM Problem

Given:observed data X = {x1,…,xm}unobserved data Z = {z1,…,zm}parameterized probability distribution P(Y|h) where

Y = {y1,…,ym} is the full data yi=<xi,zi>h are the parameters

Determine:h that (locally) maximizes E[ln P(Y|h)]Applications: train Bayesian Belief Networksunsupervised clusteringhidden Markov models

Page 61: Machine learning

General EM Method

Define likelihood function Q(h’|h) which calculates Y = X Z using observed X and current parameters h

to estimate Z Q(h’|h) = E[ ln( P(Y|h’) | h, X] EM algorithm:Estimation (E) step: Calculate Q(h’|h) using the current

hypothesis h and the observed data X to estimate the probability distribution over Y.

Q(h’|h) = E[ ln( P(Y|h’) | h, X] Maximization (M) step: Replace hypothesis h by the

hypothesis h’ that maximizes this Q function. h = argmaxh’H Q(h’|h)

Page 62: Machine learning

Thank You