Ml intro

Introduc)on to Machine Learning

Integrated Knowledge Solu)ons h7ps://iksinc.wordpress.com/home/

[email protected] [email protected]

Agenda

•  What is machine learning? •  Why machine learning and why now? •  Machine learning terminology •  Overview of machine learning methods •  Machine learning to deep learning •  Summary and Q & A

[email protected]

What is machine learning?

[email protected]

What is Machine Learning?

•  Machine learning deals with making computers learn to make predic)ons/decisions without explicitly programming them. Rather a large number of examples of the underlying task are shown to op)mize a performance criterion to achieve learning.

[email protected]

An Example of Machine Learning: Credit Default Predic)on

We have historical data about businesses and their delinquency. The data consists of 100 businesses. Each business is characterized via two a7ributes: business age in months and number of days delinquent in payment. We also know whether a business defaulted or not. Using machine learning, we can build a model to predict the probability whether a given business will default or not.

0

20

40

60

80

100

0 100 200 300 400 500

[email protected]

Logis)c Regression

•  The model that is used here is called the logis&c regression model. Lets look at the following expression

, where x1, x2,…, xk are the a7ributes. •  In our example, the a7ributes are business age and number

of days of delinquency. •  The quan)ty p will always lie in the range 0-‐1 and thus can be

interpreted as the probability of outcome being default or no default.

p = e(a0+a1x1...+akxk )

1+ e(a0+a1x1...+akxk )

[email protected]

Logis)c Regression

•  By simple rewri)ng, we get:

log(p/(1-‐p)) = a0 + a1x1 + a2x2 +·∙·∙·∙ + akxk •  This ra)o is called log odds •  The parameters of the logis)c model, a0 , a1,…, ak, are learned via an op)miza)on procedure

•  The learned parameters can then be deployed in the field to make predic)ons

[email protected]

0

0.2

0.4

0.6

0.8

1

1.2

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97

Only in rare cases, we get a 100% accurate model.

Model Details and Performance

Plot of predicted default probability

[email protected]

Using the Model

•  What is the probability of a business defaul)ng given that business has been with the bank for 26 months and is delinquent for 58 days?

e0.008*26+0.102*58-‐5.706/(1+e0.008*26+0.102*58-‐5.706)

0.603

Plug the model parameters to calculate p

BUSAGE: 0.008; DAYSDELQ: 0.102; Intercept: -‐5.076

[email protected]

Why Machine Learning and Why Now?

[email protected]

Why Machine Learning?

[email protected]

Buzz about Machine Learning

"Every company is now a data company, capable of using machine learning in the cloud to deploy intelligent apps at scale, thanks to

three machine learning trends: data flywheels, the algorithm economy, and cloud-‐hosted

intelligence."

Three factors are making machine learning hot. These are cheap data, algorithmic economy, and cloud-‐based solu)ons.

[email protected]

Data is gemng cheaper

For example, Tesla has 780 million miles of driving data, and adds another million every 10 hours [email protected]

Algorithmic Economy

[email protected]

Algorithm Economy Players in ML

[email protected]

Cloud-‐Based Intelligence

Emerging machine intelligence plaoorms hos)ng pre-‐trained machine

learning models-‐as-‐a-‐service are making it easy for companies to get started with ML, allowing them to rapidly take their applica)ons from

prototype to produc)on.

Many open source machine learning and deep learning frameworks running in the

cloud allow easy leveraging of pre-‐trained, hosted models to tag images, recommend products, and do general natural language processing tasks.

[email protected]

An Example

[email protected]

Apps for Excel

[email protected]

Machine Learning Terminology

[email protected]

Feature Vectors in ML •  A machine learning system builds models using proper)es of objects being

modeled. These proper)es are called features or a@ributes and the process of measuring/obtaining such proper)es is called feature extrac&on. It is common to represent the proper)es of objects as feature vectors.

Sepal width Sepal length

Petal width

Petal length

x =

2

664

x1

x2

x3

x4

3

775

[email protected]

Learning Styles •  Supervised Learning –  Training data comes with answers, called labels –  The goal is to produce labels for new data

[email protected]

Supervised Learning Models

•  Classifica)on models – Predict whether a customer is likely to be lost to compe)tor

– Tag objects in a given image

– Determine whether an incoming email is spam or not

[email protected]

Supervised Learning Models

•  Regression models – Predict credit card balance of customers

– Predict the number of 'likes' for a pos)ng

– Predict peak load for a u)lity given weather informa)on

[email protected]

Learning Styles

•  Unsupervised Learning –  Training data comes without labels –  The goal is to group data into different categories based on similari)es

Grouped Data

[email protected]

Unsupervised Learning Models

•  Segment/ cluster customers into different groups

•  Organize a collec)on of documents based on their content

•  Make Recommenda)ons for products

[email protected]

Learning Styles

•  Reinforcement Learning –  Training data comes without labels –  The learning system receives feedback from its opera)ng environment to know how well it is doing

–  The goal is to perform be7er

[email protected]

Overview of Machine Learning Methods

[email protected]

Walk Through An Example: Flower Classifica)on

•  Build a classifica)on model to differen)ate between two classes of flower

[email protected]

How Do We Go About It?

•  Collect a large number of both types of flowers with the help of an expert

•  Measure some a7ributes that can help differen)ate between the two types of flowers. Let those a7ributes be petal area and sepal area.

[email protected]

Sca7er plot of 100 examples of flowers

[email protected]

We can separate the flower types using the linear boundary shown above. The parameters of the line represent the learned classifica)on model. [email protected]

Another possible boundary. This boundary cannot be expressed via an equa)on. However, a tree structure can be used to express this boundary. Note, this boundary does be7er predic)on of the collected data [email protected]

Yet another possible boundary. This boundary does predic)on without any error. Is this a be7er boundary?

[email protected]

Model Complexity

•  There are tradeoffs between the complexity of models and their performance in the field. A good design (model choice) weighs these tradeoffs.

•  A good design should avoid overfimng. How? –  Divide the en)re data into three sets

•  Training set (about 70% of the total data). Use this set to build the model •  Test set (about 20% of the total data). Use this set to es)mate the model accuracy auer deployment

•  Valida)on set (remaining 10% of the total data). Use this set to determine the appropriate semngs for free parameters of the model. May not be required in some cases.

[email protected]

Measuring Model Performance •  True Posi)ve: Correctly iden)fied as relevant •  True Nega)ve: Correctly iden)fied as not relevant •  False Posi)ve: Incorrectly labeled as relevant •  False Nega)ve: Incorrectly labeled as not relevant

Image:

True Posi)ve

True Nega)ve

Cat vs. No Cat

False Nega)ve

False Posi)ve

[email protected]

Precision, Recall, and Accuracy

•  Precision –  Percentage of posi)ve labels that are correct –  Precision = (# true posi)ves) / (# true posi)ves + # false posi)ves)

•  Recall –  Percentage of posi)ve examples that are correctly labeled –  Recall = (# true posi)ves) / (# true posi)ves + # false nega)ves)

•  Accuracy –  Percentage of correct labels –  Accuracy = (# true posi)ves + # true nega)ves) / (# of samples)

[email protected]

Sum-‐of-‐Squares Error for Regression Models

For regression model, the error is measured by taking the square of the difference between the predicted output value and the target value for each training (test) example and adding this number over all examples as shown

[email protected]

Bias and Variance

•  Bias: expected difference between model’s predic)on and truth

•  Variance: how much the model differs among training sets

•  Model Scenarios –  High Bias: Model makes inaccurate predic)ons on training data

–  High Variance: Model does not generalize to new datasets –  Low Bias: Model makes accurate predic)ons on training data

–  Low Variance: Model generalizes to new datasets

[email protected]

The Guiding Principle for Model Selec)on: Occam’s Razor

[email protected]

Model Building Algorithms

•  Supervised learning algorithms – Linear methods – k-‐NN classifiers – Neural networks – Support vector machines – Decision trees – Ensemble methods

[email protected]

Illustra)on of k-‐NN Model

Predicted label of test example with 1-‐NN model : Versicolor Predicted label of text example with 3-‐NN model: Virginica

Test example

[email protected]

Illustra)on of Decision Tree Model

Petal width <= 0.8

Setosa

Yes

Petal length <= 4.75

Versicolor Virginica

Yes No

No

The decision tree is automa)cally generated by a machine learning algorithm. [email protected]

Model Building Algorithms

•  Unsupervised learning – k-‐means clustering – Agglomera)ve clustering – Self organiza)on feature maps – Recommenda)on system

[email protected]

K-‐means Clustering

Clustering

1

K-means“by far the most popular

clustering tool used nowadays in scientific and

industrial applications”

- Berkhin 2006

Choose the number of clusters, k, and ini)al cluster centers


Clustering

1




- Berkhin 2006

K-means

K-means clustering problem

2

K-means


2

K-means


2

Assign data points to clusters based on distance to cluster centers


Clustering

1




- Berkhin 2006

K-means


2

K-means


2

K-means


2

K-means


(sum of square distances from data points to cluster

centers)

minimizeN�

n=1

⇥xn � centern⇥2

3

Update cluster centers and reassign data points.

K-means


(sum of square distances from data points to cluster

centers)

minimizeN�

n=1

⇥xn � centern⇥2

3

Illustra)on of Recommenda)on System

[email protected]

[email protected]

Steps Towards a Machine Learning Project

•  Collect data •  Explore data via sca7er plots, histograms. Remove duplicates and data records with missing values

•  Check for dimensionality reduc)on •  Build model (itera)ve process) •  Transport/Integrate with an applica)on

[email protected]

Machine Learning to Deep Learning

[email protected]

Machine Learning Limita)on

•  Machine learning methods operate on manually designed features.

•  The design of such features for tasks involving computer vision, speech understanding, natural language processing is extremely difficult. This puts a limit on the performance of the system.

[email protected]

Feature Extractor Trainable Classifier

Processing Sensory Data is Hard

How do we bridge this gap between the pixels and meaning via machine learning?

Sensory Data Processing is Challenging

So why not build integrated learning systems that perform end-‐to-‐end learning, i.e. learn the representa)on as well as classifica)on from raw data without any engineered features.

Feature Learner Trainable Classifier

An approach performing end-‐to-‐end learning, typically performed through a series of successive abstrac)ons, is in a nutshell deep learning

SegNet is a deep learning architecture for pixel wise seman)c segmenta)on from the University of Cambridge.

An example of deep learning Capability

Summary

•  We have just skimmed machine learning at surface •  Web is full of reading resources (free books, lecture notes, blogs, videos) to dig into machine learning

•  Several open source souware resources (R, Rapid Miner, and Scikit-‐learn etc.) to learn via experimenta)on

•  Applica)ons based on vision, speech, and natural language processing are excellent candidates for deep learning

[email protected]

[email protected] h7ps://iksinc.wordpress.com/home/

[email protected] [email protected]

Data & Analytics

Ml intro