Click here to load reader
Upload
si-krishan
View
282
Download
0
Embed Size (px)
Citation preview
Introduc)on to Machine Learning
Integrated Knowledge Solu)ons h7ps://iksinc.wordpress.com/home/
[email protected] [email protected]
Agenda
• What is machine learning? • Why machine learning and why now? • Machine learning terminology • Overview of machine learning methods • Machine learning to deep learning • Summary and Q & A
What is machine learning?
What is Machine Learning?
• Machine learning deals with making computers learn to make predic)ons/decisions without explicitly programming them. Rather a large number of examples of the underlying task are shown to op)mize a performance criterion to achieve learning.
An Example of Machine Learning: Credit Default Predic)on
We have historical data about businesses and their delinquency. The data consists of 100 businesses. Each business is characterized via two a7ributes: business age in months and number of days delinquent in payment. We also know whether a business defaulted or not. Using machine learning, we can build a model to predict the probability whether a given business will default or not.
0
20
40
60
80
100
0 100 200 300 400 500
Logis)c Regression
• The model that is used here is called the logis&c regression model. Lets look at the following expression
, where x1, x2,…, xk are the a7ributes. • In our example, the a7ributes are business age and number
of days of delinquency. • The quan)ty p will always lie in the range 0-‐1 and thus can be
interpreted as the probability of outcome being default or no default.
p = e(a0+a1x1...+akxk )
1+ e(a0+a1x1...+akxk )
Logis)c Regression
• By simple rewri)ng, we get:
log(p/(1-‐p)) = a0 + a1x1 + a2x2 +·∙·∙·∙ + akxk • This ra)o is called log odds • The parameters of the logis)c model, a0 , a1,…, ak, are learned via an op)miza)on procedure
• The learned parameters can then be deployed in the field to make predic)ons
0
0.2
0.4
0.6
0.8
1
1.2
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97
Only in rare cases, we get a 100% accurate model.
Model Details and Performance
Plot of predicted default probability
Using the Model
• What is the probability of a business defaul)ng given that business has been with the bank for 26 months and is delinquent for 58 days?
e0.008*26+0.102*58-‐5.706/(1+e0.008*26+0.102*58-‐5.706)
0.603
Plug the model parameters to calculate p
BUSAGE: 0.008; DAYSDELQ: 0.102; Intercept: -‐5.076
Why Machine Learning and Why Now?
Why Machine Learning?
Buzz about Machine Learning
"Every company is now a data company, capable of using machine learning in the cloud to deploy intelligent apps at scale, thanks to
three machine learning trends: data flywheels, the algorithm economy, and cloud-‐hosted
intelligence."
Three factors are making machine learning hot. These are cheap data, algorithmic economy, and cloud-‐based solu)ons.
Data is gemng cheaper
For example, Tesla has 780 million miles of driving data, and adds another million every 10 hours [email protected]
Algorithmic Economy
Algorithm Economy Players in ML
Cloud-‐Based Intelligence
Emerging machine intelligence plaoorms hos)ng pre-‐trained machine
learning models-‐as-‐a-‐service are making it easy for companies to get started with ML, allowing them to rapidly take their applica)ons from
prototype to produc)on.
Many open source machine learning and deep learning frameworks running in the
cloud allow easy leveraging of pre-‐trained, hosted models to tag images, recommend products, and do general natural language processing tasks.
An Example
Apps for Excel
Machine Learning Terminology
Feature Vectors in ML • A machine learning system builds models using proper)es of objects being
modeled. These proper)es are called features or a@ributes and the process of measuring/obtaining such proper)es is called feature extrac&on. It is common to represent the proper)es of objects as feature vectors.
Sepal width Sepal length
Petal width
Petal length
x =
2
664
x1
x2
x3
x4
3
775
Learning Styles • Supervised Learning – Training data comes with answers, called labels – The goal is to produce labels for new data
Supervised Learning Models
• Classifica)on models – Predict whether a customer is likely to be lost to compe)tor
– Tag objects in a given image
– Determine whether an incoming email is spam or not
Supervised Learning Models
• Regression models – Predict credit card balance of customers
– Predict the number of 'likes' for a pos)ng
– Predict peak load for a u)lity given weather informa)on
Learning Styles
• Unsupervised Learning – Training data comes without labels – The goal is to group data into different categories based on similari)es
Grouped Data
Unsupervised Learning Models
• Segment/ cluster customers into different groups
• Organize a collec)on of documents based on their content
• Make Recommenda)ons for products
Learning Styles
• Reinforcement Learning – Training data comes without labels – The learning system receives feedback from its opera)ng environment to know how well it is doing
– The goal is to perform be7er
Overview of Machine Learning Methods
Walk Through An Example: Flower Classifica)on
• Build a classifica)on model to differen)ate between two classes of flower
How Do We Go About It?
• Collect a large number of both types of flowers with the help of an expert
• Measure some a7ributes that can help differen)ate between the two types of flowers. Let those a7ributes be petal area and sepal area.
Sca7er plot of 100 examples of flowers
We can separate the flower types using the linear boundary shown above. The parameters of the line represent the learned classifica)on model. [email protected]
Another possible boundary. This boundary cannot be expressed via an equa)on. However, a tree structure can be used to express this boundary. Note, this boundary does be7er predic)on of the collected data [email protected]
Yet another possible boundary. This boundary does predic)on without any error. Is this a be7er boundary?
Model Complexity
• There are tradeoffs between the complexity of models and their performance in the field. A good design (model choice) weighs these tradeoffs.
• A good design should avoid overfimng. How? – Divide the en)re data into three sets
• Training set (about 70% of the total data). Use this set to build the model • Test set (about 20% of the total data). Use this set to es)mate the model accuracy auer deployment
• Valida)on set (remaining 10% of the total data). Use this set to determine the appropriate semngs for free parameters of the model. May not be required in some cases.
Measuring Model Performance • True Posi)ve: Correctly iden)fied as relevant • True Nega)ve: Correctly iden)fied as not relevant • False Posi)ve: Incorrectly labeled as relevant • False Nega)ve: Incorrectly labeled as not relevant
Image:
True Posi)ve
True Nega)ve
Cat vs. No Cat
False Nega)ve
False Posi)ve
Precision, Recall, and Accuracy
• Precision – Percentage of posi)ve labels that are correct – Precision = (# true posi)ves) / (# true posi)ves + # false posi)ves)
• Recall – Percentage of posi)ve examples that are correctly labeled – Recall = (# true posi)ves) / (# true posi)ves + # false nega)ves)
• Accuracy – Percentage of correct labels – Accuracy = (# true posi)ves + # true nega)ves) / (# of samples)
Sum-‐of-‐Squares Error for Regression Models
For regression model, the error is measured by taking the square of the difference between the predicted output value and the target value for each training (test) example and adding this number over all examples as shown
Bias and Variance
• Bias: expected difference between model’s predic)on and truth
• Variance: how much the model differs among training sets
• Model Scenarios – High Bias: Model makes inaccurate predic)ons on training data
– High Variance: Model does not generalize to new datasets – Low Bias: Model makes accurate predic)ons on training data
– Low Variance: Model generalizes to new datasets
The Guiding Principle for Model Selec)on: Occam’s Razor
Model Building Algorithms
• Supervised learning algorithms – Linear methods – k-‐NN classifiers – Neural networks – Support vector machines – Decision trees – Ensemble methods
Illustra)on of k-‐NN Model
Predicted label of test example with 1-‐NN model : Versicolor Predicted label of text example with 3-‐NN model: Virginica
Test example
Illustra)on of Decision Tree Model
Petal width <= 0.8
Setosa
Yes
Petal length <= 4.75
Versicolor Virginica
Yes No
No
The decision tree is automa)cally generated by a machine learning algorithm. [email protected]
Model Building Algorithms
• Unsupervised learning – k-‐means clustering – Agglomera)ve clustering – Self organiza)on feature maps – Recommenda)on system
K-‐means Clustering
Clustering
1
K-means“by far the most popular
clustering tool used nowadays in scientific and
industrial applications”
- Berkhin 2006
Choose the number of clusters, k, and ini)al cluster centers
K-‐means Clustering
Clustering
1
K-means“by far the most popular
clustering tool used nowadays in scientific and
industrial applications”
- Berkhin 2006
K-means
K-means clustering problem
2
K-means
K-means clustering problem
2
K-means
K-means clustering problem
2
Assign data points to clusters based on distance to cluster centers
K-‐means Clustering
Clustering
1
K-means“by far the most popular
clustering tool used nowadays in scientific and
industrial applications”
- Berkhin 2006
K-means
K-means clustering problem
2
K-means
K-means clustering problem
2
K-means
K-means clustering problem
2
K-means
K-means clustering problem
(sum of square distances from data points to cluster
centers)
minimizeN�
n=1
⇥xn � centern⇥2
3
Update cluster centers and reassign data points.
K-means
K-means clustering problem
(sum of square distances from data points to cluster
centers)
minimizeN�
n=1
⇥xn � centern⇥2
3
Illustra)on of Recommenda)on System
Steps Towards a Machine Learning Project
• Collect data • Explore data via sca7er plots, histograms. Remove duplicates and data records with missing values
• Check for dimensionality reduc)on • Build model (itera)ve process) • Transport/Integrate with an applica)on
Machine Learning to Deep Learning
Machine Learning Limita)on
• Machine learning methods operate on manually designed features.
• The design of such features for tasks involving computer vision, speech understanding, natural language processing is extremely difficult. This puts a limit on the performance of the system.
Feature Extractor Trainable Classifier
Processing Sensory Data is Hard
How do we bridge this gap between the pixels and meaning via machine learning?
Sensory Data Processing is Challenging
So why not build integrated learning systems that perform end-‐to-‐end learning, i.e. learn the representa)on as well as classifica)on from raw data without any engineered features.
Feature Learner Trainable Classifier
An approach performing end-‐to-‐end learning, typically performed through a series of successive abstrac)ons, is in a nutshell deep learning
SegNet is a deep learning architecture for pixel wise seman)c segmenta)on from the University of Cambridge.
An example of deep learning Capability
Summary
• We have just skimmed machine learning at surface • Web is full of reading resources (free books, lecture notes, blogs, videos) to dig into machine learning
• Several open source souware resources (R, Rapid Miner, and Scikit-‐learn etc.) to learn via experimenta)on
• Applica)ons based on vision, speech, and natural language processing are excellent candidates for deep learning
[email protected] h7ps://iksinc.wordpress.com/home/