Machine Learning Introduc1on [email protected]
Machine learning introduc0on
Logis1c regression Feature selec1on
Boos1ng, tree boos1ng
See more ML posts: h>p://dongguo.me/
Machine Learning Makes Life Be>er
WHAT IS MACHINE LEARNING?
Learning
• What is learning – Find rules from data/experience
• Why learning is possible – Assume rules exist in this world
• How to learn – Induc1ve
What is machine learning
• “Machine Learning is a field of study that gives computers the ability to learn without being explicitly programmed” -‐ Arthur Samuel (1959)
• Machine learning is the study of computer algorithms that improve automa1cally through experience” – Tom Mitchell (1998)
Overview of machine learning
Machine Learning
Unsupervised Learning
Semi-‐supervised Learning
Supervised Learning
Classifica1on Regression
Outline
• Supervised Learning
• Case Study
• Challenge
• Resource
Supervised learning
• Concepts
• Defini1on
• Models
• Metrics
• Open Ques1ons
Concepts
Problem
Generate dataset
Train
Predict
model Model Tuning Feature selec0on
label Feature vector
Dataset
Sample/instance
Test
What is Supervised learning
• Find a func1on (from some func1on space) to predict for unseen instances, from the labeled training data – Func1on space: determined by the chosen model – Find the func1on: minimize error on training data with some cost func1on
• 2 types: Classifica1on and regression
Formal defini1on
• Given a training dataset
• And define a loss func1on
• Target
1{ , }Ni i ix y =
r
( , ), ( )L y y where y f x∧ ∧
=
1
( ) argmin ( ),
1. ( ) ( , ( ))
f
N
i ii
f x G f
st G f L y f xN
∧
=
=
= ∑
Models for supervised learning
• Classifica1on and regression – For classifica1on: LR(Logis1c regression), Naïve Bayes – For regression: linear regression – For Both: Trees, KNN, SVM, ANN
• Genera1ve and Discrimina1ve – Genera1ve: Naïve Bayes, GMM, HMM – Discrimina1ve: KNN, LR, SVM, ANN, Trees
• Parametric and nonparametric – Parametric: LR, Naïve Bayes, ANN – nonparametric: Trees, KNN, kernel methods
Decision Tree • Would you like to date somebody?
Gender
Good looking? Pass
male female
Pass Others…
No! Yes! umm..
Accept
Accept Pass
Very good else
K-‐Nearest Neighbor classifier
K=15 K=1
Naïve Bayes
• Bayes classifier
• Condi1onal Independence assump1on
• With this assump1on
Logis1c regression
• Logis1c func1on
Ar1ficial neural network
Support vector machine
Model Inference
• Typical inference methods – Gradient descent
– Expecta1on Maximiza1on
– Sampling based
Model ensemble • Averaging or vo1ng output of mul1ply classifiers • Bagging (bootstrap aggrega1ng) – Train mul1ple base models – Vote mul1ply base classifiers with same weight – Improve model stability and avoid overfihng – Work well on unstable base classifier
• Adaboost (adap1ve boos1ng) – Sequen1al base classifiers – Misclassified instances have higher weight in next base classifier
– Weighted vo1ng
Evalua1on metrics
• Common Metrics for classifica1on – Accuracy – Precision-‐Recall – AUC
• For regression – Mean absolute error (MAE) – Mean square error (MSE), RMSE
Ques1on1: How to choose a suitable model?
Characteris0c Naïve Bayes
Trees K Nearest neighbor
Logis0c regression
Neural Networks
SVM
Natural handling data of “mixed” type
1 3 1 1 1 1
Robustness to outliers in input space
3 3 3 3 1 1
Computa1onal scalability
3 3 1 3 1 1
Interpretability 2 2 1 2 1 1
Predic1ve power 1 1 3 2 3 3
<Elements of Sta-s-cal Learning> II P351
Ques1on2: Can we find a 100% accurate model?
• Expected risk
• Empirical risk
• Choose a family for candidate predic1on func1ons
• Error
Case study: Predic1ve Demographic
Problem
Dataset genera1on Choose a Model
Train
Test
model ensemble
Predictor on product
ML problem? What kind? Labels? Evalua1on metric? Possible features? (show, ad vote, ad selec1on, search…) Accessible?
1. Familiar? (NB, ANN, LR, Tree, SVM) 2. Computa1onal cost? Interpretability?
Precision? 3. Data: amount? noise ra1o?
Feature extrac1on (‘show’, ‘ad vote’, ‘ad selec1on’) feature analysis (remove ‘ad selec1on’) Load login profile
Evalua1on (AUC, Precision-‐recall)
Tuning
Try more features(add ‘OS’, ‘browser’, ‘flash’) Feature selec1on (remove ‘flash’, and non anonymous features) Try more models
Predictor
Scoring Online Update
Challenges (Noise, different Join distribu1on, evalua1on)
Challenges in Machine learning
• Data – Sparse data in high dimensions – Limited labels
• Computa1on Cost – Speed Up advanced models – Paralleliza1on
• Applica1on – Structured predic1on
Resource
• Conference • Books • Lectures • Dataset
Top conference
• ICML • NIPS • IJCAI/AAAI • KDD • Other related – WSDM, WWW, SIGIR, CIKM, ICDE, ICDM
Books
• Machine Learning [link] by Mitchell • Pa-ern Recogni0on and Machine Learning [link] by Bishop • The Elements of Sta0s0cal Learning [link] • Scaling Up Machine Learning [link]
Lectures
• Machine Learning open class – by Andrew Ng – Video in YouTube
• Advanced topics in Machine Learning – Cornell
• h>p://videolectures.net/
Other research resource
• Research Organs – Yahoo Research [link] – Google Research publica1ons [link]
• Dataset – UCI machine learning Repository [link] – kaggle.com
THANKS