Upload
kiran-lonikar
View
47
Download
2
Embed Size (px)
DESCRIPTION
An introduction to machine learning. I gave a talk on this, the video can be found here: http://www.techgig.com/expert-speak/Introduction-to-Machine-Learning-616
Citation preview
1
Introduction to Machine Learning
Kiran Lonikar
2
What is learning?Tom Mitchell: Learning is to improve some performance measure P of executing some
task T with experience E.
In plain English: Performing some task better with experience and training…
Key Elements:
• Remember or memorize the past experiences E
• Generalize from the experiences E
Observe how kids learn to read words: They make mistakes even when reading
previously known words, then correct themselves. Especially happens when reading
words with silent letters, and those ending with tion.
Warning: This is a highly mathematical subject!
3
What is Machine Learning
How would you build a computer program which “learns” from experiences?
Generally a three phase process
• Express Experience E mathematically: Build a
set of features related to the experiences (Feature
Extraction from raw data)
• Memorize and Generalize: Build a mathematical
model or set of rules from the experiences (training)
• Apply the mathematical model to features of the
future tasks
4
Machine Learning in Action…
• OCR in web pages:
http://newscarousel.herokuapp.com/scribble-js/Scribble.html
• Word Lens mobile app
5
Types of ML Systems
• Supervised Learning
• Classification
• Logistic Regression, SVM, NB, Decision Trees, ANN etc.
• Regression
• Recommender Systems*
• User-user/item-item similarity, matrix factorization etc.
• Unsupervised Learning
• Clustering
• K-means, Fuzzy K-Means, Model based (LDA) clustering etc.
• Dimensionality reduction
• Principal Component Analysis (PCA)
• Anomaly Detection
6
Classification
Identify speaker’s gender from the voice spectrum
• Training: Build a model using data: {(a1, f1,
g1), (a2, f2, g2), … (am, fm, gm)}
• Logistic Regression (LR): p(g = F | a, f; θ)
= hθ(θ0 + θ1a + θ2f)
• Decision Boundary: p < 0.5, g = M, else F
Am
plit
ude
Frequency
7
Logistic Regression• If we let
• y = 1 when g = F, and y = 0 when g = M, and define vector x = [a, f]
• and define a function hθ(x) = sigmoid(θT*x) where
sigmoid(z) = 1/(1+e-z). It represents probability
p(y=1|x,θ).
• Cost J(θ) = -Σ(y*log(h) + (1-y)*log(1-h)) -λθTθ over all training examples for some λ.
• Optimization algorithm (gradient descent): Obtain θ which minimizes J(θ).
• Try to fit model θ to cross validation data, vary λ for
optimum fitment.
• Test model θ against test data: hθ(x) ≥ 0.5, predict
gender = F, otherwise predict gender as M.
8
Recommender Systems
• User j specifies ratings for item i: y(i,j) Training Data
• Guess ratings for other items: The blanks
• Collaborative Filtering: k features of each item:
• Feature vector xi for item i: {xi1,x
i2, … xi
k}
• Parameter Vector θj for user j: {θj1, θj
2, … θjk}
• For user j’s estimated rating for item i: (θj)T xi
Ite
ms
Users
1 5
3
4
2
5
2
4
2
1
3
2
4
5
3
3
2
4
3
3
1
1
3
4
5
9
Recommender Systems
• Learn xi and θj:
• Given xi , minimizeΣ((θj)T xi - y(i,j))2 for all i where user j
has rated item i to find optimum θj.
• Given θj, minimize Σ((θj)T xi - y(i,j))2 for all j where user jhas rated item i to find optimum xi.
• Simultaneously: minimize Σ((θj)T xi - y(i,j))2 for all (i,j)where user j has rated item i to find optimum θj and xi.
• Find factors X and ϴ of ratings matrix Y such that Y ≈ X ϴT
• Other Algorithms: user-user similarity, item-item
similarity
• Useful even when users are not humans, for e.g..
Wiki documents as users and links as items.
10
Clustering
• Example: Top two occurring terms in documents
• Training set: {x1, x2, x3, … xm}, vector xi
• No labels (yi) specified
#Term
2
#Term 1
11
Clustering: Applications
• Computer Science
• Document Clustering
• Google news: Organizing similar news from different sources
• News Categorizing
• Social networks analysis
• Features reduction: Speeding up ML pipelines
• Cluster Centroids as new features
• Image compression (Reduce number of colors): Pre-processing for faster, memory efficient computations
• Deep Learning: Alternate supervised and unsupervised learning
• Recommender Systems
• Physics:
• Astronomy
• Particle physics
• Market segmentation
• http://en.wikipedia.org/wiki/Cluster_analysis#Applications
12
K-Means Clustering
1. Randomly choose initial cluster centroids
2. Assign each training example to a cluster: Pick
closest centroid
3. Move centroids: Re-compute centroids as average
of training points assigned
4. Repeat 2, 3 for max iterations count or convergence
#Term
2
#Term 1
13
Popular Machine Learning Tools
• Apache Mahout:
• Various Recommender Systems, clustering, and classification algorithms
• Java based, with some algorithms having Hadoop Map-
Reduce implementations. Recently started spark
implementations, with a new ML DSL.
• Stable, widely used in production, community support.
• R:
• Popular in statistics world. Has its own language
• GNU license
• Spark MLLib, Mlbase(http://www.mlbase.org/)
• Scala based. Runs on spark (in memory, distributed)
14
Popular Machine Learning tools
• Weka:
• Java based
• GNU License
• Vowpal Wabbit: http://hunch.net/~vw/,
https://github.com/JohnLangford/vowpal_wabbit
• Google Prediction API
• http://en.wikipedia.org/wiki/Machine_learning#Soft
ware
15
Machine Learning In Action
• Mobile:
• Speech Recognition: Google Now, Siri
• Languages/NLP: Google Translate
• Vision: face recognition in cameras and online photos, OCR
• Misc: Handwriting driven Myscript calculator and Stylus keyboard
• Applications
• OCR of printed documents and handwriting
• Automatic tagging of photos based on similar faces
• Biology and Medicine:
• DNA analysis for likelihood of diseases, personalized drugs etc.
16
Resources• Online Courses:
• Coursera: Machine Learning (Andrew Ng)
• Coursera: Neural Networks for Machine Learning (Geoffrey Hinton)
• Udacity: Intro to Artificial Intelligence (Peter Norvig, Sebastian Thrun)
• CMU: Introduction to Machine Learning (Alex Smola)
• Berkely: Scalable Machine Learning (Alex Smola)
• Books:
• Pattern Recognition and Machine Learning: Christopher Bishop
• Machine Learning: Tom Mitchell
• Mahout In Action
• Artificial Intelligence: A modern approach (http://aima.cs.berkeley.edu/)
• Machine Learning in Action
17
Resources
• Quora:
• http://www.quora.com/How-do-you-explain-Machine-Learning-and-Data-Mining-to-non-Computer-Science-people
• http://www.quora.com/Machine-Learning
• Misc.:
• http://fastml.com/
• http://alex.smola.org/
• https://funnel.hasgeek.com/fifthel2014/1132-realizing-large-scale-distributed-deep-learning-ne
• http://spark-summit.org/2014/agenda
• Tutorial on HMM, Speech Recognition: Rabiner
• Tesseract OCR library