17
1 Introduction to Machine Learning Kiran Lonikar

Introduction to machine_learning

Embed Size (px)

DESCRIPTION

An introduction to machine learning. I gave a talk on this, the video can be found here: http://www.techgig.com/expert-speak/Introduction-to-Machine-Learning-616

Citation preview

Page 1: Introduction to machine_learning

1

Introduction to Machine Learning

Kiran Lonikar

Page 2: Introduction to machine_learning

2

What is learning?Tom Mitchell: Learning is to improve some performance measure P of executing some

task T with experience E.

In plain English: Performing some task better with experience and training…

Key Elements:

• Remember or memorize the past experiences E

• Generalize from the experiences E

Observe how kids learn to read words: They make mistakes even when reading

previously known words, then correct themselves. Especially happens when reading

words with silent letters, and those ending with tion.

Warning: This is a highly mathematical subject!

Page 3: Introduction to machine_learning

3

What is Machine Learning

How would you build a computer program which “learns” from experiences?

Generally a three phase process

• Express Experience E mathematically: Build a

set of features related to the experiences (Feature

Extraction from raw data)

• Memorize and Generalize: Build a mathematical

model or set of rules from the experiences (training)

• Apply the mathematical model to features of the

future tasks

Page 4: Introduction to machine_learning

4

Machine Learning in Action…

• OCR in web pages:

http://newscarousel.herokuapp.com/scribble-js/Scribble.html

• Word Lens mobile app

Page 5: Introduction to machine_learning

5

Types of ML Systems

• Supervised Learning

• Classification

• Logistic Regression, SVM, NB, Decision Trees, ANN etc.

• Regression

• Recommender Systems*

• User-user/item-item similarity, matrix factorization etc.

• Unsupervised Learning

• Clustering

• K-means, Fuzzy K-Means, Model based (LDA) clustering etc.

• Dimensionality reduction

• Principal Component Analysis (PCA)

• Anomaly Detection

Page 6: Introduction to machine_learning

6

Classification

Identify speaker’s gender from the voice spectrum

• Training: Build a model using data: {(a1, f1,

g1), (a2, f2, g2), … (am, fm, gm)}

• Logistic Regression (LR): p(g = F | a, f; θ)

= hθ(θ0 + θ1a + θ2f)

• Decision Boundary: p < 0.5, g = M, else F

Am

plit

ude

Frequency

Page 7: Introduction to machine_learning

7

Logistic Regression• If we let

• y = 1 when g = F, and y = 0 when g = M, and define vector x = [a, f]

• and define a function hθ(x) = sigmoid(θT*x) where

sigmoid(z) = 1/(1+e-z). It represents probability

p(y=1|x,θ).

• Cost J(θ) = -Σ(y*log(h) + (1-y)*log(1-h)) -λθTθ over all training examples for some λ.

• Optimization algorithm (gradient descent): Obtain θ which minimizes J(θ).

• Try to fit model θ to cross validation data, vary λ for

optimum fitment.

• Test model θ against test data: hθ(x) ≥ 0.5, predict

gender = F, otherwise predict gender as M.

Page 8: Introduction to machine_learning

8

Recommender Systems

• User j specifies ratings for item i: y(i,j) Training Data

• Guess ratings for other items: The blanks

• Collaborative Filtering: k features of each item:

• Feature vector xi for item i: {xi1,x

i2, … xi

k}

• Parameter Vector θj for user j: {θj1, θj

2, … θjk}

• For user j’s estimated rating for item i: (θj)T xi

Ite

ms

Users

1 5

3

4

2

5

2

4

2

1

3

2

4

5

3

3

2

4

3

3

1

1

3

4

5

Page 9: Introduction to machine_learning

9

Recommender Systems

• Learn xi and θj:

• Given xi , minimizeΣ((θj)T xi - y(i,j))2 for all i where user j

has rated item i to find optimum θj.

• Given θj, minimize Σ((θj)T xi - y(i,j))2 for all j where user jhas rated item i to find optimum xi.

• Simultaneously: minimize Σ((θj)T xi - y(i,j))2 for all (i,j)where user j has rated item i to find optimum θj and xi.

• Find factors X and ϴ of ratings matrix Y such that Y ≈ X ϴT

• Other Algorithms: user-user similarity, item-item

similarity

• Useful even when users are not humans, for e.g..

Wiki documents as users and links as items.

Page 10: Introduction to machine_learning

10

Clustering

• Example: Top two occurring terms in documents

• Training set: {x1, x2, x3, … xm}, vector xi

• No labels (yi) specified

#Term

2

#Term 1

Page 11: Introduction to machine_learning

11

Clustering: Applications

• Computer Science

• Document Clustering

• Google news: Organizing similar news from different sources

• News Categorizing

• Social networks analysis

• Features reduction: Speeding up ML pipelines

• Cluster Centroids as new features

• Image compression (Reduce number of colors): Pre-processing for faster, memory efficient computations

• Deep Learning: Alternate supervised and unsupervised learning

• Recommender Systems

• Physics:

• Astronomy

• Particle physics

• Market segmentation

• http://en.wikipedia.org/wiki/Cluster_analysis#Applications

Page 12: Introduction to machine_learning

12

K-Means Clustering

1. Randomly choose initial cluster centroids

2. Assign each training example to a cluster: Pick

closest centroid

3. Move centroids: Re-compute centroids as average

of training points assigned

4. Repeat 2, 3 for max iterations count or convergence

#Term

2

#Term 1

Page 13: Introduction to machine_learning

13

Popular Machine Learning Tools

• Apache Mahout:

• Various Recommender Systems, clustering, and classification algorithms

• Java based, with some algorithms having Hadoop Map-

Reduce implementations. Recently started spark

implementations, with a new ML DSL.

• Stable, widely used in production, community support.

• R:

• Popular in statistics world. Has its own language

• GNU license

• Spark MLLib, Mlbase(http://www.mlbase.org/)

• Scala based. Runs on spark (in memory, distributed)

Page 14: Introduction to machine_learning

14

Popular Machine Learning tools

• Weka:

• Java based

• GNU License

• Vowpal Wabbit: http://hunch.net/~vw/,

https://github.com/JohnLangford/vowpal_wabbit

• Google Prediction API

• http://en.wikipedia.org/wiki/Machine_learning#Soft

ware

Page 15: Introduction to machine_learning

15

Machine Learning In Action

• Mobile:

• Speech Recognition: Google Now, Siri

• Languages/NLP: Google Translate

• Vision: face recognition in cameras and online photos, OCR

• Misc: Handwriting driven Myscript calculator and Stylus keyboard

• Applications

• OCR of printed documents and handwriting

• Automatic tagging of photos based on similar faces

• Biology and Medicine:

• DNA analysis for likelihood of diseases, personalized drugs etc.

Page 16: Introduction to machine_learning

16

Resources• Online Courses:

• Coursera: Machine Learning (Andrew Ng)

• Coursera: Neural Networks for Machine Learning (Geoffrey Hinton)

• Udacity: Intro to Artificial Intelligence (Peter Norvig, Sebastian Thrun)

• CMU: Introduction to Machine Learning (Alex Smola)

• Berkely: Scalable Machine Learning (Alex Smola)

• Books:

• Pattern Recognition and Machine Learning: Christopher Bishop

• Machine Learning: Tom Mitchell

• Mahout In Action

• Artificial Intelligence: A modern approach (http://aima.cs.berkeley.edu/)

• Machine Learning in Action

Page 17: Introduction to machine_learning

17

Resources

• Quora:

• http://www.quora.com/How-do-you-explain-Machine-Learning-and-Data-Mining-to-non-Computer-Science-people

• http://www.quora.com/Machine-Learning

• Misc.:

• http://fastml.com/

• http://alex.smola.org/

• https://funnel.hasgeek.com/fifthel2014/1132-realizing-large-scale-distributed-deep-learning-ne

• http://spark-summit.org/2014/agenda

• Tutorial on HMM, Speech Recognition: Rabiner

• Tesseract OCR library