20
Introduction to Data Science Frank Kienle Machine Learning

Machine Learning part 2 - Introduction to Data Science

Embed Size (px)

Citation preview

Introduction to Data Science

Frank Kienle Machine Learning

Artificial intelligence is … the term "artificial intelligence" is applied when a machine mimics "cognitive" functions that humans associate with other human minds, such as "learning" and "problem solving" Machine Learning is … an algorithm that can learn from data without relying on rules-based programming. Statistical Modeling is … formalization of relationships between variables in the form of mathematical equations.

Machine Learning vs. Statistical Modeling

01/08/2017 Frank Kienle, p. 35

A computer program is said to learn form experience (E) with some class of tasks (T) and a performance measure (P) if its performance at tasks in T as measured by P improves with E Learning = Improving with experience at some task

•  Improve over task T •  With respect to performance measure P •  Base on experience E

Example Spam Filtering: Spam is all email the user does not want to receive and has not asked to receive

•  T: Identify Spam Emails •  P: % of spam emails that where filtered - % of ham/(non-spam) emails that where incorrectly

filtered out •  E: a database of emails that were labelled by users

Machine Learning

01/08/2017 p. 36

optical character recognition: •  categorize images of handwritten characters by the letters represented

face detection:

•  find faces in images (or indicate if a face is present)

customer segmentation:

•  predict, for instance, which customers will respond to a particular promotion

fraud detection:

•  identify credit card transactions (for instance) which may be fraud- ulent in nature

demand prediction:

•  predict demand for individual products

Examples of Machine Learning

01/08/2017 Frank Kienle, p. 37

Batch processing: Most of the machine learning algorithms assume that we are mining a database. That is, all our data is available when and if we want it. Stream processing for e.g. machinery sensors: data arrives in a stream or streams, and if it is not processed immediately or stored, then it is lost forever. Both can be embedding in fault tolerant architectures: See for example Lambda (http://lambda-architecture.net) architecture or the Kappa architecture (kappa-architecture.com) for further discussion (discussed in a separate lecture)

Batch Processing vs Stream Processing

01/08/2017 Frank Kienle, p. 38

Machine Learning Overview

01/08/2017 39

Machine Learning

Supervised

Regression Classification

Unsuperwised

Clustering Dimension Reduction

what is the difference between supervised and un-supervised learning?

what is the difference between regression problem and classification problem?

Unsupervised •  Clustering & Dimensionality

Reduction •  SVD •  PCA •  K-means

•  Association Analysis •  Apriori •  FP-Growth

•  Hidden Markov Model

Supervised •  Regression

•  Linear •  Polynomial

•  Decision Trees •  Random Forests

•  Classification •  KNN •  Trees •  Logistic Regression •  Naïve Bayes •  SVM

Machine Learning Algorithms (small excerpt)

01/08/2017 Frank Kienle, p. 40

Cont

inuo

us

Cate

goric

al

It is all about the assumption of the underlying model

Machine Learning

01/08/2017 Frank Kienle, p. 41

input: x output: y What is the best relation (function) between x and y, which can be used for mapping new examples of x to infer a output y

Input to output example

01/08/2017 Frank Kienle, p. 42

Input to output example

01/08/2017 Frank Kienle, p. 43

Model hypothesis

input: x output: y By making an initial hypothesis on the model structure h(x) we can infer the model parameters w Describe

The process to infer the model parameters is denoted as learning in the following Describe

01/08/2017 Frank Kienle, p. 44

Model hypothesis

input: x output: y By applying the model on a new input variable we obtain a new estimate: Describe

The process to infer the model parameters is denoted as learning in the following Applying the learned model to new input data will lead to an inferred result. This process is denoted as prediction. The term inference and prediction are used as synonyms in the following. Describe

y

y

Input to output example

01/08/2017 Frank Kienle, p. 45

Model hypothesis

Supervised learning is the machine learning task of inferring a function from labeled training data. The training data consist of a set of training examples Describe

How can we derive the ,best’ model parameters Choose model parameters so that all used training samples x will result in a nearby result h(x) to y y supervises the learning process, Describe

Supervised learning

01/08/2017 Frank Kienle, p. 46

Model hypothesis

The mean square error (MSE) is the average of the squares of the errors or deviations.

Supervised learning: cost function and MSE

Cost function

MSE =1

n

nX

i=1

(yi � yi)2

finding the parameters w which minimizes this cost function will result in the estimator with the smallest possible MSE

Typical regression scenario with more input variables

01/08/2017 Frank Kienle, p. 47

x0 Rpm x1

Gas x2

Valve x3

Temp x4

Watt y

1 500 5.8 5 200 3

1 900 4.5 9 400 5

1 2500 13 15 400 5

1 3000 95 90 400 100

X =

2

664

1 500 5.8 2001 900 4.5 4001 2500 13 4001 3000 90 400

3

775 y =

2

664

355100

3

775

Typical classification scenario with more input variables

01/08/2017 Frank Kienle, p. 48

x0 Rpm x1

Gas x2

Valve x3

Temp x4

Watt x5

Status y

1 500 5.8 5 200 3 0

1 900 4.5 9 400 5 0

1 2500 13 15 400 5 0

1 3000 95 90 400 100 1

X =

2

664

1 500 5.8 2001 900 4.5 4001 2500 13 4001 3000 90 400

3

775

m: training samples (rows) n: features (columns) X: design matrix, feature matrix y: target vector (or sometimes denoted with t)

Supervised Learning: terminology

01/08/2017 Frank Kienle, p. 49

ky � h(X,w)k2 =

������������

0

BBBBBB@

y1

y2......ym

1

CCCCCCA�

0

BBBBBB@

x1,1 x1,2 · · · x1,n

x2,1 x2,2 0 x2,n...

.... . .

......

.... . .

...xm,1 xm,2 · · · xm,n

1

CCCCCCA

0

BBB@

w1

w2...wn

1

CCCA

������������

2

=mX

i=1

������ym �

nX

j=1

xj,iwi

������

2

.

A computer program is said to learn form experience (E) with some class of tasks (T) and a performance measure (P) if its performance at tasks in T as measured by P improves with E Learning = Improving with experience at some task

•  Improve over task T à model the target •  With respect to performance measure P à define the cost function •  Base on experience E à by using historic data

Machine Learning

01/08/2017 Frank Kienle, p. 50

Machine Learning (technical steps)

01/08/2017 Frank Kienle, p. 51

Training Phase

Prediction

data

Pre-processing

Prepare for cleaned/correct information and provide correct

data format

Learning

Develop new or decide for

appropriate mathematical

model

Validation

Control quality and correctness

of model

(trained)model

(trained)model

new data Prediction

52

Source: scikit-learn

01/08/2017

53

Source: scikit-learn

01/08/2017