52
Course info Machine Learning Real life problems Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

Lecture 1: Machine Learning Problem

Qinfeng (Javen) Shi

28 July 2014

Intro. to Stats. Machine LearningCOMP SCI 4401/7401

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 2: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

Table of Contents I

1 Course info

2 Machine LearningWhat’s Machine Learning?Types of LearningOverfittingOccam’s Razor

3 Real life problemsTypical assumptionsLarge-scale dataStructured dataChanging environment

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 3: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

Enrolment

Enrol yourself in Forum for messages, assignments and slidesLink: https://forums.cs.adelaide.edu.au/login/index.phpGo to Course “ISML-S2-2014”

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 4: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

Assessment

The course includes the following assessment components:

Final written exam at 55% (open book).

Three assignments at 15% each (report and code).

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 5: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

Required Skills

Ability to program in Matlab, C/C++ is required.

Knowing some basic statistics, probability, linear algebra andoptimisation would be helpful, but not essential. They will becovered when needed.

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 6: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

Recommended books

1 Pattern Recognition and Machine Learning by Bishop,Christopher M.

2 Kernel Methods for Pattern Analysis by John Shawe-Taylor,Nello Cristianini

3 Convex Optimization by Stephen Boyd and LievenVandenberghe

Book 1 is for machine learning in general. Book 2 focuses onkernel methods with pseudo code and some theoretical analysis.Book 3 gives introduction to (Convex) Optimization.

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 7: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

External courses

Learning from the Data by Yaser Abu-Mostafa in Caltech.

Machine Learning by Andrew Ng in Stanford.

Machine Learning (or related courses) by Nando de Freitas inUBC (now Oxford).

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 8: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

What’s Machine Learning?Types of LearningOverfittingOccam’s Razor

Machine Learning

Using data to uncover an underlying process.

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 9: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

What’s Machine Learning?Types of LearningOverfittingOccam’s Razor

Formulation

Input: x ∈ X (feature)

Output: y ∈ Y (label)

Underlying process (unknown) f : X→ Y

Data: {(xi , yi )}Ni=1

⇓ Learn

Decision function g : X→ Y, such that g ≈ f .

For a new x′, predict y ′ = g(x′).

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 10: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

What’s Machine Learning?Types of LearningOverfittingOccam’s Razor

Formulation

Input: x ∈ X (feature)

Output: y ∈ Y (label)

Underlying process (unknown) f : X→ Y

Data: {(xi , yi )}Ni=1

⇓ Learn

Decision function g : X→ Y, such that g ≈ f .

For a new x′, predict y ′ = g(x′).

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 11: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

What’s Machine Learning?Types of LearningOverfittingOccam’s Razor

Formulation

Input: x ∈ X (feature)

Output: y ∈ Y (label)

Underlying process (unknown) f : X→ Y

Data: {(xi , yi )}Ni=1

⇓ Learn

Decision function g : X→ Y, such that g ≈ f .

For a new x′, predict y ′ = g(x′).

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 12: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

What’s Machine Learning?Types of LearningOverfittingOccam’s Razor

Formulation

Input: x ∈ X (feature)

Output: y ∈ Y (label)

Underlying process (unknown) f : X→ Y

Data: {(xi , yi )}Ni=1

⇓ Learn

Decision function g : X→ Y, such that g ≈ f .

For a new x′, predict y ′ = g(x′).

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 13: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

What’s Machine Learning?Types of LearningOverfittingOccam’s Razor

Formulation

Input: x ∈ X (feature)

Output: y ∈ Y (label)

Underlying process (unknown) f : X→ Y

Data: {(xi , yi )}Ni=1

⇓ Learn

Decision function g : X→ Y, such that g ≈ f .

For a new x′, predict y ′ = g(x′).

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 14: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

What’s Machine Learning?Types of LearningOverfittingOccam’s Razor

Examples

X Y

(age, education, occupation, ...) → income > $50k p.a.?

→ {0, 1, ..., 9}

→ {John, Jenny , ...}

To learn decision function g : X→ Y. What’s g like?

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 15: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

What’s Machine Learning?Types of LearningOverfittingOccam’s Razor

Examples

X Y

(age, education, occupation, ...) → income > $50k p.a.?

→ {0, 1, ..., 9}

→ {John, Jenny , ...}

To learn decision function g : X→ Y. What’s g like?

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 16: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

What’s Machine Learning?Types of LearningOverfittingOccam’s Razor

Examples

X Y

(age, education, occupation, ...) → income > $50k p.a.?

→ {0, 1, ..., 9}

→ {John, Jenny , ...}

To learn decision function g : X→ Y. What’s g like?

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 17: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

What’s Machine Learning?Types of LearningOverfittingOccam’s Razor

Examples

X Y

(age, education, occupation, ...) → income > $50k p.a.?

→ {0, 1, ..., 9}

→ {John, Jenny , ...}

To learn decision function g : X→ Y. What’s g like?

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 18: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

What’s Machine Learning?Types of LearningOverfittingOccam’s Razor

Examples

X Y

(age, education, occupation, ...) → income > $50k p.a.?

→ {0, 1, ..., 9}

→ {John, Jenny , ...}

To learn decision function g : X→ Y. What’s g like?

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 19: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

What’s Machine Learning?Types of LearningOverfittingOccam’s Razor

Decision functions

Inner product

For vectors x = [x1, x2, · · · , xd ]>,w = [w1,w2, · · · ,wd ]>, theinner product

〈x,w〉 =d∑

i=1

x iw i .

We write x,w ∈ Rd to say they are d-dimensional real numbervectors. We consider all vectors as column vectors by default.

Sign function

For any scalar a ∈ R,

sign(a) =

{1 if a > 0−1 otherwise

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 20: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

What’s Machine Learning?Types of LearningOverfittingOccam’s Razor

Decision functions

Inner product

For vectors x = [x1, x2, · · · , xd ]>,w = [w1,w2, · · · ,wd ]>, theinner product

〈x,w〉 =d∑

i=1

x iw i .

We write x,w ∈ Rd to say they are d-dimensional real numbervectors. We consider all vectors as column vectors by default.

Sign function

For any scalar a ∈ R,

sign(a) =

{1 if a > 0−1 otherwise

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 21: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

What’s Machine Learning?Types of LearningOverfittingOccam’s Razor

Decision functions

Typical decision functions for classification 1 :

Binary-class g(x; w) = sign(〈x,w〉).

Multi-class g(x; w) = argmaxy∈Y

(〈x,wy 〉).

where w,wy are the parameters, and x,w,wy ∈ Rd .

Parameterisation

To learn g is to learn w or wy .

1for b ∈ R, more general form 〈x,w〉+ b can be rewritten as 〈[x; 1], [w; b]〉Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 22: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

What’s Machine Learning?Types of LearningOverfittingOccam’s Razor

Decision functions

Typical decision functions for classification 1 :

Binary-class g(x; w) = sign(〈x,w〉).

Multi-class g(x; w) = argmaxy∈Y

(〈x,wy 〉).

where w,wy are the parameters, and x,w,wy ∈ Rd .

Parameterisation

To learn g is to learn w or wy .

1for b ∈ R, more general form 〈x,w〉+ b can be rewritten as 〈[x; 1], [w; b]〉Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 23: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

What’s Machine Learning?Types of LearningOverfittingOccam’s Razor

Types of Learning

1 Supervised Learning

2 Unsupervised Learning

3 Semi-supervised Learning

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 24: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

What’s Machine Learning?Types of LearningOverfittingOccam’s Razor

Supervised Learning

Definition

Given input-output data pairs {(xi , yi )}ni=1 sampled from anunknown but fixed distribution p(x, y), the goal is to learng : X→ Y, g ∈ G s.t. p(g(x) 6= y) is small.

p(g(x) 6= y) (i.e. expected testing error) is generalisation error.

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 25: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

What’s Machine Learning?Types of LearningOverfittingOccam’s Razor

Supervised Learning

Coin recognition (vending machines and parking meters).

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 26: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

What’s Machine Learning?Types of LearningOverfittingOccam’s Razor

Supervised Learning

We have (input, correct output) in the training data.

Mass

Size

$2

$1

50c

20c10c

5c

Mass

Size

$2

$1

50c

20c10c

5c

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 27: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

What’s Machine Learning?Types of LearningOverfittingOccam’s Razor

Unsupervised Learning

Instead of (input, correct output), we have (input, ?).

Mass

Size

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 28: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

What’s Machine Learning?Types of LearningOverfittingOccam’s Razor

Semi-supervised Learning

We have some (input, correct output), and some (input, ?).

Mass

Size

$2

$1

50c

20c10c

5c

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 29: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

What’s Machine Learning?Types of LearningOverfittingOccam’s Razor

Overfitting

Fitting the training data too well cause a problem.

Training

Testing

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 30: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

What’s Machine Learning?Types of LearningOverfittingOccam’s Razor

Overfitting

Train on training data (testing data are hidden from us).

Training

Testing

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 31: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

What’s Machine Learning?Types of LearningOverfittingOccam’s Razor

Overfitting

Two possible models. Which model fits the training data better?

Training

Testing

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 32: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

What’s Machine Learning?Types of LearningOverfittingOccam’s Razor

Overfitting

Two possible models. Which model fits the testing data better?

Training

Testing

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 33: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

What’s Machine Learning?Types of LearningOverfittingOccam’s Razor

Overfitting

Reveal the testing data.

Training

Testing

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 34: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

What’s Machine Learning?Types of LearningOverfittingOccam’s Razor

Occam’s Razor

“The simplest model that fits the data is also the most plausible.”

Two questions:

1 What does it mean for a model to be simple?

2 Why simpler is better?

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 35: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

What’s Machine Learning?Types of LearningOverfittingOccam’s Razor

Occam’s Razor

“The simplest model that fits the data is also the most plausible.”

Two questions:

1 What does it mean for a model to be simple?

2 Why simpler is better?

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 36: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

What’s Machine Learning?Types of LearningOverfittingOccam’s Razor

Occam’s Razor

“The simplest model that fits the data is also the most plausible.”

Two questions:

1 What does it mean for a model to be simple?

2 Why simpler is better?

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 37: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

What’s Machine Learning?Types of LearningOverfittingOccam’s Razor

Simpler means less complex

Model complexity – two types:1 complexity of the function g : order of a polynomial, MDL

a straight line (order 0 or 1) is simpler than a quadraticfunction (order 2).computer program: 100 bits simpler than 1000 bits

2 complexity of the space G: |G |, VC dimension, noise-fitting, ...

Often used in proofs.

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 38: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

What’s Machine Learning?Types of LearningOverfittingOccam’s Razor

Simpler is better

1 What do you mean by “better”?

smaller generalisation error (e.g. smaller expected testingerror).

2 Why simpler is better?

Practically implemented by regularisation techniques, whichwill be covered in Lecture 2.Theoretically answered by generalisation bounds, which will becovered in Learning Theory in Lecture 12.

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 39: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

Typical assumptionsLarge-scale dataStructured dataChanging environment

Typical assumptions

1 Small-scale data

Model fits in the memoryData fit in the memory or at least the diskComputer is fast enough

2 {(xi , yi )}Ni=1 are independent and identically distributed (i.i.d.)samples from p(x, y)

3 Underlying process (f (x) or p(x, y)) unknown but fixed

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 40: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

Typical assumptionsLarge-scale dataStructured dataChanging environment

In real life things are more complex

1 Small-scale data

Large-scale → Random Projection

2 {(xi , yi )}Ni=1 are independent and identically distributed (i.i.d.)samples from p(x, y)

Correlated → Structured Learning and Graphical Models

3 Underlying process unknown but fixed

Changing environment → Online Learning (with StructuredData)

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 41: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

Typical assumptionsLarge-scale dataStructured dataChanging environment

In real life things are more complex

1 Small-scale data

Large-scale → Random Projection

2 {(xi , yi )}Ni=1 are independent and identically distributed (i.i.d.)samples from p(x, y)

Correlated → Structured Learning and Graphical Models

3 Underlying process unknown but fixed

Changing environment → Online Learning (with StructuredData)

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 42: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

Typical assumptionsLarge-scale dataStructured dataChanging environment

In real life things are more complex

1 Small-scale data

Large-scale → Random Projection

2 {(xi , yi )}Ni=1 are independent and identically distributed (i.i.d.)samples from p(x, y)

Correlated → Structured Learning and Graphical Models

3 Underlying process unknown but fixed

Changing environment → Online Learning (with StructuredData)

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 43: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

Typical assumptionsLarge-scale dataStructured dataChanging environment

Large-scale data

Assumption 1: Small-scale data.

Web topic classification: 4.4 million data, input vector 1.8million dimensions, and output 7k classes?

argmaxy∈Y(〈x,wy 〉)? No! “store all wy” ≈ 100G memory.

Our methods:

Loading data, training and testing on 804, 414 news articles topredict the topics in 25.16s!Training 4.4 million data in 0.5 hours (normally 2000 days).

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 44: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

Typical assumptionsLarge-scale dataStructured dataChanging environment

Large-scale data

Assumption 1: Small-scale data.

Web topic classification: 4.4 million data, input vector 1.8million dimensions, and output 7k classes?

argmaxy∈Y(〈x,wy 〉)? No! “store all wy” ≈ 100G memory.

Our methods:

Loading data, training and testing on 804, 414 news articles topredict the topics in 25.16s!Training 4.4 million data in 0.5 hours (normally 2000 days).

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 45: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

Typical assumptionsLarge-scale dataStructured dataChanging environment

Structured data

Assumption 2: {(xi , yi )}Ni=1 are independent.

Figure : Tennis action recognition

Most likely actions = argmaxy1,y2,y3,y4 P(y1, y2, y3, y4|Image).y = (y1, y2, y3, y4) is a structure of an array.

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 46: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

Typical assumptionsLarge-scale dataStructured dataChanging environment

Structured data

Assumption 2: {(xi , yi )}Ni=1 are independent.

Figure : Tennis action recognition

Most likely actions = argmaxy1,y2,y3,y4 P(y1, y2, y3, y4|Image).

y = (y1, y2, y3, y4) is a structure of an array.

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 47: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

Typical assumptionsLarge-scale dataStructured dataChanging environment

Structured data

Assumption 2: {(xi , yi )}Ni=1 are independent.

Figure : Tennis action recognition

Most likely actions = argmaxy1,y2,y3,y4 P(y1, y2, y3, y4|Image).y = (y1, y2, y3, y4) is a structure of an array.

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 48: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

Typical assumptionsLarge-scale dataStructured dataChanging environment

Structured data

2

Structured output: a sequence, a tree, or a network, ...

2courtesy of B. TaskarQinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 49: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

Typical assumptionsLarge-scale dataStructured dataChanging environment

Online Learning

Assumption 3 fails: Underlying process changes. +Assumption 1 fails too. i.e. We have Large-scale data.

⇓Online Learning (OL): predicting answers for a sequence ofquestions.

processing one datum at a time (theoretical guarantee)

no assumption on underlying process being fixedProblem 1: it does not scale for structured data.

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 50: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

Typical assumptionsLarge-scale dataStructured dataChanging environment

Online Learning

Assumption 3 fails: Underlying process changes. +Assumption 1 fails too. i.e. We have Large-scale data.⇓Online Learning (OL): predicting answers for a sequence ofquestions.

processing one datum at a time (theoretical guarantee)

no assumption on underlying process being fixed

Problem 1: it does not scale for structured data.

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 51: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

Typical assumptionsLarge-scale dataStructured dataChanging environment

Online Learning

Assumption 3 fails: Underlying process changes. +Assumption 1 fails too. i.e. We have Large-scale data.⇓Online Learning (OL): predicting answers for a sequence ofquestions.

processing one datum at a time (theoretical guarantee)

no assumption on underlying process being fixedProblem 1: it does not scale for structured data.

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem

Page 52: Lecture 1: Machine Learning Problem - University of Adelaidejaven/talk/L1 ML problem.pdf · 2014-08-04 · Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro

Course infoMachine LearningReal life problems

Typical assumptionsLarge-scale dataStructured dataChanging environment

That’s all

Thanks!

Qinfeng (Javen) Shi Lecture 1: Machine Learning Problem