48
1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

Embed Size (px)

Citation preview

Page 1: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

1

CSE 446Machine Learning

Daniel Weld

Xiao Ling

Congle Zhang

Page 2: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

2©2005-2009 Carlos Guestrin

What is Machine Learning ?

Page 3: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

3©2005-2009 Carlos Guestrin

Machine Learning

Study of algorithms that improve their performance at some task with experience

Data UnderstandingMachine Learning

Page 4: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

4

Why?

Is this topic important?

Page 5: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

5©2005-2009 Carlos Guestrin

Exponential Growth in Data

Data UnderstandingMachine Learning

Page 6: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

6©2005-2009 Carlos Guestrin

Supremacy of Machine Learning

Machine learning is preferred approach to Speech recognition, Natural language processing Web search – result ranking Computer vision Medical outcomes analysis Robot control Computational biology Sensor networks …

This trend is accelerating Improved machine learning algorithms Improved data capture, networking, faster computers Software too complex to write by hand New sensors / IO devices Demand for self-customization to user, environment

Page 7: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

7

Logistics

Page 8: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

8©2005-2009 Carlos Guestrin, D. Weld,

Syllabus

Covers a wide range of Machine Learning techniques – from basic to state-of-the-art

You will learn about the methods you heard about: Naïve Bayes, logistic regression, nearest-neighbor, decision trees, boosting, neural

nets, overfitting, regularization, dimensionality reduction, error bounds, loss function, VC dimension, SVMs, kernels, margin bounds, K-means, EM, mixture models, semi-supervised learning, HMMs, graphical models, active learning

Covers algorithms, theory and applications It’s going to be fun and hard work

Page 9: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

9©2005-2009 Carlos Guestrin

Prerequisites

Probabilities Distributions, densities, marginalization…

Basic statistics Moments, typical distributions, regression…

Algorithms Dynamic programming, basic data structures, complexity…

Programming Mostly your choice of language, but Python (NumPy) + Matlab will be very

useful We provide some background, but the class will be fast paced

Ability to deal with “abstract mathematical concepts”

Page 10: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

10

Staff

Two Great TAs: Fantastic resource for learning, interact with them!

Xiao Ling, CSE 610, xiaoling@cs Office hours: TBA

Congle Zhang, CSE 524, clzhang@cs Office hours: TBA

Administrative Assistant Alicen Smith, CSE 546, asmith@cs

Page 11: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

11

Text Books

Required Text: Pattern Recognition and Machine Learning; Chris Bishop

Optional: Machine Learning; Tom Mitchell The Elements of Statistical Learning: Data Mining, Inference,

and Prediction; Trevor Hastie, Robert Tibshirani, Jerome Friedman

Information Theory, Inference, and Learning Algorithms; David MacKay

Website: Andrew Ng’s AI class videos Website: Tom Mitchell’s AI class videos

Page 12: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

12

Grading

4 homeworks (55%) First one goes out Fri 1/6/12

Start early, Start early, Start early, Start early, Start early, Start early, Start early, Start early, Start early, Start early

Midterm (15%) Circa Feb 10 in class

Final (30%) TBD by registrar

Page 13: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

13©2005-2009 Carlos Guestrin

Homeworks

Homeworks are hard, start early Due at the beginning of class

Minus 33% credit for each day (or part of day) late All homeworks must be handed in, even for zero credit

Collaboration You may discuss the questions Each student writes their own answers Write on your homework anyone with whom you collaborate Each student must write their own code for the programming part Please don’t search for answers on the web, Google, previous years’

homeworks, etc. Ask us if you are not sure if you can use a particular reference

Page 14: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

14

Communication

Main discussion board https://catalyst.uw.edu/gopost/board/xling/25219/

Urgent announcements  cse446@cs  Subscribe: 

http://mailman.cs.washington.edu/mailman/listinfo/cse446 

To email instructors, always use: cse446_instructor@cs

Page 15: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

15

Space of ML Problems

What is B

eing Learned?

Type of Supervision (eg, Experience, Feedback)

LabeledExamples

Reward Nothing

Discrete Function

Classification Clustering

Continuous Function

Regression

Policy Apprenticeship Learning

ReinforcementLearning

Page 16: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

16©2009 Carlos Guestrin

Classification

from data to discrete classes

Page 17: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

17©2005-2009 Carlos Guestrin

Page 18: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

Spam filtering

18©2009 Carlos Guestrin

data prediction

Page 19: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

19©2009 Carlos Guestrin

Text classification

Company home page

vs

Personal home page

vs

Univeristy home page

vs

Page 20: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

20©2009 Carlos Guestrin

Object detection

Example training images for each orientation

(Prof. H. Schneiderman)

Page 21: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

21©2009 Carlos Guestrin

Reading a noun (vs verb)

[Rustandi et al., 2005]

Page 22: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

Weather prediction

22©2009 Carlos Guestrin

Page 23: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

The classification pipeline

23©2009 Carlos Guestrin

Training

Testing

Page 24: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

24©2009 Carlos Guestrin

Regression

predicting a numeric value

Page 25: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

Stock market

25©2009 Carlos Guestrin

Page 26: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

Weather prediction revisted

26©2009 Carlos Guestrin

Temperature

Page 27: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

27©2009 Carlos Guestrin

Modeling sensor data

Measure temperatures at some locations

Predict temperatures throughout the environment

SERVER

LAB

KITCHEN

COPYELEC

PHONEQUIET

STORAGE

CONFERENCE

OFFICEOFFICE50

51

52 53

54

46

48

49

47

43

45

44

42 41

3739

38 36

33

3

6

10

11

12

13 14

1516

17

19

2021

22

242526283032

31

2729

23

18

9

5

8

7

4

34

1

2

3540

[Guestrin et al. ’04]

Page 28: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

28©2009 Carlos Guestrin

Clustering

discovering structure in data

Page 29: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

Clustering Data: Group similar things

Page 30: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

Clustering images

30©2009 Carlos Guestrin [Goldberger et al.]

Set of Images

Page 31: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

Clustering web search results

31©2009 Carlos Guestrin

Page 32: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

32©2009 Carlos Guestrin

Reinforcement Learning

training by feedback

Page 33: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

33

Reinforcement Learning

©2005-2009 Carlos Guestrin

Page 34: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

35©2009 Carlos Guestrin

Learning to act

Reinforcement learning An agent

Makes sensor observations Must select action Receives rewards

positive for “good” states negative for “bad” states

[Ng et al. ’05]

Page 35: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

36

In Summary

What is B

eing Learned?

Type of Supervision (eg, Experience, Feedback)

LabeledExamples

Reward Nothing

Discrete Function

Classification Clustering

Continuous Function

Regression

Policy Apprenticeship Learning

ReinforcementLearning

Page 36: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

37

In Summary

What is B

eing Learned?

Type of Supervision (eg, Experience, Feedback)

LabeledExamples

Reward Nothing

Discrete Function

Classification Clustering

Continuous Function

Regression

Policy Apprenticeship Learning

ReinforcementLearning

Page 37: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

38

Key Concepts

Page 38: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

Classifier

0.0 1.0 2.0 3.0 4.0 5.0 6.0

0.0

1.0

2.0

3.0

Hypothesis:Function for labeling examples

++

+ +

++

+

+

- -

-

- -

--

-

-

- +

++

-

-

-

+

+ Label: -Label: +

?

?

?

?

Page 39: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

40

Generalization

Hypotheses must generalize to correctly classify instances not in the training data.

Simply memorizing training examples is a consistent hypothesis that does not generalize.

Page 40: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

ML = Function Approximation

41

c(x)

x

May not be any perfect fitClassification ~ discrete functions

h(x)

h(x) = contains(`nigeria’, x) contains(`wire-transfer’, x)

Page 41: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

© Daniel S. Weld 42

Why is Learning Possible?

Experience alone never justifies any conclusion about any unseen instance.

Learning occurs when

PREJUDICE meets DATA!

Learning a “Frobnitz”

Page 42: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

© Daniel S. Weld 43

Bias

The nice word for prejudice is “bias”.Different from “Bias” in statistics

What kind of hypotheses will you consider?What is allowable range of functions you use when

approximating?What kind of hypotheses do you prefer?

Page 43: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

© Daniel S. Weld 44

Some Typical Biases

Occam’s razor“It is needless to do more when less will suffice”

– William of Occam,

died 1349 of the Black plagueMDL – Minimum description lengthConcepts can be approximated by ... conjunctions of predicates

... by linear functions

... by short decision trees

Frobnitz?

Page 44: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

45

ML as Optimization

Specify Preference Bias aka “Loss Function”

Solve using optimization Combinatorial Convex Linear Nasty

©2005-2009 Carlos Guestrin

Page 45: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

Overfitting

Hypothesis H is overfit when H’ and H has smaller error on training examples, but H has bigger error on test examples

Page 46: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

Overfitting

Hypothesis H is overfit when H’ and H has smaller error on training examples, but H has bigger error on test examples

Causes of overfitting Training set is too small Large number of features

Big problem in machine learning One solution: Validation set

Page 47: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

© Daniel S. Weld 48

Overfitting

Accuracy

0.9

0.8

0.7

0.6

On training dataOn test data

Model complexity (e.g., number of nodes in decision tree)

Page 48: 1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

49

The Road Ahead