01 introduction student - cs.hmc.eduyjw/teaching/cs158/lectures/01_Introduction.pdfDefining the Learning Task Improve on task T, with respect to ... and using existing libraries.”

Robot Image Credit: Viktoriya Sukhanova © 123RF.com

The instructor gratefully acknowledges Eric Eaton (UPenn), who assembled the original slides, David Kauchak(Pomona), whose slides are also heavily used, and the many others who made their course materials freelyavailable online.

CS 158

Machine Learning

Instructor: Jessica Wu

ML BasicsLearning Goals

Learning GoalsDefine Machine LearningKnow when ML is useful

Machine Learning is…

Machine learning, a branch of artificialintelligence, concerns the construction andstudy of systems that can learn from data.

Slide credit: David Kauchak

Machine Learning is…

Machine learning is about predicting the futurebased on the past.

Hal Daume III


Machine learning is…

Machine Learning is the study of algorithms thatimprove their performanceat some taskwith experience .

A well defined learning task is given by < , , >.

[Definition by Tom Mitchell (1998)]

Based on slide by Eric Eaton

Samuel’s Checkers Player“Machine Learning: Field of study that givescomputers the ability to learn without beingexplicitly programmed.” Arthur Samuel (1959)

Slide credit: Eric Eaton

Traditional Programming

Machine Learning

ComputerData

ProgramOutput

ComputerData

OutputProgram

Slide credit: Pedro Domingos

Defining the Learning TaskImprove on task T, with respect to

performance metric P, based on experience ET: Playing checkers

T: Recognizing hand written words

T: Driving on four lane highways using vision sensors

T: Categorize email messages as spam or legitimate

Slide credit: Ray Mooney

(indicates you needto fill in slide)

When Do We Use Machine Learning?ML is used when:

Human expertise does not exist (navigating on Mars)Humans cannot explain their expertise (speech recognition)Models must be customized (personalized medicine)Models are based on huge amounts of data (genomics)

Learning is not always useful:There is no need to “learn” to calculate payroll

Slide credit: Eric EatonBased on slide by E. Alpaydin

A classic example of a task that requires machine learning:It is very hard to say what makes a 2

Slide credit: Geoffrey Hinton

Some examples of tasks that are best solved byusing a learning algorithm

Learning patternsPredicting passenger survivability on the TitanicRecognizing tweets as positive or negativeClustering faces by identity

State of the art applicationsAutonomous carsAutomatic speech recognitionAnomalies in credit card transactionsStock prediction

[Your favorite area here]

(This slide intentionally left blank.)

Course BasicsLearning Goals

Know course goalsKnow course policies

Drew Conway’s Data Science Venn Diagram

SubstantiveExpertise

How much Math?A Fall 2015 course project analyzed syllabi text of all 5Ccourses and found that CS 158 is ...

46.4% Math (36.4% theory + 10.0% linear systems +applications)13.6% CS (engineering, design, programming)10.7% lab sciences (research, field, analysis)10.0% research (thesis, project, writing)7.9% social science5.0% humanities6.4% other

What can you do?Evaluate your background: PS 1 has math refresher andresourcesLet me know when math is going too fast!

What have Students Said?“The course ... is well balanced in terms of math andtheory (coding) parts. The course helped me review manyimportant math theory and techniques.”

“I liked the balance between implementing algorithmsand using existing libraries.”

“I enjoyed the amount of coding for the homeworks /projects and think that it went well with the theory welearned in class.”

“I liked the real world applications of ML that we learnedabout.”

Types of LearningLearning Goals

Define three main types of machine learningList goals and inputs and outputs

Types of LearningSupervised (inductive) learning : Learn with a teacher

Given: labeled training examplesGoal: learn mapping that predicts label for test example

Unsupervised learning : Learn without a teacherGiven: unlabeled inputsGoal: learn some intrinsic structure in inputs

Reinforcement learning: Learn by interactingGiven agent interacting in environment (having set ofstates)Learn policy (state to action mapping) that maximizesagent’s reward

Slide credit: Eric EatonBased on slide by Pedro Domingos

Supervised Learning: RegressionGiven (x(1), y(1)), (x(2), y(2)), ..., (x(n), y(n))

Learn a function f(x) to predict y given xy is real valued == regression

0

1

2

3

4

5

6

7

8

9

1975 1980 1985 1990 1995 2000 2005 2010 2015

Septem

berA

rctic

SeaIceExtent

(1,000

,000

sqkm

)

YearSlide credit: Eric EatonData from G. Witt. Journal of StatisticsEducation, Volume 21, Number 1 (2013)

Supervised Learning: ClassificationGiven (x(1), y(1)), (x(2), y(2)), ..., (x(n), y(n))

Learn a function f(x) to predict y given xy is categorical == classification

1(Malignant)

0(Benign)

Tumor Size

Breast Cancer (Malignant / Benign)

Tumor SizeBased on slide by Eric Eaton[example originally by Andrew Ng]

Predict MalignantPredict Benign

Supervised Learning

Tumor Size

Age

Clump ThicknessUniformity of Cell SizeUniformity of Cell Shape

…

x can be multi dimensionaleach dimension corresponds to an attribute

Based on slide by Eric Eaton[example originally by Andrew Ng]

Unsupervised LearningGiven x(1), x(2), ..., x(n) (without labels)Output hidden structure behind the x’s

e.g., clustering


Slide credit: Eric Eaton[Source: Daphne Koller]

Gene

s

Individuals

Unsupervised LearningGenomics application: group individuals by genetic similarity

Market segmentation Social network analysis

Based on slide by Andrew Ng

Unsupervised Learning

Reinforcement LearningGiven sequence of states and actions with(delayed) rewardsLearn policy that maximizes agent’s rewardExamples:

Credit assignment problemGame playingRobot in mazeBalance a pole on your hand


Reinforcement Learning

… WIN!

… LOSE!

Backgammon

Given sequences of moves and whether or not theplayer won at the end, learn to make good moves


Reinforcement Learning

https://www.youtube.com/watch?v=4cgWya wjgYSlide credit: Eric Eaton

Brief Peek at TopicsSupervised Learning

Decision treesK nearest neighborsLinear regressionLogistic regressionPerceptronSupport vector machinesEnsemble methods

Unsupervised LearningPrincipal component analysisClusteringExpectation maximization

Reinforcement LearningNone: covered in depth in AI(CS 151)

Structured LearningHidden Markov ModelsCollaborative filtering

Other TopicsExperimental evaluation(procedure + metrics)Computational learning theory

Framing a Learning ProblemLearning Goals

List stages of ML pipelineName some key issues in ML

Representing Examples

examples features

How our algorithmsactually “view” the data

Features are thequestions we can askabout the examples

feat1, feat2, feat3, feat4,…

What is an example?How is it represented?

Based on slide by David Kauchak

Learning System

red, round, leaf, 3oz, …

green, round, no leaf, 4oz, …

yellow, curved, no leaf, 4oz, …

green, curved, no leaf, 5oz, …

labelapple

apple

banana

banana

examples

model/classifier

During learning/training/induction, learn a model of whatdistinguishes apples and bananas based on the features


red, round, no leaf, 4oz, … model/classifier

The model can then classify a new example based on the features

Learning SystemLearning is about generalizing from trainingdataWhat does this assume about training andtest set?

Training data Test set

Not always the case, butwe’ll often assume it is!

Training data

Test set


More Technically…We are going to use the probabilistic model oflearning

There is some (unknown) probability distributionover example/label pairs called the datagenerating distribution

Both the training data and the test set aregenerated based on this distribution

we call this i.i.d. (independent and identicallydistributed)


ML as Function ApproximationProblem SettingSet of possible instancesSet of possible labelsUnknown target function f : Set of function hypotheses H = {h | h : }

Input: Training examples of unknown target function f

(superscript denotes example number)

Output: Hypothesis h H that best approximates fSlide credit: Eric EatonBased on slide by TomMitchell

Stages of (Batch) Machine LearningGiven: labeled training data

Assumes each (x(i),y(i)) p(x,y)such that y(i) = f(x(i))

Train model:model learner.train(X,Y )

Apply model to new data:Given: new unlabeled instance x p(x)

model.predict(x)

model

learner

X,Y

Based on slide by Eric Eaton

x

Example Regression ProblemConsider simpleregression dataset

f : x y

Question 1: Howshould we pick thehypothesis space H ?Question 2: How dowe find the best h inthis space?

Based on slide by David SontagImages from Bishop [PRML]

Dataset: 10 points generated fromsin function with noise

y

Hypothesis Space: Degree M Polynomials

InfinitelymanyhypothesesM < 9 notconsistentwith dataset,M 9consistentWhich one isbest?


y y

y y

Hypothesis Space: Degree M Polynomials


We measure error using a lossfunctionFor regression, common choiceis squared loss

Empirical loss of function happlied to training data is then

example ofoverfitting

measure of model complexity

Learning Curve

y y y y

Key Issues in Machine LearningRepresentation : How do we choose a hypothesis space?

Often we use prior knowledge to guide this choiceThe ability to answer the next two questions also affectschoice

Optimization : How do we find the best hypothesis within thisspace?

This is an algorithmic question, at the intersection ofcomputer science and optimization research.

Evaluation : How can we gauge the accuracy of a hypothesison unseen testing data?

The previous example showed that choosing the hypothesiswhich simply minimizes training set error is not optimalThis question is the main topic of learning theory

Based on slides by Eric Eaton and by David Sontag

ML in PracticeUnderstand domain, prior knowledge, and goalsData integration, selection, cleaning, pre processing, etc.Learn modelsInterpret resultsConsolidate and deploy discovered knowledge

Slide credit: Eric Eaton and Ray MooneyBased on a slide by Pedro Domingos

Loop

Lessons Learned about LearningLearning can be viewed as using experience to approximate achosen target functionFunction approximation can be viewed as a search through aspace of hypotheses (representations of functions) for onethat best fits a set of training dataDifferent learning methods assume different hypothesisspaces and/or employ different search techniques

Documents

01 introduction student - cs.hmc.eduyjw/teaching/cs158/lectures/01_Introduction.pdfDefining the Learning Task Improve on task T, with respect to ... and using existing libraries.”