Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Robot Image Credit: Viktoriya Sukhanova © 123RF.com
The instructor gratefully acknowledges Eric Eaton (UPenn), who assembled the original slides, David Kauchak(Pomona), whose slides are also heavily used, and the many others who made their course materials freelyavailable online.
CS 158
Machine Learning
Instructor: Jessica Wu
ML BasicsLearning Goals
Learning GoalsDefine Machine LearningKnow when ML is useful
Machine Learning is…
Machine learning, a branch of artificialintelligence, concerns the construction andstudy of systems that can learn from data.
Slide credit: David Kauchak
Machine Learning is…
Machine learning is about predicting the futurebased on the past.
Hal Daume III
Slide credit: David Kauchak
Machine learning is…
Machine Learning is the study of algorithms thatimprove their performanceat some taskwith experience .
A well defined learning task is given by < , , >.
[Definition by Tom Mitchell (1998)]
Based on slide by Eric Eaton
Samuel’s Checkers Player“Machine Learning: Field of study that givescomputers the ability to learn without beingexplicitly programmed.” Arthur Samuel (1959)
Slide credit: Eric Eaton
Traditional Programming
Machine Learning
ComputerData
ProgramOutput
ComputerData
OutputProgram
Slide credit: Pedro Domingos
Defining the Learning TaskImprove on task T, with respect to
performance metric P, based on experience ET: Playing checkers
T: Recognizing hand written words
T: Driving on four lane highways using vision sensors
T: Categorize email messages as spam or legitimate
Slide credit: Ray Mooney
(indicates you needto fill in slide)
When Do We Use Machine Learning?ML is used when:
Human expertise does not exist (navigating on Mars)Humans cannot explain their expertise (speech recognition)Models must be customized (personalized medicine)Models are based on huge amounts of data (genomics)
Learning is not always useful:There is no need to “learn” to calculate payroll
Slide credit: Eric EatonBased on slide by E. Alpaydin
A classic example of a task that requires machine learning:It is very hard to say what makes a 2
Slide credit: Geoffrey Hinton
Some examples of tasks that are best solved byusing a learning algorithm
Learning patternsPredicting passenger survivability on the TitanicRecognizing tweets as positive or negativeClustering faces by identity
State of the art applicationsAutonomous carsAutomatic speech recognitionAnomalies in credit card transactionsStock prediction
[Your favorite area here]
(This slide intentionally left blank.)
Course BasicsLearning Goals
Know course goalsKnow course policies
Drew Conway’s Data Science Venn Diagram
SubstantiveExpertise
How much Math?A Fall 2015 course project analyzed syllabi text of all 5Ccourses and found that CS 158 is ...
46.4% Math (36.4% theory + 10.0% linear systems +applications)13.6% CS (engineering, design, programming)10.7% lab sciences (research, field, analysis)10.0% research (thesis, project, writing)7.9% social science5.0% humanities6.4% other
What can you do?Evaluate your background: PS 1 has math refresher andresourcesLet me know when math is going too fast!
What have Students Said?“The course ... is well balanced in terms of math andtheory (coding) parts. The course helped me review manyimportant math theory and techniques.”
“I liked the balance between implementing algorithmsand using existing libraries.”
“I enjoyed the amount of coding for the homeworks /projects and think that it went well with the theory welearned in class.”
“I liked the real world applications of ML that we learnedabout.”
Types of LearningLearning Goals
Define three main types of machine learningList goals and inputs and outputs
Types of LearningSupervised (inductive) learning : Learn with a teacher
Given: labeled training examplesGoal: learn mapping that predicts label for test example
Unsupervised learning : Learn without a teacherGiven: unlabeled inputsGoal: learn some intrinsic structure in inputs
Reinforcement learning: Learn by interactingGiven agent interacting in environment (having set ofstates)Learn policy (state to action mapping) that maximizesagent’s reward
Slide credit: Eric EatonBased on slide by Pedro Domingos
Supervised Learning: RegressionGiven (x(1), y(1)), (x(2), y(2)), ..., (x(n), y(n))
Learn a function f(x) to predict y given xy is real valued == regression
0
1
2
3
4
5
6
7
8
9
1975 1980 1985 1990 1995 2000 2005 2010 2015
Septem
berA
rctic
SeaIceExtent
(1,000
,000
sqkm
)
YearSlide credit: Eric EatonData from G. Witt. Journal of StatisticsEducation, Volume 21, Number 1 (2013)
Supervised Learning: ClassificationGiven (x(1), y(1)), (x(2), y(2)), ..., (x(n), y(n))
Learn a function f(x) to predict y given xy is categorical == classification
1(Malignant)
0(Benign)
Tumor Size
Breast Cancer (Malignant / Benign)
Tumor SizeBased on slide by Eric Eaton[example originally by Andrew Ng]
Predict MalignantPredict Benign
Supervised Learning
Tumor Size
Age
Clump ThicknessUniformity of Cell SizeUniformity of Cell Shape
…
x can be multi dimensionaleach dimension corresponds to an attribute
Based on slide by Eric Eaton[example originally by Andrew Ng]
Unsupervised LearningGiven x(1), x(2), ..., x(n) (without labels)Output hidden structure behind the x’s
e.g., clustering
Slide credit: Eric Eaton
Slide credit: Eric Eaton[Source: Daphne Koller]
Gene
s
Individuals
Unsupervised LearningGenomics application: group individuals by genetic similarity
Market segmentation Social network analysis
Based on slide by Andrew Ng
Unsupervised Learning
Reinforcement LearningGiven sequence of states and actions with(delayed) rewardsLearn policy that maximizes agent’s rewardExamples:
Credit assignment problemGame playingRobot in mazeBalance a pole on your hand
Slide credit: Eric Eaton
Reinforcement Learning
… WIN!
… LOSE!
Backgammon
Given sequences of moves and whether or not theplayer won at the end, learn to make good moves
Slide credit: David Kauchak
Reinforcement Learning
https://www.youtube.com/watch?v=4cgWya wjgYSlide credit: Eric Eaton
Brief Peek at TopicsSupervised Learning
Decision treesK nearest neighborsLinear regressionLogistic regressionPerceptronSupport vector machinesEnsemble methods
Unsupervised LearningPrincipal component analysisClusteringExpectation maximization
Reinforcement LearningNone: covered in depth in AI(CS 151)
Structured LearningHidden Markov ModelsCollaborative filtering
Other TopicsExperimental evaluation(procedure + metrics)Computational learning theory
Framing a Learning ProblemLearning Goals
List stages of ML pipelineName some key issues in ML
Representing Examples
examples features
How our algorithmsactually “view” the data
Features are thequestions we can askabout the examples
feat1, feat2, feat3, feat4,…
What is an example?How is it represented?
Based on slide by David Kauchak
Learning System
red, round, leaf, 3oz, …
green, round, no leaf, 4oz, …
yellow, curved, no leaf, 4oz, …
green, curved, no leaf, 5oz, …
labelapple
apple
banana
banana
examples
model/classifier
During learning/training/induction, learn a model of whatdistinguishes apples and bananas based on the features
Based on slide by David Kauchak
red, round, no leaf, 4oz, … model/classifier
The model can then classify a new example based on the features
Learning SystemLearning is about generalizing from trainingdataWhat does this assume about training andtest set?
Training data Test set
Not always the case, butwe’ll often assume it is!
Training data
Test set
Based on slide by David Kauchak
More Technically…We are going to use the probabilistic model oflearning
There is some (unknown) probability distributionover example/label pairs called the datagenerating distribution
Both the training data and the test set aregenerated based on this distribution
we call this i.i.d. (independent and identicallydistributed)
Based on slide by David Kauchak
ML as Function ApproximationProblem SettingSet of possible instancesSet of possible labelsUnknown target function f : Set of function hypotheses H = {h | h : }
Input: Training examples of unknown target function f
(superscript denotes example number)
Output: Hypothesis h H that best approximates fSlide credit: Eric EatonBased on slide by TomMitchell
Stages of (Batch) Machine LearningGiven: labeled training data
Assumes each (x(i),y(i)) p(x,y)such that y(i) = f(x(i))
Train model:model learner.train(X,Y )
Apply model to new data:Given: new unlabeled instance x p(x)
model.predict(x)
model
learner
X,Y
Based on slide by Eric Eaton
x
Example Regression ProblemConsider simpleregression dataset
f : x y
Question 1: Howshould we pick thehypothesis space H ?Question 2: How dowe find the best h inthis space?
Based on slide by David SontagImages from Bishop [PRML]
Dataset: 10 points generated fromsin function with noise
y
Hypothesis Space: Degree M Polynomials
InfinitelymanyhypothesesM < 9 notconsistentwith dataset,M 9consistentWhich one isbest?
Based on slide by David SontagImages from Bishop [PRML]
y y
y y
Hypothesis Space: Degree M Polynomials
Based on slide by David SontagImages from Bishop [PRML]
We measure error using a lossfunctionFor regression, common choiceis squared loss
Empirical loss of function happlied to training data is then
example ofoverfitting
measure of model complexity
Learning Curve
y y y y
Key Issues in Machine LearningRepresentation : How do we choose a hypothesis space?
Often we use prior knowledge to guide this choiceThe ability to answer the next two questions also affectschoice
Optimization : How do we find the best hypothesis within thisspace?
This is an algorithmic question, at the intersection ofcomputer science and optimization research.
Evaluation : How can we gauge the accuracy of a hypothesison unseen testing data?
The previous example showed that choosing the hypothesiswhich simply minimizes training set error is not optimalThis question is the main topic of learning theory
Based on slides by Eric Eaton and by David Sontag
ML in PracticeUnderstand domain, prior knowledge, and goalsData integration, selection, cleaning, pre processing, etc.Learn modelsInterpret resultsConsolidate and deploy discovered knowledge
Slide credit: Eric Eaton and Ray MooneyBased on a slide by Pedro Domingos
Loop
Lessons Learned about LearningLearning can be viewed as using experience to approximate achosen target functionFunction approximation can be viewed as a search through aspace of hypotheses (representations of functions) for onethat best fits a set of training dataDifferent learning methods assume different hypothesisspaces and/or employ different search techniques