Upload
others
View
30
Download
0
Embed Size (px)
Citation preview
Applied Machine LearningLecture 1: Introduction
Richard Johansson
January 21, 2020
-20pt
welcome to the course!
I machine learning is increasingly popular among studentsI our courses are full!I many thesis projects develop or apply ML models
I . . . and in industry, public sectorI many companies come to us looking for studentsI joint research projects
I why the fuss and why now?
-20pt
success stories: image recognition
-20pt
success stories: machine translation
[image by Chris Manning]
-20pt
data
[source]
-20pt
[source]
-20pt
-20pt
applications. . .
[source]
-20pt
topics covered in the course
I the usual “zoo”: a selection of machine learning modelsI what’s the idea behind them?I how are they implemented? (at least on a high level)I what are the use cases?I how can we apply them practically?
I but hopefully also the “real-world context”:I extended “messy” practical assignments requiring that you
think of what you’re doingI invited talks from industry and the healthcare sectorI annotation of data, evaluationI ethical and legal issues, interpretability
-20pt
overview
practical issues about the course
basic ideas in machine learning
machine learning libraries in Python
example of a learning algorithm: decision tree learning
underfitting and overfitting
-20pt
course webpage
I the official course webpage is the Canvas pagehttps://chalmers.instructure.com/courses/8685/
-20pt
structure of teaching
I lectures Tuesdays and FridaysI some theory and introduction to ML softwareI interactive codingI solving a few exercises when we have timeI most lectures will be given by SelpiI . . . except some guest lectures
I lab sessions ThursdaysI our TAs help you work on your assignmentsI choose between the 13-15 and the 15-17 sessionI please let me know if it’s too crowded
-20pt
assignments
I seven compulsory assignments:PA 1A intro to the ML workflow, decision treesPA 1B random forestsPA 2A text classificationPA 2B linear classifiersPA 3B skin mark classificationWA 1 read a scientific paper in applied machine learningWA 2 written essay on ethics in ML
I please refer to the course PM for details about gradingI we will use the Python programming language
-20pt
programming assignment 1A
I warmup lab exercise: quick tour of the scikit-learn libraryI introduction to decision treesI for a high grade: implement decision tree regressionI lab session on ThursdayI submission deadline: January 29
-20pt
noncompulsory work
I exercise sheetsI online quizzes
-20pt
literature
I the main course book is A Course in Machine Learning byHal Daumé III: http://ciml.info
I and additional papers to read for some topicsI some notes to complement the lecturesI example code will be posted on the course page
-20pt
exam, mid-March
I this is a take-home exam: a written assignmentI your solution be submitted onlineI we will find a date in the exam period that suits as many as
possible
-20pt
exam, details
I a first part about basic concepts: you need to answer most ofthese questions correctly to pass
I a second part that requires more insight: answer thesequestions for a higher grade
-20pt
student representatives
I if you’re interested in being a student representative, pleasesend me an email!
I the workload is light and there will be a small reward. . .
-20pt
overview
practical issues about the course
basic ideas in machine learning
machine learning libraries in Python
example of a learning algorithm: decision tree learning
underfitting and overfitting
-20pt
basic ideas
I given some object, make a predictionI is this patient diabetic?I is the sentiment of this movie review positive?I does this image contain a cat?I what will be tomorrow’s stock market value of this company?I what are the phonemes contained in this speech signal?
I the goal of machine learning is to build the predictionfunctions by observing data
I contrast: expert-defined or data-driven
[source]
-20pt
basic ideas
I given some object, make a predictionI is this patient diabetic?I is the sentiment of this movie review positive?I does this image contain a cat?I what will be tomorrow’s stock market value of this company?I what are the phonemes contained in this speech signal?
I the goal of machine learning is to build the predictionfunctions by observing data
I contrast: expert-defined or data-driven
[source]
-20pt
basic ideas
I given some object, make a predictionI is this patient diabetic?I is the sentiment of this movie review positive?I does this image contain a cat?I what will be tomorrow’s stock market value of this company?I what are the phonemes contained in this speech signal?
I the goal of machine learning is to build the predictionfunctions by observing data
I contrast: expert-defined or data-driven
[source]
-20pt
why machine learning?
why would we want to “learn” the function from data instead ofjust implementing it?
I usually because we don’t really know how to write downthe function by handI speech recognitionI image classificationI machine translationI . . .
I might not be necessary for limited tasks where we knowI what is more expensive in your case? knowledge or data?
-20pt
don’t forget your domain expertise!
ML makes some tasks automatic, but we still need our brains:
I defining the tasks, terminology, evaluation metricsI annotating (hand-labeling) training and testing dataI designing featuresI error analysis
-20pt
example: is the patient diabetic?
in order to predict, we make some measurements of propertieswe believe will be useful: these are called the features
-20pt
example: is the patient diabetic?
I in order to predict, we make some measurements of propertieswe believe will be useful: these are called the features
-20pt
features: different views
I many learning algorithms operate on numerical vectors:features = [ 1.5, -2, 3.8, 0, 9.12 ]
I more abstractly, we often represent the features as attributeswith values (in Python, typically a dictionary)
features = { "gender":"male","age":37,"blood_pressure":130, ... }
I sometimes, it’s easier just to see the features as a list of e.g.words (bag of words)
features = [ "here", "are", "some", "words","in", "a", "document" ]
-20pt
more terminology: what is the output?
I classification: learning to output a category labelI spam/non-spam; positive/negative; . . .
I regression: learning to guess a numberI value of a share; number of stars in a review; . . .
-20pt
basic terminology: supervised learning
I in supervised learning, the training set consists ofinput–output pairs
I our goal is to learn to produce the outputs
-20pt
types of supervision: alternatives
I unsupervised learning: we are given “unorganized” dataI our goal is to discover some structure
7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0
10
5
0
5
10
15
7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0
10
5
0
5
10
15
I reinforcement learning: our problem is formalized as a gameI an agent carries out actions and receives rewards
-20pt
example: Fisher’s iris data
3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0petal_length
1.0
1.2
1.4
1.6
1.8
2.0
2.2
2.4
peta
l_w
idth
versicolorvirginicaversicolorvirginica
-20pt
approach 1: linear separator
if 0.85 · petal_length+ 2.42 · petal_width ≥ 8.34:return virginica
elsereturn versicolor
3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5petal_length
1.0
1.2
1.4
1.6
1.8
2.0
2.2
2.4
peta
l_w
idth
-20pt
approach 2: if/then/else tree
3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5petal_length
1.0
1.2
1.4
1.6
1.8
2.0
2.2
2.4
peta
l_w
idth
-20pt
basic machine learning workflow
-20pt
basic ML methodology: evaluation
I select an evaluation procedure (a “metric”) such asI classification accuracy: proportion correct classifications?I mean squared error often used in regressionI or some domain-specific metric
I compare to one or more baselinesI trivial solutionI rule-based solutionI existing solution
I apply your model to a held-out test set and evaluateI the test set must be different from the training setI also: don’t optimize on the test set; use a development set or
cross-validation!
-20pt
managing your data for evaluation
[source]
-20pt
overview
practical issues about the course
basic ideas in machine learning
machine learning libraries in Python
example of a learning algorithm: decision tree learning
underfitting and overfitting
-20pt
use cases for machine learning
I standard use cases: standardsolutions are available
I special cases: we may need to tailorour own solutions
-20pt
the Python machine learning ecosystem (selection)
-20pt
machine learning software: a small sample
I general-purpose software, large collections of algorithms:I scikit-learn: http://scikit-learn.org
I Python library – will be used in this courseI Weka: http://www.cs.waikato.ac.nz/ml/weka
I Java library with nice user interfaceI special-purpose software, small collections of algorithms:
I Keras, PyTorch, TensorFlow, CNTK for neural networksI LibSVM/LibLinear for support vector machinesI XGboost, lightgbm for tree ensemblesI . . .
I large-scale learning in distributed architectures:I Spark MLLibI H2O
-20pt
scikit-learn toy example
see alsohttps://scikit-learn.org/stable/getting_started.html
-20pt
overview
practical issues about the course
basic ideas in machine learning
machine learning libraries in Python
example of a learning algorithm: decision tree learning
underfitting and overfitting
-20pt
classifiers as rule systems
I assume that we’re building the prediction function by handI how would it look?I probably, you would start writing rules like this:
I IF the blood glucose level > 150, THENI IF the age > 50, THEN return TrueI ELSE . . .I . . .
I a human would construct such a rule system by trial and errorI we’ll see how it can be learned automatically
-20pt
decision tree classifiers
I a decision tree is a tree whereI the internal nodes represent a choice based on a featureI the leaves represent the return value of the classifier
I like the example we had previously:I IF the blood glucose level > 150, THEN
I IF the age > 50, THEN return TrueI ELSE . . .I . . .
-20pt
general idea for learning a tree
I it should make few errors on the training setI and an Occam’s razor intuition: we’d like a small treeI however, finding a small and accurate tree is a complex
computational problemI it is NP-hard
I instead, we’ll look at an algorithm that works top-down byselecting the “most useful feature”
I some different variants:I basic approach: the ID3 algorithmI extended approaches: CART, C4.5, . . .I see e.g. Daumé III’s book or
http://en.wikipedia.org/wiki/ID3_algorithm
-20pt
greedy decision tree classifier learning (pseudocode)
def TrainDecisionTreeClassifier(X , Y )if all outputs in Y are identical
return a leaf with the class of the examples in Yif we have reached the maximally allowed depth
return a leaf with the majority class of YF ← the “most useful feature” in Xfor each possible value fi of F
Xi ,Yi ← the subset where F = fitreei ← TrainDecisionTreeClassifier(Xi ,Yi )
return a tree node that splits on F ,where fi is connected to the subtree treei
-20pt
how to select the “most useful feature”?
I there are many rules of thumb to select the most usefulfeatureI idea: a feature is good if the subsets Ti are homogeneous
I in Daumé III’s book, he uses a simple score to rank thefeatures:I for each subset Ti , compute the frequency of its majority classI sum the majority class frequencies
I however, the most well-known ranking measure is theinformation gainI this measures the reduction of entropy (statistical uncertainty)
we get by considering the feature
I scikit-learn uses the Gini impurity by default
-20pt
example: selecting the feature for the top node
-20pt
example: selecting the feature for the top node
-20pt
example: selecting the feature for the top node
-20pt
decision trees with numerical features
I when our features are numerical, we set a threshold and buildsubtrees for the upper and lower subset
I so we need to find the threshold that gives us the nicest split
-20pt
example: finding the best threshold
1 2 3 4 5petal_length
3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0petal_length
-20pt
example: finding the best threshold
1 2 3 4 5petal_length
3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0petal_length
-20pt
implementing decision tree classifiers
-20pt
overview
practical issues about the course
basic ideas in machine learning
machine learning libraries in Python
example of a learning algorithm: decision tree learning
underfitting and overfitting
-20pt
what goes on when we “learn”?
I the learning algorithm observes the examples in the training setI it tries to find common patterns that explain the data: it
generalizes so that we can make predictions for new examples
I how this is done depends on what algorithm we are using
-20pt
principles of induction: how do we select “good” models?
I hypothesis space: the set of all possible outputs of a learningalgorithmI for decision tree learners: The set of possible treesI for linear separators: the set of all lines in the plane /
hyperplanes in a vector space
I “learning” = searching the hypothesis spaceI how do we know what hypothesis to look for?
-20pt
a fundamental tradeoff in machine learning
I goodness of fit: the learned classifier should be able tocapture the information in the training setI e.g. correctly classify the examples in the training data
I regularization: the classifier should be simpleI use as few features as possible?I don’t rely too much on any feature?I small tree or neural network?
-20pt
why would we prefer “simple” hypotheses?
-20pt
“overfitting” and “underfitting”: the bias–variance tradeoff
[Source: Wikipedia]
-20pt
example: training/test accuracy as a function of tree depth
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20tree depth
0.90
0.92
0.94
0.96
0.98
accu
racy
traintest
-20pt
up next
I Thursday: lab session for programming assignment 1AI topic of Friday’s lecture: ensembles and random forestsI please prepare for assignment 1A by reading my code and the
extra reading on decision trees