22
Introductory Applied Machine Learning Charles Sutton and Victor Lavrenko School of Informatics Semester 1 1 / 22

Charles Sutton and Victor Lavrenko School of Informatics ... · Charles Sutton and Victor Lavrenko School of Informatics Semester 1 ... course is to provide the student ... and Eibe

Embed Size (px)

Citation preview

Introductory Applied Machine Learning

Charles Sutton and Victor LavrenkoSchool of Informatics

Semester 1

1 / 22

The primary aim of the course is to provide the student with aset of practical tools that can be applied to solve real-worldproblems in machine learning.

Machine learning is the study of computer algorithms thatimprove automatically through experience [Mitchell, 1997].

2 / 22

Overview

I Course structure / AssessmentI Relationships between ML coursesI Overview of Machine LearningI Provisional course outlineI Maths LevelI Reading: W & F chapter 1

Acknowledgements: We thank Amos Storkey, David Barber, and ChrisWilliams for permission to use course material from previous years.Additionally, inspiration has been obtained from Geoff Hinton’s slides for CSC2515 in Toronto

3 / 22

Administration

I Course text: Data Mining: Practical Machine LearningTools and Techniques (Second Edition, 2005) by Ian H.Witten and Eibe Frank

I Assessment:I Assignment (25% of mark)I Exam (75% of mark)

I 4 Tutorials and 4 LabsI Maths surgeriesI Course repI Plagiarismhttp://www.inf.ed.ac.uk/teaching/plagiarism.html

4 / 22

Machine Learning Courses

IAML Basic introductory course on supervised and unsupervisedlearning

MLPR More advanced course on machine learning, includingcoverage of Bayesian methods (Semester 2)

RL Reinforcement Learning.PMR Probabilistic modelling and reasoning. Focus on learning

and inference for probabilistic models, e.g. probabilisticexpert systems, latent variable models, Hidden Markovmodels

DME Data mining and Exploration. Using methods from PMR todeal with practical issues in learning from large datasets.(Semester 2)

I Basically, IAML: Users of ML; MLPR: Developers of newML techniques.

5 / 22

What is Machine Learning?

I It is very hard to write programs that solve problems likerecognizing a face

I We don’t know how the brain does it (and even if we did, itmight be horrendously complicated)

I Instead of writing a program by hand, we collect lots ofexamples that specify the correct output for a given input

I A machine learning algorithm then takes these examplesand produces a program that does the job (to some level ofperformance)

I Reading: W & F chapter 1

6 / 22

Example: Spam Classification

Web page Feature vector

13307...

learning

lectures

Paris Hilton

assignments

Classifier

SPAM

NONSPAM

7 / 22

learning robot inverse dynamics star/galaxy classification

8 / 22

Tasks:I Classification: Is there are dog in this image?I Localization: If there is a dog in this image, draw its

bounding boxI http://pascallin.ecs.soton.ac.uk/challenges/VOC/

9 / 22

Primate splice-junction gene sequences (DNA)

CCAGCTGCATCACAGGAGGCCAGCGAGCAGGTCTGTTCCAAGGGCCTTCGAGCCAGTCTG EIGAGGTGAAGGACGTCCTTCCCCAGGAGCCGGTGAGAAGCGCAGTCGGGGGCACGGGGATG EITAAATTCTTCTGTTTGTTAACACCTTTCAGACTTATGTGTATGAAGGAGTAGAAGCCAAA IEAAACTAAAGAATTATTCTTTTACATTTCAGTTTTTCTTGATCATGAAAACGCCAACAAAA IEAAAGCAGATCAGCTGTATAAACAGAAAATTATTCGTGGTTTCTGTCACTTGTGTGATGGT NTTGCCCTCAGCATCACCATGAACGGAGAGGCCATCGCCTGCGCTGAGGGCTGCCAGGCCA N

I Task is to predict if there is an IE, EI or N (neither) junctionin the centre of the string

I Data from http://mlearn.ics.uci.edu/

10 / 22

Joint modelling of stock prices and news reports

[Victor Lavrenko]

11 / 22

Example: Collaborative Filtering

12 / 22

More examples

I Science (Astronomy, neuroscience, medical imaging,bio-informatics)

I Retail (Intelligent stock control, demographic storeplacement)

I Manufacturing (Intelligent control, automated monitoring,detection methods)

I Security (Intelligent smoke alarms, fraud detection)I Marketing (targetting promotions, ...)I Management (Scheduling, timetabling)I Finance (credit scoring, risk analysis...)I Web data (information retrieval, information extraction, ...)

13 / 22

Types of learning task

I Supervised learningI Predict an output y when given an input xI For categorical y : classification.I For real-valued y : regression.I Most of IAML treats classification

I Unsupervised learningI Create an internal representation of the input, e.g.

clustering, dimensionalityI This is important in machine learning as getting labels is

often difficult and expensiveI Reinforcement learning

I Learn policy to maximize payoff; often the payoff is delayedI We will not cover RL in this course

I Frontier areas: Semi-supervised learning

14 / 22

Machine Learning and Structured Models

I In IAML we will usually be interested in predicting a singleoutput y (real-valued or discrete) depending on a input x

I Many problems can be set up in this way, but one can alsobuild more structured models, e.g. for prediction onsequences, or more general structures (graphs)

I However, the techniques we will learn in IAML are buildingblocks for more complex, structured models

15 / 22

General structure of supervised learning algorithms

Hand, Mannila, Smyth (2001)

I Define the taskI Decide on the model structure (choice of inductive bias)I Decide on the score function (judge quality of fitted

model)I Decide on optimization/search method to optimize the

score function

16 / 22

Example: linear regression

f (x) = w0 + w1x1 + . . .+ wDxD

I x = (x1, . . . , xD)T

I Here the assumption B is that f (x) is a linear function in xI The specific setting of the parameters w0,w1, . . . ,wD is

done by minimizing a score functionI Usual score function is

∑ni=1(y

i − f (xi))2 where the sumruns over all training cases

I Linear regression is discussed in W & F §4.6, and we willcover it later in the course

17 / 22

Inductive bias

I Supervised learning is inductive, i.e. we makegeneralizations about the form of f (x) based on instancesD

I Let f (x;L,D) be the function learned by algorithm L withdata D

I The inductive bias of L is the minimal set of assumptions Bneeded so that the function f (x;L,D) is logically entailedby B ∧ D for all x

I We will consider a number of different supervised learningmethods in the IAML; these correspond to differentinductive biases

I W & F §1.5 call this “language bias”

18 / 22

The futility of bias-free learningI Supervised learning creates a predictor f (x) from training

data D. Let’s assume that f takes on the values {0,1}I If we have a new input x, half of the possible functions will

give the value 0, and half 1I A learner that makes no a priori assumptions regarding the

target concept has no rational basis for classifying anyunseen examples (Mitchell, 1997, p 42)

x=0 x=1

1

0

����

����

����

����

19 / 22

Machine Learning and Statistics

I A lot of work in machine learning can be seen as arediscovery of things that were known in statistics; butthere are also flows in the other direction

I The emphasis is rather different. One difference is a focuson prediction in machine learning vs interpretation of themodel in statistics

I Machine learning often refers to tasks associated withartificial intelligence (AI) such as recognition, diagnosis,planning, robot control, prediction, etc. These provide richand interesting tasks

I Goals can be autonomous machine performance, orenabling humans to learn from data (data mining)

20 / 22

Provisional Course OutlineI Introduction (CS)I Thinking about data (VL)I Basic probability (CS)I Naı̈ve Bayes classification (VL)I Linear regression (CS)I Generalization and Overfitting (CS)I Linear classification: logistic regression, perceptrons (CS)I Kernel classifiers: support vector machines (CS)I Decision trees (CS)I Dimensionality reduction (PCA etc) (VL)I Instance-based methods (VL)I Performance evaluation (VL)I Clustering (k -means, hierarchical) (VL)I Further topics as time permits ...

21 / 22

Maths Level

I Machine learning generally involves a significant number ofmathematical ideas and a significant amount ofmathematical manipulation

I IAML aims to keep the maths level to a minimum,explaining things more in terms of higher-level concepts,and developing understanding in a procedural way (e.g.how to program an algorithm)

I For those wanting to pursue research in any of the areascovered you will need courses like PMR, MLPR

22 / 22