14
Machine Learning 1 Lecture 1.2 - What is Machine Learning? Erik Bekkers (Bishop 1.0 and 1.1) Image credit: Kirillm | Getty Images Slide credits: Patrick Forré and Rianne van den Berg

Machine Learning 1 · 0 1 −1 0 1 x t M =1 0 1 −1 0 1 x t M =3 0 1 −1 0 1 x t M =9 0 1 −1 0 1 Q: On which datapoints should performance be measured? Best performance on training

  • Upload
    others

  • View
    17

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Machine Learning 1 · 0 1 −1 0 1 x t M =1 0 1 −1 0 1 x t M =3 0 1 −1 0 1 x t M =9 0 1 −1 0 1 Q: On which datapoints should performance be measured? Best performance on training

Machine Learning 1 Lecture 1.2 - What is Machine Learning?

Erik Bekkers

(Bishop 1.0 and 1.1)

Image credit: Kirillm | Getty Images

Slide credits: Patrick Forré and Rianne van den Berg

Page 2: Machine Learning 1 · 0 1 −1 0 1 x t M =1 0 1 −1 0 1 x t M =3 0 1 −1 0 1 x t M =9 0 1 −1 0 1 Q: On which datapoints should performance be measured? Best performance on training

Machine Learning 1

What is machine learning?

2

“A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.”

- Tom M. Mitchell

Machine Learning, Tom Mitchell, McGraw Hill, 1997

Page 3: Machine Learning 1 · 0 1 −1 0 1 x t M =1 0 1 −1 0 1 x t M =3 0 1 −1 0 1 x t M =9 0 1 −1 0 1 Q: On which datapoints should performance be measured? Best performance on training

Machine Learning 1 3

E: Experience

MNIST dataset

• ,

Han

@ritter digits of sire 20h28

1 !

2:

3 : ②

Page 4: Machine Learning 1 · 0 1 −1 0 1 x t M =1 0 1 −1 0 1 x t M =3 0 1 −1 0 1 x t M =9 0 1 −1 0 1 Q: On which datapoints should performance be measured? Best performance on training

Machine Learning 1 4

E: Experience

6 1. Introduction

SID42354SID31984SID301902SIDW128368SID375990SID360097SIDW325120ESTsChr.10SIDW365099SID377133SID381508SIDW308182SID380265SIDW321925ESTsChr.15SIDW362471SIDW417270SIDW298052SID381079SIDW428642TUPLE1TUP1ERLUMENSIDW416621SID43609ESTsSID52979SIDW357197SIDW366311ESTsSMALLNUCSIDW486740ESTsSID297905SID485148SID284853ESTsChr.15SID200394SIDW322806ESTsChr.2SIDW257915SID46536SIDW488221ESTsChr.5SID280066SIDW376394ESTsChr.15SIDW321854WASWiskottHYPOTHETICALSIDW376776SIDW205716SID239012SIDW203464HLACLASSISIDW510534SIDW279664SIDW201620SID297117SID377419SID114241ESTsCh31SIDW376928SIDW310141SIDW298203PTPRCSID289414SID127504ESTsChr.3SID305167SID488017SIDW296310ESTsChr.6SID47116MITOCHONDRIAL60ChrSIDW376586HomosapiensSIDW487261SIDW470459SID167117SIDW31489SID375812DNAPOLYMERSID377451ESTsChr.1MYBPROTOSID471915ESTsSIDW469884HumanmRNASIDW377402ESTsSID207172RASGTPASESID325394H.sapiensmRNAGNALSID73161SIDW380102SIDW299104

BREA

STRE

NAL

MELAN

OMA

MELAN

OMA

MCF

7D-repro

COLO

NCO

LON

K562B-repro

COLO

NNS

CLC

LEUK

EMIA

RENA

LMELAN

OMA

BREA

STCN

SCN

SRE

NAL

MCF

7A-repro

NSCL

CK562A-repro

COLO

NCN

SNS

CLC

NSCL

CLEUK

EMIA

CNS

OVA

RIAN

BREA

STLEUK

EMIA

MELAN

OMA

MELAN

OMA

OVA

RIAN

OVA

RIAN

NSCL

CRE

NAL

BREA

STMELAN

OMA

OVA

RIAN

OVA

RIAN

NSCL

CRE

NAL

BREA

STMELAN

OMA

LEUK

EMIA

COLO

NBR

EAST

LEUK

EMIA

COLO

NCN

SMELAN

OMA

NSCL

CPR

OST

ATE

NSCL

CRE

NAL

RENA

LNS

CLC

RENA

LLEUK

EMIA

OVA

RIAN

PROST

ATE

COLO

NBR

EAST

RENA

LUN

KNOWN

FIGURE 1.3. DNA microarray data: expression matrix of 6830 genes (rows)and 64 samples (columns), for the human tumor data. Only a random sampleof 100 rows are shown. The display is a heat map, ranging from bright green(negative, under expressed) to bright red (positive, over expressed). Missing valuesare gray. The rows and columns are displayed in a randomly chosen order.

Expression matrix of genes (rows) for 64 human tumor samples (columns). [source: ESL 1.3]

§atmap for activity of a gene•

age

9 9 Tumor

Page 5: Machine Learning 1 · 0 1 −1 0 1 x t M =1 0 1 −1 0 1 x t M =3 0 1 −1 0 1 x t M =9 0 1 −1 0 1 Q: On which datapoints should performance be measured? Best performance on training

Machine Learning 1 5

E: Experience

Examples of spam emails. [source: Yesware]

OoO

O

Page 6: Machine Learning 1 · 0 1 −1 0 1 x t M =1 0 1 −1 0 1 x t M =3 0 1 −1 0 1 x t M =9 0 1 −1 0 1 Q: On which datapoints should performance be measured? Best performance on training

Machine Learning 1 6

T: Class of tasksClassification:

=

=

=

=

=¥0 : :}Spamspamspam0K

spam

Page 7: Machine Learning 1 · 0 1 −1 0 1 x t M =1 0 1 −1 0 1 x t M =3 0 1 −1 0 1 x t M =9 0 1 −1 0 1 Q: On which datapoints should performance be measured? Best performance on training

Machine Learning 1 7

T: Class of tasksRegression

x

t

0 1

−1

0

1

t =

ε ∼

target :

noise :

input : "

sinful • IS"

NCO,I )

O

Page 8: Machine Learning 1 · 0 1 −1 0 1 x t M =1 0 1 −1 0 1 x t M =3 0 1 −1 0 1 x t M =9 0 1 −1 0 1 Q: On which datapoints should performance be measured? Best performance on training

Machine Learning 1 8

T: Class of tasksRegression

x

t

0 1

−1

0

1

x

t

M = 0

0 1

−1

0

1

x

t

M = 1

0 1

−1

0

1

x

t

M = 3

0 1

−1

0

1

x

t

M = 9

0 1

−1

0

1

Polynomials of order M (red) fit to data constructed as t = sin(2πx) + ε (green)

fcxlzwo t w,x

+wz fit .- t Wm XM

Page 9: Machine Learning 1 · 0 1 −1 0 1 x t M =1 0 1 −1 0 1 x t M =3 0 1 −1 0 1 x t M =9 0 1 −1 0 1 Q: On which datapoints should performance be measured? Best performance on training

Machine Learning 1 9

T: Class of tasksClustering

6 1. Introduction

SID42354SID31984SID301902SIDW128368SID375990SID360097SIDW325120ESTsChr.10SIDW365099SID377133SID381508SIDW308182SID380265SIDW321925ESTsChr.15SIDW362471SIDW417270SIDW298052SID381079SIDW428642TUPLE1TUP1ERLUMENSIDW416621SID43609ESTsSID52979SIDW357197SIDW366311ESTsSMALLNUCSIDW486740ESTsSID297905SID485148SID284853ESTsChr.15SID200394SIDW322806ESTsChr.2SIDW257915SID46536SIDW488221ESTsChr.5SID280066SIDW376394ESTsChr.15SIDW321854WASWiskottHYPOTHETICALSIDW376776SIDW205716SID239012SIDW203464HLACLASSISIDW510534SIDW279664SIDW201620SID297117SID377419SID114241ESTsCh31SIDW376928SIDW310141SIDW298203PTPRCSID289414SID127504ESTsChr.3SID305167SID488017SIDW296310ESTsChr.6SID47116MITOCHONDRIAL60ChrSIDW376586HomosapiensSIDW487261SIDW470459SID167117SIDW31489SID375812DNAPOLYMERSID377451ESTsChr.1MYBPROTOSID471915ESTsSIDW469884HumanmRNASIDW377402ESTsSID207172RASGTPASESID325394H.sapiensmRNAGNALSID73161SIDW380102SIDW299104

BREA

STRE

NAL

MELAN

OMA

MELAN

OMA

MCF

7D-repro

COLO

NCO

LON

K562B-repro

COLO

NNS

CLC

LEUK

EMIA

RENA

LMELAN

OMA

BREA

STCN

SCN

SRE

NAL

MCF

7A-repro

NSCL

CK562A-repro

COLO

NCN

SNS

CLC

NSCL

CLEUK

EMIA

CNS

OVA

RIAN

BREA

STLEUK

EMIA

MELAN

OMA

MELAN

OMA

OVA

RIAN

OVA

RIAN

NSCL

CRE

NAL

BREA

STMELAN

OMA

OVA

RIAN

OVA

RIAN

NSCL

CRE

NAL

BREA

STMELAN

OMA

LEUK

EMIA

COLO

NBR

EAST

LEUK

EMIA

COLO

NCN

SMELAN

OMA

NSCL

CPR

OST

ATE

NSCL

CRE

NAL

RENA

LNS

CLC

RENA

LLEUK

EMIA

OVA

RIAN

PROST

ATE

COLO

NBR

EAST

RENA

LUN

KNOWN

FIGURE 1.3. DNA microarray data: expression matrix of 6830 genes (rows)and 64 samples (columns), for the human tumor data. Only a random sampleof 100 rows are shown. The display is a heat map, ranging from bright green(negative, under expressed) to bright red (positive, over expressed). Missing valuesare gray. The rows and columns are displayed in a randomly chosen order.

Expression matrix of genes (rows) for 64 human tumor samples (columns). [source: ESL 1.3]

I2.D

N- dimpoint

vector a tumor

few.sn

random www.go's:&:¥

÷:÷÷:im.

t

Page 10: Machine Learning 1 · 0 1 −1 0 1 x t M =1 0 1 −1 0 1 x t M =3 0 1 −1 0 1 x t M =9 0 1 −1 0 1 Q: On which datapoints should performance be measured? Best performance on training

Machine Learning 1 10

P: Performance measure

= 2

= 8

= 0

Classification

accuracy(y, y) = 1nsamples

nsamples

∑i=1# [yi

-

- Ji )

indicator fund. = {1 if Yi -- Fi0 otherwise

Page 11: Machine Learning 1 · 0 1 −1 0 1 x t M =1 0 1 −1 0 1 x t M =3 0 1 −1 0 1 x t M =9 0 1 −1 0 1 Q: On which datapoints should performance be measured? Best performance on training

Machine Learning 1 11

P: Performance measureRegression

Polynomials of order M (red) fit to data constructed as t = sin(2πx) + ε (green)

x

t

M = 0

0 1

−1

0

1

x

t

M = 1

0 1

−1

0

1

x

t

M = 3

0 1

−1

0

1

x

t

M = 9

0 1

−1

0

1

MSE(y, y) = 1nsamples

nsamples

∑i=1

largeMSE small MSE

MSE

rill. .

p.

'''

,,

°. .

. .

Il l

a

mean squared error z

( yi - ji )[fuecxi )

Page 12: Machine Learning 1 · 0 1 −1 0 1 x t M =1 0 1 −1 0 1 x t M =3 0 1 −1 0 1 x t M =9 0 1 −1 0 1 Q: On which datapoints should performance be measured? Best performance on training

Machine Learning 1 12

P: Performance measureClustering

6 1. Introduction

SID42354SID31984SID301902SIDW128368SID375990SID360097SIDW325120ESTsChr.10SIDW365099SID377133SID381508SIDW308182SID380265SIDW321925ESTsChr.15SIDW362471SIDW417270SIDW298052SID381079SIDW428642TUPLE1TUP1ERLUMENSIDW416621SID43609ESTsSID52979SIDW357197SIDW366311ESTsSMALLNUCSIDW486740ESTsSID297905SID485148SID284853ESTsChr.15SID200394SIDW322806ESTsChr.2SIDW257915SID46536SIDW488221ESTsChr.5SID280066SIDW376394ESTsChr.15SIDW321854WASWiskottHYPOTHETICALSIDW376776SIDW205716SID239012SIDW203464HLACLASSISIDW510534SIDW279664SIDW201620SID297117SID377419SID114241ESTsCh31SIDW376928SIDW310141SIDW298203PTPRCSID289414SID127504ESTsChr.3SID305167SID488017SIDW296310ESTsChr.6SID47116MITOCHONDRIAL60ChrSIDW376586HomosapiensSIDW487261SIDW470459SID167117SIDW31489SID375812DNAPOLYMERSID377451ESTsChr.1MYBPROTOSID471915ESTsSIDW469884HumanmRNASIDW377402ESTsSID207172RASGTPASESID325394H.sapiensmRNAGNALSID73161SIDW380102SIDW299104

BREA

STRE

NAL

MELAN

OMA

MELAN

OMA

MCF

7D-repro

COLO

NCO

LON

K562B-repro

COLO

NNS

CLC

LEUK

EMIA

RENA

LMELAN

OMA

BREA

STCN

SCN

SRE

NAL

MCF

7A-repro

NSCL

CK562A-repro

COLO

NCN

SNS

CLC

NSCL

CLEUK

EMIA

CNS

OVA

RIAN

BREA

STLEUK

EMIA

MELAN

OMA

MELAN

OMA

OVA

RIAN

OVA

RIAN

NSCL

CRE

NAL

BREA

STMELAN

OMA

OVA

RIAN

OVA

RIAN

NSCL

CRE

NAL

BREA

STMELAN

OMA

LEUK

EMIA

COLO

NBR

EAST

LEUK

EMIA

COLO

NCN

SMELAN

OMA

NSCL

CPR

OST

ATE

NSCL

CRE

NAL

RENA

LNS

CLC

RENA

LLEUK

EMIA

OVA

RIAN

PROST

ATE

COLO

NBR

EAST

RENA

LUN

KNOWN

FIGURE 1.3. DNA microarray data: expression matrix of 6830 genes (rows)and 64 samples (columns), for the human tumor data. Only a random sampleof 100 rows are shown. The display is a heat map, ranging from bright green(negative, under expressed) to bright red (positive, over expressed). Missing valuesare gray. The rows and columns are displayed in a randomly chosen order.

Expression matrix of genes (rows) for 64 human tumor samples (columns). [source: ESL 1.3]

within cluster sum of squares =nsamples

∑i=1

minμj∈C

on;

H Mj - Xin'

*

Page 13: Machine Learning 1 · 0 1 −1 0 1 x t M =1 0 1 −1 0 1 x t M =3 0 1 −1 0 1 x t M =9 0 1 −1 0 1 Q: On which datapoints should performance be measured? Best performance on training

Machine Learning 1 13

P: Performance measureMSE(y, y) =

1

nsamples

nsamplesX

i=1

(yi � yi)2

<latexit sha1_base64="9De/MMwpWJ6v1Mkej6eg31VuxdU=">AAADGXicfZFNbxMxEIadbYESPprCkYtFhJSiUu1WSHCpVPEhcakogrSVsulq1utNrPhjZXuBleVfwZFfww1xhQt/hRPeJJVoCoxk6dXM4/F43rzizNg4/tmJ1tavXL22cb174+at25u9rTvHRtWa0CFRXOnTHAzlTNKhZZbT00pTEDmnJ/nseVs/eU+1YUq+s01FxwImkpWMgA2prCdTAXaqhTt8+9IPmh2cTsHiZhvv47TUQFzinczcOWVAVJwa7z1OTS0yx/YTf/YPAg+ajOFHy5YZ2z7by3r9eDeeB74skqXoo2UcZVudT2mhSC2otISDMaMkruzYgbaMcOq7aW1oBWQGEzoKUoKgZuzmi/H4QcgUuFQ6HGnxPPvnDQfCmEbkgWzHN6u1NvnXWmHahhdedzmfgdbQrMxky6djx2RVWyrJYqSy5tgq3PqBC6YpsbwJAohm4VeYTCFs3gbXuukLGn6t6WGY4HVFNVilH7oU9ETARx+2MEl3WvU/kMlzMKhuKukHooQAWQTPcj9Kxgvv8tL1E78C5GYB5IoX7aIUn0PBxmTVtMvieG83iXeTN4/7B8+Whm6ge+g+GqAEPUEH6BU6QkNE0A/0q7PWWY8+R1+ir9G3BRp1lnfuogsRff8NjQcCRQ==</latexit><latexit sha1_base64="9De/MMwpWJ6v1Mkej6eg31VuxdU=">AAADGXicfZFNbxMxEIadbYESPprCkYtFhJSiUu1WSHCpVPEhcakogrSVsulq1utNrPhjZXuBleVfwZFfww1xhQt/hRPeJJVoCoxk6dXM4/F43rzizNg4/tmJ1tavXL22cb174+at25u9rTvHRtWa0CFRXOnTHAzlTNKhZZbT00pTEDmnJ/nseVs/eU+1YUq+s01FxwImkpWMgA2prCdTAXaqhTt8+9IPmh2cTsHiZhvv47TUQFzinczcOWVAVJwa7z1OTS0yx/YTf/YPAg+ajOFHy5YZ2z7by3r9eDeeB74skqXoo2UcZVudT2mhSC2otISDMaMkruzYgbaMcOq7aW1oBWQGEzoKUoKgZuzmi/H4QcgUuFQ6HGnxPPvnDQfCmEbkgWzHN6u1NvnXWmHahhdedzmfgdbQrMxky6djx2RVWyrJYqSy5tgq3PqBC6YpsbwJAohm4VeYTCFs3gbXuukLGn6t6WGY4HVFNVilH7oU9ETARx+2MEl3WvU/kMlzMKhuKukHooQAWQTPcj9Kxgvv8tL1E78C5GYB5IoX7aIUn0PBxmTVtMvieG83iXeTN4/7B8+Whm6ge+g+GqAEPUEH6BU6QkNE0A/0q7PWWY8+R1+ir9G3BRp1lnfuogsRff8NjQcCRQ==</latexit><latexit sha1_base64="9De/MMwpWJ6v1Mkej6eg31VuxdU=">AAADGXicfZFNbxMxEIadbYESPprCkYtFhJSiUu1WSHCpVPEhcakogrSVsulq1utNrPhjZXuBleVfwZFfww1xhQt/hRPeJJVoCoxk6dXM4/F43rzizNg4/tmJ1tavXL22cb174+at25u9rTvHRtWa0CFRXOnTHAzlTNKhZZbT00pTEDmnJ/nseVs/eU+1YUq+s01FxwImkpWMgA2prCdTAXaqhTt8+9IPmh2cTsHiZhvv47TUQFzinczcOWVAVJwa7z1OTS0yx/YTf/YPAg+ajOFHy5YZ2z7by3r9eDeeB74skqXoo2UcZVudT2mhSC2otISDMaMkruzYgbaMcOq7aW1oBWQGEzoKUoKgZuzmi/H4QcgUuFQ6HGnxPPvnDQfCmEbkgWzHN6u1NvnXWmHahhdedzmfgdbQrMxky6djx2RVWyrJYqSy5tgq3PqBC6YpsbwJAohm4VeYTCFs3gbXuukLGn6t6WGY4HVFNVilH7oU9ETARx+2MEl3WvU/kMlzMKhuKukHooQAWQTPcj9Kxgvv8tL1E78C5GYB5IoX7aIUn0PBxmTVtMvieG83iXeTN4/7B8+Whm6ge+g+GqAEPUEH6BU6QkNE0A/0q7PWWY8+R1+ir9G3BRp1lnfuogsRff8NjQcCRQ==</latexit><latexit sha1_base64="9De/MMwpWJ6v1Mkej6eg31VuxdU=">AAADGXicfZFNbxMxEIadbYESPprCkYtFhJSiUu1WSHCpVPEhcakogrSVsulq1utNrPhjZXuBleVfwZFfww1xhQt/hRPeJJVoCoxk6dXM4/F43rzizNg4/tmJ1tavXL22cb174+at25u9rTvHRtWa0CFRXOnTHAzlTNKhZZbT00pTEDmnJ/nseVs/eU+1YUq+s01FxwImkpWMgA2prCdTAXaqhTt8+9IPmh2cTsHiZhvv47TUQFzinczcOWVAVJwa7z1OTS0yx/YTf/YPAg+ajOFHy5YZ2z7by3r9eDeeB74skqXoo2UcZVudT2mhSC2otISDMaMkruzYgbaMcOq7aW1oBWQGEzoKUoKgZuzmi/H4QcgUuFQ6HGnxPPvnDQfCmEbkgWzHN6u1NvnXWmHahhdedzmfgdbQrMxky6djx2RVWyrJYqSy5tgq3PqBC6YpsbwJAohm4VeYTCFs3gbXuukLGn6t6WGY4HVFNVilH7oU9ETARx+2MEl3WvU/kMlzMKhuKukHooQAWQTPcj9Kxgvv8tL1E78C5GYB5IoX7aIUn0PBxmTVtMvieG83iXeTN4/7B8+Whm6ge+g+GqAEPUEH6BU6QkNE0A/0q7PWWY8+R1+ir9G3BRp1lnfuogsRff8NjQcCRQ==</latexit>

x

t

M = 0

0 1

−1

0

1

x

t

M = 1

0 1

−1

0

1

x

t

M = 3

0 1

−1

0

1

x

t

M = 9

0 1

−1

0

1

Q: On which datapoints should performance be measured?

Best performance on training set :

Best performance on new datapoints :

Generalisation:

g.positing

i

q ,bad

!

great bad

← .

performance shouldbe

measured on new data

[ testdata )IT

Page 14: Machine Learning 1 · 0 1 −1 0 1 x t M =1 0 1 −1 0 1 x t M =3 0 1 −1 0 1 x t M =9 0 1 −1 0 1 Q: On which datapoints should performance be measured? Best performance on training

Machine Learning 1

What is machine learning?

14

“A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.”

- Tom M. Mitchell

Machine Learning, Tom Mitchell, McGraw Hill, 1997

•oo