Chapter II - Overview of Supervised Leraning

7/24/2019 Chapter II - Overview of Supervised Leraning

1/25

Overview of SupervisedLearning

By

Amitava Bandyopadhyay andBoby John

SQC & OR Division

ndian Statisti!a" nstitute


2/25

Contents

Supervised "earning as a fun!tion appro#imation

$arametri! and non%parametri! methods of fun!tionappro#imation

wo e#tremes' "inear mode"s and nearest neighbour

(a)or !"asses of approa!h ypes of supervised "earning prob"ems ' predi!tive and

e#p"anatory

Bayes* optima" !"assi+er

Assessing mode" a!!ura!y and ,ua"ity of +t ' training andho"dout data- !on!epts of (S.- training and test errors- biasand varian!e- /e#ibi"ity and interpretabi"ity- over +tting and itsimp"i!ations-

(ode" se"e!tion basi! "esson

$rob"ems of high dimensiona" data


3/25

Supervised Learning as 0un!tionAppro#imation

Supervised "earning !onsists of estimating a target variab"eon the basis of a set of inputs1 n genera"2 therefore2 theprob"em may be mathemati!a""y stated as

Y = fhat(X) + where X represents a vector of inputvariables (x1, x2,,xk) and Y may or may not be a vector

!he term " represents random error, to be explained later

n the supervised "earning set up we often assume that thetarget may be e#pressed as a fun!tion of the inputs13owever2 the true fun!tion ' say f(X) is genera""y un4nownand fhat(X)is an estimate of the true fun!tion1

#ote$ %n supervised learnin& problem we &enerally tryto estimate the avera&e (mean), median, rate orproportion of the tar&et variable for &iven input values


4/25

Data for (ode" 0itting

0itting the fun!tion to estimate va"ues of theoutput 5target6 variab"e 7 is !a""ed modeltting1

he mode"s are +tted using data !o""e!ted

on both the output 576 as we"" as the input586 variab"es1 n the usua" setup2 the dataare represented as a 59 # 5p : ;66 matri#where p gives the number of input

variab"es1 n most !ases 7 is not a ve!tor1 3owever2

there are o!!asions when there are morethan one output variab"es1


5/25

raining2


6/25

wo Di@erent ypes of(ode"s

(ode"s may be +tted to estimate the va"ue of 7 or to !"assifythe response into one of severa" !"asses ' these two typesmay be referred to as estimation and !"assi+!ation settingrespe!tive"y

'stimation settin&$ n this !ase we estimate the averageor median of 7 for a given set of input variab"es1 n this setup

the error is usua""y measured as (y yhat)2/ n2 where yhatrepresents the estimated va"ue of y for given va"ues of #1 n!ertain !ases average abso"ute deviation is a"so ta4en1

lassication settin&$n this !ase we !"assify the responseinto one of severa" !"asses on the basis of the va"ues of 81 n

this setup the error is measured as I(yi yhati) / n1 hefun!tion I(yi yhati) is !a""ed the indi!ator fun!tion and it

ta4es the va"ue ; if yi yhatiand > otherwise1 he number

of !ases for whi!h the error is measured is given by n1


7/25

9ote

Apart from the two settings of estimationand !"assi+!ation2 we sometimes havehypotheses testing setup

n this setup2 !ertain statements made aboutsome variab"es ' usua""y response variab"esare veri+ed from data1 n order to verify thestatements it is often ne!essary to estimate

some va"ues1 hese a!tivities and the!orresponding methodo"ogies have been!overed in a separate se!tion of this !ourse


8/25

Di@erent ypes of


9/25

Bayes* C"assi+er Bayes* !"assi+er provides an optima"ity !riteria for

!"assi+!ation mode"s Let the response 7 be a !ategori!a" variab"e with 4

di@erent !"asses 5"abe"s6

Consider a !"assi+er that !"assi+es 7 to !"ass ) su!h

that $57 E ) 8 E x6 F $57 E 4 8 E x6 for a"" 4 )2 i1e1the response is a""o!ated to the !"ass with ma#imum!onditiona" probabi"ity1 his !"assi+er is !a""ed Bayes*!"assi+er and it !an be shown that the Bayes*!"assi+er gives the "owest rate of !"assi+!ation error

among a"" !"assi+ers1 *verall +ayes 'rror -ate .; ' .5ma#)$57 E ) 8 E

x6 ' the e#pe!tation averages the probabi"ity over a""possib"e va"ues of 8


10/25

$arametri! and 9on%$arametri! (ethods

$arametri! mode"s assume a parti!u"ar form of thefun!tion ' say "inear or po"ynomia"- e1g1 the ana"ystmay assume "inearity G f586 E H>: H;8;:HI8I:111

:Hp8p1 n this !ase the ana"yst wi"" on"y have to

estimate a set of parameters to +t the mode"1

9on%parametri! methods do not ma4e e#p"i!itassumptions about the fun!tiona" form of f1 nsteadthey see4 an estimate of f that gets as !"ose to thedata points as possib"e without being too rough or

wigg"y1 hus non%parametri! methods aim at +ttingthe data as a!!urate"y as possib"e but does notassume how the inputs may be re"ated to theoutput 5target61


11/25

Comparison of $arametri! and 9on%$arametri! (ode"s

/dvanta&es of non0parametric approach$ As

these approa!hes avoid the assumption of a parti!u"arfun!tiona" form of f2 they have the potentia" toa!!urate"y +t a wide range of possib"e shapes of f1 n!ontrast2 a parametri! approa!h assumes a fun!tiona"form and therefore su@ers from a ma)or ris4 of the

assumed fun!tiona" form being very di@erent from thetrue shape

/dvanta&es of parametric approach$ heseapproa!hes redu!e the prob"em to one of estimating ahandfu" of parameters and !onse,uent"y re,uire are"ative"y sma""er number of observations1 n !ontrastnon%parametri! methods depend on the observedva"ues of 7 and tries to un!over under"ying patterns1Conse,uent"y these methods re,uire mu!h "arger

number of observations1 hen a parametri! mode" +ts we""2 we may assume that a


12/25

wo


13/25

he Continuum of (ode"s e present the mode"s from the perspe!tive of

/e#ibi"ity !omp"e#ity vs1 interpretabi"ity1 heordering is appro#imate

he mode"s that appear in the beginning are moreinterpretab"e but "ess /e#ib"e

a1 Linear mode"sb1 Subset se"e!tions2 stepwise regression and ridge

regression

!1 enera"ied Linear (ode"s 5L(6

d1 enera"ied Additive (ode"s 5A(6

e1 ree based mode"s

f1 Bagging and Boosting

g1 Regression sp"ines and "o!a" regression mode"s

h1 Support


14/25


15/25

Con!ept of Over 0itting

hen a +tted mode" shows very sma"" training errorbut high test error2 the mode" is said to have over+tted the data

Over +tting refers to e#tra!ting nuan!es of the

parti!u"ar data rather than e#p"aining thephenomenon1

hese mode"s have "ow bias for the training dataset1 3owever2 they have high varian!e sin!e +tting

with a di@erent data set may "ead to "arge !hange ofthe mode" parameters

Over +tted mode"s +t the training data very we"" butdoes not +t the va"idation test data we""1


16/25

est and raining .rror

est .rror

raining

.rror

Over +ttingArea

.

rrorRate

0"e#ibi"ity Comp"e#ity

Nnder +ttingArea


17/25

Con!ept of 0"e#ibi"ity and Comp"e#ity

A method is said to be more /e#ib"e in !ase ita""ows a "arger range of shapes to be +tted

(ore /e#ib"e mode"s wi"" re,uire more number ofparameters to be estimated1 0or e#amp"e2 a 4

nearest neighbour approa!h with 4 E ;> and 9 E;>>>> wi"" re,uire ;>>> parameters to beestimated1 3owever2 if there are ;> independentvariab"es a "inear mode" wi"" re,uire on"y ;;parameters to be estimated1

(ode"s with "arger number of parameters is saidto be more !omp"e#1 hus more /e#ib"e mode"sare e#pe!ted to be more !omp"e#1


18/25

ypes of .rrors

n a mode" +tting e#er!ise we !ome a!ross three types of errors 'the irredu!ib"e error2 bias and varian!e

%rreducible error$ As we may fai" to !onsider a"" variab"es orthere may be un!ontro""ab"e variation even when a"" measurab"evariab"es have been !onsidered2 a"" +tted mode"s have !ertain,uantum of error1 his error is !a""ed the irredu!ib"e error and isoften denoted by "1

+ias$ he amount by whi!h the average of the estimate di@er

from the true mean1 Lower bias2 therefore2 indi!ates "owerdeparture from the true mean on an average

ariance$ he e#tent to whi!h the estimated fun!tion 5fhat6varies around its mean1


19/25

ypes of Supervised Learning $rob"ems

Supervised "earning prob"ems may be divided into three broad

!"asses2 name"y 'e#p"anatory2 predi!tive2 and !ombination1 e are often interested in understanding the way the response 7

is impa!ted by the input variab"es 8;211128p1 n this situation we

wish to estimate f2 but our goa" is not ne!essari"y to ma4epredi!tions for 71 e instead want to understand the re"ationshipbetween 8 and 72 or more spe!i+!a""y2 to understand how 7

!hanges as a fun!tion of 8;211128p1 9ow fhat !annot be treated as ab"a!4 bo#2 be!ause we need to 4now its e#a!t form1 hese setupsare often !a""ed 1lanatory -nalytics!

redicti"e -nalytics,n !ertain !ases the ana"yst may be so"e"yinterested in predi!tion a!!ura!y and may not be interested in

in!reasing substantive understanding1 n su!h !ases it isimportant to use very /e#ib"e fun!tions that estimate va"ues of fa!!urate"y1

Co#$ination, n !ertain !ases we may be interested in bothpredi!tion as we"" as e#p"anation of phenomenon1


20/25

.#amp"es of .#p"anatoryAna"yti!s

Which predictors are associated with the response? t isoften the !ase that on"y a sma"" fra!tion of the avai"ab"epredi!tors are substantia""y asso!iated with 71 dentifyingthe few important predi!tors among a "arge set of possib"evariab"es !an be e#treme"y usefu"2 depending on theapp"i!ation1

What is the relationship between the response and eachpredictor? Some predi!tors may have a positive re"ationshipwith 72 in the sense that in!reasing the predi!tor isasso!iated with in!reasing va"ues of 71 Other predi!tors mayhave the opposite re"ationship1 Depending on the

!omp"e#ity of f2 the re"ationship between the response anda given predi!tor may a"so depend on the va"ues of theother predi!tors1

Can the relationship between Y and each predictor beadeuately summari!ed using a linear euation" or is the

relationship more complicated?


21/25

.#amp"es of $redi!tiveAna"yti!s

.stimating sto!4 pri!e

0inding out whether a !redit !ardtransa!tion is fraudu"ent

.stimating how "ong a parti!u"ar e!onomi!situation "i4e a re!ession may "ast

#ote;1 hether a parti!u"ar prob"em is e#p"anatory or predi!tive

depends on the spe!i+! !ondition1 An investor may be interestedin 4nowing the possib"e sto!4 pri!e or may wish to understandthe variab"es that impa!t the pri!e

I1 n many !ases the prob"em at hand may be a !ombination ofpredi!tive as we"" as e#p"anatory1 e may be interested ina!!urate"y predi!ting the dri""ing time of a oi" rig or the !han!e offai"ure of an instrument and at the same time we may "i4e to


22/25

(ode" Se"e!tion Basi!Lesson

Depending on whether our u"timate goa" ispredi!tion2 inferen!e2 or a !ombination of the two2di@erent methods for estimating f may beappropriate1 0or e#amp"e2 "inear mode"s a""ow for

re"ative"y simp"e and interpretab"e inferen!e2 butmay not yie"d as a!!urate predi!tions as someother approa!hes1 n !ontrast2 some of the high"ynon%"inear approa!hes that we dis!uss in the "ater

!hapters in this !ourse !an potentia""y provide,uite a!!urate predi!tions for 72 but this !omes atthe e#pense of a "ess interpretab"e mode" for whi!hinferen!e is more !ha""enging1


23/25

Summary A "arge part of Business Ana"yti!s !onsists of deve"oping

understanding about responses or predi!ting their va"uesor a !ombination of both1 he te!hni,ues used for thispurpose are !a""ed supervised "earning te!hni,ues

he supervised "earning te!hni,ues essentia""y boi" downto estimating a fun!tion of the e#p"anatory variab"es that

appro#imate the va"ue of the responses given someva"ues of the e#p"anatory variab"es

he supervised "earning prob"ems !onsist of prob"ems ofestimation and prob"ems of !"assi+!ation1 Sometimes wehave the prob"ems of hypothesis testing as we""

Choosing and +tting the fun!tion is 4nown as mode"+tting1 (ode"s with "arger number of parameters aremore /e#ib"e !omp"e#1 3owever su!h mode"s areusua""y more di!u"t to interpret

wo broad approa!hes ' parametri! and non%parametri!

are used to estimate the fun!tion1 he parametri!


24/25

Summary 5ContinuedP6

(ode" +tting is !arried out from twoperspe!tives ' e#p"anatory andpredi!tive1 hi"e /e#ib"e mode"s are

preferred for predi!tion2 interpretabi"ity ismost important for e#p"anatory mode"s

he data used to +t mode"s is !a""edtraining data1 enera""y the !o""e!teddata need to be divided into three!"asses ' training2 test and va"idation


25/25

Review Questions

hat is e#p"anatory ana"yti!s

hat is predi!tive ana"yti!s

hat is a non%parametri! mode" ive an e#amp"e1

hat are bias and varian!e

hat is meant by over +tting hy is it ris4y

$arametri! mode"s are genera""y more /e#ib"e but"ess interpretab"e1 Do you agree

.#p"ain the !on!ept of 99 brie/y1

n a "inear mode" you try to e#press .57 8;2 8I2P2

846 as a "inear fun!tion of 8;2 8I2P2 841 Do you

agree .#p"ain brie/y1

Documents

Chapter II - Overview of Supervised Leraning