Chapter II - Overview of Supervised Leraning

  • Upload
    chirag

  • View
    219

  • Download
    0

Embed Size (px)

Citation preview

  • 7/24/2019 Chapter II - Overview of Supervised Leraning

    1/25

    Overview of SupervisedLearning

    By

    Amitava Bandyopadhyay andBoby John

    SQC & OR Division

    ndian Statisti!a" nstitute

  • 7/24/2019 Chapter II - Overview of Supervised Leraning

    2/25

    Contents

    Supervised "earning as a fun!tion appro#imation

    $arametri! and non%parametri! methods of fun!tionappro#imation

    wo e#tremes' "inear mode"s and nearest neighbour

    (a)or !"asses of approa!h ypes of supervised "earning prob"ems ' predi!tive and

    e#p"anatory

    Bayes* optima" !"assi+er

    Assessing mode" a!!ura!y and ,ua"ity of +t ' training andho"dout data- !on!epts of (S.- training and test errors- biasand varian!e- /e#ibi"ity and interpretabi"ity- over +tting and itsimp"i!ations-

    (ode" se"e!tion basi! "esson

    $rob"ems of high dimensiona" data

  • 7/24/2019 Chapter II - Overview of Supervised Leraning

    3/25

    Supervised Learning as 0un!tionAppro#imation

    Supervised "earning !onsists of estimating a target variab"eon the basis of a set of inputs1 n genera"2 therefore2 theprob"em may be mathemati!a""y stated as

    Y = fhat(X) + where X represents a vector of inputvariables (x1, x2,,xk) and Y may or may not be a vector

    !he term " represents random error, to be explained later

    n the supervised "earning set up we often assume that thetarget may be e#pressed as a fun!tion of the inputs13owever2 the true fun!tion ' say f(X) is genera""y un4nownand fhat(X)is an estimate of the true fun!tion1

    #ote$ %n supervised learnin& problem we &enerally tryto estimate the avera&e (mean), median, rate orproportion of the tar&et variable for &iven input values

  • 7/24/2019 Chapter II - Overview of Supervised Leraning

    4/25

    Data for (ode" 0itting

    0itting the fun!tion to estimate va"ues of theoutput 5target6 variab"e 7 is !a""ed modeltting1

    he mode"s are +tted using data !o""e!ted

    on both the output 576 as we"" as the input586 variab"es1 n the usua" setup2 the dataare represented as a 59 # 5p : ;66 matri#where p gives the number of input

    variab"es1 n most !ases 7 is not a ve!tor1 3owever2

    there are o!!asions when there are morethan one output variab"es1

  • 7/24/2019 Chapter II - Overview of Supervised Leraning

    5/25

    raining2

  • 7/24/2019 Chapter II - Overview of Supervised Leraning

    6/25

    wo Di@erent ypes of(ode"s

    (ode"s may be +tted to estimate the va"ue of 7 or to !"assifythe response into one of severa" !"asses ' these two typesmay be referred to as estimation and !"assi+!ation settingrespe!tive"y

    'stimation settin&$ n this !ase we estimate the averageor median of 7 for a given set of input variab"es1 n this setup

    the error is usua""y measured as (y yhat)2/ n2 where yhatrepresents the estimated va"ue of y for given va"ues of #1 n!ertain !ases average abso"ute deviation is a"so ta4en1

    lassication settin&$n this !ase we !"assify the responseinto one of severa" !"asses on the basis of the va"ues of 81 n

    this setup the error is measured as I(yi yhati) / n1 hefun!tion I(yi yhati) is !a""ed the indi!ator fun!tion and it

    ta4es the va"ue ; if yi yhatiand > otherwise1 he number

    of !ases for whi!h the error is measured is given by n1

  • 7/24/2019 Chapter II - Overview of Supervised Leraning

    7/25

    9ote

    Apart from the two settings of estimationand !"assi+!ation2 we sometimes havehypotheses testing setup

    n this setup2 !ertain statements made aboutsome variab"es ' usua""y response variab"esare veri+ed from data1 n order to verify thestatements it is often ne!essary to estimate

    some va"ues1 hese a!tivities and the!orresponding methodo"ogies have been!overed in a separate se!tion of this !ourse

  • 7/24/2019 Chapter II - Overview of Supervised Leraning

    8/25

    Di@erent ypes of

  • 7/24/2019 Chapter II - Overview of Supervised Leraning

    9/25

    Bayes* C"assi+er Bayes* !"assi+er provides an optima"ity !riteria for

    !"assi+!ation mode"s Let the response 7 be a !ategori!a" variab"e with 4

    di@erent !"asses 5"abe"s6

    Consider a !"assi+er that !"assi+es 7 to !"ass ) su!h

    that $57 E ) 8 E x6 F $57 E 4 8 E x6 for a"" 4 )2 i1e1the response is a""o!ated to the !"ass with ma#imum!onditiona" probabi"ity1 his !"assi+er is !a""ed Bayes*!"assi+er and it !an be shown that the Bayes*!"assi+er gives the "owest rate of !"assi+!ation error

    among a"" !"assi+ers1 *verall +ayes 'rror -ate .; ' .5ma#)$57 E ) 8 E

    x6 ' the e#pe!tation averages the probabi"ity over a""possib"e va"ues of 8

  • 7/24/2019 Chapter II - Overview of Supervised Leraning

    10/25

    $arametri! and 9on%$arametri! (ethods

    $arametri! mode"s assume a parti!u"ar form of thefun!tion ' say "inear or po"ynomia"- e1g1 the ana"ystmay assume "inearity G f586 E H>: H;8;:HI8I:111

    :Hp8p1 n this !ase the ana"yst wi"" on"y have to

    estimate a set of parameters to +t the mode"1

    9on%parametri! methods do not ma4e e#p"i!itassumptions about the fun!tiona" form of f1 nsteadthey see4 an estimate of f that gets as !"ose to thedata points as possib"e without being too rough or

    wigg"y1 hus non%parametri! methods aim at +ttingthe data as a!!urate"y as possib"e but does notassume how the inputs may be re"ated to theoutput 5target61

  • 7/24/2019 Chapter II - Overview of Supervised Leraning

    11/25

    Comparison of $arametri! and 9on%$arametri! (ode"s

    /dvanta&es of non0parametric approach$ As

    these approa!hes avoid the assumption of a parti!u"arfun!tiona" form of f2 they have the potentia" toa!!urate"y +t a wide range of possib"e shapes of f1 n!ontrast2 a parametri! approa!h assumes a fun!tiona"form and therefore su@ers from a ma)or ris4 of the

    assumed fun!tiona" form being very di@erent from thetrue shape

    /dvanta&es of parametric approach$ heseapproa!hes redu!e the prob"em to one of estimating ahandfu" of parameters and !onse,uent"y re,uire are"ative"y sma""er number of observations1 n !ontrastnon%parametri! methods depend on the observedva"ues of 7 and tries to un!over under"ying patterns1Conse,uent"y these methods re,uire mu!h "arger

    number of observations1 hen a parametri! mode" +ts we""2 we may assume that a

  • 7/24/2019 Chapter II - Overview of Supervised Leraning

    12/25

    wo

  • 7/24/2019 Chapter II - Overview of Supervised Leraning

    13/25

    he Continuum of (ode"s e present the mode"s from the perspe!tive of

    /e#ibi"ity !omp"e#ity vs1 interpretabi"ity1 heordering is appro#imate

    he mode"s that appear in the beginning are moreinterpretab"e but "ess /e#ib"e

    a1 Linear mode"sb1 Subset se"e!tions2 stepwise regression and ridge

    regression

    !1 enera"ied Linear (ode"s 5L(6

    d1 enera"ied Additive (ode"s 5A(6

    e1 ree based mode"s

    f1 Bagging and Boosting

    g1 Regression sp"ines and "o!a" regression mode"s

    h1 Support

  • 7/24/2019 Chapter II - Overview of Supervised Leraning

    14/25

  • 7/24/2019 Chapter II - Overview of Supervised Leraning

    15/25

    Con!ept of Over 0itting

    hen a +tted mode" shows very sma"" training errorbut high test error2 the mode" is said to have over+tted the data

    Over +tting refers to e#tra!ting nuan!es of the

    parti!u"ar data rather than e#p"aining thephenomenon1

    hese mode"s have "ow bias for the training dataset1 3owever2 they have high varian!e sin!e +tting

    with a di@erent data set may "ead to "arge !hange ofthe mode" parameters

    Over +tted mode"s +t the training data very we"" butdoes not +t the va"idation test data we""1

  • 7/24/2019 Chapter II - Overview of Supervised Leraning

    16/25

    est and raining .rror

    est .rror

    raining

    .rror

    Over +ttingArea

    .

    rrorRate

    0"e#ibi"ity Comp"e#ity

    Nnder +ttingArea

  • 7/24/2019 Chapter II - Overview of Supervised Leraning

    17/25

    Con!ept of 0"e#ibi"ity and Comp"e#ity

    A method is said to be more /e#ib"e in !ase ita""ows a "arger range of shapes to be +tted

    (ore /e#ib"e mode"s wi"" re,uire more number ofparameters to be estimated1 0or e#amp"e2 a 4

    nearest neighbour approa!h with 4 E ;> and 9 E;>>>> wi"" re,uire ;>>> parameters to beestimated1 3owever2 if there are ;> independentvariab"es a "inear mode" wi"" re,uire on"y ;;parameters to be estimated1

    (ode"s with "arger number of parameters is saidto be more !omp"e#1 hus more /e#ib"e mode"sare e#pe!ted to be more !omp"e#1

  • 7/24/2019 Chapter II - Overview of Supervised Leraning

    18/25

    ypes of .rrors

    n a mode" +tting e#er!ise we !ome a!ross three types of errors 'the irredu!ib"e error2 bias and varian!e

    %rreducible error$ As we may fai" to !onsider a"" variab"es orthere may be un!ontro""ab"e variation even when a"" measurab"evariab"es have been !onsidered2 a"" +tted mode"s have !ertain,uantum of error1 his error is !a""ed the irredu!ib"e error and isoften denoted by "1

    +ias$ he amount by whi!h the average of the estimate di@er

    from the true mean1 Lower bias2 therefore2 indi!ates "owerdeparture from the true mean on an average

    ariance$ he e#tent to whi!h the estimated fun!tion 5fhat6varies around its mean1

  • 7/24/2019 Chapter II - Overview of Supervised Leraning

    19/25

    ypes of Supervised Learning $rob"ems

    Supervised "earning prob"ems may be divided into three broad

    !"asses2 name"y 'e#p"anatory2 predi!tive2 and !ombination1 e are often interested in understanding the way the response 7

    is impa!ted by the input variab"es 8;211128p1 n this situation we

    wish to estimate f2 but our goa" is not ne!essari"y to ma4epredi!tions for 71 e instead want to understand the re"ationshipbetween 8 and 72 or more spe!i+!a""y2 to understand how 7

    !hanges as a fun!tion of 8;211128p1 9ow fhat !annot be treated as ab"a!4 bo#2 be!ause we need to 4now its e#a!t form1 hese setupsare often !a""ed 1lanatory -nalytics!

    redicti"e -nalytics,n !ertain !ases the ana"yst may be so"e"yinterested in predi!tion a!!ura!y and may not be interested in

    in!reasing substantive understanding1 n su!h !ases it isimportant to use very /e#ib"e fun!tions that estimate va"ues of fa!!urate"y1

    Co#$ination, n !ertain !ases we may be interested in bothpredi!tion as we"" as e#p"anation of phenomenon1

  • 7/24/2019 Chapter II - Overview of Supervised Leraning

    20/25

    .#amp"es of .#p"anatoryAna"yti!s

    Which predictors are associated with the response? t isoften the !ase that on"y a sma"" fra!tion of the avai"ab"epredi!tors are substantia""y asso!iated with 71 dentifyingthe few important predi!tors among a "arge set of possib"evariab"es !an be e#treme"y usefu"2 depending on theapp"i!ation1

    What is the relationship between the response and eachpredictor? Some predi!tors may have a positive re"ationshipwith 72 in the sense that in!reasing the predi!tor isasso!iated with in!reasing va"ues of 71 Other predi!tors mayhave the opposite re"ationship1 Depending on the

    !omp"e#ity of f2 the re"ationship between the response anda given predi!tor may a"so depend on the va"ues of theother predi!tors1

    Can the relationship between Y and each predictor beadeuately summari!ed using a linear euation" or is the

    relationship more complicated?

  • 7/24/2019 Chapter II - Overview of Supervised Leraning

    21/25

    .#amp"es of $redi!tiveAna"yti!s

    .stimating sto!4 pri!e

    0inding out whether a !redit !ardtransa!tion is fraudu"ent

    .stimating how "ong a parti!u"ar e!onomi!situation "i4e a re!ession may "ast

    #ote;1 hether a parti!u"ar prob"em is e#p"anatory or predi!tive

    depends on the spe!i+! !ondition1 An investor may be interestedin 4nowing the possib"e sto!4 pri!e or may wish to understandthe variab"es that impa!t the pri!e

    I1 n many !ases the prob"em at hand may be a !ombination ofpredi!tive as we"" as e#p"anatory1 e may be interested ina!!urate"y predi!ting the dri""ing time of a oi" rig or the !han!e offai"ure of an instrument and at the same time we may "i4e to

  • 7/24/2019 Chapter II - Overview of Supervised Leraning

    22/25

    (ode" Se"e!tion Basi!Lesson

    Depending on whether our u"timate goa" ispredi!tion2 inferen!e2 or a !ombination of the two2di@erent methods for estimating f may beappropriate1 0or e#amp"e2 "inear mode"s a""ow for

    re"ative"y simp"e and interpretab"e inferen!e2 butmay not yie"d as a!!urate predi!tions as someother approa!hes1 n !ontrast2 some of the high"ynon%"inear approa!hes that we dis!uss in the "ater

    !hapters in this !ourse !an potentia""y provide,uite a!!urate predi!tions for 72 but this !omes atthe e#pense of a "ess interpretab"e mode" for whi!hinferen!e is more !ha""enging1

  • 7/24/2019 Chapter II - Overview of Supervised Leraning

    23/25

    Summary A "arge part of Business Ana"yti!s !onsists of deve"oping

    understanding about responses or predi!ting their va"uesor a !ombination of both1 he te!hni,ues used for thispurpose are !a""ed supervised "earning te!hni,ues

    he supervised "earning te!hni,ues essentia""y boi" downto estimating a fun!tion of the e#p"anatory variab"es that

    appro#imate the va"ue of the responses given someva"ues of the e#p"anatory variab"es

    he supervised "earning prob"ems !onsist of prob"ems ofestimation and prob"ems of !"assi+!ation1 Sometimes wehave the prob"ems of hypothesis testing as we""

    Choosing and +tting the fun!tion is 4nown as mode"+tting1 (ode"s with "arger number of parameters aremore /e#ib"e !omp"e#1 3owever su!h mode"s areusua""y more di!u"t to interpret

    wo broad approa!hes ' parametri! and non%parametri!

    are used to estimate the fun!tion1 he parametri!

  • 7/24/2019 Chapter II - Overview of Supervised Leraning

    24/25

    Summary 5ContinuedP6

    (ode" +tting is !arried out from twoperspe!tives ' e#p"anatory andpredi!tive1 hi"e /e#ib"e mode"s are

    preferred for predi!tion2 interpretabi"ity ismost important for e#p"anatory mode"s

    he data used to +t mode"s is !a""edtraining data1 enera""y the !o""e!teddata need to be divided into three!"asses ' training2 test and va"idation

  • 7/24/2019 Chapter II - Overview of Supervised Leraning

    25/25

    Review Questions

    hat is e#p"anatory ana"yti!s

    hat is predi!tive ana"yti!s

    hat is a non%parametri! mode" ive an e#amp"e1

    hat are bias and varian!e

    hat is meant by over +tting hy is it ris4y

    $arametri! mode"s are genera""y more /e#ib"e but"ess interpretab"e1 Do you agree

    .#p"ain the !on!ept of 99 brie/y1

    n a "inear mode" you try to e#press .57 8;2 8I2P2

    846 as a "inear fun!tion of 8;2 8I2P2 841 Do you

    agree .#p"ain brie/y1