CSC 3301-Lecture06 Introduction to Machine Learning

Embed Size (px)

Citation preview

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    1/56

    INTRODUCTION TO

    MACHINE LEARNING

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    2/56

    A Few Quotes

    • “A breakthrough in machine earning !ou" be !orthten Micro#o$t#% &'i Gate#( Chairman( Micro#o$t)

    • “Machine earning i# the ne*t Internet%&Ton+ Tether( Director( DAR,A)

    • Machine earning i# the hot ne! thing%&-ohn Henne##+( ,re#i"ent( .tan$or")

    • “/eb ranking# to"a+ are mo#t+ a matter o$ machineearning% &,rabhakar Ragha0an( Dir1 Re#earch( 2ahoo)

    •“Machine earning i# going to re#ut in a rea re0oution%&Greg ,a3a"o3ouo#( CTO( .un)

    • “Machine earning i# to"a+4# "i#continuit+%&-err+ 2ang( CEO( 2ahoo)

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    3/56

    De$inition#

    • Machine earning in0e#tigate# the mechani#m# b+ !hich

    kno!e"ge i# ac5uire" through e*3erience

    • Machine Learning i# the $ie" that concentrate# on

    in"uction agorithm# an" on other agorithm# that can be

    #ai" to 66earn177

    • Learning through the "ata #et#

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    4/56

    Mo"e

    • A mo"e o$ earning i# $un"amenta in an+ machine

    earning a33ication8• !ho i# earning &a com3uter 3rogram)

    • !hat i# earne" &a "omain)

    • $rom !hat the earner i# earning &the in$ormation #ource)

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    5/56

      Traditional Programming

     

    Machine Learning

    ComputerData

    ProgramOutput

    ComputerData

    OutputProgram

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    6/56

    Magic? 

    No, more like gardening

    • Seeds 9 Agorithm#

    • Nutrients 9 Data

    • Gardener  9 2ou

    • Plants 9 ,rogram#

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    7/56

    Sample Applications

    • /eb #earch• Com3utationa bioog+• :inance

    • E;commerce• .3ace e*3oration• Robotic#• In$ormation e*traction

    • .ocia net!ork#• Debugging•

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    8/56

    ML in a Nutshell

    • Ten# o$ thou#an"# o$ machine earning agorithm#

    • Hun"re"# ne! e0er+ +ear 

    • E0er+ machine earning agorithm ha# three com3onent#8•

    epresentation• !"aluation

    • #ptimi$ation

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    9/56

    epresentation

    • Deci#ion tree#• .et# o$ rue# > Logic 3rogram#• In#tance#• Gra3hica mo"e# &'a+e#>Marko0 net#)• Neura net!ork#• .u33ort 0ector machine#• Mo"e en#embe#• Etc1

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    10/56

    !"aluation

    • Accurac+• ,reci#ion an" reca• .5uare" error 

    • Likeihoo"• ,o#terior 3robabiit+• Co#t > Utiit+• Margin

    • Entro3+• ?;L "i0ergence• Etc1

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    11/56

    #ptimi$ation

    • Combinatoria o3timi@ation• E1g18 Gree"+ #earch

    • Con0e* o3timi@ation• E1g18 Gra"ient "e#cent

    • Con#traine" o3timi@ation• E1g18 Linear 3rogramming

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    12/56

    Data ,re3aration

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    13/56

    Data ,re3roce##ing

    • /h+ 3re3roce## the "ata

    • Data ceaning

    • Data integration an" tran#$ormation

    • Data re"uction

    • Di#creti@ation an" conce3t hierarch+ generation

    • .ummar+

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    14/56

    /h+ Data ,re3roce##ing

    • Data in the rea !or" i# "irt+• incom3ete8 acking attribute 0aue#( acking certain

    attribute# o$ intere#t( or containing on+ aggregate "ata

    • noi#+8 containing error# or outier#• incon#i#tent8 containing "i#cre3ancie# in co"e# or

    name#

    • No 5uait+ "ata( no 5uait+ mining re#ut#B• uait+ "eci#ion# mu#t be ba#e" on 5uait+ "ata

    • Data !arehou#e nee"# con#i#tent integration o$ 5uait+

    "ata

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    15/56

    Maor Ta#k# in Data ,re3roce##ing

    • Data ceaning• :i in mi##ing 0aue#( #mooth noi#+ "ata( i"enti$+ or remo0e outier#( an"

    re#o0e incon#i#tencie#

    • Data integration

    • Integration o$ muti3e "ataba#e#( "ata cube#( or $ie#

    • Data tran#$ormation• Normai@ation an" aggregation

    • Data re"uction

    • Obtain# re"uce" re3re#entation in 0oume but 3ro"uce# the #ame or#imiar ana+tica re#ut#

    • Data "i#creti@ation• ,art o$ "ata re"uction but !ith 3articuar im3ortance( e#3ecia+ $or

    numerica "ata

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    16/56

    :orm# o$ "ata 3re3roce##ing 

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    17/56

    Data ,re3roce##ing

    • /h+ 3re3roce## the "ata

    • Data ceaning 

    • Data integration an" tran#$ormation

    • Data re"uction

    • Di#creti@ation an" conce3t hierarch+ generation

    • .ummar+

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    18/56

    Data Ceaning

    • Data ceaning ta#k#

    • :i in mi##ing 0aue#

    • I"enti$+ outier# an" #mooth out noi#+ "ata

    • Correct incon#i#tent "ata

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    19/56

    Mi##ing Data

    • Data i# not a!a+# a0aiabe

    • E1g1( man+ tu3e# ha0e no recor"e" 0aue $or #e0era attribute#(

    #uch a# cu#tomer income in #ae# "ata

    • Mi##ing "ata ma+ be "ue to• e5ui3ment ma$unction

    • incon#i#tent !ith other recor"e" "ata an" thu# "eete"

    • "ata not entere" "ue to mi#un"er#tan"ing

    • certain "ata ma+ not be con#i"ere" im3ortant at the time o$ entr+

    • not regi#ter hi#tor+ or change# o$ the "ata

    • Mi##ing "ata ma+ nee" to be in$erre"1

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    20/56

    Ho! to Han"e Mi##ing Data

    • Ignore the tu3e8 u#ua+ "one !hen ca## abe i# mi##ing &a##uming

    the ta#k# in ca##i$icationnot e$$ecti0e !hen the 3ercentage o$

    mi##ing 0aue# 3er attribute 0arie# con#i"erab+)

    • :i in the mi##ing 0aue manua+8 te"iou# F in$ea#ibe

    • U#e a goba con#tant to $i in the mi##ing 0aue8 e1g1( “unkno!n%( a

    ne! ca##B

    • U#e the attribute mean to $i in the mi##ing 0aue

    • U#e the mo#t 3robabe 0aue to $i in the mi##ing 0aue8 in$erence;

    ba#e" #uch a# 'a+e#ian $ormua or "eci#ion tree

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    21/56

    Noi#+ Data

    • Noi#e8 ran"om error or 0ariance in a mea#ure" 0ariabe

    • Incorrect attribute 0aue# ma+ "ue to

    • $aut+ "ata coection in#trument#

    • "ata entr+ 3robem#

    • "ata tran#mi##ion 3robem#• technoog+ imitation

    • incon#i#tenc+ in naming con0ention

    • Other "ata 3robem# !hich re5uire# "ata ceaning

    •"u3icate recor"#

    • incom3ete "ata

    • incon#i#tent "ata

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    22/56

    Ho! to Han"e Noi#+ Data

    • 'inning metho"8• $ir#t #ort "ata an" 3artition into &e5ui;"e3th) bin#

    • then #mooth b+ bin mean#( #mooth b+ bin me"ian(

    #mooth b+ bin boun"arie#( etc1• Cu#tering

    • "etect an" remo0e outier#

    • Combine" com3uter an" human in#3ection• "etect #u#3iciou# 0aue# an" check b+ human

    • Regre##ion• #mooth b+ $itting the "ata into regre##ion $unction#

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    23/56

    Data ,re3roce##ing

    • /h+ 3re3roce## the "ata

    • Data ceaning

    • Data integration an" tran#$ormation

    • Data re"uction

    • Di#creti@ation an" conce3t hierarch+ generation

    • .ummar+

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    24/56

    Data Integration

    • Data integration8• combine# "ata $rom muti3e #ource# into a coherent #tore

    • .chema integration

    • integrate meta"ata $rom "i$$erent #ource#• Entit+ i"enti$ication 3robem8 i"enti$+ rea !or" entitie#

    $rom muti3e "ata #ource#( e1g1( A1cu#t;i" ≡ '1cu#t;

    • Detecting an" re#o0ing "ata 0aue con$ict#•

    $or the #ame rea !or" entit+( attribute 0aue# $rom"i$$erent #ource# are "i$$erent• 3o##ibe rea#on#8 "i$$erent re3re#entation#( "i$$erent

    #cae#( e1g1( metric 0#1 'riti#h unit#

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    25/56

    Han"ing Re"un"ant Data

    • Re"un"ant "ata occur o$ten !hen integration o$ muti3e

    "ataba#e#

    • The #ame attribute ma+ ha0e "i$$erent name# in "i$$erent

    "ataba#e#• Care$u integration o$ the "ata $rom muti3e #ource#

    ma+ he3 re"uce>a0oi" re"un"ancie# an"

    incon#i#tencie# an" im3ro0e mining #3ee" an" 5uait+

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    26/56

    Data Tran#$ormation

    • .moothing8 remo0e noi#e $rom "ata

    • Aggregation8 #ummari@ation( "ata cube con#truction

    • Generai@ation8 conce3t hierarch+ cimbing

    • Normai@ation8 #cae" to $a !ithin a #ma( #3eci$ie"

    range

    • min;ma* normai@ation

    • @;#core normai@ation

    • normai@ation b+ "ecima #caing

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    27/56

    Data Tran#$ormation8

    Normai@ation

    • min;ma* normai@ation

    • @;#core normai@ation

    • normai@ation b+ "ecima #caingWhere  j is the smallest integer such that Max(| |)

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    28/56

    Data ,re3roce##ing

    • /h+ 3re3roce## the "ata

    • Data ceaning

    • Data integration an" tran#$ormation

    • Data re"uction

    • Di#creti@ation an" conce3t hierarch+ generation

    • .ummar+

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    29/56

    Data Re"uction .trategie#

    • /arehou#e ma+ #tore terab+te# o$ "ata8 Com3e*

    "ata ana+#i#>mining ma+ take a 0er+ ong time to run

    on the com3ete "ata #et

    • Data re"uction• Obtain# a re"uce" re3re#entation o$ the "ata #et that i#

    much #maer in 0oume but +et 3ro"uce# the #ame &or

    amo#t the #ame) ana+tica re#ut#

    • Data re"uction #trategie#• Data cube aggregation

    • Dimen#ionait+ re"uction

    • Numero#it+ re"uction

    • Di#creti@ation an" conce3t hierarch+ generation

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    30/56

    Data Cube Aggregation

    • The o!e#t e0e o$ a "ata cube

    • the aggregate" "ata $or an in"i0i"ua entit+ o$ intere#t

    • e1g1( a cu#tomer in a 3hone caing "ata !arehou#e1

    • Muti3e e0e# o$ aggregation in "ata cube#

    • :urther re"uce the #i@e o$ "ata to "ea !ith

    • Re$erence a33ro3riate e0e#

    • U#e the #mae#t re3re#entation !hich i# enough to #o0e

    the ta#k

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    31/56

    Dimen#ionait+ Re"uction

    • :eature #eection &i1e1( attribute #ub#et #eection)8

    • .eect a minimum #et o$ $eature# #uch that the

    3robabiit+ "i#tribution o$ "i$$erent ca##e# gi0en

    the 0aue# $or tho#e $eature# i# a# co#e a#3o##ibe to the origina "i#tribution gi0en the

    0aue# o$ a $eature#

    • re"uce o$ 3attern# in the 3attern#( ea#ier to

    un"er#tan"

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    32/56

    .am3ing

    • Ao! a mining agorithm to run in com3e*it+ that i#

    3otentia+ #ub;inear to the #i@e o$ the "ata

    • Choo#e a re3re#entati0e #ub#et o$ the "ata

    • .im3e ran"om #am3ing ma+ ha0e 0er+ 3oor3er$ormance in the 3re#ence o$ #ke!

    • De0eo3 a"a3ti0e #am3ing metho"#

    • .trati$ie" #am3ing8

    •  A33ro*imate the 3ercentage o$ each ca## &or #ub3o3uation o$ intere#t)in the o0era "ataba#e

    • U#e" in conunction !ith #ke!e" "ata

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    33/56

    Sampling

     S R  S W O R 

     (  s i m p l e  r a n d o m

      s a m p l e   i

     t h o u t 

     r e p l a c e m e

     n t )

    S R S W R 

    Ra !ata

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    34/56

    Data ,re3roce##ing

    • /h+ 3re3roce## the "ata

    • Data ceaning

    • Data integration an" tran#$ormation

    • Data re"uction

    • Di#creti@ation an" conce3t hierarch+ generation

    • .ummar+

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    35/56

    Di#creti@ation

    • Three t+3e# o$ attribute#8• Nomina 0aue# $rom an unor"ere" #et• Or"ina 0aue# $rom an or"ere" #et• Continuou# rea number#

    • Di#creti@ation8• "i0i"e the range o$ a continuou# attribute into inter0a#• .ome ca##i$ication agorithm# on+ acce3t categorica

    attribute#1• Re"uce "ata #i@e b+ "i#creti@ation• ,re3are $or $urther ana+#i#

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    36/56

    Di#creti@ation an" Conce3t hierach+

    • Di#creti@ation 

    • re"uce the number o$ 0aue# $or a gi0en continuou#

    attribute b+ "i0i"ing the range o$ the attribute into

    inter0a#1 Inter0a abe# can then be u#e" to re3ace

    actua "ata 0aue#1

    • Conce3t hierarchie# 

    • re"uce the "ata b+ coecting an" re3acing o! e0econce3t# uch a# numeric 0aue# $or the attribute age)

    b+ higher e0e conce3t# uch a# +oung( mi""e;age"(

    or #enior)1

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    37/56

    Data ,re3roce##ing

    • /h+ 3re3roce## the "ata

    • Data ceaning

    • Data integration an" tran#$ormation

    • Data re"uction

    • Di#creti@ation an" conce3t hierarch+ generation

    • .ummar+

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    38/56

    .ummar+

    • Data 3re3aration i# a big i##ue $or both !arehou#ing

    an" mining

    • Data 3re3aration incu"e#

    • Data ceaning an" "ata integration

    • Data re"uction an" $eature #eection

    • Di#creti@ation• A ot a metho"# ha0e been "e0eo3e" but #ti an acti0e

    area o$ re#earch

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    39/56

    Re$erence#

    • D1 ,1 'aou an" G1 ?1 Ta+i1 Enhancing "ata 5uait+ in "ata !arehou#e

    en0ironment#1 Communication# o$ ACM( 8JK;J( 1

    • -aga"i#h et a1( .3ecia I##ue on Data Re"uction Techni5ue#1 'uetin o$ the

    Technica Committee on Data Engineering( &)( December J1

    • D1 ,+e1 Data ,re3aration $or Data Mining1 Morgan ?au$mann( 1

    • T1 Re"man1 Data uait+8 Management an" Technoog+1 'antam 'ook#(

    Ne! 2ork( 1

    • 21 /an" an" R1 /ang1 Anchoring "ata 5uait+ "imen#ion# ontoogica

    $oun"ation#1 Communication# o$ ACM( K8P;Q( P1

    • R1 /ang( 1 .tore+( an" C1 :irth1 A $rame!ork $or ana+#i# o$ "ata 5uait+

    re#earch1 IEEE Tran#1 ?no!e"ge an" Data Engineering( J8PK;P( Q1

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    40/56

    T%pes o& Learning

    • Super"ised 'inducti"e( learning• Training "ata incu"e# "e#ire" out3ut#

    • )nsuper"ised learning• Training "ata "oe# not incu"e "e#ire" out3ut#

    • Semi*super"ised learning• Training "ata incu"e# a $e! "e#ire" out3ut#

    • ein&orcement learning• Re!ar"# $rom #e5uence o$ action#

    41

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    41/56

    De#igning a Learning .+#tem8

     An E*am3e

    1 ,robem De#cri3tion 8 Ca##i$+ing cancer 

    1 Choo#ing the Training E*3erience8 Data Coection

    K1 Choo#ing the Target :unction > target out3ut8 I"enti$+

    a33ro3riate $unction $or the "ata1 Choo#ing a :unction Agorithm

    P1 De#ign

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    42/56

    Data Coection

    • htt38>>archi0e1ic#1uci1e"u>m>

    • +reast ancer -isconsin '#riginal( .ata Set

    • Attribute Domain ;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; 1 .am3e co"e number i"

    number 1 Cum3 Thickne## ; K1 Uni$ormit+ o$ Ce .i@e ; 1 Uni$ormit+ o$

    Ce .ha3e ; Q1 Margina A"he#ion ; P1 .inge E3itheia Ce .i@e ; J1

    'are Nucei ; 1 'an" Chromatin ; 1 Norma Nuceoi ; 1 Mito#e# ;

    1 Ca##8 & $or benign( $or maignant)

    • Q(Q((((((K((( Q(Q(((Q(J((K((( QQ(K((((((K(((

    PJJ(P((((K((K(J(( JK((((K(((K(((

    J(((((J(((J(( (((((((K(((

    QP(((((((K((( KKJ(((((((((Q( KKJ((((((((((

    KQK(((((((K((( KPJ(((((((((( (Q(K(K(K((K((((K((((((K(K((( QJ((J(Q((J((Q(Q(( JPK(J((P((P(((K((

    PJ(((((((((( Q(((((((K(((

    QPJ((J(J(P(((((( QJ(P((((((K(((

    • Ca## "i#tribution8 'enign8 Q &PQ1QS) Maignant8 &K1QS)

    http://archive.ics.uci.edu/ml/http://archive.ics.uci.edu/ml/http://archive.ics.uci.edu/ml/

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    43/56

    Choo#ing a :unction Agorithm

    • A33+ing to Neura Net!ork• In3ut Data

    • Number o$ a+er#

    • Out3ut Data

    • De#ign• Training Data

    • Te#ting Data

    .te3#

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    44/56

    ,re"iction

    .te3#

    Training

    Labe#Training

    Image#

    Training

    Training

    Image

    :eature#

    Image

    :eature#

    Testing

    Te#t Image

    Learne"

    mo"e

    Learne"

    mo"e

    Slide credit: D. Hoiem and L.

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    45/56

    Cro## 0ai"ation

    • .oution8 k;$o" cro## 0ai"ation ma*imi@e# the u#e o$the "ata1

    • Di0i"e "ata ran"om+ into k $o"# ub#et#) o$ e5ua#i@e1

    • Train the mo"e on k $o"#( u#e one $o" $or te#ting1• Re3eat thi# 3roce## k time# #o that a $o"# are u#e"$or te#ting1

    • Com3ute the a0erage 3er$ormance on the k te#t #et#1

    • Thi# e$$ecti0e+ u#e# a the "ata $or both training an"te#ting1

    • T+3ica+ k 9 i# u#e"1

    • .ometime# #tratie" k;$o" cro## 0ai"ation i# u#e"1

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    46/56

    Cro## 0ai"ation

    • I"enti$+ n “$o"#% o$ the a0aiabe "ata1• Train on n-1 $o"#• Te#t on the remaining $o"1

    • In the e*treme &n=N ) thi# i# kno!n a#“lea"e*one*out% cro## 0ai"ation

    47

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    47/56

    48

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    48/56

    • In k ;$o" cro##;0ai"ation( the origina #am3e i# ran"om+ 3artitione"

    into k  #ub#am3e#1 O$ the k  #ub#am3e#( a #inge #ub#am3e i#

    retaine" a# the 0ai"ation "ata $or te#ting the mo"e( an" the

    remaining k   #ub#am3e# are u#e" a# training "ata1 The cro##;

    0ai"ation 3roce## i# then re3eate" k  time# &the folds)( !ith each o$

    the k  #ub#am3e# u#e" e*act+ once a# the 0ai"ation "ata1 The k  

    re#ut# $rom the $o"# then can be a0erage" &or other!i#e combine")

    to 3ro"uce a #inge e#timation1 The a"0antage o$ thi# metho" o0erre3eate" ran"om #ub;#am3ing i# that a ob#er0ation# are u#e" $or

    both training an" 0ai"ation( an" each ob#er0ation i# u#e" $or

    0ai"ation e*act+ once1 ;$o" cro##;0ai"ation i# common+ u#e"(

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    49/56

    /*&old cross*"alidation

    Thi# i# the #im3e#t 0ariation o$ k ;$o" cro##;0ai"ation1

    :or each $o"( !e ran"om+ a##ign "ata 3oint# to t!o

    #et# d  an" d ( #o that both #et# are e5ua #i@e &thi# i#

    u#ua+ im3emente" a# #hu$$ing the "ata arra+ an"

    then #3itting in t!o)1 /e then train on d  an" te#t ond ( $oo!e" b+ training on d  an" te#ting on d 1

    Thi# ha# the a"0antage that our training an" te#t #et#

    are both arge( an" each "ata 3oint i# u#e" $or both

    training an" 0ai"ation on each $o"1

    50

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    50/56

    Lea"e*one*out cross*"alidation

    • Lea0e;one;out cro##0ai"ation i# #im3+ k;$o"

    cro##0ai"ation !ith k #et to n( the number o$in#tance# in the "ata #et1

    • Thi# mean# that the te#t #et on+ con#i#t# o$ a #inge

    in#tance( !hich !i be ca##ie" either correct+ or

    incorrect+1

    • A"0antage#8 ma*ima u#e o$ training "ata( i1e1(

    training on n in#tance#1 The 3roce"ure i#

    "etermini#tic( no #am3ing in0o0e"1

    • Di#a"0antage#8 un$ea#ibe $or arge "ata #et#8 arge

    number o$ training run# re5uire"( high com3utationaco#t1 Cannot be #tratie" &on+ one ca## in the te#t

    #et)1

    01

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    51/56

    Cro##;0ai"ation 0i#uai@e"

     A0aiabe Labee" Data

    I"enti$+ n 3artition#

    Train Train Train Train De0 Te#t:o"

    0/

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    52/56

    Cro##;0ai"ation 0i#uai@e"

     A0aiabe Labee" Data

    I"enti$+ n 3artition#

    Train Train Train Train De0Te#t:o"

    02

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    53/56

    Cro##;0ai"ation 0i#uai@e"

     A0aiabe Labee" Data

    I"enti$+ n 3artition#

    Train Train Train TrainDe0 Te#t:o" K

    03

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    54/56

    Cro##;0ai"ation 0i#uai@e"

     A0aiabe Labee" Data

    I"enti$+ n 3artition#

    Train Train Train TrainDe0 Te#t:o"

    00

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    55/56

    Cro##;0ai"ation 0i#uai@e"

     A0aiabe Labee" Data

    I"enti$+ n 3artition#

    Train Train Train TrainDe0 Te#t:o" Q

    04

  • 8/16/2019 CSC 3301-Lecture06 Introduction to Machine Learning

    56/56

    Cro##;0ai"ation 0i#uai@e"

     A0aiabe Labee" Data

    I"enti$+ n 3artition#

    Train Train Train TrainDe0 Te#t:o" P

    Cacuate A0erage ,er$ormance