Machine learning functionalities

Embed Size (px)

Citation preview

  • 8/15/2019 Machine learning functionalities

    1/58

    MachineLearning

    Introduction

    MachineLearning

    Functionalities

  • 8/15/2019 Machine learning functionalities

    2/58

    Outline

    • Introduction

    • MiningAssociations

    • Classifcation

    • NumericPrediction

    • Cluster Analysis

    • InterestingPatterns

    • Reerences

    Rules

  • 8/15/2019 Machine learning functionalities

    3/58

    Introduction

  • 8/15/2019 Machine learning functionalities

    4/58

    Introduction

    • Machine Learning tasks– Descriptive Machine Learning

    + characterize the general properties o the datain the database.

    Predictive Machine Learning+ perorm inerence on the current data in order

    to mae predictions.

  • 8/15/2019 Machine learning functionalities

    5/58

    Introduction

    • !i"erent #ie$s lead to di"erentclassifcations

    Machine %earning unctionalities&

    – Data view& 'inds o data to be mined

    + e.g. numeric( categorical( mi)ed( * +

    Knowledge view& 'inds o no$ledgeto be disco#ered

    + e.g. decision tree( classifcation rules( * +

    Method view& 'inds o techni,uesutilized

    + e.g. neural net$ors( -M( * +

    Application view& 'inds o applicationsadapted

    + e.g. mareting( medicine( rail$ay( * +

  • 8/15/2019 Machine learning functionalities

    6/58

    Introduction

    • Knowledgeview– Machine %earning unctionalities are used to

    speciy theind o patterns to beound

    • Main unctionalities

    indata

    mining

    tass.

    Mining AssociationRules

    Classiication

    Numeric Prediction

    Cluster Analysis

  • 8/15/2019 Machine learning functionalities

    7/58

    Mining AssociationRules

  • 8/15/2019 Machine learning functionalities

    8/58

    Frequent Patterns

    • Frequent patterns arepatterns that

    re,uently in data.

    • /he inds o re,uent patterns

    occur

    – Frequent itemsets patterns&

    reers to

    a set o items

    that re,uently appear together in atransactional dataset( such as mil and bread.

    Frequent sequential patterns& such as the

    pattern that customers tend to purchase irsta PC( ollo$ed by a digital camera( and thena memory card( is a 0re,uent1 se,uentialpattern.

    • Mining re,uent patterns leads to the disco#eryo 

    interesting associations and correlations

    $ithin data.

  • 8/15/2019 Machine learning functionalities

    9/58

    Association Rules

    • -uppose( as a mareting manager o  AllElectronics( you $ould lie to determine$hich items are re,uently purchasedtogether $ithin the same transactions.

    • An e)ample o association rule rom the AllElectronics transactional database( is&

    – $here 2 is a #ariable representing acustomer.

  • 8/15/2019 Machine learning functionalities

    10/58

    Association Rules

    A confdence( or certaint( o 345 meansthat i a customer buys a computer ( there is a345

    chance that she $ill buy software as $ell.A !" rule support means that 65 o allo the transactions under analysissho$ed that

    computer and software $ere purchasedtogether.

     /his association rule in#ol#es a singleattri#ute

    or predicate 0i.e.( buys1 that repeats.

  • 8/15/2019 Machine learning functionalities

    11/58

    Association Rules

    • 7e may fnd association ruleslie&

    –  /he rule indicates that o the AllElectronics customers

    understudy( 85 are 84 to 89 years o age $ith an incomeo 84(44489(444 and ha#e purchased a C! player at AllElectronics.

     /here is a :45 probability that a customer in thisage and income group $ill purchase a C! player.

    to

    • /his is an association bet$een more thanone

    attribute 0i.e.( age( income( and buys1.

    • /his is a multidimensionalassociation rule.

  • 8/15/2019 Machine learning functionalities

    12/58

    Support and confidence of a rule

    • ;)ample& < cool days $ith normalhumidity

    – -upport = = 8 and conidence >= 935 or$eather data

  • 8/15/2019 Machine learning functionalities

    13/58

    Interpreting associationrules

    • Interpretationis

    not

    ob#ious&

    • is not thesame as

    • It means that the ollo$ing

    also

    holds&

  • 8/15/2019 Machine learning functionalities

    14/58

    Association Rules

    • %arge number o possibleassociations– ?utput needs to be restricted to sho$ only

    the mostpredicti#e associations

    • Association rules are interesting i they do

    satisyboth&– A minimum support threshold% number o

    instancespredicted correctly

    A minimum confdence threshold% numbero correct predictions( as proportion o allinstances that rule applies to

  • 8/15/2019 Machine learning functionalities

    15/58

    Association Rules

    • Association learning is

    unsuper#ised

    • Association rules usually in#ol#eonly attributes

    • Additional analysis can beperormed

    nonnumeric

    to unco#erinteresting statistical correlations

    bet$eenassociated attribute@#alue pairs.

  • 8/15/2019 Machine learning functionalities

    16/58

    Association Rules

    • &'amples o( association rulealgorithms%–

    Apriori Algorithm

    P@gro$th Algorithm

     /ree@ProBection algorithm

    ;C%A/ 0;,ui#alence C%A-- /ransormation1algorithm

  • 8/15/2019 Machine learning functionalities

    17/58

    Classification

  • 8/15/2019 Machine learning functionalities

    18/58

    Classification

    )lassifcation– Construct models 0unctions1 that describeanddistinguish classes or concepts to predict the

    class o obBects $hose class la#el is unno$n• ;)ample&–

    In $eather problem the play or dont play Budgment

    In contact lenses problem the lens

    recommendation

    • Classifcation learning is

    supervised– Process is pro#ided $ith actualoutcome

  • 8/15/2019 Machine learning functionalities

    19/58

    Training vs. Test Data Set

    • *raining dataset–  /he deri#ed model is based on the analysis o

    a seto( training data 0i.e.( data obBects $hose

    class label is no$n1.• *est dataset– Dsing an independent set o test data or

    $hich classlabels are no$n but not made a#ailableto the machine.

     /his data set is used to e#aluate thesuccess o classiication learning

  • 8/15/2019 Machine learning functionalities

    20/58

    Classification

    • ;)amples o classifcationoutput&–

    Decision tree

    )lassifcationrules

    +euralnetworks

  • 8/15/2019 Machine learning functionalities

    21/58

    Decision trees

    • E!i#ide and con,uerF approach producesdecision tree

    • Nodes in#ol#e testing a particularattribute

    • Dsually( attribute #alue is compared toconstant

    • ?ther possibilities&–

    Comparing #alues o t$o attributes

    Dsing a unction o one or moreattributes

  • 8/15/2019 Machine learning functionalities

    22/58

    Decision trees

    • %ea nodes– gi#e a classiication that applies to all

    instances thatreach the lea 

    set o classiicationsprobability distribution o#er all possibleclassiications

    classiy an unno$n instance(

    it is routed do$n the tree according to the#alues o the attributes tested in successi#enodes( and

    $hen a lea is reached the instance isclassiied according to the class assigned

    to the lea.

    –•

     /o

  • 8/15/2019 Machine learning functionalities

    23/58

  • 8/15/2019 Machine learning functionalities

    24/58

    Decision tree for t#e la"or data

  • 8/15/2019 Machine learning functionalities

    25/58

    Classification rules

    • Popular alternati#e to decisiontrees

    • Rules include t$o parts&– Antecedent or precondition&+ a series o tests Bust lie the tests at

    the nodes tree+  /ests are usually logically AN!edtogether

    + All the tests must succeed i the rule is tofre

    )onsequent or conclusion&

    o a

    decision

    +  /he class or set o classes or probability distributionassigned

    by rule

    • ;)ample& A rule rom contact lens problem

  • 8/15/2019 Machine learning functionalities

    26/58

    Fro! trees to rules

    • ;asy& con#erting a tree into a set orules– ?ne rule or each lea&

    + Antecedent contains a condition or e#ery nodeon the path rom the root to the lea 

    + Conse,uent is class assigned by the lea 

    • Produces rules that are #eryclear– !oesnt matter in $hich order they are

    e)ecuted• Gut& resulting rules are unnecessarily

    comple)– It needs to remo#e redundanttestsHrules

  • 8/15/2019 Machine learning functionalities

    27/58

    Fro! rules to trees

    • More dicult& transorming a rule set intoa tree–  /ree cannot easily e)press disBunction

    bet$een rules• ;)ample& rules $hich test di"erentattributes

  • 8/15/2019 Machine learning functionalities

    28/58

    Fro! rules to trees

    *he e'clusive$orpro#lem

  • 8/15/2019 Machine learning functionalities

    29/58

    A tree $it# a replicated su"tree

    • I it is possible to ha#e a EdeaultF rule thatco#ers

    cases not specifed by the other rules(rules are much more compact than trees

    • /here are our attributes( x, y, z, and w,each can be 6( 8( or J

  • 8/15/2019 Machine learning functionalities

    30/58

    A tree $it# a replicated su"tree

    • Replicated su#tree pro#lem% tree containsidentical

    subtrees

  • 8/15/2019 Machine learning functionalities

    31/58

    %&ecuting a rule set

    • /$o $ays o e)ecuting a ruleset&– ,rdered set o( rules 0Edecision listF1

    + ?rder is important or interpretation

    -nordered set o( rules

    + Rules may o#erlap and lead to di"erentconclusions same instance

    or the

  • 8/15/2019 Machine learning functionalities

    32/58

     eural et$or' 

    • A neural network ( $hen used or classiication(is

    typically a collection o neuron@lie processingunits $ith $eighted connections bet$een theunits.

  • 8/15/2019 Machine learning functionalities

    33/58

    Classification Tec#niques

    • &'amples o(classifcation

    techniques%–

    !ecision /rees

    Classiication Rules

    Neural Net$or

    NaK#e Gayesianclassiication

    -upport #ector machines

    @nearest neighbor

    classiication

  • 8/15/2019 Machine learning functionalities

    34/58

    Classification vs. Association Anal(sis

    • !i"erence association analysis to

    classifcationlearning&

    Can predict any attributes #alue( not Bustthe class

    More than one attributes #alue at a time

     /here are more association rules thanclassiication rules

  • 8/15/2019 Machine learning functionalities

    35/58

    +umericPrediction

  • 8/15/2019 Machine learning functionalities

    36/58

     u!eric prediction

    • +umericprediction&– predicting a numeric

    ,uantity• Numeric prediction is a #ariant oclassifcation

    learning in $hich the outcome is anumeric #alue rather than a category.

    • %earning is supervised– Process is being pro#ided $ith target#alue

    Measure success on testdata

  • 8/15/2019 Machine learning functionalities

    37/58

     u!eric prediction

    • A #ersion o the $eather data in $hich $hat isto be

    predicted is the time 0in minutes1 to play

  • 8/15/2019 Machine learning functionalities

    38/58

     u!eric prediction

    • /o fnd the important attributes and ho$

    theyrelate to the numeric outcome is moreimportant than predicting #alue or ne$instances.

  • 8/15/2019 Machine learning functionalities

    39/58

     u!eric prediction

    • Representing numericprediction&– Linear regression equation& an e,uation to

    predictsa numeric ,uantity

    Regression tree& a decision tree $hereeach lea predicts a numeric ,uantity

    + Predicted #alue is a#erage #alue o traininginstances that reach the lea 

    Model tree& a regression tree $ith linear

    regression models at the lea nodes

  • 8/15/2019 Machine learning functionalities

    40/58

    )inear regression equation

    • %inear regression e,uation or CPDdata

  • 8/15/2019 Machine learning functionalities

    41/58

    Regression tree

    • Regression tree or the CPDdata

  • 8/15/2019 Machine learning functionalities

    42/58

    Regression tree

    • 7e calculate the a#erage o the absolute#alues

    o the errors bet$een the predictedand the actual CPD perormance

    measures(• It turns out to be signifcantly less or

    the tree than or the regressione,uation.

  • 8/15/2019 Machine learning functionalities

    43/58

    *odel tree

    • Model tree or the CPDdata

  • 8/15/2019 Machine learning functionalities

    44/58

    Cluster Anal(sis

  • 8/15/2019 Machine learning functionalities

    45/58

    Cluster Anal(sis

    •)lustering– grouping similar instances into

    clusters• Clustering isunsupervised–  /he class o an e)ample is not

    no$n• ;)ample&

    – a #ersion o the iris data in $hichthe typeomitted

    o iris is

    –  /hen it is liely that the 634 instances all into

    naturalclusters corresponding to the three iris types.

  • 8/15/2019 Machine learning functionalities

    46/58

    Iris data as a clustering pro"le!

  • 8/15/2019 Machine learning functionalities

    47/58

    Cluster Anal(sis

    • A 8@! plot o customer data $ith respect to customer locationsin a city(sho$ing three data clusters. ;ach cluster EcenterF is mared$ith a ELF.

  • 8/15/2019 Machine learning functionalities

    48/58

    Clustering

    • Clustering may be ollo$ed by a secondstep o 

    classifcation learning in $hich rules arelearned that gi#e an intelligible

    description o ho$ ne$ instances shouldbe placed into the clusters.

  • 8/15/2019 Machine learning functionalities

    49/58

    Representing clusters

    • /he output taes the orm o a diagram thatsho$s

    ho$ the instances all into clusters.

    • !i"erent cases&.imple /D representation: in#ol#es

    associating a cluster number $ith eachinstance

    0enn diagram: allo$ one instance tobelong to more than one cluster

    Pro#a#ilistic assignment: associate

    instances $ith clusters probabilistically

    Dendrogram: produces a hierarchicalstructure o clusters 0dendron is the ree$ord or tree1

  • 8/15/2019 Machine learning functionalities

    50/58

    Representing clusters

  • 8/15/2019 Machine learning functionalities

    51/58

    Representing clusters

  • 8/15/2019 Machine learning functionalities

    52/58

  • 8/15/2019 Machine learning functionalities

    53/58

    InterestingPatterns

  • 8/15/2019 Machine learning functionalities

    54/58

  • 8/15/2019 Machine learning functionalities

    55/58

    Interesting Patterns

    • ?bBecti#e #s. subBecti#e interestingnessmeasures

    – ?bBecti#e& based on statistics andstructurespatterns( e.g.( support( conidence(etc.

    – -ubBecti#e& based on users belie in thedata( e.g.(une)pectedness( no#elty( actionability( etc.

  • 8/15/2019 Machine learning functionalities

    56/58

    References

  • 8/15/2019 Machine learning functionalities

    57/58

    References

    • Ian . 7itten and ;ibe ran( MachineLearning%

    Practical Machine Learning *oolsand *echniques( 8nd ;dition( ;lse#ier

    Inc.( 8443. 0Chapter 8 * J1• O. an( M. 'amber( Machine Learning%)oncepts and

    *echniques( ;lse#ier Inc. 844:. 0Chapter

    61

  • 8/15/2019 Machine learning functionalities

    58/58

     /he

    end