Upload
rashi-agarwal
View
224
Download
0
Embed Size (px)
Citation preview
8/15/2019 Machine learning functionalities
1/58
MachineLearning
Introduction
MachineLearning
Functionalities
8/15/2019 Machine learning functionalities
2/58
Outline
• Introduction
• MiningAssociations
• Classifcation
• NumericPrediction
• Cluster Analysis
• InterestingPatterns
• Reerences
Rules
8/15/2019 Machine learning functionalities
3/58
Introduction
8/15/2019 Machine learning functionalities
4/58
Introduction
• Machine Learning tasks– Descriptive Machine Learning
+ characterize the general properties o the datain the database.
Predictive Machine Learning+ perorm inerence on the current data in order
to mae predictions.
–
8/15/2019 Machine learning functionalities
5/58
Introduction
• !i"erent #ie$s lead to di"erentclassifcations
Machine %earning unctionalities&
o
– Data view& 'inds o data to be mined
+ e.g. numeric( categorical( mi)ed( * +
Knowledge view& 'inds o no$ledgeto be disco#ered
+ e.g. decision tree( classifcation rules( * +
Method view& 'inds o techni,uesutilized
+ e.g. neural net$ors( -M( * +
Application view& 'inds o applicationsadapted
+ e.g. mareting( medicine( rail$ay( * +
–
–
–
8/15/2019 Machine learning functionalities
6/58
Introduction
• Knowledgeview– Machine %earning unctionalities are used to
speciy theind o patterns to beound
• Main unctionalities
indata
mining
tass.
–
–
–
–
Mining AssociationRules
Classiication
Numeric Prediction
Cluster Analysis
8/15/2019 Machine learning functionalities
7/58
Mining AssociationRules
8/15/2019 Machine learning functionalities
8/58
Frequent Patterns
• Frequent patterns arepatterns that
re,uently in data.
• /he inds o re,uent patterns
occur
– Frequent itemsets patterns&
reers to
a set o items
that re,uently appear together in atransactional dataset( such as mil and bread.
Frequent sequential patterns& such as the
pattern that customers tend to purchase irsta PC( ollo$ed by a digital camera( and thena memory card( is a 0re,uent1 se,uentialpattern.
–
• Mining re,uent patterns leads to the disco#eryo
interesting associations and correlations
$ithin data.
8/15/2019 Machine learning functionalities
9/58
Association Rules
• -uppose( as a mareting manager o AllElectronics( you $ould lie to determine$hich items are re,uently purchasedtogether $ithin the same transactions.
• An e)ample o association rule rom the AllElectronics transactional database( is&
– $here 2 is a #ariable representing acustomer.
8/15/2019 Machine learning functionalities
10/58
Association Rules
A confdence( or certaint( o 345 meansthat i a customer buys a computer ( there is a345
chance that she $ill buy software as $ell.A !" rule support means that 65 o allo the transactions under analysissho$ed that
computer and software $ere purchasedtogether.
/his association rule in#ol#es a singleattri#ute
or predicate 0i.e.( buys1 that repeats.
•
•
•
•
8/15/2019 Machine learning functionalities
11/58
Association Rules
• 7e may fnd association ruleslie&
– /he rule indicates that o the AllElectronics customers
understudy( 85 are 84 to 89 years o age $ith an incomeo 84(44489(444 and ha#e purchased a C! player at AllElectronics.
/here is a :45 probability that a customer in thisage and income group $ill purchase a C! player.
to
–
• /his is an association bet$een more thanone
attribute 0i.e.( age( income( and buys1.
• /his is a multidimensionalassociation rule.
8/15/2019 Machine learning functionalities
12/58
Support and confidence of a rule
• ;)ample& < cool days $ith normalhumidity
– -upport = = 8 and conidence >= 935 or$eather data
8/15/2019 Machine learning functionalities
13/58
Interpreting associationrules
• Interpretationis
not
ob#ious&
• is not thesame as
• It means that the ollo$ing
also
holds&
8/15/2019 Machine learning functionalities
14/58
Association Rules
• %arge number o possibleassociations– ?utput needs to be restricted to sho$ only
the mostpredicti#e associations
• Association rules are interesting i they do
satisyboth&– A minimum support threshold% number o
instancespredicted correctly
A minimum confdence threshold% numbero correct predictions( as proportion o allinstances that rule applies to
–
8/15/2019 Machine learning functionalities
15/58
Association Rules
• Association learning is
unsuper#ised
• Association rules usually in#ol#eonly attributes
• Additional analysis can beperormed
nonnumeric
to unco#erinteresting statistical correlations
bet$eenassociated attribute@#alue pairs.
8/15/2019 Machine learning functionalities
16/58
Association Rules
• &'amples o( association rulealgorithms%–
–
–
–
Apriori Algorithm
P@gro$th Algorithm
/ree@ProBection algorithm
;C%A/ 0;,ui#alence C%A-- /ransormation1algorithm
8/15/2019 Machine learning functionalities
17/58
Classification
8/15/2019 Machine learning functionalities
18/58
Classification
•
)lassifcation– Construct models 0unctions1 that describeanddistinguish classes or concepts to predict the
class o obBects $hose class la#el is unno$n• ;)ample&–
–
In $eather problem the play or dont play Budgment
In contact lenses problem the lens
recommendation
• Classifcation learning is
supervised– Process is pro#ided $ith actualoutcome
8/15/2019 Machine learning functionalities
19/58
Training vs. Test Data Set
• *raining dataset– /he deri#ed model is based on the analysis o
a seto( training data 0i.e.( data obBects $hose
class label is no$n1.• *est dataset– Dsing an independent set o test data or
$hich classlabels are no$n but not made a#ailableto the machine.
/his data set is used to e#aluate thesuccess o classiication learning
–
8/15/2019 Machine learning functionalities
20/58
Classification
• ;)amples o classifcationoutput&–
–
–
Decision tree
)lassifcationrules
+euralnetworks
8/15/2019 Machine learning functionalities
21/58
Decision trees
• E!i#ide and con,uerF approach producesdecision tree
• Nodes in#ol#e testing a particularattribute
• Dsually( attribute #alue is compared toconstant
• ?ther possibilities&–
–
Comparing #alues o t$o attributes
Dsing a unction o one or moreattributes
8/15/2019 Machine learning functionalities
22/58
Decision trees
• %ea nodes– gi#e a classiication that applies to all
instances thatreach the lea
set o classiicationsprobability distribution o#er all possibleclassiications
classiy an unno$n instance(
it is routed do$n the tree according to the#alues o the attributes tested in successi#enodes( and
$hen a lea is reached the instance isclassiied according to the class assigned
to the lea.
–
–•
/o
–
–
8/15/2019 Machine learning functionalities
23/58
8/15/2019 Machine learning functionalities
24/58
Decision tree for t#e la"or data
8/15/2019 Machine learning functionalities
25/58
Classification rules
• Popular alternati#e to decisiontrees
• Rules include t$o parts&– Antecedent or precondition&+ a series o tests Bust lie the tests at
the nodes tree+ /ests are usually logically AN!edtogether
+ All the tests must succeed i the rule is tofre
)onsequent or conclusion&
o a
decision
–
+ /he class or set o classes or probability distributionassigned
by rule
• ;)ample& A rule rom contact lens problem
8/15/2019 Machine learning functionalities
26/58
Fro! trees to rules
• ;asy& con#erting a tree into a set orules– ?ne rule or each lea&
+ Antecedent contains a condition or e#ery nodeon the path rom the root to the lea
+ Conse,uent is class assigned by the lea
• Produces rules that are #eryclear– !oesnt matter in $hich order they are
e)ecuted• Gut& resulting rules are unnecessarily
comple)– It needs to remo#e redundanttestsHrules
8/15/2019 Machine learning functionalities
27/58
Fro! rules to trees
• More dicult& transorming a rule set intoa tree– /ree cannot easily e)press disBunction
bet$een rules• ;)ample& rules $hich test di"erentattributes
8/15/2019 Machine learning functionalities
28/58
Fro! rules to trees
*he e'clusive$orpro#lem
8/15/2019 Machine learning functionalities
29/58
A tree $it# a replicated su"tree
• I it is possible to ha#e a EdeaultF rule thatco#ers
cases not specifed by the other rules(rules are much more compact than trees
• /here are our attributes( x, y, z, and w,each can be 6( 8( or J
8/15/2019 Machine learning functionalities
30/58
A tree $it# a replicated su"tree
• Replicated su#tree pro#lem% tree containsidentical
subtrees
8/15/2019 Machine learning functionalities
31/58
%&ecuting a rule set
• /$o $ays o e)ecuting a ruleset&– ,rdered set o( rules 0Edecision listF1
+ ?rder is important or interpretation
-nordered set o( rules
+ Rules may o#erlap and lead to di"erentconclusions same instance
–
or the
8/15/2019 Machine learning functionalities
32/58
eural et$or'
• A neural network ( $hen used or classiication(is
typically a collection o neuron@lie processingunits $ith $eighted connections bet$een theunits.
8/15/2019 Machine learning functionalities
33/58
Classification Tec#niques
• &'amples o(classifcation
techniques%–
–
–
–
–
–
!ecision /rees
Classiication Rules
Neural Net$or
NaK#e Gayesianclassiication
-upport #ector machines
@nearest neighbor
classiication
8/15/2019 Machine learning functionalities
34/58
Classification vs. Association Anal(sis
• !i"erence association analysis to
classifcationlearning&
–
–
–
Can predict any attributes #alue( not Bustthe class
More than one attributes #alue at a time
/here are more association rules thanclassiication rules
8/15/2019 Machine learning functionalities
35/58
+umericPrediction
8/15/2019 Machine learning functionalities
36/58
u!eric prediction
• +umericprediction&– predicting a numeric
,uantity• Numeric prediction is a #ariant oclassifcation
learning in $hich the outcome is anumeric #alue rather than a category.
• %earning is supervised– Process is being pro#ided $ith target#alue
•
Measure success on testdata
8/15/2019 Machine learning functionalities
37/58
u!eric prediction
• A #ersion o the $eather data in $hich $hat isto be
predicted is the time 0in minutes1 to play
8/15/2019 Machine learning functionalities
38/58
u!eric prediction
• /o fnd the important attributes and ho$
theyrelate to the numeric outcome is moreimportant than predicting #alue or ne$instances.
8/15/2019 Machine learning functionalities
39/58
u!eric prediction
• Representing numericprediction&– Linear regression equation& an e,uation to
predictsa numeric ,uantity
Regression tree& a decision tree $hereeach lea predicts a numeric ,uantity
+ Predicted #alue is a#erage #alue o traininginstances that reach the lea
Model tree& a regression tree $ith linear
regression models at the lea nodes
–
–
8/15/2019 Machine learning functionalities
40/58
)inear regression equation
• %inear regression e,uation or CPDdata
8/15/2019 Machine learning functionalities
41/58
Regression tree
• Regression tree or the CPDdata
8/15/2019 Machine learning functionalities
42/58
Regression tree
• 7e calculate the a#erage o the absolute#alues
o the errors bet$een the predictedand the actual CPD perormance
measures(• It turns out to be signifcantly less or
the tree than or the regressione,uation.
8/15/2019 Machine learning functionalities
43/58
*odel tree
• Model tree or the CPDdata
8/15/2019 Machine learning functionalities
44/58
Cluster Anal(sis
8/15/2019 Machine learning functionalities
45/58
Cluster Anal(sis
•)lustering– grouping similar instances into
clusters• Clustering isunsupervised– /he class o an e)ample is not
no$n• ;)ample&
– a #ersion o the iris data in $hichthe typeomitted
o iris is
– /hen it is liely that the 634 instances all into
naturalclusters corresponding to the three iris types.
8/15/2019 Machine learning functionalities
46/58
Iris data as a clustering pro"le!
8/15/2019 Machine learning functionalities
47/58
Cluster Anal(sis
• A 8@! plot o customer data $ith respect to customer locationsin a city(sho$ing three data clusters. ;ach cluster EcenterF is mared$ith a ELF.
8/15/2019 Machine learning functionalities
48/58
Clustering
• Clustering may be ollo$ed by a secondstep o
classifcation learning in $hich rules arelearned that gi#e an intelligible
description o ho$ ne$ instances shouldbe placed into the clusters.
8/15/2019 Machine learning functionalities
49/58
Representing clusters
• /he output taes the orm o a diagram thatsho$s
ho$ the instances all into clusters.
• !i"erent cases&.imple /D representation: in#ol#es
associating a cluster number $ith eachinstance
0enn diagram: allo$ one instance tobelong to more than one cluster
Pro#a#ilistic assignment: associate
instances $ith clusters probabilistically
Dendrogram: produces a hierarchicalstructure o clusters 0dendron is the ree$ord or tree1
–
–
–
–
8/15/2019 Machine learning functionalities
50/58
Representing clusters
8/15/2019 Machine learning functionalities
51/58
Representing clusters
8/15/2019 Machine learning functionalities
52/58
8/15/2019 Machine learning functionalities
53/58
InterestingPatterns
8/15/2019 Machine learning functionalities
54/58
8/15/2019 Machine learning functionalities
55/58
Interesting Patterns
• ?bBecti#e #s. subBecti#e interestingnessmeasures
– ?bBecti#e& based on statistics andstructurespatterns( e.g.( support( conidence(etc.
o
– -ubBecti#e& based on users belie in thedata( e.g.(une)pectedness( no#elty( actionability( etc.
8/15/2019 Machine learning functionalities
56/58
References
8/15/2019 Machine learning functionalities
57/58
References
• Ian . 7itten and ;ibe ran( MachineLearning%
Practical Machine Learning *oolsand *echniques( 8nd ;dition( ;lse#ier
Inc.( 8443. 0Chapter 8 * J1• O. an( M. 'amber( Machine Learning%)oncepts and
*echniques( ;lse#ier Inc. 844:. 0Chapter
61
8/15/2019 Machine learning functionalities
58/58
/he
end