Upload
talon-griffith
View
22
Download
1
Tags:
Embed Size (px)
DESCRIPTION
social networks analysis seminar introductory lecture #2. Danny Hendler and Yehonatan Cohen Advanced Topics in on-line Social Networks Analysis. Seminar schedule. Introductory lecture #1. 5/3/14. 10/3/14. Papers list published, students send their 3 preferences. 12/3/14. - PowerPoint PPT Presentation
Citation preview
social networks analysis seminarintroductory lecture #2Danny Hendler and Yehonatan CohenAdvanced Topics in on-line Social Networks Analysis1Introductory lecture #15/3/14No seminar (Purim!)Semesterends12/3/14Introductory lecture #2Papers list published, students send their 3 preferences14/3/1411 weeks of Student talks19/3/14Student talks startAll students preferences must be received10/3/1426/3/14Seminar schedule2Nodes centralityDegreeClosenessBetweennessMachine-learningTalk outline3Name the most central/significant node:12345678910111213Nodes centrality4Name the most central/significant node:12345678910111213Nodes centrality5Nodes centralityWhat makes a node central?Number of connectionsIt is central if it disconnects the graphHigh number of paths passing through the nodeProximity to all other nodesCentral node is the one whose neighbors are central
6Detection of the most popular actor in a network Spamming / AdvertisingNetwork vulnerability Health care / EpidemicsClustering similar structural positions Recommendation systemsNodes centrality: Applications7Nodes centrality: Degree8Name the most central/significant node:123456789Nodes centrality: Degree9Nodes centrality: Degree12345678910111213DegreeNode443637383931021121210Nodes centrality: Closeness (Reach)11Nodes centrality: Closeness (Reach)12345678910111213ReachDegreeNode5.84445.93366.12375.75385.25395.1831021121212Nodes centrality: Betweenness13Nodes centrality: Beetweenness12345678910111213BetweennessReachNode605.844785.936726.127435.758155.259415.1810111214Nodes centralityMachine LearningThe learning processClassificationEvaluationTalk outline15Herbert Alexander Simon: Learning is any process by which a system improves performance from experience.Machine Learning is concerned with computer programs that automatically improve their performance through experience.
Herbert Simon Turing Award 1975Nobel Prize in Economics 1978
Machine Learning16Learning = Improving with experience at some task Improve over task T,With respect to performance measure, PBased on experience, E.
Herbert Simon Turing Award 1975Nobel Prize in Economics 1978
Machine Learning17Machine Learning
Example: Spam FilteringT: Identify Spam EmailsP: % of spam emails that were filtered% of ham/ (non-spam) emails that were incorrectly filtered-outE: a database of emails that were labelled by users i.e. Feedback on emails:Move to Spam , Move to Inbox
18Machine Learning
Applications?19Machine Learning: The learning process
Model LearningModel Testing20Machine Learning: The learning process
Email Server Content of the email Number of recipients Size of message Number of attachments Number of "re's" in the subject lineModel LearningModel Testing21From e-mails to feature vectors:Textual-Based Content Features:Email is tokenizedEach token is a feature
Meta-Features:Number of recipients Size of messageMachine Learning: The learning process22Machine Learning: The learning processEmail TypeFree. . .LotteryViagraHam010Ham101Spam000Spam111Ham000Ham110Spam001VocabularyTarget Attribute
InstancesBinary23Machine Learning: The learning processEmail TypeCustomer TypeCountry (IP)Email Length (K)Number of new RecipientsHamGoldGermany20HamSilverGermany41SpamBronzeNigeria25SpamBronzeRussia42HamBronzeGermany43HamSilverUSA10SpamSilverUSA24Input AttributesTarget Attribute
InstancesNumericNominalOrdinal
24Machine Learning: Model learning
LearnerClassifier25Machine Learning: Model testingDatabaseTraining Set
Learner
26Machine Learning: Decision trees
categoricalcategoricalcontinuousclassTraining Data27Machine Learning: Decision trees
categoricalcategoricalcontinuousclassRefundYesSplitting AttributeTraining DataModel: Decision Tree28Machine Learning: Decision trees
categoricalcategoricalcontinuousclassRefundNOYesSplitting AttributeTraining DataModel: Decision Tree29Machine Learning: Decision trees
categoricalcategoricalcontinuousclassRefundMarStNOYesNoMarried Splitting AttributesTraining DataModel: Decision Tree30Machine Learning: Decision trees
categoricalcategoricalcontinuousclassRefundMarStNOYesNoMarried Splitting AttributesTraining DataModel: Decision Tree31Machine Learning: Decision trees
categoricalcategoricalcontinuousclassRefundMarStNOYesNoMarried Splitting AttributesTraining DataModel: Decision TreeNO32Machine Learning: Decision trees
categoricalcategoricalcontinuousclassRefundMarStNOYesNoMarried Single, DivorcedSplitting AttributesTraining DataModel: Decision TreeNO33Machine Learning: Decision trees
categoricalcategoricalcontinuousclassRefundMarStNOYesNoMarried Single, DivorcedSplitting AttributesTraining DataModel: Decision TreeNOTaxInc> 80K34Machine Learning: Decision trees
categoricalcategoricalcontinuousclassRefundMarStNOYesNoMarried Single, DivorcedSplitting AttributesTraining DataModel: Decision TreeNOTaxInc> 80KYES35Machine Learning: Decision trees
categoricalcategoricalcontinuousclassRefundMarStNOYesNoMarried Single, DivorcedSplitting AttributesTraining DataModel: Decision TreeNOTaxInc> 80KYES36Machine Learning: Decision trees
categoricalcategoricalcontinuousclassRefundMarStNOYesNoMarried Single, DivorcedSplitting AttributesTraining DataModel: Decision TreeNOTaxInc> 80KYES< 80K37Machine Learning: Decision trees
categoricalcategoricalcontinuousclassRefundMarStNOYesNoMarried Single, DivorcedSplitting AttributesTraining DataModel: Decision TreeNOTaxInc> 80KYESNO< 80K38Machine Learning: ClassificationBinary classification(Instances, Class labels): (x1, y1), (x2, y2), ..., (xn, yn)yi {1,-1} - valuedClassifier: provides class prediction for an instanceOutcomes for a prediction:
1-11True positive (TP)False positive(FP)-1False negative(FP)True negative(TN)True classPredictedclass39Machine Learning: ClassificationP( = Y): accuracyP( = 1 | Y = 1): true positive rateP( = 1 | Y = -1): false positive rateP(Y = 1 | = 1): precision
1-11True positive (TP)False positive(FP)-1False negative(FP)True negative(TN)True classPredictedclass40Machine Learning: ClassificationConsider diagnostic test for a diseaseTest has 2 possible outcomes:positive = suggesting presence of disease negative An individual can test either positive or negative for the disease41Machine Learning: ClassificationTest ResultIndividuals with diseaseIndividuals without the disease42Machine Learning: ClassificationTest ResultCall these patients negativeCall these patients positive43Machine Learning: ClassificationTest ResultCall these patients negativeCall these patients positivewithout the diseasewith the diseaseTrue Positives44Machine Learning: ClassificationTest ResultCall these patients negativeCall these patients positivewithout the diseasewith the diseaseFalse Positives45Machine Learning: ClassificationTest ResultCall these patients negativeCall these patients positivewithout the diseasewith the diseaseTrue negatives46Machine Learning: ClassificationTest ResultCall these patients negativeCall these patients positivewithout the diseasewith the diseaseFalse negatives47Machine Learning: Cross-ValidationWhat if we dont have enough data to set aside a test dataset?Cross-Validation:Each data point is used both as train and test data.Basic idea:Fit model on 90% of the data; test on other 10%.Now do this on a different 90/10 split.Cycle through all 10 cases.10 folds a common rule of thumb.
48Machine Learning: Cross-ValidationDivide data into 10 equal pieces P1P10.Fit 10 models, each on 90% of the data.Each data point is treated as an out-of-sample data point by exactly one of the models.
49TidRefundMarital
StatusTaxable
IncomeCheat
1YesSingle125KNo
2NoMarried100KNo
3NoSingle70KNo
4YesMarried120KNo
5NoDivorced95KYes
6NoMarried60KNo
7YesDivorced220KNo
8NoSingle85KYes
9NoMarried75KNo
10NoSingle90KYes
10
TidRefundMarital
StatusTaxable
IncomeCheat
1YesSingle125KNo
2NoMarried100KNo
3NoSingle70KNo
4YesMarried120KNo
5NoDivorced95KYes
6NoMarried60KNo
7YesDivorced220KNo
8NoSingle85KYes
9NoMarried75KNo
10NoSingle90KYes
10
TidRefundMarital
StatusTaxable
IncomeCheat
1YesSingle125KNo
2NoMarried100KNo
3NoSingle70KNo
4YesMarried120KNo
5NoDivorced95KYes
6NoMarried60KNo
7YesDivorced220KNo
8NoSingle85KYes
9NoMarried75KNo
10NoSingle90KYes
10
TidRefundMarital
StatusTaxable
IncomeCheat
1YesSingle125KNo
2NoMarried100KNo
3NoSingle70KNo
4YesMarried120KNo
5NoDivorced95KYes
6NoMarried60KNo
7YesDivorced220KNo
8NoSingle85KYes
9NoMarried75KNo
10NoSingle90KYes
10
TidRefundMarital
StatusTaxable
IncomeCheat
1YesSingle125KNo
2NoMarried100KNo
3NoSingle70KNo
4YesMarried120KNo
5NoDivorced95KYes
6NoMarried60KNo
7YesDivorced220KNo
8NoSingle85KYes
9NoMarried75KNo
10NoSingle90KYes
10
TidRefundMarital
StatusTaxable
IncomeCheat
1YesSingle125KNo
2NoMarried100KNo
3NoSingle70KNo
4YesMarried120KNo
5NoDivorced95KYes
6NoMarried60KNo
7YesDivorced220KNo
8NoSingle85KYes
9NoMarried75KNo
10NoSingle90KYes
10
TidRefundMarital
StatusTaxable
IncomeCheat
1YesSingle125KNo
2NoMarried100KNo
3NoSingle70KNo
4YesMarried120KNo
5NoDivorced95KYes
6NoMarried60KNo
7YesDivorced220KNo
8NoSingle85KYes
9NoMarried75KNo
10NoSingle90KYes
10
TidRefundMarital
StatusTaxable
IncomeCheat
1YesSingle125KNo
2NoMarried100KNo
3NoSingle70KNo
4YesMarried120KNo
5NoDivorced95KYes
6NoMarried60KNo
7YesDivorced220KNo
8NoSingle85KYes
9NoMarried75KNo
10NoSingle90KYes
10
TidRefundMarital
StatusTaxable
IncomeCheat
1YesSingle125KNo
2NoMarried100KNo
3NoSingle70KNo
4YesMarried120KNo
5NoDivorced95KYes
6NoMarried60KNo
7YesDivorced220KNo
8NoSingle85KYes
9NoMarried75KNo
10NoSingle90KYes
10
TidRefundMarital
StatusTaxable
IncomeCheat
1YesSingle125KNo
2NoMarried100KNo
3NoSingle70KNo
4YesMarried120KNo
5NoDivorced95KYes
6NoMarried60KNo
7YesDivorced220KNo
8NoSingle85KYes
9NoMarried75KNo
10NoSingle90KYes
10
TidRefundMarital
StatusTaxable
IncomeCheat
1YesSingle125KNo
2NoMarried100KNo
3NoSingle70KNo
4YesMarried120KNo
5NoDivorced95KYes
6NoMarried60KNo
7YesDivorced220KNo
8NoSingle85KYes
9NoMarried75KNo
10NoSingle90KYes
10
TidRefundMarital
StatusTaxable
IncomeCheat
1YesSingle125KNo
2NoMarried100KNo
3NoSingle70KNo
4YesMarried120KNo
5NoDivorced95KYes
6NoMarried60KNo
7YesDivorced220KNo
8NoSingle85KYes
9NoMarried75KNo
10NoSingle90KYes
10