36
CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation Presenter Prashant Chandrasekar {peecee}@vt.edu Instructor Dr. Edward A. Fox Virginia Polytechnic Institute and State University Blacksburg, VA, 24061 May 2, 2017

CS6604 Digital Libraries Social Communities Knowledge ...€¦ · CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation

  • Upload
    others

  • View
    16

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS6604 Digital Libraries Social Communities Knowledge ...€¦ · CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation

CS6604 Digital LibrariesSocial Communities Knowledge Management:

Social Interactome

Final Term Project PresentationPresenter

Prashant Chandrasekar{peecee}@vt.edu

InstructorDr. Edward A. Fox

Virginia Polytechnic Institute and State UniversityBlacksburg, VA, 24061

May 2, 2017

Page 2: CS6604 Digital Libraries Social Communities Knowledge ...€¦ · CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation

Final Presentation

• Dr.EdwardA.Fox• Globaleventsteam• SocialInteractometeam

• TheSocialInteractomeofRecovery:SocialMediaasTherapyDevelopment(NIHGrant1R01DA039456-01)

• XuanZhangandYufengMa• MostafaMohammed

Acknowledgements

2

Page 3: CS6604 Digital Libraries Social Communities Knowledge ...€¦ · CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation

Final Presentation

• Background• Socialnetworkcommunity;SocialInteractome• Data

• Challenges• Goal• Approaches• NetworkClassification• LearningviaMarkovLogicNetworks• FutureWork

Outline

3

Page 4: CS6604 Digital Libraries Social Communities Knowledge ...€¦ · CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation

Final Presentation

• SocialInteractome• NIH-fundedprojectconductedbyateamofresearchers• Studythecommunityofpeople,whoarerecoveringfromaddiction

• Studytheirinteractionsinanonlinesocialnetwork,builttoprovidesupportandmanagementoftheirrecovery

• Theprojectisbrokendownintosetof“testvs.control”experimentswithvariablesdefined:

• Durationofstudy• Numberofparticipantsrequired• Avenueofrecruitment• Nullandalternativehypotheses

Background: Social Interactome (SI)

4

Page 5: CS6604 Digital Libraries Social Communities Knowledge ...€¦ · CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation

Final Presentation

• Theprojectisbrokendownintoasetofclinicaltrials.Foreachclinicaltrial:

• Theteamdecidesonasetofnullandalternativehypothesesandthedurationofthetrial

• Recruitsparticipantsforthetrial• Organizestheparticipantsintooneoftwo(ormore)128-nodesocialnetwork

• Participantsinteractwiththewebsiteandtheirassignedfriends

• Two16-weekclinicaltrialshavebeencompleted.AlongwithasetofsmallerscaledtrialsexecutedviaAmazonMturk

Background: SI Setup

5

Page 6: CS6604 Digital Libraries Social Communities Knowledge ...€¦ · CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation

Final Presentation

Participants

Demographic Family'shistorywithaddiction

PastSocialnetworkexperience

PrimaryAddiction

SecondaryAddiction

AddictionSeverityIndex

SocialConnected

Scale

AdultSocialNetworkIndex

DSM-VBig5

Personalities

RecoveryCapitalScale

Relapse

AssessmentRecoveryCapital

MinuteDiscounting

ReligiousCommitmentInventory

RecoveryParticipation

Scale

FamilyInfo

...

Background: SI Participant Info

6

- Collectedfrom19,070questions

- ~10psychology-basedmeasures

- 16surveys

Page 7: CS6604 Digital Libraries Social Communities Knowledge ...€¦ · CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation

Final Presentation

Participants

TESModules/TESscores

NewsStories

SuccessStories

UnpaidAssessments

VideoMeetings

PrivateMessages

ResponseFromAdmin

Posts

Posts/Likes/Shares

/Comments

Pictures/LinksUploads

Background:SI Website Use Data

7

Page 8: CS6604 Digital Libraries Social Communities Knowledge ...€¦ · CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation

Final Presentation

• Howdoyouorganizethedata?• Howdoyouvalidate/cleanthedata?• Whatdoyouanalyzefirst?Andinwhatorderdoyougoaboutit?

• Howdoyoumakesenseofthedata?• Howtointerpretpsychology-relatedmeasures?• Biggoal:Howtostreamlinetheentireprocessfromdatacollectiontoanalysestopresentationsuchthatitisreproducibleandextensible?

Overall Challenges

8

Page 9: CS6604 Digital Libraries Social Communities Knowledge ...€¦ · CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation

Final Presentation

• Goal:Investigate/explorewaystomodelthedataandrecommendanapproach.

• Approachestounderstandthedata• FrequencyDistributions/Histograms• Timeseries• Checkingforcorrelations• Comparingmeansandstandarddeviations

• t-tests• Statisticalmodeling

Goal

9

Page 10: CS6604 Digital Libraries Social Communities Knowledge ...€¦ · CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation

Final Presentation

• StatisticalModeling• Whatdowemodel?

• Substancerelapse• Engagement/Changeinengagement• Changeinpsychology-relatedmeasures• Changeinbehavior• Homophily• FriendshiporTrust

• Factors• Classification:Whatwouldbethepredictorvariables?Responsevariables?

• PGMs:DirectedorUndirected?Whatwouldbethefactors?

Approaches

10

Page 11: CS6604 Digital Libraries Social Communities Knowledge ...€¦ · CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation

Final Presentation

• Classification• Network-ClassificationusingNetKit-SRL(StatisticalRelationalLearning)1[Focusofthepresentation]

• Learning usingMarkovLogicNetworks2

1Sofus A.Macskassy,FosterProvost."Classification inNetworkedData:Atoolkitandaunivariate casestudy,"Journal ofMachineLearning,8(May):935-983, 2007.

2Domingos, PedroandRichardson,Matthew(2007).MarkovLogic:AUnifying FrameworkforStatisticalRelationalLearning.InL.Getoor andB.Taskar (eds.), IntroductiontoStatisticalRelationalLearning(pp.339-371),2007.Cambridge,MA:MITPress.

Approaches

11

Page 12: CS6604 Digital Libraries Social Communities Knowledge ...€¦ · CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation

Final Presentation

• Idea:Takingadvantageofrelationalinformationinadditiontoattributeinformationforentityclassification.Example:Networkeddata.

• Focusesonwithin-network classification• Networksofwebpages,researchpapers,socialnetworks,etc.

• Netkit-SRL:Toolkitdevelopedtoemploystatisticalrelationallearningandinference

Network Classification

12

Page 13: CS6604 Digital Libraries Social Communities Knowledge ...€¦ · CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation

Final Presentation

• Netkit-SRL• Networklearningtoolkitforclassificationandinference• DevelopedbyDr.Macskassy &Dr.Provost• Has3components

• Non-relationalmodel• Relationalmodel• Collectiveinference

• SpecificOutcomes:• MaximizeP(x|GK),wherexarelabelstobeestimatedandGK iseverythingknowninthenetwork

• Estimatingjointdistributionoverthelabels• Input:

• Graphwithedgesdescribingrelationshipsandattributesofnodes

Network Classification

13

Page 14: CS6604 Digital Libraries Social Communities Knowledge ...€¦ · CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation

Final Presentation

• Netkit-SRLComponents

Network Classification

14

Purpose Approaches

Local(Non-relational) ClassifierReturnsamodelwhichusesonly attributesofanodetoestimateitsclasslabel.

1)Uniformprior;2)Class-prior

Relational Classifier

Returnsamodelwhichusesnotonlythelocalattributesofanodebutalsoattributesofrelatednodes, including their(estimated)classmembership.

1)Weighted-voterelationalneighbor;2)Class-

distributional relationalneighbor;3)Network-onlymultinomial BayesclassifierwithMarkovRandomField

estimation

Collective Inference

Thismodule appliescollectiveinferenceinorderto

(approximately)maximizethejointprobability ofthelabelsofallnodes inthegraphwhoselabelswereinitially

unknown.

1)Relaxation labeling;2)Iterativeclassification;3)

Gibb’s sampling

Page 15: CS6604 Digital Libraries Social Communities Knowledge ...€¦ · CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation

Final Presentation

• Possibleinstantiations

Network Classification

15

Author Non-relationalClassifier

RelationalClassifier CollectiveInference

Chakrabartietal.(1998)1

Naïve Bayesclassifier NaïveBayesMarkovRandom Field

Relaxation labeling

Lu&Getoor(2003)2

Logistic regression Logisticregression Iterativeclassification

Macskassy&Provost(2003)3

Classespriors Majorityvoteofneighboring classes

Relaxation labeling

[1] Chakrabarti,S.,Dom,B.,&Indyk,P.(1998).EnhancedHypertextCategorizationUsingHyperlinks.ProceedingsoftheACMSIGMODInternationalConferenceonManagementofData(pp.307–318).[2] Lu,Q.,& Getoor,L.(2003).Link-Based Classification.InternationalConference onMachineLearning,ICML-2003 (pp. 496–503).[3]Macskassy,S.A.,&Provost,F.(2003).ASimpleRelationalClassifier.ProceedingsoftheSecondWorkshoponMulti-RelationalDataMining(MRDM-2003)atKDD-2003(pp.64–76).

Page 16: CS6604 Digital Libraries Social Communities Knowledge ...€¦ · CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation

Final Presentation

• Weighted-voterelationalneighborclassifier(wv-RN)• Authors:Macskassy,S.A.,&Provost,F.(2003)• Estimatesclassmembershipbyassumingexistenceofhomophily

• Weightedmeanofclass-membershipprobabilitiesofentitiesinDe (whereDe istheneighborsofentity/nodee)

• 𝑃 𝑐 𝑒 = %&∑𝑤 𝑒, 𝑒* ∗ 𝑃(𝑐|𝑒*)

Network Classification

16

Page 17: CS6604 Digital Libraries Social Communities Knowledge ...€¦ · CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation

Final Presentation

• CollectiveInferenceusingRelaxationLabeling• Definitionofcollectiveinference:

• SimilarbutdifferenttoGibbssamplinginthat:• KeepstrackofclassprobabilityestimatesforXU• Insteadofupdatingthegraphonenodeatatime,updatesclassprobabilitiesofallvertices,atiterationt+1,basedonestimationsfromstept.

Network Classification

17

Page 18: CS6604 Digital Libraries Social Communities Knowledge ...€¦ · CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation

Final Presentation

• Experiment• Rationale:Participantswhoare“homopholous”(whohavesharedbackgroundincommon),havecommoninterests.

• Hypothesis:Givenasetofcommoninterests,betweenpairsofparticipants,onecanpredictthehomophily-measureswithgoodaccuracy.

• Inputgraph• Nodes:Participants• Attributes:Addiction,Education,Income• Edges:Edgeweightisthenumberofnewsstories+successstories+ educationalmodulesthatbothnodes(connectedviatheedge)haveviewedincommon.

• Predictedattribute:Addiction

Network Classification: Experiment

18

Page 19: CS6604 Digital Libraries Social Communities Knowledge ...€¦ · CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation

Final Presentation

• PossibleExperimentconfigurations• Non-relationalclassifier:None• Relationalclassifier:(Options)

• WeightedVoteRelationalNeighbor• Class-DistributionalRelationalNeighbor

• Collectiveinference:(Options)• Relaxationlabeling• Gibbssampling• Iterativeclassification

• Data:Nodesandedgesextractedfromexperiment1replicate2(E1R2)participantinteractions.

• Goal:Predict1)PrimaryAddiction(givengraph);2)Education(givengraph);3)Incomebracket(givengraph)

Network Classification: Experiment Config

19

Page 20: CS6604 Digital Libraries Social Communities Knowledge ...€¦ · CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation

Final Presentation

• E1R2datastatistics• #ofnodes:256;#ofedges:436

Network Classification: Experiment

20

30

139

41

118 17

1 7 1 10

20406080100120140160

Freq

uency

PrimarySubstance

PRIMARYSUBSTANCEBREAKDOWNAMONG256PARTICIPANTS

Page 21: CS6604 Digital Libraries Social Communities Knowledge ...€¦ · CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation

Final Presentation

Network Classification: Experiment

21

• ExperimentE1R2datastatistics• Edgeweightbreakdown

317

66

2411 7 5 1 3 2 1

0

50

100

150

200

250

300

350

1 2 3 4 5 6 7 8 11 12

Freq

uency

EdgeStrength

EDGE WEIGHT BREAKDOWN

Page 22: CS6604 Digital Libraries Social Communities Knowledge ...€¦ · CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation

Final Presentation

RelationalClassifier/CollectiveInferencemethods

Relaxation Labeling Gibbs Sampling IterativeClassification

Weighted VoteRelationalNeighbor

(wvRN)0.36601 0.37908 0.39216

Class-DistributionalRelational Neighbor 0.15686 0.22222 0.18954

Network Classification: Experiment Results

22

• Networkclassificationframeworkresults(variousexperimentconfigurationsgivenasrow/columnnames)(Metric:Accuracy)• Goal:Predict“PrimaryAddiction”ofparticipants

Page 23: CS6604 Digital Libraries Social Communities Knowledge ...€¦ · CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation

Final Presentation

• PredictedResponse/Class=PrimaryAddiction• Configuration:wvRNwithrelaxationlabeling• ConfusionMatrix

Network Classification: Experiment Results

23

Page 24: CS6604 Digital Libraries Social Communities Knowledge ...€¦ · CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation

Final Presentation

• PredictedResponse/Class=Education• Configuration:wvRNwithrelaxationlabeling• ConfusionMatrix

Network Classification: Experiment Results

24

Page 25: CS6604 Digital Libraries Social Communities Knowledge ...€¦ · CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation

Final Presentation

• PredictedResponse/Class=Income• Configuration:wvRNwithrelaxationlabeling• ConfusionMatrix

Network Classification: Experiment Results

25

Page 26: CS6604 Digital Libraries Social Communities Knowledge ...€¦ · CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation

Final Presentation

• Conclusion• Thehighestaccuracyforallexperimentconfigurationsforpredictingprimaryaddictionasshowninslide22,is0.392

• Theconfusionmatrixforpredictingeachofprimaryaddiction,educationandincomeshowsmoredetailsontheaccuracyofpredictingeachclass.

• Theaccuracyislow.• Thisisprobably duetothefactthatourexperimentconfigurationdoesNOTincludeanon-relationalcomponent.

• Furthermore,ourgraphedges,andattributeshaveonly1-3fields.Thegraphneedstobemoredensewithalotmoreinformationtobeusedfornetwork-basedinference.

Network Classification: Experiment Conclusion

26

Page 27: CS6604 Digital Libraries Social Communities Knowledge ...€¦ · CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation

Final Presentation

• Possibleextensionsofthework:• Buildgraphwithdifferentrepresentationofedges• Constructmoreattributesofthenodefornon-relational(local)classifierstep

• Tryexperimentswithpriorslearntfromvarioustraditionalclassificationmodels.

• Problem/Challenge• Extensionorfurtherworkisopen-ended.• Partofdoctoralwork:Buildalogicalflowchartofinquiries/hypotheses.

• Thelogicalflowchartofinquiriescanbeusedandcalleduponbasedonuser’slineofinquiry.

Network Classification: Next Steps

27

Page 28: CS6604 Digital Libraries Social Communities Knowledge ...€¦ · CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation

Final Presentation

• AMarkovLogicNetwork(MLN) isasetofpairs(F,w)where

• F isaformulainfirst-orderlogic• w isarealnumber

• Togetherwithasetofconstants,itdefinesaMarkovnetworkwith

• OnenodeforeachgroundingofeachpredicateintheMLN• OnefeatureforeachgroundingofeachformulaF intheMLN,withthecorrespondingweightw

*Slidesource:http://www.cs.washington.edu/homes/pedrod/psrai.ppt

Learning via Markov Logic Networks

28

Page 29: CS6604 Digital Libraries Social Communities Knowledge ...€¦ · CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation

Final Presentation

Learning via Markov Logic Networks

29

Twoconstants:Anna (A)andBob (B)

Cancer(A)

Smokes(A) Smokes(B)

Cancer(B)

*Slidesource:http://www.cs.washington.edu/homes/pedrod/psrai.ppt

Page 30: CS6604 Digital Libraries Social Communities Knowledge ...€¦ · CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation

Final Presentation

Learning via Markov Logic Networks

30

Cancer(A)

Smokes(A)Friends(A,A)

Friends(B,A)

Smokes(B)

Friends(A,B)

Cancer(B)

Friends(B,B)

*Slidesource:http://www.cs.washington.edu/homes/pedrod/psrai.ppt

Page 31: CS6604 Digital Libraries Social Communities Knowledge ...€¦ · CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation

Final Presentation

Learning via Markov Logic Networks

31

Cancer(A)

Smokes(A)Friends(A,A)

Friends(B,A)

Smokes(B)

Friends(A,B)

Cancer(B)

Friends(B,B)

*Slidesource:http://www.cs.washington.edu/homes/pedrod/psrai.ppt

Page 32: CS6604 Digital Libraries Social Communities Knowledge ...€¦ · CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation

Final Presentation

Learning via Markov Logic Networks

32

Cancer(A)

Smokes(A)Friends(A,A)

Friends(B,A)

Smokes(B)

Friends(A,B)

Cancer(B)

Friends(B,B)

Weightof formulai

No.oftruegroundings of formulai inx

⎟⎠

⎞⎜⎝

⎛= ∑

iii xnw

ZxP )(exp1)(Probabilityofaworldx:

*Slidesource:http://www.cs.washington.edu/homes/pedrod/psrai.ppt

Page 33: CS6604 Digital Libraries Social Communities Knowledge ...€¦ · CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation

Final Presentation

Tasks/Applications

Learning via Markov Logic Networks

33

• Basics• Logisticregression• Hypertextclassification• Informationretrieval• Entityresolution• HiddenMarkovmodels• Informationextraction

• Statisticalparsing• Semanticprocessing• Bayesiannetworks• Relationalmodels• Robotmapping• PlanningandMDPs• Practicaltips

*Slidesource:http://www.cs.washington.edu/homes/pedrod/psrai.ppt

Page 34: CS6604 Digital Libraries Social Communities Knowledge ...€¦ · CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation

Final Presentation

• Nextsteps• Extractmoreattributesforeachparticipant

• Compiledifferentwaystorepresentedgeweight

• BuildlocalclassifierandtestingresultsforNetkit-SRL

• UseAlchemytorepresentdatausingMarkovLogicnetworks.

Future work

34

Page 35: CS6604 Digital Libraries Social Communities Knowledge ...€¦ · CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation

Questions?

Page 36: CS6604 Digital Libraries Social Communities Knowledge ...€¦ · CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation

Final Presentation

• Otherworks• Inductivelogicprogramming• Markovrandomfields• Conditionalrandomfields• Probabilisticrelationalmodels• RelationalBayesiannetworks• Relationaldependencynetworks• RelationalMarkovnetworks

Network Classification

36