56
Frontiers of Computational Journalism Columbia JournalismSchool Week 9: Social Network Analysis November 20, 2015

Social Network Analysis. Computational Journalism week 10

Embed Size (px)

DESCRIPTION

Jonathan Stray, Columbia University, Fall 2015Syllabus at http://www.compjournalism.com/?p=133

Citation preview

Page 1: Social Network Analysis. Computational Journalism week 10

FrontiersofComputationalJournalism

ColumbiaJournalismSchoolWeek9:SocialNetworkAnalysis

November20,2015

Page 2: Social Network Analysis. Computational Journalism week 10
Page 3: Social Network Analysis. Computational Journalism week 10

Asetofpeople

Network

andasetofconnectionsbetweenpairsofthem

Page 4: Social Network Analysis. Computational Journalism week 10

Typesofconnections

Socialnetworkanalysis:onlyonetypeofconnectionbetweenindividuals (e.g."friend")

Linkanalysis:multiple typesofconnectionsfriendbrotheremployerwenttouniversitywithsoldacartoowns51%of

Linkanalysisismuchmorerelevanttojournalism, becauseitallowsrepresentationofmuchmoredetailandcontext.

Page 5: Social Network Analysis. Computational Journalism week 10

PeopleActinGroupsFamilyandfriendships:Iammostcloselyconnectedtoasmallsetofpeople,whoareusuallycloselyconnectedtoeachother.

Business:IammuchmorelikelytodobusinesswithpeopleIalreadyknow.

Influence:IlistentopeopleIknowmorethanIlistentostrangers.

Norms:whatisrightdependsonwhatthepeoplearoundmethink.

Peopletendtomarry,dobusinesswith,spendtimewith,etc.peoplefromsimilarbackgrounds...andpeoplewhohavesocialtiestendtobesimilar.

Page 6: Social Network Analysis. Computational Journalism week 10

Homophily

Homophily istheprinciple thatcontactbetweensimilarpeopleoccursatahigher ratethanamongdissimilarpeople.Thepervasivefactofhomophily meansthatcultural,behavioral,genetic,ormaterialinformation thatflowsthrough networkswilltendtobelocalized.Homophily imples thatdistanceintermsofsocialcharacteristicstranslatesintonetworkdistance,thenumberofrelationships through whichapieceofinformationmusttraveltoconnecttwoindividuals.

- McPherson,Smith-Lovin,CookBirdsofafeather:homophily insocialnetworks

Page 7: Social Network Analysis. Computational Journalism week 10

StructureRelatestoBehavior

Ina1951experiment, researchershadfivepeopleworktogether,onlyallowedtocommunicateaccordingtooneofthepatternsabove.Theywereeachgivenacardwithseveralsymbolsonit.Thetaskwastodeterminewhichsymbolwasincommonbetweenallofthecards.Itwasrepeatedmanytimes.

Howdidthegroupsorganizethemselves?Whichpatternswere fastest?

From H. Leavitt, Some effects of certain communication patterns on group performance,Journal of Abnormal Psychology 46(1)

Page 8: Social Network Analysis. Computational Journalism week 10

Correlationofdifferenttypesofinfo

Supposeyouhavearecordofphonenumberscalled,adatabaseofpoliticalcampaigndonations,andalistofgovernmentappointees.Putthemtogether,andyouhavethisstory:

WASHINGTON—Timeandagain,TexasGov.RickPerrypickeduphisofficephoneinthemonthsbeforehewouldannouncehisbidforthepresidency.Hedialedwealthyfriendswhowerehisbigfundraisersandstateofficialswhoowedhimfortheirjobs.

PerryalsometwithaTexasexecutivewhowould laterco-foundanindependentpoliticalcommitteethathaspromisedtoraisemillionstosupportPerrybutisprohibitedfromcoordinatingitsactivitieswiththegovernor.

- JackGillum,Perrycalledtopdonorsfromworkphones, AP,6Dec2011

Page 9: Social Network Analysis. Computational Journalism week 10

SocialNetworkAnalysisinJournalism

• Identifypeopleorcommunities• Trackmoneyandcriminalnetworks• Understandspreadofinformationandbehavior• Illustratecomplexstories

UsefulinallareaswhereCSintersectsjournalism!(Reporting,communication,filtering,effecttracking)

Page 10: Social Network Analysis. Computational Journalism week 10

Twomajoranalysismethods

…afteryouhavethenetworkdata,whichmaybeaverymanualprocess.

• Lookatavisualization• Applyalgorithm

Inbothcases,theresultsarenotinterpretablewithoutcontext!

Page 11: Social Network Analysis. Computational Journalism week 10

Force-DirectedLayout

Eachedgeisa"spring" withafixedpreferredlength.Plusglobal repulsiveforcethatpushesallnodesapart.

Page 12: Social Network Analysis. Computational Journalism week 10

FromTheEffectofGraphLayoutonInferencefromSocialNetworkData,Blytheetal.

Page 13: Social Network Analysis. Computational Journalism week 10

FromTheEffectofGraphLayoutonInferencefromSocialNetworkData,Blytheetal.

Page 14: Social Network Analysis. Computational Journalism week 10

FromTheEffectofGraphLayoutonInferencefromSocialNetworkData,Blytheetal.

Weaskedrespondentsthreequestionsaboutthesamefivefocalnodesineachsociogram:

1)howmanysubgroupswereinthesociogram2)how“prominent”waseachplayerinthesociogram3)howimportanta“bridging”roledideachplayeroccupyinthesociogram

Page 15: Social Network Analysis. Computational Journalism week 10

Centrality

Oftenidentifiedwith"influence"or"power."Oftenimportantinjournalism.

Wecanvisualizethegraphanduseoureyes,orwecancomputecentralityvaluesalgorithmically.

Page 16: Social Network Analysis. Computational Journalism week 10

Degreecentrality:numberofedges

Models: caseswherethenumberofconnections isimportant.Example:whichcelebritycanreachthemostpeopleatonce?

Page 17: Social Network Analysis. Computational Journalism week 10

Closenesscentrality:averagedistancetoallothernodes

Models: caseswheretimetakentoreachanodeisimportant.Example:whofindsoutaboutgossip first?

Page 18: Social Network Analysis. Computational Journalism week 10

Betweenness centrality:numberofshortestpathsthatpassthrough node

Models: caseswherecontrolovertransmission isimportant.Example:whohasthemostpower tomakeintroductions?

Page 19: Social Network Analysis. Computational Journalism week 10

Eigenvectorcentrality:howlikelyyouaretoendupatanodeonarandomwalk

(sameideaasPageRank)

Models: caseswhereimportanceofneighbors isimportant.Example:theprivateadvisertothepresident

Page 20: Social Network Analysis. Computational Journalism week 10

Journalismcentrality:howimportant isthispersontothisstory?

Page 21: Social Network Analysis. Computational Journalism week 10

Whois"important"?Whattypeofpersondoyouwanttoidentify inthenetwork?

Oftenassumedwe'reafter"influential."Butsociology says"power" isacomplicatedthinganddifficult todefineandmeasure.

Networkanalysishasmostlyignored thisproblem.Iknowofnosuccessfuluseofcentralitymetricsinjournalism– maybeyou'llbethefirst.

Page 22: Social Network Analysis. Computational Journalism week 10

FindingCommunities

Noonedefinitionof"community."Couldmeanatown,oraclub,oranindustrynetwork.

Butforourpurposes,acommunityis"agroupofpeoplewithpre-existingpatternsofassociation."

Insocialnetworkanalysis,thattranslatesintoclustersinthegraph.

Page 23: Social Network Analysis. Computational Journalism week 10

Friends/followers

Page 24: Social Network Analysis. Computational Journalism week 10

Co-consumption – Networkofpoliticalbooksales, Orgnet.com

Page 25: Social Network Analysis. Computational Journalism week 10

Communicationsnetwork– ExploringEnron, JefferyHeer

Page 26: Social Network Analysis. Computational Journalism week 10

Weblinkstructure–MapofIranianBlogosphere,Berkman Center

Page 27: Social Network Analysis. Computational Journalism week 10

Individual time/locationtrails– CitySense,SenseNetworks

Page 28: Social Network Analysis. Computational Journalism week 10

Warning:nonetworkisever"complete."Otherwisetherewouldbe7billionpeople init

Page 29: Social Network Analysis. Computational Journalism week 10

Mathematicaldefinitionsof"cluster"

You'vealreadyseenseveral!Ifyoucancomputedistancebetweenanytwoitems,youcancluster.

Butinsocialnetworks,noteveryoneisconnectedtoeveryoneelse...

Page 30: Social Network Analysis. Computational Journalism week 10

Modularity

Aretheremoreintra-groupedgesthanwewouldexpectrandomly?

Page 31: Social Network Analysis. Computational Journalism week 10

Modularity

n=numberofverticeski =degreeofvertexiAij =1ifedgebetweeni,j,0otherwisegij =1ifi,j insamegroup,0otherwise

Therearetotaledgesinthegraph.Iftheygobetweenrandomverticesthennumberofedgesbetweeni,j is

m = 12 ki∑

kik j / 2m

Page 32: Social Network Analysis. Computational Journalism week 10

Modularity

n=numberofverticeski =degreeofvertexiAij =1ifedgebetweeni,j,0otherwisegij =1ifi,j insamegroup,0otherwise

Modularity

IfQ>0thenthereare"excess"edgesinsidethegroups(andfeweredgesbetweenthem.)

Q = Aij − kik j / 2m( )ij∑ gij

Page 33: Social Network Analysis. Computational Journalism week 10

Modularityalgorithm

• LookforadivisionofnodesintotwogroupsthatmaximizesQ

• Canfindthisthrougheigenvectortechnique• Possiblethatno divisionhasQ>0,inwhichcasethegraphisasinglecommunity

• IfadivisionwithQ>0found,split• Recursivelysplitsub-graphs

Page 34: Social Network Analysis. Computational Journalism week 10
Page 35: Social Network Analysis. Computational Journalism week 10

TheHairballproblem

Realsocialnetworksarebig,withcomplex,overlappingcommunities inthecentralcomponent.Modularityandothercommunitydetectionalgorithmsgivepoor results.

Page 36: Social Network Analysis. Computational Journalism week 10

K-coreDecomposition

Findthenodesatthe"center"ofanetwork.

for k=1 to maximum node degreerepeat

remove all nodes with degree < kuntil all remaining nodes have degree >= kset "core number" of remaining nodes to k

Page 37: Social Network Analysis. Computational Journalism week 10

K-coreDecomposition

Page 38: Social Network Analysis. Computational Journalism week 10

Carmietal.,AmodelofInternettopologyusingk-shelldecomposition

Page 39: Social Network Analysis. Computational Journalism week 10

ProtestDynamicsonTwitter

González-Bailon etal,TheDynamicsofProtestRecruitmentthroughanOnlineNetwork

Page 40: Social Network Analysis. Computational Journalism week 10

k-corenumbervs.maximumcascadesize.Color=sentatleastonetweetwhichreachedthisfractionofusers(orange=reachedallusers)

Page 41: Social Network Analysis. Computational Journalism week 10

Keyinsight:trianglesnotedges

Simmel's theoryofsociology(early20th C.)saysrelationshipbetweentwopeoplecannotbeunderstoodwithoutcontext.

Page 42: Social Network Analysis. Computational Journalism week 10

Idea:countsharedtriangles

1.GiveneachnodeA,giveneachofA'sfriendsB,countthenumberoftrianglesinvolvingAandeachB(=numberofsharedfriendsofAandB).2.RankA'sfriends(eachB)bynumberofsharedfriends(numberofC'sforA,B)tocreate"topfriends"listforA.2.KeeptheedgebetweennodesA,Donlyifthereissomethresholdpercentageoverlapintheirtopfriendslist.

Page 43: Social Network Analysis. Computational Journalism week 10

Simmelian Backbones

Page 44: Social Network Analysis. Computational Journalism week 10

SNAinjournalism

• ICIJOffshoreTaxHavenleak• ICIJhumantissueinvestigation• OrganizedCrimeandCorruptionReportingProject• WSJGalleon'sWeb insidertradingstory• SCMP'sWhoRunsHongKong• Muckety.com

Page 45: Social Network Analysis. Computational Journalism week 10
Page 46: Social Network Analysis. Computational Journalism week 10

Theotherchallengewasthedataitself.Howtoseparatetheextraordinaryfromtheroutineandfindthepublicinterestinsideamazeofmorethan37,000offshorecompanyholders?Afirststepwastobuildasmanylistsaspossibleofpublicfigures:Politburomembers,militarycommanders,mayorsoflargecities,billionaireslistedinForbesandHurun’s rankingsofthemega-wealthyandso-calledprincelings(relativesofthecurrentleadershiporformerCommunistPartyelders).

Throughpainstakingdatabasework,areporterinSpaincross-referencedthelistsofnotableChineseagainstthenamesofoffshoreclientslistedwithinICIJ’sOffshoreLeaksdata.Theaddeddifficultywasthatinmostcases,namesintheoffshorefileswereregisteredinRomanizedform,notChinesecharacters.Thismademakingexactmatchesextremelyhard,becauseRomanizedspellingsfromChinesecharacterstendtovarywidely:WangmightbespelledWong,ZhangcouldbeCheung,andYemightbespelledYeh.AddressesandIDnumbershelpedconfirmedmanyidentitiesbutmanyothersnamesweredroppedbecausethereportingteamcouldnotbe100percentsurethatthepersonwasacorrectmatch.

Apictureslowlybegantoemerge:China’seliteswereaggressivelyusingoffshorehavenstoholdassets,listcompaniesintheworld’sstockexchanges,buyandsellrealestateandconducttheirbusinessawayfromBeijing’sredtapeandcapitalcontrols.

HowWeDidOffshoreLeaksChina,ICIJ

Page 47: Social Network Analysis. Computational Journalism week 10

AnalyzingtheDatabehindSkinandBone,ICIJ

Page 48: Social Network Analysis. Computational Journalism week 10
Page 49: Social Network Analysis. Computational Journalism week 10

WhoRunsHK?TheFightoverStanleyHo'sFortuneSouthChinaMorningPost,2010

Page 50: Social Network Analysis. Computational Journalism week 10
Page 51: Social Network Analysis. Computational Journalism week 10

SNAthatcouldbeusedinJournalism

• TheNetworkofGlobalCorporateControlpaper• Networkofcampaignfinancecontributions(SuperPACs)• Internationalfinancialsystem/HFT• "Revolvingdoor"/regulatorycapture• Politicaleliteinanycountry• Findaudienceforstory,akintotargetedmarketing• ...

Page 52: Social Network Analysis. Computational Journalism week 10
Page 53: Social Network Analysis. Computational Journalism week 10

Vitali,Glattfelder,Battiston,TheNetworkofGlobalCorporateControl

Page 54: Social Network Analysis. Computational Journalism week 10
Page 55: Social Network Analysis. Computational Journalism week 10
Page 56: Social Network Analysis. Computational Journalism week 10

SNAinjournalism• Visualizationwidelyused• Linkanalysissuccessfulininvestigativereporting• Mostoftheworkrequiredtodothesetypesofstoriesistraditional research,notalgorithmically-guided.

• Iamnotawareofsuccessfulapplicationofcentralitymetricsorcommunitydetectionalgorithms.

• Thismaychangeasthegraphsjournalismexaminesgetbigger...

• Woulditbepossibletousecommunitydetectiontofindthe"right"audienceforastory?