Persons Through Groups 2-mode networks Overview Breiger: Duality of Persons and Groups Argument...

Preview:

Citation preview

Persons Through Groups2-mode networks

OverviewBreiger: Duality of Persons and Groups

•Argument•Method

•Sociology Examples•Moody: Coauthorship

•Methods:•Finish ego-networks•Working w. 2-mode data•Constructing a PTG network•Constructing a GTP network•(Bipartite graphs)

Breiger: 1974 - Duality of Persons and Groups

Argument:

Metaphor: people intersect through their associations, which defines (in part) their individuality.

Duality implies that relations among groups implies relations among individuals

Persons Through Groups2-mode networks

An Example:

(4.3)0 0 0 1 0 00 0 1 0 0 00 1 0 1 0 01 0 1 0 1 20 0 0 1 0 10 0 0 2 1 0

C E

B D F

A

Interpersonal Network

(4.4)0 1 0 0 01 0 1 1 10 1 0 2 10 1 2 0 10 1 1 1 0

1 2 5

4

3

Intergroup Network

Problem:These two representations, though clearly related, are not easily compared.

Persons Through Groups2-mode networks

An Example:

To compare them, construct a person-to-group adjacency matrix:

1 2 3 4 5A 0 0 0 0 1B 1 0 0 0 0C 1 1 0 0 0D 0 1 1 1 1E 0 0 1 0 0F 0 0 1 1 0

A =

Each column is a group, each row a person, and the cell = 1 if the person in that row belongs to that group.

You can tell how many groups two people both belong to by comparing the rows: Identify every place that both rows = 1, sum them, and you have the overlap.

Persons Through Groups2-mode networks

An Example:

1 2 3 4 5A 0 0 0 0 1B 1 0 0 0 0C 1 1 0 0 0D 0 1 1 1 1E 0 0 1 0 0F 0 0 1 1 0

A =

Compare persons A and F 1 2 3 4 5 A 0 0 0 0 1 = 1 F 0 0 1 1 0 = 2 AF 0 0 0 0 0 = 0

Or persons D and F

1 2 3 4 5 D 0 1 1 1 1 = 4 F 0 0 1 1 0 = 4 DF 0 0 1 1 0 = 2

Person A is in 1 group, Person F is in two groups, and they are in no groups together.

Person D is in 4 groups, Person F is in two groups, and they are in 2 groups together.

Persons Through Groups2-mode networks

An Example:

1 2 3 4 5A 0 0 0 0 1B 1 0 0 0 0C 1 1 0 0 0D 0 1 1 1 1E 0 0 1 0 0F 0 0 1 1 0

A =

Similarly for Groups:

1 2 1•2A 0 0 0B 1 0 0C 1 1 1D 0 1 0 E 0 0 0F 0 0 0 2 2 1

Group 1 has 2 members, group 2 has 2 members and they overlap by 1 members (C).

Persons Through Groups2-mode networks

In general, you can get the overlap for any pair of groups / persons by summing the multiplied elements of the corresponding rows/columns of the persons-to-groups adjacency matrix. That is:

g

kjkikij AAP

1

p

kkjkiij AAG

1

Persons-to-Persons Groups-to-Groups

Persons Through Groups2-mode networks

One can get these easily with a little matrix multiplication. First define AT as the transpose of A (Simply reverse the rows and columns). If A is of size P x G, then AT will be of size G x P.

jiTij AA

1 2 3 4 5A 0 0 0 0 1B 1 0 0 0 0C 1 1 0 0 0D 0 1 1 1 1E 0 0 1 0 0F 0 0 1 1 0

A =

A B C D E F1 0 1 1 0 0 02 0 0 1 1 0 03 0 0 0 1 1 14 0 0 0 1 0 15 1 0 0 1 0 0

AT =

Persons Through Groups2-mode networks

P = A(AT)G = AT(A)

1 2 3 4 5A 0 0 0 0 1B 1 0 0 0 0C 1 1 0 0 0D 0 1 1 1 1E 0 0 1 0 0F 0 0 1 1 0

A =

A B C D E F1 0 1 1 0 0 02 0 0 1 1 0 03 0 0 0 1 1 14 0 0 0 1 0 15 1 0 0 1 0 0

AT =

(6x5) (5x6)

A * AT = P (6x5)(5x6) (6x6)

AT * A = P (5x6) 6x5) (5x5)

See: Breiger_ex.sas for an IML example.

P A B C D E FA 1 0 0 1 0 0B 0 1 1 0 0 0C 0 1 2 1 0 0D 1 0 1 4 1 2E 0 0 0 1 1 1F 0 0 0 2 1 2

G 1 2 3 4 51 2 1 0 0 02 1 2 1 1 13 0 1 3 2 14 0 1 2 2 15 0 1 1 1 2

Persons Through Groups2-mode networks

Theoretically, these two equations define what Breiger means by duality:

“With respect to the membership network,…, persons who are actors in one picture (the P matrix) are with equal legitimacy viewed as connections in the dual picture (the G matrix), and conversely for groups.” (p.87)

The resulting network:1) Is always symmetric2) the diagonal tells you how many groups (persons) a person (group) belongs to (has)

Persons Through Groups2-mode networks

In practice, most network software (UCINET, PAJEK) will do all of these operations. It is also simple to do the matrix multiplication in programs like SAS or SPSS

Name Health Fam Devlp IneqAlessandro Tarozzi 0 0 1 1Alexander Pfaff-Talikoff 1 0 0 0Amar Hamoudi 1 1 0 0Anatoli Yashin 1 0 0 1Angela M ORand 1 0 0 0Anna Gassman-Pines 0 1 1 1Asia Maselko 1 0 0 0Avshalom Caspi 1 0 1 0Charlie Cloffelter 0 0 0 1Christina M. Gibson-Davis 0 1 1 1Duncan Thomas 1 1 1 1Elizabeth Frankenberg 1 1 0 1Elizabeth Oltmans Ananat 0 1 0 1Frank A. Sloan 1 0 0 0Jacob L. Vigdor 0 0 1 1James Moody 0 0 1 1James S Clark 1 0 0 0James W. Vaupel 1 0 0 0Jennan Read 1 0 0 1Jerry Reiter 1 0 0 0Kim Blankenship 1 0 0 0Kathleen Sikkema 1 0 1 0Keith E Whitfield 1 0 0 0Kenneth A Dodge 1 1 1 1Kenneth C Land 1 0 1 1Linda K George 1 0 0 0Linda M Burton 1 1 1 1Lisa A Keister 0 0 0 1M. Giovanna Merli 1 0 0 0Manoj Mohanan 1 0 0 0Marie Lynn Miranda 1 0 1 0Marjorie B McElroy 0 1 0 0P. J. Eric Stallard 1 0 0 0Patrick Bayer 0 0 0 1Peter Arcidiacono 0 1 0 1Phil Morgan 0 1 0 0Philip J. Cook 0 0 0 1Philip R Costanzo 1 0 1 0Rachel Kranton 0 0 0 1Sabrendu Pattanayak 1 0 0 0Seth Gary Sanders 1 1 1 1Sherman James 1 0 0 1Terrie E Moffitt 0 0 1 0V. Joseph Hotz 0 1 0 1William \"Sandy\" Darity 0 0 0 1Zeng Yi 1 1 0 0

=AG=(AT)A

Persons Through GroupsDuPRI Example

Health Fam HDev Ineqy 29 7 9 9 7 14 6 10 9 6 15 10 9 10 10 23

Area Overlap Among DuPRI FacultyP = A(AT)

(Ine

qual

ity)

Family

Human Dev

Health

Persons Through Groups2-mode networks

Online Version

Persons Through GroupsSociology ExampleOr consider ties formed by sharing membership on a student committee (MA, exams, etc).

(all committee memberships, line thickness proportional to number of joint appearances)

Persons Through GroupsSociology ExampleOr consider ties formed by sharing membership on a student committee (MA, exams, etc).

(all committee memberships, line thickness proportional to number of joint appearances)

Duke English Department

Persons Through GroupsSociology ExampleOr consider ties formed by sharing membership on a student committee (MA, exams, etc).

(all committee memberships, line thickness proportional to number of joint appearances)

Duke English Department

Interactiveversion

Persons Through GroupsSociology Coauthorship

Sociology Coauthorship Networks

Persons Through GroupsSociology Coauthorship

(2-mode)

(1-modeprojection)

Persons Through GroupsSociology Coauthorship

3-degrees of Lynn Smith-Lovin

LSL reaches 533 people in 3 steps.

Persons Through GroupsSociology Coauthorship

3-degrees of LSL

Persons Through GroupsSociology Coauthorship

The likelihood of coauthorship varies by type of work

Persons Through GroupsSociology Coauthorship

Largest Bicomponent, g = 29,462

0.04 0.27 0.50 0.73 0.96

Persons Through GroupsSociology Coauthorship

Largest Bicomponent, n = 29,462

Persons Through GroupsSociology Coauthorship

Persons Through GroupsDirector Interlocks

Val Burris – Interlocks & Political Cohesion

Persons Through GroupsDirector Interlocks

Val Burris – Interlocks & Political Cohesion

Persons Through GroupsDirector Interlocks

Val Burris – Interlocks & Political Cohesion

Persons Through GroupsDirector Interlocks

Val Burris – Interlocks & Political Cohesion

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Direct 2-ste 3-step 4-step 5-step 6-step

Effect size of indirect ties, by Dependent Variable

Party Contribution

Presidential Match

Presidential Correlation

Persons Through GroupsEcology Co-authorship

Persons Through GroupsEcology Co-authorship

Persons Through GroupsEcology Co-authorship

Persons Through GroupsPhysician Networks

Construct networks of physicians who share patients. Note we sampled patients from 5 states, here are resulting physicians from all the PA patients.

Table 1. Network Sample Construction

Patient Visits

Unique Patients

Unique Physicians

2008 12,263,448 922,189 138,375 2009 12,977,008 924,387 134,863 2010 12,167,013 871,993 135,212 Total 37,407,477 963,899 190,785

Persons Through GroupsConstructing large 2-mode nets

• The direct matrix multiplication approach is (highly) inefficient for large 2-mode networks.-couldn’t even hold the physician indicator matrix in memory

Solution is to construct the bipartite list then construct edges as a summary over that. For example:

Obs rid auid1 1 602422 2 19613 2 160064 2 477415 3 500096 3 514177 4 304178 4 496129 5 839610 5 11500

In SAS, I then transpose the matrix by the mode I want to link by. So here, if I want an author to author network, I transpose by papers (Rid)

Persons Through GroupsConstructing large 2-mode nets

• The direct matrix multiplication approach is (highly) inefficient for large 2-mode networks.-couldn’t even hold the physician indicator matrix in memory

Then write a loop to construct the edge-parts

data edges; set auplev; array aus(82) col1-col82; do i=1 to 81; if aus(i)^= . then do; snd=aus(i); end; else do; i=82; end; do j=i+1 to 82; if aus(j)^= . then do; snd=min(aus(i),aus(j); rcv=max(aus(i),aus(j); val=1; output; end; else do; j=82; end; end; end; keep snd rcv rid val; run;

This produces all the edge parts, then sum by dyad to get the valued network.

proc means data=edges noprint;class snd rcv;var val;output out=edgesum (where=(_type_=3)) sum=;run;

Persons Through GroupsConstructing large 2-mode nets

In PAJEK, you can define an input graph as bipartite as:

*Vertices 8 31 "Actor 1"2 "Actor 2"3 "Actor 3"4 "Event 1"5 "Event 2"6 "Event 3"7 "Event 4"8 "Event 5"*Edges1 4 1 52 42 5 2 62 83 43 73 8

So the first line has two vertices numbers, the total number of nodes (8) and the number in the first “row” mode (3). Then the edges all fall from mode 1 to mode 2.

It is possible to construct a network that links people and their groups directly in a single network. In this case, the nodes are of 2 types: person and groups. Consider the classic example of the Southern Women’s data:

Persons Through GroupsBipartite “Two-Mode” graphs

The classic treatment of this network would create a person to person or a group to group network:

Persons Through GroupsBipartite “Two-Mode” graphs

The classic treatment of this network would create a person to person or a group to group network:

Persons Through GroupsBipartite “Two-Mode” graphs

Instead, you could analyze the network as a joint network, with two types of nodes:

Persons Through GroupsBipartite “Two-Mode” graphs

Instead, you could analyze the network as a joint network, with two types of nodes:

Persons Through GroupsBipartite “Two-Mode” graphs

1 2 3 4 5 6 7 8---------------------------- Actor 1 1. 0 0 0 1 1 0 0 0 Actor 2 2. 0 0 0 1 1 1 0 1 Actor 3 3. 0 0 0 1 0 0 1 1 Event 1 4. 1 1 1 0 0 0 0 0 Event 2 5. 1 1 0 0 0 0 0 0 Event 3 6. 0 1 0 0 0 0 0 0 Event 4 7. 0 0 1 0 0 0 0 0 Event 5 8. 0 1 1 0 0 0 0 0

It is always possible to arrange a 2-mode network so that the adjacency matrix has all zeros in the block-diagonal cells.

Persons Through GroupsBipartite “Two-Mode” graphs

Galois Lattices

A new way to think about bipartite networks is as a collection of ordered sets, and then use some of the tools from discrete mathematics to map the collection of sets. For example, consider the set of all possible combinations of {1,2,3}. This can be represented in a network as:

This is known as a Galois Lattice

Persons Through GroupsBipartite “Two-Mode” graphs

Galois Lattices

Imagine you had the following data on actors and events:

Persons Through GroupsBipartite “Two-Mode” graphs

Galois Lattices

Persons Through GroupsBipartite “Two-Mode” graphs

Galois Lattices

The Davis data in Lattice form:

Persons Through GroupsBipartite “Two-Mode” graphs

Topic / Text Models

To uncover topics, we applying a similar process across papers and words. Basically a corpus is nothing more than a big two-mode network of papers containing words:

Comparing across columns tells us whether the two papers are recognized by others as similar.

Paper 1

Paper 2

Paper 3

Paper 4

Paper 1 -- Hi low Low

Paper 2 Hi -- Low Low

Paper 3 low Low -- Hi

Paper 3 low low Hi --

similarity matrixPaper 1

Paper 2

Paper 3 Paper 4

Obedient 5 10 0 0

Loyal 6 5 1 0

Friendly 8 9 0 0

Aloof 0 1 9 15

Proud 0 0 5 4

Dog 2 1 0 0

Cat 0 0 1 1

Topic / Text Models

Key differences are: a)we typically need to parse the text first for unimportant words, parts of speech or other particular features we care about.

b)Weight words differently based on their importance in the corpus-Most common is the td-idf formulation, that gives higher weight to rare words

c)Then define a similarity score rather than a simple count/volume of overlap

Topic / Text Models

Topic / Text Models

Term “key” result

Topic / Text Models

Tgparse linked output:

Weighting applied by tmutil

These are all “under the hood” in the SAS “TextMiner” application

(linked)

BackgroundMining Science Products: Topic structure

To uncover topics, we applying a similar process across papers:

Example: One-step neighborhood of “More information, better jobs?”

BackgroundMining Science Products: Topic structure

To uncover topics, we applying a similar process across papers:

Example: One-step neighborhood of “More information, better jobs?”

BackgroundMining Science Products: Topic structure

To uncover topics, we applying a similar process across papers:

BackgroundMining Science Products: Topic structure

Borrett, Stuart R., James Moody & Achim Edelmann. 2014. “The Rise of Network Ecology: Maps of the topic diversity and scientific collaboration” Ecological Modeling (DOI: 10.1016/j.ecolmodel.2014.02.019)

Network Ecology Topic Map

Man Made Pathogen DebateCommunity of Science FoundationsTopic Structures

The collaboration space is based on published papers and we’re curious how the papers are topically clustered.

Here we used the Latent Dirichlet allocation (LDA) topic modeling routine on the full corpus of papers. LDA does not assign papers to topics exactly, but rather provides a degree of association based on the topic loadings depending on the paper’s distribution of terms.

Community of Science FoundationsTopic Structures

We settled on an eight topic solution:Paper similarity matrix, sorted by topic loadings

Papers titles of papers with the top five topic loadings on each topic

Title: Evolutionary Genetics top2

Identifying Sigtures of Selection in Genetic Time Series 0.99457

A spatially explicit model of sex ratio evolution in response to sex-biased dispersal 0.99441

The magnitude of local adaptation under genotype-dependent dispersal 0.99431

The advantages of segregation and the evolution of sex. 0.99427

DISENTANGLING THE EFFECTS OF EVOLUTIORY, DEMOGRAPHIC, AND ENVIRONMENTAL FACTORS INFLUENCING GENETIC STRUCTURE OF TURAL POPULATIONS: ATLANTIC HERRING AS A CASE STUDY

0.99414

Title: Virology (emphasis on Influenza) top1

Growth of H5N1 influenza a viruses in the upper respiratory tracts of mice 0.99363

Transmission of Influenza Virus in a Mammalian Host Is Increased by PB2 Amino Acids 627K or 627E/701N 0.99274

The M Segment of the 2009 New Pandemic H1N1 Influenza Virus Is Critical for Its High Transmission Efficiency in the Guinea Pig Model

0.99236

Insertion of a multibasic cleavage site in the haemagglutinin of human influenza H3N2 virus does not increase pathogenicity in ferrets

0.99137

Reverse genetics demonstrates that proteolytic processing of the Ebola virus glycoprotein is not essential for replication in cell culture.

0.99127

Title: Genetic Sequencing top3

Sequence and organization of coelacanth neurohypophysial hormone genes: evolutiory history of the vertebrate neurohypophysial hormone gene locus

0.99445

Characterization of the neurohypophysial hormone gene loci in elephant shark and the Japanese lamprey: origin of the vertebrate neurohypophysial hormone genes

0.99357

Sequence Data from New Plastid and Nuclear COSII Regions Resolves Early Diverging Lineages in Coffea (Rubiaceae) 0.99308

Sequence characterization and comparative alysis of three Plasmids isolated from environmental Vibfio spp. 0.99267

Large Linear Plasmids of Borrelia Species That Cause Relapsing Fever 0.99244

Community of Science FoundationsTopic Structures

Title: Immunology top4

Cholinergic agonists regulate JAK2/STAT3 sigling to suppress endothelial cell activation 0.99368

CD4 expression on activated NK cells: Ligation of CD4 induces cytokine expression and cell migration 0.99295

Reduced DEAF1 function during type 1 diabetes inhibits translation in lymph node stromal cells by suppressing Eif4g3 0.99281

Persistent expression of Pax3 in the neural crest causes cleft palate and defective osteogenesis in mice 0.99259

Critical Role of the Tumor Suppressor Tuberous Sclerosis Complex 1 in Dendritic Cell Activation of CD4 T Cells by Promoting MHC Class II Expression via IRF4 and CIITA

0.99252

Title: Public Health (emphasis on HIV) top5

Opportunities for health promotion education in child care. 0.99639

To Fund or Not to Fund Development of a Decision-Making Framework for the Coverage of New Health Technologies 0.99515

Community-based research in AIDS-service organizations: what helps and what doesn't? 0.99445

Sustaining chronic disease magement in primary care: Lessons from a demonstration project 0.99418

Strengthening biostatistics resources in sub-Saharan Africa: Research collaborations through U.S. partnerships 0.99418

Title: Biochemistry (cellular) top6Functiol and structural roles of the N-termil extension in Methanosarci acetivorans protoglobin 0.99315

The effects of an ideal beta-turn on beta-2 microglobulin fold stability 0.99288The Juxtamembrane Linker of Full-length Syptotagmin 1 Controls Oligomerization and Calcium-dependent Membrane Binding.

0.99267

Structure, conformatiol stability, and enzymatic properties of acylphosphatase from the hyperthermophile Sulfolobus solfataricus.

0.99236

The Escherichia coli Lpt Transenvelope Protein Complex for Lipopolysaccharide Export Is Assembled via Conserved Structurally Homologous Domains

0.99220

Community of Science FoundationsTopic Structures

Papers titles of papers with the top five topic loadings on each topic

Title: HIV Vaccines & Drugs top7Efficacy of zidovudine compared to stavudine, both in combition with lamivudine and indivir, in human immunodeficiency virus-infected nucleoside-experienced patients with no prior exposure to lamivudine, stavudine, or protease inhibitors (novavir trial).

0.99418

Stavudine, nevirapine and ritovir in stable antiretroviral therapy-experienced children with human immunodeficiency virus infection.

0.99302

Effect of HIV Infection Status and Anti-Retroviral Treatment on Quantitative and Qualitative Antibody Responses to Pneumococcal Conjugate Vaccine in Infants

0.98871

Prior meningococcal A/C polysaccharide vaccine does not reduce immune responses to conjugate vaccine in young adults. 0.98816

Long-Term Efficacy and Safety of Raltegravir Combined with Optimized Background Therapy in Treatment-Experienced Patients with Drug-ResistantHIV Infection: Week 96 Results of the BENCHMRK 1 and 2 Phase III Trials

0.98403

Title: Social Aspects of Health Care top8Impact of admission hyperglycemia on hospital mortality in various intensive care unit populations. 0.99572

High prevalence of chronic kidney disease in population-based patients diagnosed with type 2 diabetes in downtown Shanghai 0.99553

Does socioeconomic status affect mortality subsequent to hospital admission for community acquired pneumonia among older persons?

0.99488

F-18-FDG PET/CT Identifies Patients at Risk for Future Vascular Events in an Otherwise Asymptomatic Cohort with Neoplastic Disease

0.99461

Preinjury warfarin use among elderly patients with closed head injuries in a trauma center. 0.99457

Community of Science FoundationsTopic Structures

Papers titles of papers with the top five topic loadings on each topic

Community of Science FoundationsTopic Structures

Red-blue scale is size, circle is proportional to distribution in 2d space

Community of Science FoundationsTopic Structures

Red-blue scale is size, circle is proportional to distribution in 2d space

Assign each node to the area they write in most:

Community of Science FoundationsTopics predict Debate Side

“VirologyInfluenza”

Evolutionary Genetics

GeneticSequencing

Immunology Public Health Cellular BioChem

HIV/Drugs Social Aspects of

health

Extending Text beyond “bag of words”

Key issue with text models is that they “chop up” language – subtle differences get lost:“country music problem”

Solutions:• link words (k-word phrases). This adds in a little localized context• sentiment models: add a content-specific weight to each term, based on prior

knowledge• Implication models. Goal here is to link terms/concepts to each other by the narrative

implication implied in the sentence/corpus.

Extending Text beyond “bag of words”

Blocking the Future

Bearman, Peter S., Robert Farris, and James Moody. “Blocking the Future: New Solutions for Old Problems in Historical Social Science.” Social Science History 23: 501-535.

Extending Text beyond “bag of words”

Blocking the Future

Bearman, Peter S., Robert Farris, and James Moody. “Blocking the Future: New Solutions for Old Problems in Historical Social Science.” Social Science History 23: 501-535.

One villager's life story

Extending Text beyond “bag of words”

Blocking the Future

Bearman, Peter S., Robert Farris, and James Moody. “Blocking the Future: New Solutions for Old Problems in Historical Social Science.” Social Science History 23: 501-535.

Combined narratives from multiple interviews

Methods: Review Ego-Networks.

1) Go over network drawing programs2) Go over ego-network creation programs3) Go over ego-network measures programs 4) Go over persons-through-groups creation programs

Recommended