Social Network Data HEFAN 1301110886. Table of Contents Introduction: What's different about social network data? Nodes Relations l Scales of

Social Network DataHEFAN 1301110886

Table of Contents

Introduction: What's different about social network data?

Nodes

Relations

l Scales of measurement

l A note on statistics and social network data

1INTRODUCTION:

WHAT'S DIFFERENT ABOUT SOCIAL NETWORK DATA?

"Conventional" data: an example

Attributions

observation

An example of network data

Holistically

Relation

The total number of 0 and 1 is the sameWhen Carol chooses Bob, will Bob choose Carol?

"Network" data

The first major emphasis :

seeing how actors are located or "embedded" in the overall network

The second major emphasis :

seeing how the whole pattern of individual choices gives rise to more holistic patterns

Similarities and differences

the rows as simply a listing of cases ,and the columns as attributes of each actor

conventional data focuses on actors and attributes; network data focus on actors and relations

Same method(i.e. correlations) Different purpose

Questions for discussion

What are the difference between attribute analyses and relation analyses?

Who contains more information?

Who is the cause?

When are individually structurally equivalent or equivalent in some attribution?

2

NODES

POPULATIONS, SAMPLES, AND BOUNDARIES

MODALITY AND LEVELS OF ANALYSIS

Node and the sampling process

Seemingly no "samples" at all

Rather, all of the individuals in some population or populations

the boundaries of each population to be

Studied

individual units of observation are to be selected within that population

several levels of analysis

Boundaries

Two main types imposed or created by the actors

themselves(i.e. all the members of a classroom)

imposed by the investigator( having gross family incomes over $1,000,000 per year

Modality and levels of analysis

Multi-modal : Most networkers think of individual persons as being

embedded in networks that are embedded in networks that are embedded in networks.

Advantage for dealing with multiple levels of analysis

how the individual is embedded within a structure and how the structure emerges from the micro-relations between individual parts

Rare practice


How to decide the boundary of the population? Are the network always relative?

Why not enough practice in Multi-mode despite its big advantage?

3RELATIONS

SAMPLING TIES

MULTIPLE RELATIONS

Relations

All node , Selected relations

(Cost and generalizability)

Full network methods

Information about each actor's ties with all other actors

Necessary to properly define and measure many of the structural concepts of network analysis

Strong ties are relatively few

An example

Take all the people in our classroom as an example, how many relations exist among us? How to select our target relations?

Snowball methods

When to stop: The limitations on the numbers of strong ties that most actors have, and the tendency for ties to be reciprocated often make it fairly easy to find the boundaries

Particularly helpful for tracking down "special" populations (small sub-sets in large numbers of others

Two limitations :isolates and starting points

Ego-centric networks (with alter connections)

Combined with attribute-based approaches

Understand the opportunities and constraints that ego has as a result of the way they are embedded in their networks( overall network density and the prevalence of reciprocal ties, cliques, and the like can be estimated rather directly)

Less information (Distance, centrality, and various kinds of positional equivalence)

Comparison between the above two network

C

B

DA

Snow ball :all of the starting point A B C E and the following other nodes are included in our data

Ego-centric: A is always the center, and we only study A B C D and their relations without E.

E

Ego-centric networks (ego only)

only information on ego's connections to alters -- but not information on the connections among those alters

Don’t know the nature of the macro-structure or the whole network

think about alters in terms of their social roles, rather than as individual occupants of social roles, ego-centered networks can tell us a good bit about local social structures

Multiple relations

do not sample -- but rather select – relations

material and informational

"conserved“ vs. non-conserved

tie. Vs. a common attribute


Ego-centric networks :individually or holistically? attributions and subjectivity

When sampling ties, are the nodes also be sampling?

4SCALES OF

MEASUREMENT

Binary measures of relations

Not unusual to see data that are measured at a "higher" level transformed into binary scores

The additional power and simplicity of analysis of binary data is "worth" the cost in information lost

Multiple-category nominal measures of relations

relationship is coded by it's type, rather than it's strength ( dummy coding)

each node was allowed to have a tie in at most one of the resulting networks( low densities and an inherent negative correlation among the matrices)

simply code whether a tie exists or not ,and The types of ties can then be scaled into a single grouped ordinal measure of tie strength

An example

When considering type of classroom relations, there are maybe classroom from primary school ,high school or university

What if a person belongs to more than one type simultaneously?

Grouped ordinal measures of relations

Ordinal scales of measurement contain more information than nominal. That is, the scores reflect finer gradations of tie strength than the simple binary presence or absence.

binarized by choosing some cut-point and re-scoring(choices of cut-points can be consequential

treated as interval(the intervals separating points on an ordinal scale may be very heterogeneous

Full-rank ordinal measures of relations:

to score the strength of all of the relations of an actor in a rank order from strongest to weakest

treated as interval (a number of full rank order scales that we may wish to combine to form a scale (i.e. rankings of people's likings of other in the group), the sum of such scales into an index)

group the rank order scores into groups (i.e. produce a grouped ordinal scale) or dichotomize the data

Interval measures of relations

Objective: Rather than asking whether two people communicate, one could count the number of email, phone, and inter-office mail deliveries between them

Whenever possible, connections should be measured at the interval level -- as we can always move to a less refined approach later; if data are collected at the nominal level, it is much more difficult to move to a more refined level

The most powerful insights of network analysis, and many of the mathematical and graphical tools used by network analysts were developed for simple graphs (i.e. binary, undirected)

Interval measures of relations

About the cut-point Theory and the purposes of the analysis the data (maybe the distribution of tie strengths

really is discretely bi-modal, and displays a clear cut point; maybe the distribution is highly skewed and the main feature is a distinction between no tie and any tie).

When a cut point is chosen, it is wise to also consider alternative values that are somewhat higher and lower, and repeat the analyses with different cut-points to see if the substance of the results is affected

Questions for discussion cont.

Multiple relations , binary scores: rich data ,poor analyses？

Grouped ordinal = binary or interval (lose of information) vs. CB confounding/identical

The most powerful insights of network analysis, and many of the mathematical and graphical tools used by network analysts were developed for simple graphs (i.e. binary, undirected)

Many characterizations of the embeddedness of actors in their networks, and of the networks themselves are most commonly thought of in discrete terms in the research literature

5A NOTE ON STATISTICS AND SOCIAL

NETWORK DATA

"mathematical" sociology vs "statistical or quantitative analysis

Mathematical approaches treat the data as "deterministic." That is, they tend to regard

the measured relationships and relationship strengths as accurately reflecting the "real" or "final" or "equilibrium" status of the network.

assume that the observations are not a "sample" of some larger population of possible observations; rather, the observations are usually regarded as the population of interest

"mathematical" sociology vs "statistical or quantitative analysis

Statistical analysts regard the particular scores on relationship strengths as

stochastic or probabilistic realizations of an underlying true tendency or probability distribution of relationship strengths.

tend to think of a particular set of network data as a "sample" of a larger class or population of such networks or network elements -- and have a concern for the results of the current study would be reproduced in the "next" study of similar samples

little difference with conventional statistical approaches

Characteristics of individual observations (e.g. the median tie strength of actor X with all other actors in the network) and the network as a whole (e.g. the mean of all tie strengths among all actors in the network).

The degree of similarity among actors, and if finding patterns in network data (e.g. factor analysis, cluster analysis, multi-dimensional scaling). Even the tools of predictive modeling are

commonly applied to network data (e.g. correlation and regression)

Issues with Inferential statistics

the stability, reproducibility, or generalizability of results

generalizing, but our sample was not drawn by probability methods(Network analysis often relies on artifacts, direct observation, laboratory experiments, and documents as data sources)

Issues with Inferential statistics

Hypothesis testing The key link in the inferential chain of

hypothesis testing is the estimation of the standard errors of statistics

The approximations of it, however, hold when the observations are drawn by independent random sampling. Network

Observations in network data are almost always non-independent, by definition

Hypothesis testing(simulation)

20 symmetric ties among ten actors, and it’s observed that there is one clique containing four actors

Simulations of network with 10 nodes and 20 ties

If it turns out that such cliques (or more numerous or more inclusive ones) are very unlikely under the assumption that ties are purely random, then it is very plausible to reach the conclusion that there is a social structure present

Hypothesis testing(simulation)

The simulation is simple. Just repeat the experiment several thousand times and add up what proportion of the "trials" result in "successes."

the logic of testing hypotheses is the same

Mathematical sociology than of statistical or quantitative analysis(simulation）

The relationship data like 0 or 1 is seen in a mathematical sociology way, but we still need to sample network data like sampling from large graphs and regard them representative of the whole network.

Questions for discussion cont.

Lewis, K., Kaufman, J., Gonzalez, M., Wimmer, A., & Christakis, N. (2008). Tastes, ties, and time: A new social network dataset using Facebook. com. Social networks, 30(4), 330-342.

Recommended paper

The first dataset of its kind to be made publicly available

Past research has tended to draw upon only a very small portion of the wealth of data available on Facebook: some (e.g. Lampe et al., 2006; Ellison et al., 2007) avoid the site altogether and rely exclusively on survey methods; most (e.g. Lampe et al., 2007; Gross and Acquisti, 2005) focus only on profile data, ignoring the network ties between users; and no study has begun to make use of data on user tastes to the degree we have seen elsewhere

First, our data are collected in a naturally occurring, as opposed to contrived, fashion(Taste responses)

Second, they are socio-centric and indicate the interrelatedness of an entire population of interest , not egocentric(not representative of some larger population)

First, Facebook is a standardized research instrument

Few network data have been gathered on college students despite the role of higher education in shaping a number of important life outcomes

Five defining features of the data

Third, they are multiplex(four waves of longitudinal data corresponding to the 4 years our population is in college

First, relationships, once established remain in place until or unless they are actively terminated

Second, 220 users changed their profiles from “public” to “private” between waves 1 and 2(housing and taste data)


Fourth, they are longitudinal.

Our dataset affords at least three measures of relationship, . While agnostic about the subjective meaning of these ties, we comment briefly on the extent to which they might correspond to “real life” social relationships

Fifth, they include demographic, relational, and cultural information on respondents(favorite music, movies, and books tastes as cause or consequence of social interaction)


Some marketing related examples for discussion

Product diffusion Imit

ator

Imitator

Imitator

Innovator

Product selling data in a store(products as nodes and being sold together as relations)

Contact information between two customers in a customer network

Information about posting, commenting and replying behavior in a Weibo network and find the opinion leaders

Stores selling competitive or complementary products in a district

Some marketing related examples for discussion

Thank you

Documents

Social Network Data HEFAN 1301110886. Table of Contents Introduction: What's different about social network data? Nodes Relations l Scales of