Upload
evan-chapman
View
216
Download
2
Embed Size (px)
Citation preview
Social Network DataHEFAN 1301110886
Table of Contents
Introduction: What's different about social network data?
Nodes
Relations
l Scales of measurement
l A note on statistics and social network data
1INTRODUCTION:
WHAT'S DIFFERENT ABOUT SOCIAL NETWORK DATA?
"Conventional" data: an example
Attributions
observation
An example of network data
Holistically
Relation
The total number of 0 and 1 is the sameWhen Carol chooses Bob, will Bob choose Carol?
"Network" data
The first major emphasis :
seeing how actors are located or "embedded" in the overall network
The second major emphasis :
seeing how the whole pattern of individual choices gives rise to more holistic patterns
Similarities and differences
the rows as simply a listing of cases ,and the columns as attributes of each actor
conventional data focuses on actors and attributes; network data focus on actors and relations
Same method(i.e. correlations) Different purpose
Questions for discussion
What are the difference between attribute analyses and relation analyses?
Who contains more information?
Who is the cause?
When are individually structurally equivalent or equivalent in some attribution?
2
NODES
POPULATIONS, SAMPLES, AND BOUNDARIES
MODALITY AND LEVELS OF ANALYSIS
Node and the sampling process
Seemingly no "samples" at all
Rather, all of the individuals in some population or populations
the boundaries of each population to be
Studied
individual units of observation are to be selected within that population
several levels of analysis
Boundaries
Two main types imposed or created by the actors
themselves(i.e. all the members of a classroom)
imposed by the investigator( having gross family incomes over $1,000,000 per year
Modality and levels of analysis
Multi-modal : Most networkers think of individual persons as being
embedded in networks that are embedded in networks that are embedded in networks.
Advantage for dealing with multiple levels of analysis
how the individual is embedded within a structure and how the structure emerges from the micro-relations between individual parts
Rare practice
Questions for discussion
How to decide the boundary of the population? Are the network always relative?
Why not enough practice in Multi-mode despite its big advantage?
3RELATIONS
SAMPLING TIES
MULTIPLE RELATIONS
Relations
All node , Selected relations
(Cost and generalizability)
Full network methods
Information about each actor's ties with all other actors
Necessary to properly define and measure many of the structural concepts of network analysis
Strong ties are relatively few
An example
Take all the people in our classroom as an example, how many relations exist among us? How to select our target relations?
Snowball methods
When to stop: The limitations on the numbers of strong ties that most actors have, and the tendency for ties to be reciprocated often make it fairly easy to find the boundaries
Particularly helpful for tracking down "special" populations (small sub-sets in large numbers of others
Two limitations :isolates and starting points
Ego-centric networks (with alter connections)
Combined with attribute-based approaches
Understand the opportunities and constraints that ego has as a result of the way they are embedded in their networks( overall network density and the prevalence of reciprocal ties, cliques, and the like can be estimated rather directly)
Less information (Distance, centrality, and various kinds of positional equivalence)
Comparison between the above two network
C
B
DA
Snow ball :all of the starting point A B C E and the following other nodes are included in our data
Ego-centric: A is always the center, and we only study A B C D and their relations without E.
E
Ego-centric networks (ego only)
only information on ego's connections to alters -- but not information on the connections among those alters
Don’t know the nature of the macro-structure or the whole network
think about alters in terms of their social roles, rather than as individual occupants of social roles, ego-centered networks can tell us a good bit about local social structures
Multiple relations
do not sample -- but rather select – relations
material and informational
"conserved“ vs. non-conserved
tie. Vs. a common attribute
Questions for discussion
Ego-centric networks :individually or holistically? attributions and subjectivity
When sampling ties, are the nodes also be sampling?
4SCALES OF
MEASUREMENT
Binary measures of relations
Not unusual to see data that are measured at a "higher" level transformed into binary scores
The additional power and simplicity of analysis of binary data is "worth" the cost in information lost
Multiple-category nominal measures of relations
relationship is coded by it's type, rather than it's strength ( dummy coding)
each node was allowed to have a tie in at most one of the resulting networks( low densities and an inherent negative correlation among the matrices)
simply code whether a tie exists or not ,and The types of ties can then be scaled into a single grouped ordinal measure of tie strength
An example
When considering type of classroom relations, there are maybe classroom from primary school ,high school or university
What if a person belongs to more than one type simultaneously?
Grouped ordinal measures of relations
Ordinal scales of measurement contain more information than nominal. That is, the scores reflect finer gradations of tie strength than the simple binary presence or absence.
binarized by choosing some cut-point and re-scoring(choices of cut-points can be consequential
treated as interval(the intervals separating points on an ordinal scale may be very heterogeneous
Full-rank ordinal measures of relations:
to score the strength of all of the relations of an actor in a rank order from strongest to weakest
treated as interval (a number of full rank order scales that we may wish to combine to form a scale (i.e. rankings of people's likings of other in the group), the sum of such scales into an index)
group the rank order scores into groups (i.e. produce a grouped ordinal scale) or dichotomize the data
Interval measures of relations
Objective: Rather than asking whether two people communicate, one could count the number of email, phone, and inter-office mail deliveries between them
Whenever possible, connections should be measured at the interval level -- as we can always move to a less refined approach later; if data are collected at the nominal level, it is much more difficult to move to a more refined level
The most powerful insights of network analysis, and many of the mathematical and graphical tools used by network analysts were developed for simple graphs (i.e. binary, undirected)
Interval measures of relations
About the cut-point Theory and the purposes of the analysis the data (maybe the distribution of tie strengths
really is discretely bi-modal, and displays a clear cut point; maybe the distribution is highly skewed and the main feature is a distinction between no tie and any tie).
When a cut point is chosen, it is wise to also consider alternative values that are somewhat higher and lower, and repeat the analyses with different cut-points to see if the substance of the results is affected
Questions for discussion cont.
Multiple relations , binary scores: rich data ,poor analyses?
Grouped ordinal = binary or interval (lose of information) vs. CB confounding/identical
The most powerful insights of network analysis, and many of the mathematical and graphical tools used by network analysts were developed for simple graphs (i.e. binary, undirected)
Many characterizations of the embeddedness of actors in their networks, and of the networks themselves are most commonly thought of in discrete terms in the research literature
5A NOTE ON STATISTICS AND SOCIAL
NETWORK DATA
"mathematical" sociology vs "statistical or quantitative analysis
Mathematical approaches treat the data as "deterministic." That is, they tend to regard
the measured relationships and relationship strengths as accurately reflecting the "real" or "final" or "equilibrium" status of the network.
assume that the observations are not a "sample" of some larger population of possible observations; rather, the observations are usually regarded as the population of interest
"mathematical" sociology vs "statistical or quantitative analysis
Statistical analysts regard the particular scores on relationship strengths as
stochastic or probabilistic realizations of an underlying true tendency or probability distribution of relationship strengths.
tend to think of a particular set of network data as a "sample" of a larger class or population of such networks or network elements -- and have a concern for the results of the current study would be reproduced in the "next" study of similar samples
little difference with conventional statistical approaches
Characteristics of individual observations (e.g. the median tie strength of actor X with all other actors in the network) and the network as a whole (e.g. the mean of all tie strengths among all actors in the network).
The degree of similarity among actors, and if finding patterns in network data (e.g. factor analysis, cluster analysis, multi-dimensional scaling). Even the tools of predictive modeling are
commonly applied to network data (e.g. correlation and regression)
Issues with Inferential statistics
the stability, reproducibility, or generalizability of results
generalizing, but our sample was not drawn by probability methods(Network analysis often relies on artifacts, direct observation, laboratory experiments, and documents as data sources)
Issues with Inferential statistics
Hypothesis testing The key link in the inferential chain of
hypothesis testing is the estimation of the standard errors of statistics
The approximations of it, however, hold when the observations are drawn by independent random sampling. Network
Observations in network data are almost always non-independent, by definition
Hypothesis testing(simulation)
20 symmetric ties among ten actors, and it’s observed that there is one clique containing four actors
Simulations of network with 10 nodes and 20 ties
If it turns out that such cliques (or more numerous or more inclusive ones) are very unlikely under the assumption that ties are purely random, then it is very plausible to reach the conclusion that there is a social structure present
Hypothesis testing(simulation)
The simulation is simple. Just repeat the experiment several thousand times and add up what proportion of the "trials" result in "successes."
the logic of testing hypotheses is the same
Mathematical sociology than of statistical or quantitative analysis(simulation)
The relationship data like 0 or 1 is seen in a mathematical sociology way, but we still need to sample network data like sampling from large graphs and regard them representative of the whole network.
Questions for discussion cont.
Lewis, K., Kaufman, J., Gonzalez, M., Wimmer, A., & Christakis, N. (2008). Tastes, ties, and time: A new social network dataset using Facebook. com. Social networks, 30(4), 330-342.
Recommended paper
The first dataset of its kind to be made publicly available
Past research has tended to draw upon only a very small portion of the wealth of data available on Facebook: some (e.g. Lampe et al., 2006; Ellison et al., 2007) avoid the site altogether and rely exclusively on survey methods; most (e.g. Lampe et al., 2007; Gross and Acquisti, 2005) focus only on profile data, ignoring the network ties between users; and no study has begun to make use of data on user tastes to the degree we have seen elsewhere
First, our data are collected in a naturally occurring, as opposed to contrived, fashion(Taste responses)
Second, they are socio-centric and indicate the interrelatedness of an entire population of interest , not egocentric(not representative of some larger population)
First, Facebook is a standardized research instrument
Few network data have been gathered on college students despite the role of higher education in shaping a number of important life outcomes
Five defining features of the data
Third, they are multiplex(four waves of longitudinal data corresponding to the 4 years our population is in college
First, relationships, once established remain in place until or unless they are actively terminated
Second, 220 users changed their profiles from “public” to “private” between waves 1 and 2(housing and taste data)
Five defining features of the data
Fourth, they are longitudinal.
Our dataset affords at least three measures of relationship, . While agnostic about the subjective meaning of these ties, we comment briefly on the extent to which they might correspond to “real life” social relationships
Fifth, they include demographic, relational, and cultural information on respondents(favorite music, movies, and books tastes as cause or consequence of social interaction)
Five defining features of the data
Some marketing related examples for discussion
Product diffusion Imit
ator
Imitator
Imitator
Innovator
Product selling data in a store(products as nodes and being sold together as relations)
Contact information between two customers in a customer network
Information about posting, commenting and replying behavior in a Weibo network and find the opinion leaders
Stores selling competitive or complementary products in a district
Some marketing related examples for discussion
Thank you