19
IT’S A NETWORK, NOT AN ENCYCLOPEDIA: A SOCIAL NETWORK PERSPECTIVE ON WIKIPEDIA COLLABORATION Gerald C. Kane (and Sam Ransbotham) Assistant Professor of Information Systems Carroll School of Management Boston College Paper presented at the Academy of Management Annual Meeting 2009 – Chicago Presentation adapted from http://www.profkane.com/uploads/7/9/1/3/79137/a om_2009.pptx

It’s a Network,Not an Encyclopedia

Embed Size (px)

DESCRIPTION

Kane, G. (2009). It’s a Network, Not an Encyclopedia: A Social Network Perspective on Wikipedia Collaboration. Academy of Management Annual Meeting Proceedings.

Citation preview

Page 1: It’s a Network,Not an Encyclopedia

IT’S A NETWORK,NOT AN ENCYCLOPEDIA: A SOCIAL NETWORK PERSPECTIVE ON WIKIPEDIA COLLABORATION

Gerald C. Kane (and Sam Ransbotham)

Assistant Professor of Information Systems

Carroll School of Management

Boston College

Paper presented at the Academy of Management Annual Meeting 2009 – Chicago

Presentation adapted from http://www.profkane.com/uploads/7/9/1/3/79137/aom_2009.pptx

Page 2: It’s a Network,Not an Encyclopedia

WIKIPEDIA AS KNOWLEDGE NETWORK

Wikipedia is a good environment for studying collaborative processes (Kane and Fichman 2009).

Research on collaboration on Wikipedia (Butler et

al 2008, Kittur and Kraut 2008, Wilkinson and Huberman 2008). Views articles as independent But contributors collaborate on multiple articles,

can transfer content, process, and reputational knowledge

Assumption: Wikipedia’s article are not independent, but interconnected

RQ: How the connections between WP articles influence article quality?

Page 3: It’s a Network,Not an Encyclopedia

SOCIAL NETWORK ANALYSIS (SNA) SNA used to study collaborative environments (Borgatti and

Foster 2003, Constant et al 1996). Structure of the network and node’s position within

structure describes how knowledge flows through network.

Two-mode network is classic, but underused, network conceptualization One type of node (contributor) is viewed as the tie

connecting other types of node (articles). Article is unit of analysis.

Centrality is among most widely used measures, refers to whether an individual node is situated within core or periphery of network (Wasserman and Faust 1994, Scott 2000). Central nodes typically perform better, because have better

access to knowledge contained in network.

Page 4: It’s a Network,Not an Encyclopedia

TWO-MODE NETWORKS (EXAMPLE)

“Mode” = “a distinct set of entities on which the structural variables are measured. […] Structural variables measured on a single set o actors give rise to one-mode network” (Wasserman & Faust, 1994, p. 29)

Two-mode network = a network dataset containing two sets of actors. Measure: which actors from one set have ties to actors in the other set

Affiliation network: special type of two-mode network, with one set of actors and one set of events. Relation: which actor participate in which events?

Page 5: It’s a Network,Not an Encyclopedia

TWO-MODE NETWORKS (EXAMPLE)

People that participate in social events – film festivals (incidence matrix):

Adjacency matrix (events by events)

Cannes Berlin Venice Moscow

Shangai

Julia 1 1 1 1 0

Sean 1 1 1 0 1

Audrey 0 1 1 1 0

Tom 0 0 1 0 1

Cannes Berlin Venice Moscow

Shangai

Cannes 2 2 1 1

Berlin 2 3 2 1

Venice 2 3 2 2

Moscow 1 2 2 0

Shangai 1 1 2 0

Page 6: It’s a Network,Not an Encyclopedia

TWO-MODE NETWORKS (EXAMPLE)

People that participate in social events – film festivals (incidence matrix):

Adjacency matrix (actor by actor)

Cannes Berlin Venice Moscow

Shangai

Julia 1 1 1 1 0

Sean 1 1 1 0 1

Audrey 0 1 1 1 0

Tom 0 0 1 0 1

Julia Sean Audrey Tom

Julia 3 3 1

Sean 3 2 2

Audrey 3 2 1

Tom 1 2 1

Page 7: It’s a Network,Not an Encyclopedia

TWO TYPES OF CENTRALITY

Different measures of centrality assess node’s position at different levels of the network (Faust 1997).

Degree centrality = local network Number of nodes directly connected to a node and

strength of those relationships. H1: The degree centrality of an article in the

two-mode matrix of articles and contributors will be positively related to article quality.

Eigenvector centrality = global network. How important (central) are the nodes to which the

focal node is directly connected to. H2: The eigenvector centrality of an article in

the two-mode matrix of articles and contributors will be positively related to quality.

Page 8: It’s a Network,Not an Encyclopedia

EIGENVECTOR CENTRALITY - EXAMPLE

A

B1

B2 B3

Page 9: It’s a Network,Not an Encyclopedia

EIGENVECTOR CENTRALITY - EXAMPLE

A

B1

B2 B3

Page 10: It’s a Network,Not an Encyclopedia

SETTING AND METHOD Sampled 300 (out of 15,000) medical articles on

Wikipedia in Wikiproject Medicine (WP:MED). Qualitative analysis confirmed that WP:MED was

largely independent sub-community on Wikipedia. 75% of most active contributors worked mostly or

exclusively on medical articles. 1/3 medical professionals, 1/3 patients, 1/3

Wikipedians. WP:MED assesses quality of all articles:

Featured Article (best): FA A-quality: A Good Article: GA B-quality: B Start-quality (worst): Start

Collected all FA, A, GA (~100)

Random sample of B, Start (~200)

Page 11: It’s a Network,Not an Encyclopedia

Squares = contributorsCircles = articles

Red = Featured ArticlesOrange = A-quality ArticlesYellow = Good Articles

Light Blue = B-quality ArticlesDark Blue = Start-quality articles

TWO-MODE NETWORK OF ARTICLES AND CONTRIBUTORS

Page 12: It’s a Network,Not an Encyclopedia

VARIABLES

Dependent variable: quality

Independent variables: Transformed 300 (articles) x 1800 (contributors) two-mode network into 300 x 300 matrix (article by article). Nodes are articles, ties are the n° of editors who contributed to each pair of articles.UCINet: Degree centrality Eigenvector centrality

Page 13: It’s a Network,Not an Encyclopedia

VARIABLES

Control variables: Topic Importance

Most important article are likely to receive more attention, to attract divergent opinions and have a greater base of knowledge.

(1-4). Assigned by WP:MED. Popularity:

Past research: editors may be more likely to contribute to high-traffic articles; patients tend to click on the top search results.

Traffic: Average number page views/ month 1Q2008 (Alexa.com).

Google Rank: Is it the top Google result? (60% were). Direct collaboration (article and discussion pages).

Number of unique contributors, average edits/contributor, number of anonymous contributors.

Page 14: It’s a Network,Not an Encyclopedia

DATA ANALYSIS

Ordinal Regression in SPSS 16

When there is a progressive relationship within a categorical dependent variable, but it is unclear the magnitude of difference between the categories.

Negative log-log link function chosen because high proportion of lower quality articles (200).

Page 15: It’s a Network,Not an Encyclopedia

HYPOTHESIS

H1: The degree centrality of an article in the 2-mode incidence matrix of articles and editors will be positively related to article quality.

H2: The eigenvector centrality of an article in the 2-mode incidence matrix of articles and editors will be positively related to article quality.

Page 16: It’s a Network,Not an Encyclopedia

ResultsMODEL 1 MODEL 2

Est. SE Wald Sig. Est. SE Wald Sig.

Traffic 0.00** 0.00 7.76 0.005 0.00 0.00 2.26 0.13

Google Rank 0.43** 0.15 7.97 0.005 0.43** 0.16 7.51 0.01

Importance -0.82*** 0.12 49.08 0.000 -0.91*** 0.12 53.72 0.00

Contributors (A) 0.00 0.00 1.17 0.279 0.00 0.00 0.15 0.69

Contributors (D) 0.00 0.00 0.08 0.776 0.00 0.00 0.68 0.41

Edits/Contributor(A) 0.63*** 0.12 28.59 0.000 0.61** 0.12 24.35 0.00

Edits/Contributor(D) -0.06 0.05 1.67 0.196 -0.08 0.05 2.16 0.14

Anon (Article) 4.15*** 1.04 16.00 0.000 3.47*** 1.06 10.72 0.00

Anon (Discussion) -2.06** 0.75 7.54 0.006 -1.46* 0.76 3.71 0.05

Deg. Cent. 0.00** 0.00 7.29 0.01

Eig. Cent. 0.33*** 0.09 12.60 0.00

Pseudo R-Square

Cox and Snell 0.379 0.474

Nagelkerke 0.407 0.510

Page 17: It’s a Network,Not an Encyclopedia

RESULTS Both hypotheses supported.

Degree and Eigenvector centrality highly significant (p<.01+) in the expected direction.

Only one of the measures of direct collaboration is significant (edits/contributor), underscoring importance of network variables.

Topic importance. Most highly significant variable in model, but negatively correlated with quality. Implications for value of medical knowledge on WP.

Anonymity is positively related on article page, but negative related on talk page. Differential effect noted elsewhere (Sia et al., 2002).

Page 18: It’s a Network,Not an Encyclopedia

IMPLICATIONS FOR THEORY AND PRACTICE

Theoretical Implications. Collaborative features associated with quality in

community-based knowledge creation settings. Network variables critically important for

predicting quality, should not examine as independent collaborative environments.

Validation in other settings needed. Managerial implications.

Companies are beginning to employ similar communities for strategic purposes (Tapscott 2006).

Should approach as integrated collaborative network, not simply independent efforts.

Page 19: It’s a Network,Not an Encyclopedia

FUTURE DIRECTIONS Connect more directly with existing theories of

information quality (e.g. Constant et al. 1996). Volume of information = number of authors. Diversity of information = degree centrality Quality of information = eigenvector centrality.

Recruiting team of 4th year medical school students to validate WP rankings. Pilot (60 articles) = 90% accuracy and 85% IRR

Collecting entire WP:MED. full text history of 2,026,992 revisions to all of the

16,354 Wikipedia medical articles Eliminates question of biased sampling, reduces

multicollinearity. Additional Controls: Age, Length, Complexity, Sections,

References.