27
How temporal network analysis can help us to explore existing interrelationships in online production systems Dr. Claudia Müller-Birn Institute for Computer Science, Group Networked Information Systems January 20, 2011 Invited Talk, GESIS, Bonn

How temporal network analysis can help us to explore existing interrelationships in online production systems

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: How temporal network analysis can help us to explore existing interrelationships in online production systems

How temporal network analysis can help us to explore existing interrelationships in online production systems

Dr. Claudia Müller-Birn

Institute for Computer Science, Group Networked Information Systems

January 20, 2011

Invited Talk, GESIS, Bonn

Page 2: How temporal network analysis can help us to explore existing interrelationships in online production systems

January 20, 2011 How temporal network analysis can help us to explore existing interrelationships in online production systems.Claudia Müller-Birn

When you think of the Social Web...

2

Page 3: How temporal network analysis can help us to explore existing interrelationships in online production systems

January 20, 2011 How temporal network analysis can help us to explore existing interrelationships in online production systems.Claudia Müller-Birn

Social participation creates digital products

3

Can Distributed Volunteers Accomplish Massive Data Analysis Tasks?

(Kanefsky et al., 2001)

STEM (Spatio-Temporal Exploratory Model) map of Dickcissel (http://ebird.org)

Number of articles on English-language Wikipedia from its creation in 2001

through June 2010 (Riedl, 2011)

Graph of source lines of code added [millions] (Deshpande & Riehle, 2008)dataset based on www.ohloh.net

Page 4: How temporal network analysis can help us to explore existing interrelationships in online production systems

January 20, 2011 How temporal network analysis can help us to explore existing interrelationships in online production systems.Claudia Müller-Birn

Social participation creates digital products

3

Can Distributed Volunteers Accomplish Massive Data Analysis Tasks?

(Kanefsky et al., 2001)

STEM (Spatio-Temporal Exploratory Model) map of Dickcissel (http://ebird.org)

Number of articles on English-language Wikipedia from its creation in 2001

through June 2010 (Riedl, 2011)

Graph of source lines of code added [millions] (Deshpande & Riehle, 2008)dataset based on www.ohloh.net

• Geographically distributed communities• Very large number of granular, individual contributions• Openness of boundaries, technical standards,

communication and information sources• Peering as a new form of horizontal organization• Sharing of intellectual property

(Benkler, 2006), (OMahony, 2007), (Tapscott2007)

Page 5: How temporal network analysis can help us to explore existing interrelationships in online production systems

January 20, 2011 How temporal network analysis can help us to explore existing interrelationships in online production systems.Claudia Müller-Birn

Outline

• Dimensions in online production systems and existing research issues

• Success in online production systems

• Mirroring hypothesis in online production systems (research in progress)

• Recent and future research challenges

4

Page 6: How temporal network analysis can help us to explore existing interrelationships in online production systems

January 20, 2011 How temporal network analysis can help us to explore existing interrelationships in online production systems.Claudia Müller-Birn

Dimensions in online production systems

5

pooledproduct

structuredproduct

integralproduct

Page 7: How temporal network analysis can help us to explore existing interrelationships in online production systems

January 20, 2011 How temporal network analysis can help us to explore existing interrelationships in online production systems.Claudia Müller-Birn

Selected research issues in online production systems

6

MODELING• How do we model the dimensions of online production systems?• Which network descriptions are especially useful?• What are appropriate data sources?

EVOLUTION• How do the social and the technical dimension co-evolve? • What techniques can be used for measuring and describing evolution?

QUALITY/SUCCESS• How do we measure quality or success?• How do online production systems strive for quality?

INFLUENCE• How do we measure the influence of the technical dimension on the social dimension and vice versa? • Are specific structures of networks more influential than others?

Page 8: How temporal network analysis can help us to explore existing interrelationships in online production systems

How temporal network analysis can help us to explore existing interrelationships in online production systems.Claudia Müller-Birn

January 20, 2011

Success in online production systems: A longitudinal analysis of the socio-technical duality of development projects*

Müller-Birn, C., Cataldo, M., Wagstrom, P., Herbsleb, J.D.: Success in Online Production Systems: A Longitudinal Analysis of the Socio-Technical Duality of Development Projects. Technical Report CMU-ISR-10-129, 2010.

7

Page 9: How temporal network analysis can help us to explore existing interrelationships in online production systems

January 20, 2011 How temporal network analysis can help us to explore existing interrelationships in online production systems.Claudia Müller-Birn

8

What might be success factors for OPSs?

Success of virtual community sites (Preece, 2000):Usability: human-technology interactions (e.g., information design, navigation, and access)Sociability: human-human interactions by developing policies and practices that are socially acceptable and practicable

Success drivers are number of participants who communicate, the number of exchanged messages, interactivity, and reciprocity (Preece, 2001)

In product development, conceptualizations such as market performance of the product, project cycle time, efficiency of the development process and product quality are used (Clark & Fujimoto, 1990), (Eisenhardt & Tabrizi, 1995), (Sethi, 2000)

In Wikipedia the success of an article can be seen as its quality (Kittur & Kraut, 2008) (there are certain requirements in order to get assigned into a six-level quality system, ranging from “stub” (almost no content) to “featured-article” (best quality))

In open source projects typically quantifications of volume related to number of contributors or participants or number of access to the particular project’s product or outcome (Crowston et al., 2006), (Iriberri & Leroy, 2009) is used

Page 10: How temporal network analysis can help us to explore existing interrelationships in online production systems

January 20, 2011 How temporal network analysis can help us to explore existing interrelationships in online production systems.Claudia Müller-Birn

Open source software (OSS) project GNOME

• Graphical user interface and a development framework for desktop applications

• GNOME is a large collection of libraries and applications rather than a monolith application (German, 2003)

• Data covered a period of about 8 years of activity from November 1997 until July 2005 Description Value

Mail repository

Number of emails 467,639Number of senders 34,662Date of first email 01-01-1997Date of last email 02-10-2007

Code repository

Number of committer 1,312Number of commits 479,678Number of files 286,314Number of commits (files) 2,456,302Date of first commit 12-22-1996Date of last commit 08-01-2005

Bug repository

Number of users 2,706Number of bugs 201,068Date of first bug 01-01-1999Date of last bug 11-18-2005

9

Page 11: How temporal network analysis can help us to explore existing interrelationships in online production systems

January 20, 2011 How temporal network analysis can help us to explore existing interrelationships in online production systems.Claudia Müller-Birn

10

Page 12: How temporal network analysis can help us to explore existing interrelationships in online production systems

January 20, 2011 How temporal network analysis can help us to explore existing interrelationships in online production systems.Claudia Müller-Birn

Used data set

11

• Community hosted over 700 different projects• Projects differ significantly in their development activity, size, and

participation rate• Projects were included if they satisfy all of the following criteria

- Continuity of development activity (at least one year)

- Amount of development activity (at least 100 commits)- Attractiveness of project for developers (at least 10 committers),

- User interest to participate (at least one community hosted mailing list)- Data collected from different repositories should overlap during the

analyzed period

• Further used data set consists of 27 projects

Page 13: How temporal network analysis can help us to explore existing interrelationships in online production systems

January 20, 2011 How temporal network analysis can help us to explore existing interrelationships in online production systems.Claudia Müller-Birn

Social Dimension

12

• Coordination needs network- Computation of coordination needs networks for each project by computing

(Task Assignment ∗ Task Dependency) ∗ Transpose(Task Assignment) (Cataldo et al., 2008)- Task assignment: which individuals are working on which tasks

- Task dependency: relationships or dependencies among tasks

• Communication network- Construction of a collection of communication networks for each project

from the project’s mailing list- Construction of communication networks of the whole OPS by aggregating

the project-level communication networks into one

Page 14: How temporal network analysis can help us to explore existing interrelationships in online production systems

January 20, 2011 How temporal network analysis can help us to explore existing interrelationships in online production systems.Claudia Müller-Birn

Technical Dimension

• Syntactic Dependency Network- Examination of source code and extracting data-related dependency (e.g.,

a particular data structure modified by a function and used in another function) and functional dependency (e.g., method A calls method B) relationships between source code files during the period of time between two releases of the GNOME distribution

• Logical Dependency Network- Construction of the logical dependencies network by extracting the set of

source code files that were modified as part of development tasks performed during the period of time between two releases of the GNOME distribution

13

Page 15: How temporal network analysis can help us to explore existing interrelationships in online production systems

• Successful projects benefit from interaction patterns that are able to disseminate information to most of the project participants while minimizing redundant interconnections

• Successful projects exhibit a continuously active core group that is able to integrate all member of the project or the developed software

• Project success depends on its members occupying different structural positions within the network as a mechanism to balance the benefits and limitations of belonging solely to the core or the periphery

• When tasks dependencies are partitioned among separate clusters of highly interdependent sets of individuals, projects are more likely to succeed

• Modular technical structures (those with independent clusters of highly interdependent parts) are an important success driver for online production systems

January 20, 2011 How temporal network analysis can help us to explore existing interrelationships in online production systems.Claudia Müller-Birn

Results

14

Page 16: How temporal network analysis can help us to explore existing interrelationships in online production systems

How temporal network analysis can help us to explore existing interrelationships in online production systems.Claudia Müller-Birn

January 20, 2011

Mirroring hypothesis in online production systems using temporal network analysis (research in progress)

15

Page 17: How temporal network analysis can help us to explore existing interrelationships in online production systems

January 20, 2011 How temporal network analysis can help us to explore existing interrelationships in online production systems.Claudia Müller-Birn

Co-evolution of social and technical architectures

• Social architecture should reflect the technical architecture of a system and vice versa in order to improve the degree of innovation or to reduce the coordination needs (Conway, 1968), (Baldwin & Clark, 2000), (Cataldo et al., 2008)

• Open collaborative communities are geographically distributed; therefore, their technical architecture should be modular (e.g., (Moon & Sproull, 2000))

• In the context of OSS, a modular technical architecture increases incentives to join and decreases free riding (Baldwin & Clark, 2006), (West & Mahony, 2008)

• BUT recent empirical work has shown that this hypothesis can only be partly supported in open collaborative settings (Colfer & Baldwin 2010)

16

Page 18: How temporal network analysis can help us to explore existing interrelationships in online production systems

January 20, 2011 How temporal network analysis can help us to explore existing interrelationships in online production systems.Claudia Müller-Birn

Requirements for model description

17

• Networks are used to describe communities therefore the relation between the people (density of links) should be used as measure

• Evolution of networks over time; therefore, a temporal model is required

• Large membership base in open collaborative communities therefore the algorithm should be able to deal with large networks

• Complete knowledge about the networks is often not available therefore the algorithm should detect local communities

• People are often actively involved in different communities; therefore, the algorithm should allow overlapping communities

Page 19: How temporal network analysis can help us to explore existing interrelationships in online production systems

January 20, 2011 How temporal network analysis can help us to explore existing interrelationships in online production systems.Claudia Müller-Birn

• Discrete approach to consider time in graphs (Moody, 2005) - Cross-sectional analysis of graphs where the main focus lies on the changes

of network stages (e.g., (Cortes, 2003), (Sun, 2007))

- Approaches to discretize the interactions (a) the cumulative approach and (b) the time window approach

• Continuous approach to consider time in graphs (Moody, 2005)- Each single interaction with a start and end date is considered (e.g.,

(Kumar, 2003r), (Priebe, 2005))

• Describing evolution in networks based on a group-level- Network quality function (Mucha et al. 2010)- Dynamic tensor analysis (Sun et al. 2006)

- Evolutionary spectral clustering (Chi et al. 2007)- Clique percolation method (Palla et al., 2005)

18

Brief overview on existing approaches

Page 20: How temporal network analysis can help us to explore existing interrelationships in online production systems

January 20, 2011 How temporal network analysis can help us to explore existing interrelationships in online production systems.Claudia Müller-Birn

Experimental setup

• Data set: OOS project Epiphany (web browser)• Communication network based on mailing list repository• One time frame considers three months of activity • Steps of CPM

- Locate all complete subgraphs, i.e. cliques, that are not part of a larger subgraph

- Identify communities based on clique-clique overlap matrix

- Specify “optimal” percolation structure

Description Value

# month 44

# senders 688

# mails 8,352

# threads 1,294

# committers 208

# commits 5,898

# files 21,223

# added LOC 957,091

# removed LOC 748,956

mails per person 12.00

persons per thread 6.45

commits per person 28.36

19

Page 21: How temporal network analysis can help us to explore existing interrelationships in online production systems

January 20, 2011 How temporal network analysis can help us to explore existing interrelationships in online production systems.Claudia Müller-Birn

Selected community and network characteristics

20

1

10

100

1,000

10,000

1 2 3 4 5 6 7 8 9 10

!"#$%

&'()'!(*

%+,%*-%+'

+!./+0(1'

edges

nodes

0

0.0005

0.001

0.0015

0.002

0.0025

0.003

1 2 3 4 5 6 7 8 9 10

!"#$%&'(

$#)*$+,&(

0

1

2

3

4

5

6

7

8

9

1 2 3 4 5 6 7 8 9 10

!"#$

%&%'(

)*#+),-&(

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

!" #" $" %" &" '" (" )" *" !+"

perc

enta

ge o

f non

-incl

uded

nod

es

snapshot

k=3 not included

k=4 not included

k=5 not included

0

10

20

30

40

50

60

70

80

90

1 2 3 4 5 6 7 8 9 10

!"#$%&'%()*+$!,%-&.

./0

",1%

!0)2!3&,%

k=3

k=4

k=5

Page 22: How temporal network analysis can help us to explore existing interrelationships in online production systems

January 20, 2011 How temporal network analysis can help us to explore existing interrelationships in online production systems.Claudia Müller-Birn

0

10

20

30

40

50

60

70

80

90

1 2 3 4 5 6 7 8 9 10

size

snapshot

new (leaving)

new

old (leaving)

old

Community development based on social interactions

21

Page 23: How temporal network analysis can help us to explore existing interrelationships in online production systems

How temporal network analysis can help us to explore existing interrelationships in online production systems.Claudia Müller-Birn

January 20, 2011

Recent and future research challenges

22

Page 24: How temporal network analysis can help us to explore existing interrelationships in online production systems

January 20, 2011 How temporal network analysis can help us to explore existing interrelationships in online production systems.Claudia Müller-Birn

• Considering time by describing the two dimensions helps to reveal dependencies between development patterns and the specific life cycle stage of an OPS

• Success of an online production system is related to the social AND technical dimension; thus, describing both dimensions is a requirement to understand and to improve existing production processes

• Other research has shown that organizational and technical structures are related; necessity to explore existing interdependencies in OPSs

23

Conclusions

Page 25: How temporal network analysis can help us to explore existing interrelationships in online production systems

How temporal network analysis can help us to explore existing interrelationships in online production systems.Claudia Müller-Birn

January 20, 2011

Thank you.

AcknowledgementsCo-authors: Marcelo Cataldo, James D. Herbsleb

24

Page 26: How temporal network analysis can help us to explore existing interrelationships in online production systems

January 20, 2011 How temporal network analysis can help us to explore existing interrelationships in online production systems.Claudia Müller-Birn

• C.Y. Baldwin and K.B. Clark: Design Rules: The Power of Modularity Volume 1. MIT Press, Cambridge, MA, USA, 1999.

• C.Y. Baldwin and K.B. Clark. The Architecture of Participation: Does Code Architecture Mitigate Free Riding in the Open Source Development Model? Management Science. 52:7. 2006.

• Benkler, Y., & Nissenbaum, H. Commons based Peer Production and Virtue*. Journal of Political Philosophy, 14(4): 394-419. 2006.

• M. Cataldo, J.D. Herbsleb, K.M. Carley. Socio-technical congruence: a framework for assessing the impact of technical and work dependencies on software development productivity, Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement: 2-11. Kaiserslautern, Germany: ACM. 2008.

• Y. Chi, S. Zhu, X. Song, J. Tatemura and B.L. Tseng. Structural and temporal analysis of the blogosphere through community factorization. KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM. San Jose, California, USA, 163-172. 2007.

• K. Clark and T. Fujimoto. Product Development Performance. Harvard Business School Press, 1991.

• M. E. Conway. How do Committees Invent? Datamation. 14:4. 28-31. 1968.

• C. Cortes, D. Pregibon and C. Volinsky: Computational Methods for Dynamic Graphs. Journal of Computational and Graphical Statistics. 12:4. 950-970. 2003.

• K. Crowston, J. Howison, H. Annabi, H. Information systems success in free and open source software development: theory and measures. Software Process: Improvement and Practice, 11(2): 123-148. 2006.

• A. Deshpande and D. Riehle: The Total Growth of Open Source. Proceedings of the Fourth Conference on Open Source Systems (OSS 2008). Springer Verlag. 197-209. 2008.

• K. Eisenhardt and B. Tabrizi. Accelerating adaptive processes: Product innovation in the global industry. Administrative Science Quarterly, 40(1):84–110, 1995.

• A. Iriberri and G. Leroy. A life-cycle perspective on online community success. ACM Comput. Surv., 41(2):1–29, 2009.

• A. Kittur and R. E. Kraut. Harnessing the wisdom of crowds in wikipedia: quality through coordination. In Proc. of CSCW, pages 37–46, 2008.

• B. Kanefsky, N.G. Barlow, V.C. Gulick. Can Distributed Volunteers Accomplish Massive Data Analysis Tasks?. 32nd Annual Lunar and Planetary Science Conference. 2001.

• R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. On the bursty evolution of blogspace. WWW '03: Proceedings of the 12th international conference on World Wide Web. ACM, New York, NY, USA. 568--576. 2003.

25

References

Page 27: How temporal network analysis can help us to explore existing interrelationships in online production systems

January 20, 2011 How temporal network analysis can help us to explore existing interrelationships in online production systems.Claudia Müller-Birn

References (cont.)• J. Moody, D. McFarland and S. Bender-deMoll. Dynamic Network Visualization. American Journal of Sociology. 110:4.

1206-1241. 2005.

• J.Y. Moon and L. Sproull. Essence of Distributed Work: The Case of the Linux Kernel. First Monday. 5:11. 2000.

• L. Sproull and S. Kiesler. Connections - new ways of working in the networked organization. MIT Press. Cambridge, Mass. 1995.

• P.J. Mucha, T. Richardson, K. Macon, M.A. Porter, J-P. Onnela: Community Structure in Time-Dependent, Multiscale, and Multiplex Networks. Science. 328: 5980. 876-878. 2010.

• C. Müller-Birn, M. Cataldo, P. Wagstrom, J.D. Herbsleb: Success in Online Production Systems: A Longitudinal Analysis of the Socio-Technical Duality of Development Projects. Technical Report CMU-ISR-10-129, 2010.

• O'Mahoney, S., & Ferraro, F. The emergence of governance in an open source community. Academy of Management Journal, 50(5): 1079-1106. 2007.

• G. Palla, I. Dereny, I. Farkas, I, T. Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature. 435: 7043. 814-818. 2005.

• J. Preece. Online Communities: Designing Usability, Supporting Sociability. John Wiley & Son, 2000.

• J. Preece. Sociability and usability in online communities: determining and measuring success. Behav. & Inform. Techn., 20(5):347–356, 2001.

• C.E. Priebe, J.M. Conroy, D.J. Marchette, and Y. Park. Scan Statistics on Enron Graphs. Computational and Mathematical Organization Theory Journal. 11:3. 229-247. 2005

• J. Riedl. The Promise and Peril of Social Computing. Computer. 44:1. 93-95. 2011.

• R. Sethi. New product quality and product development teams. Journal of Marketing, 64:1–14, 2000.

• J. Sun, D. Tao and C. Faloutsos. Beyond streams and graphs: dynamic tensor analysis. KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM. New York, NY, USA. 374-383. 2006.

• J. Sun. Incremental pattern discovery on streams, graphs and tensors (phdthesis). CMU. Pittsburgh, PA, USA. 2007.

• D. Tapscott, A. Williams. Wikinomics: How mass collaboration changes everything: Portfolio Trade. 2008.

• J. West and S. O'Mahony. The Role of Participation Architecture in Growing Sponsored Open Source Communities. Industry and Innovation. 15:2. 145-168. 2008.

26