36
Exploring Core-Periphery Structures ©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 1 Exploring Core-Periphery Structures in Complex Software Products Carliss Baldwin (HBS) Alan MacCormack (MIT), John Rusnak (HBS) Drexel University Philadelphia, May 2009

Exploring Core-Periphery Structures ©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 1 Exploring Core-Periphery Structures in Complex Software Products

Embed Size (px)

Citation preview

Exploring Core-Periphery Structures

©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 1

Exploring Core-Periphery Structures in Complex Software Products

Carliss Baldwin (HBS)

Alan MacCormack (MIT), John Rusnak (HBS)

Drexel University

Philadelphia, May 2009

Exploring Core-Periphery Structures

©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 2

Architecture and Intellectual Property

Design for capturing/defending value, not for collaboration!

Exploring Core-Periphery Structures

©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 3Slide 3 © Carliss Y. Baldwin 2008

Platform Component of a Java ServerLaMantia et. al. (WICSA 2008)

Used licensed- in

codeThe license was about to

expire

Creating a classic holdup

problem

Exploring Core-Periphery Structures

©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 4Slide 4 © Carliss Y. Baldwin 2008

They created a “thin crossing point” between their code and the licensed code

Before Modulariziation After Modulariziation

Licensed Code

No Depend- encies

Exploring Core-Periphery Structures

©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 5Slide 5 © Carliss Y. Baldwin 2008

Thin crossing points <=>Low transaction costs

The presence/absence or propensity to have thin crossing points

<==> modular structure of the design

Exploring Core-Periphery Structures

©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 6

Conceptual Background

• Much academic work suggests that complex technical systems possess a “Core-Periphery” structure– Core = tightly-coupled components, central to system operation– Periphery = loosely-coupled components, optional/non-critical

• Little empirical work explores the extent to which such structures are observed in practice; or those factors which influence the size, nature and evolution of these structures

• Our Aim: Develop a system to reveal the Core-Periphery structure of real software systems; analyze large sample

Exploring Core-Periphery Structures

©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 7

The Intuition: Core-Periphery Structures

Exploring Core-Periphery Structures

©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 8

Distribution of Coupling Metrics

Measures of Visibility have a Bi-Modal (or Multi-Modal) Distribution

Number of Direct Dependencies has an Exponential Distribution

Exploring Core-Periphery Structures

©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 9

Defining the Core: The “Spectrum Plot”

Exploring Core-Periphery Structures

©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 10

A DSM in Core-Periphery (CP) View

Exploring Core-Periphery Structures

©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 11

Key Questions

• Do all systems have a Core-Periphery structure; can we predict those that do?

• How large is the Core; what factors predict whether the Core is large/small?

• Are Core Components located in close proximity or distributed about the system?

• What happens to the size of the Core over time; does it remain stable or grow?

Exploring Core-Periphery Structures

©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 12

Empirical Approach: Analysis of over 1,000 Software Systems using DSAS

• Darwin

• “MyBooks” (Disguised)

• Abiword

• Apache

• BDB

• Chrome

• Calc (Open Office)

• Ghostscript

• Gnucash

• Gnumeric

• Linux

• Mozilla

• MySQL

• Open AFS

• Open Office (All)

• Open Solaris

• PostGres

• Write (Open Office)

• XNU

Exploring Core-Periphery Structures

©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 13

Key Findings• About 2/3rds of these systems have Core-Periphery structure

– Remainder may have “No Core” or have “Multiple-Cores”

• Cannot always tell if a system is Core-Periphery from DSM– Direct dependencies is insufficient; Pattern of dependencies is key

• “Core” sizes vary significantly; from zero to thousands– Large variations, even for systems that “do the same thing”– Aligned with Open and Closed organizational choices (see Conway)

• Core Components are NOT collocated, tend to be distributed– Designers may be unaware which components are Core

Exploring Core-Periphery Structures

©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 14

2/3rds of Systems are Core-Periphery

Put two clear examples of Core-Periphery systems here

Notes: Biased sample; systems do change over time (e.g., Linux)

Exploring Core-Periphery Structures

©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 15

One small core, one larger (Linux 2.1.88 vs Mozilla; both 1500 files)2/3rds of Systems are Core-Periphery

Exploring Core-Periphery Structures

©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 16

Some Systems are “Multi-Core”…

Open Office Spectrum Plots — Note matching V-

fan-ins and V-fan-outs

Exploring Core-Periphery Structures

©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 17

Open Office v1.0

Database

Write word processor

Calc spreadsheet

Graphics system

Presentations, charts, drawing

Exploring Core-Periphery Structures

©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 18

…with Modules that are Core-Periphery

Open OfficeCalc Subsystem

Architect’sView

Exploring Core-Periphery Structures

©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 19

…with Modules that are Core-Periphery

Open OfficeCalc SubsystemCore-Periphery

View

Exploring Core-Periphery Structures

©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 20

Open Office v1.0—Core-Periphery View

Core-Periphery Analysis iterated on the Modules Calc and Write

Exploring Core-Periphery Structures

©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 21

Some Systems have no Core: Gnucash

Exploring Core-Periphery Structures

©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 22

This is NOT apparent from the DSM

Implications for

Code Architects!

Exploring Core-Periphery Structures

©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 23

Core Components are Distributed: Can be Difficult to Identify

My Books—Core files distributed throughout the system

Exploring Core-Periphery Structures

©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 24

Systems of Similar Size can havevery different Core sizes

Release Size V-FanIn-Only V-FanOut-Only V-Both V-Otherlinux_1_1_23 301 7 2% 196 65% 39 13% 59 20%db_4_1_24 305 19 6% 124 41% 107 35% 55 18%gnumeric_1_1_18 355 35 10% 102 29% 159 45% 59 17%postgresql_6_2_1 370 29 8% 85 23% 209 56% 47 13%linux_1_1_92 400 11 3% 250 63% 38 10% 101 25%

Exploring Core-Periphery Structures

©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 25

Systems of Similar Size and Function often have very different Core sizes

Linux Open SolarisThe “Core”

Exploring Core-Periphery Structures

©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 26

Spectrum ComparisonsLinux Open Solaris

Note: Very Different Sizes!

Exploring Core-Periphery Structures

©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 27

Systems of Similar Size and Function often have very different Core sizes

Gnucash (no Core) My Books (70% Core)

Exploring Core-Periphery Structures

©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 28

Core Sizes Evolve Differently:Sometimes they are Stable in Size

Exploring Core-Periphery Structures

©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 29

Core Sizes Evolve Differently:Sometimes they Grow at a Linear rate

Linux Core Size

0

100

200

300

400

500

600

0 2000 4000 6000 8000 10000

System size in source files

Nu

mb

er o

f V

_Bo

th

file

s

0%

5%

10%

15%

20%

25%

30%

Sys

tem

per

cetn

of

V_B

oth

fil

es

Number of V_Both files System percent of V_Both files

Exploring Core-Periphery Structures

©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 30

The Challenge of an Increasing Core…

Release Size V-FanIn-Only V-FanOut-Only V-Both V-Otheron_src_b36 12105 458 4% 6629 55% 2892 24% 2126 18%on_src_b37 12149 455 4% 6650 55% 2917 24% 2127 18%on_src_b38 12330 458 4% 6813 55% 2921 24% 2138 17%on_src_b39 12336 460 4% 6815 55% 2924 24% 2137 17%on_src_b40 12343 462 4% 6821 55% 2925 24% 2135 17%on_src_b41 12404 485 4% 6803 55% 3001 24% 2115 17%on_src_b42 12407 485 4% 6811 55% 3002 24% 2109 17%on_src_b45 12568 495 4% 6877 55% 3081 25% 2115 17%on_src_b46 12567 495 4% 6876 55% 3081 25% 2115 17%on_src_b47 12550 493 4% 6865 55% 3075 25% 2117 17%on_src_b48 12573 492 4% 6878 55% 3079 24% 2124 17%on_src_b49 12644 491 4% 6932 55% 3099 25% 2122 17%on_src_b50 12722 493 4% 6945 55% 3162 25% 2122 17%on_src_b51 12732 491 4% 6993 55% 3121 25% 2127 17%on_src_b52 12732 491 4% 6990 55% 3124 25% 2127 17%on_src_b53 12771 492 4% 7003 55% 3147 25% 2129 17%on_src_b54 12794 493 4% 7015 55% 3160 25% 2126 17%on_src_b55 12794 493 4% 7015 55% 3160 25% 2126 17%on_src_b56 12832 495 4% 7044 55% 3167 25% 2126 17%on_src_b57 12838 499 4% 7049 55% 3168 25% 2122 17%on_src_b58 12859 499 4% 7065 55% 3170 25% 2125 17%on_src_b59 12860 499 4% 7067 55% 3169 25% 2125 17%on_src_b60 12861 499 4% 7071 55% 3168 25% 2123 17%on_src_b61 12897 500 4% 7104 55% 3171 25% 2122 16%on_src_b62 12927 500 4% 7119 55% 3181 25% 2127 16%on_src_b63 12935 498 4% 7127 55% 3183 25% 2127 16%on_src_b65 12949 497 4% 7142 55% 3184 25% 2126 16%

The Core of Solaris

Exploring Core-Periphery Structures

©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 31

Core Sizes can Exhibit Discontinuities:E.g., The Evolution of Linux

Release Size V-FanIn-Only V-FanOut-Only V-Both V-Otherlinux_1_0 282 10 4% 186 66% 31 11% 55 20%linux_1_1_0 282 10 4% 185 66% 31 11% 56 20%linux_1_2_0 400 11 3% 249 62% 41 10% 99 25%linux_1_3_0 431 12 3% 256 59% 42 10% 121 28%linux_2_0 779 16 2% 520 67% 73 9% 170 22%linux_2_1_0 785 19 2% 526 67% 70 9% 170 22%linux_2_2_0 1891 25 1% 1368 72% 79 4% 419 22%linux_2_3_0 1946 32 2% 1292 66% 75 4% 547 28%linux_2_4_0 3243 46 1% 2513 77% 211 7% 473 15%linux_2_5_0 4047 98 2% 2807 69% 259 6% 883 22%linux_2_6_0 6194 156 3% 4395 71% 405 7% 1238 20%

Q: Did IBM’s Major Code Contributions start in Linux 2.4.0?

Exploring Core-Periphery Structures

©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 32

Core Sizes can be Influenced—Mozilla before and after Redesign

Exploring Core-Periphery Structures

©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 33

Core sizes as a percent of system size

1275 total systems

248 have VBoth=0

Exploring Core-Periphery Structures

©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 34

Larger Open Source SystemsHave Smaller Relative Core Sizes

Open Solaris

Exploring Core-Periphery Structures

©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 35

Conclusions

• Developed a method to extract Core-Periphery structures from Software and analyzed over 1,000 Software Systems– 2/3rds of these systems have a single Core– Some have no Core and others have Multi-Cores

• Difficult to tell from DSM if a system has zero, one or more Cores; difficult to tell which components are in the Core

• Core sizes:– Cross-section: Vary significantly - organization structure matters– Longitudinal: Evolve differently – can be influenced by redesigns

Exploring Core-Periphery Structures

©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 36

Thank you all!