Recovering Commit Dependencies for Selective Code Integration in Software Product Lines

Preview:

Citation preview

1

Recovering Commit Dependencies for Selective Code Integration in

Software Product Lines

Tejinder Dhaliwal, Foutse Khomh, Ying Zou, Ahmed E. Hassan

2

Software Product Lines

Multiple Products

Production

Shared Components

3

Main Line Branching Model for Software Product Lines

Main Branch

Product-1 Branch

Product-2 Branch

Product-n Branch

Developers add new features

Integrators integrate selected features

4

Feature to Code Change Mapping

Developer adds Code Changes

Integrator integrates Features

Features Code Changes

FA CA1

FB CB1

Mapping facilitates selective

integration

5

Cost of Integration Failure • If change CA1 implements FA and

change CB1 implements FB

Feature Code Changes

FA CA1

FB CB1

Missing Dependencies

CA2 CB1

• If a change CA2 is added to modify FA and CA2 is dependent on CB1

CA1 CA2 CB1

CB1

CA1

Integrate FA

Integrate FB

530% more time

Product-1 Branch

Product-2 Branch

6

Our Solution

CA1 CB1CA2

Group dependent commits and propose to integrate a group as a whole

8

Commit Assignment Algorithm

Automated Grouping ( during

Integration)

Developer Guided Grouping ( during

Development)

Calibrate the Metrics on

Prior Versions

Our Approach to Group Dependent Commits

Define Dissimilarity

Metrics

9

Dissimilarity Metrics

Metric Description

File Dependency Distance (FD) Captures source code dependencies between files involved in two commits

File Association Distance (FA) Captures logical dependencies between files involved in two commits

Developer Dissimilarity Distance (DD) Captures the working relation between two developers submitting commits

CR Dependency Distance (CRD) Captures the dissimilarity between the CRs implemented by two commits

Given two commits characterized by files, developers and change requests (CRs)

10

Commit Assignment Algorithm

Automated Grouping ( during

Integration)

Developer Guided Grouping ( during

Development)

Calibrate the Metrics on

Prior Versions

Our Approach to Group Dependent Commits

Define Dissimilarity

Metrics

11

Calibrate Metrics on Prior Versions

For each of the four metrics - • Min_Threshold = Avg(a)• Max_Threshold = Avg(bmin)

• Silhouette= Avg{(bmin-a)/max(bmin,a)}

A higher silhouette value is better

a

b1

b2b3

12

Commit Assignment Algorithm

Automated Grouping ( during

Integration)

Developer Guided Grouping ( during

Development)

Calibrate the Metrics on

Prior Versions

Our Approach to Group Dependent Commits

Define Dissimilarity

Metrics

13

Commit Assignment Algorithm

Color > Shape

• Apply the similarity metrics in order of their precedence

• If no suitable group is found for a commit, assign the commit to a new group

14

Commit Assignment Algorithm

Automated Grouping ( during

Integration)

Developer Guided Grouping ( during

Development)

Calibrate the Metrics on

Prior Versions

Our Approach to Group Dependent Commits

Define Dissimilarity

Metrics

15

Commit Grouping ApproachesDeveloper-guided

Grouping

Automated Grouping

Groups commits incrementally and uses developers’ feedback to improve the grouping during development

Both approaches follow the k-means clustering method which consists in assigning each item to the cluster with the nearest mean.

16

Evaluation

We analyzed three major versions of a family of mobile applications

17

Evaluation Criteria• Validate the dissimilarity metrics

Can the proposed metrics be used to identify commit dependencies ?

• Validate the grouping approachesHow efficient are our proposed grouping approaches?

• Value for DevelopersCan the proposed approaches identify commit dependencies missed by developers ?

18

ResultsThe four similarity metrics display good abilities in grouping

commits ( i.e. high silhouette values)

Verion 1 Version 2 Version 30

0.2

0.4

0.6

0.8

1 0.94 0.96 0.96

0.760000000000001

0.790.6700000000000

010.63

0.670000000000001

0.57

0.46 0.47 0.49 CRDFADDFD

Sil

ho

uet

te V

alu

e

CRD > FA > DD > FD

19

Results

• Efficiency of the Grouping Approaches– 82% of commit dependencies were recovered by the

automated grouping with a precision of 95% – The accuracy of the developer-guided grouping

approach is 98%–We observed that precision/recall improves with

longer history data• Value for Developers– Automated grouping and Developer-guided grouping

approaches were able to reduce integration failures by 76% and 94% respectively

20

Summary

Recommended