D2-1 State of the art and initial requirements v1-00 · 2016. 2. 18. · D2.1 State of the art and initial requirements © RISCOSS Consortium 2 Version history: Version Date Description

© RISCOSS Consortium

Project Acronym: RISCOSS

Project Title: Managing Risk and Costs in Open Source Software Adoption

Project Number: 318249

Instrument: Collaborative Project - STREP

Thematic Priority: FP7-ICT-2011-8 1.2 Cloud Computing, Internet of Services and Advanced Software Engineering

D2.1 State of the art and initial requirements

WP2: OSS Risk Management

Task 2.1: Preliminary analysis of techniques

Due Date: 30/04/2013

Submission Date: 30/04/2013

Start Date of Project: 01/11/2012

Duration of Project: 36 months

Deliverable Responsible: FBK

Version: 1.0

Status: final

Author(s): Nili Bergida KPA

Yehuda Blumenfeld KPA

Ron Kenett KPA

Mirko Morandini FBK

Alberto Siena FBK

Angelo Susi FBK

Reviewer(s): Maria Carmela Annosi TEI

Claudia Ayala UPC

© RISCOSS Consortium

Dissemination level:

PU Public PU

PP Restricted to other programme participants (including the Commission)

RE Restricted to a group specified by the consortium (including the Commission)

CO Confidential, only for members of the consortium (including the Commission)


© RISCOSS Consortium 1



Version history:

Version Date Description Author, Reviewer

0.1 20/03/2013 Document structure agreed A. Susi, M. Mo-randini (FBK), R. Kenett (KPA)

0.11 28/03/2013 Added risk management via statistical techniques

Y. Blumenfeld, N. Bergida (KPA)

0.12 10/04/2013 Added Systematic Literature Review on risk management in OSS risk adoption practic-es

A. Susi, M. Mo-randini

0.2 18/04/2013 Added risk assessment and mitigation via search based and formal techniques

A. Susi, M. Mo-randini, A. Siena (FBK)

0.3 21/04/2013 Improved SLR on risk management, related work on risk modelling and risk analysis


0.4 23/04/2013 Abstract, Introduction and Discussion A. Susi, M. Mo-randini, A. Siena (FBK)

0.41 23/04/2013 Improved analysis example for Appendix A Y. Blumenfeld (KPA)

0.5 24/04/2013 Document assembly and formatting M. Morandini (FBK)

0.6 25/04/2013 Completion of all parts including Introduc-tion, discussion and conclusions


0.7 26/04/2013 Review of the whole document, corrections, Introduction 3.1


0.8 27/04/2013 Review of the whole document, corrections A. Susi, M. Mo-randini, A. Siena (FBK)

0.81 28/04/2013 Small fixes A. Susi, M. Mo-randini, A. Siena (FBK)

0.9 29/04/2013 Merges of the reviews by Claudia Ayala (UPC) and Maria Carmela Annosi (TEI)


1.0 30/04/2013 Version submitted A. Susi (FBK)



Table of Contents

List of Figures .......................................................................................................................... 5

List of Tables ........................................................................................................................... 6

Abstract .................................................................................................................................... 7

1 Introduction ....................................................................................................................... 8

1.1 Motivation .................................................................................................................. 8

1.2 Glossary of terms ...................................................................................................... 8

1.3 Intended audience ..................................................................................................... 9

1.4 Relation to other deliverables .................................................................................... 9

1.5 Scope ........................................................................................................................ 9

1.6 Document structure ................................................................................................... 9

2 State of the art in OSS adoption risk management ........................................................ 10

2.1 Risks and Risk management ................................................................................... 10

2.1.1 Operational Risk Management ......................................................................... 10

2.1.2 Definitions of Operational Risk Management ................................................... 10

2.1.3 Operational Risk Management Techniques ..................................................... 11

2.1.4 Risk Mitigation .................................................................................................. 12

2.2 Systematic literature review: Risks in OSS adoption .............................................. 12

2.2.1 Systematic literature review: protocol .............................................................. 13

2.2.2 Purpose of the SLR .......................................................................................... 13

2.2.3 Literature search: publication channels ............................................................ 14

2.2.4 Search terms and libraries ............................................................................... 14

2.2.5 Selection/exclusion criteria ............................................................................... 15

2.2.6 Data Analysis and Extraction ........................................................................... 16

2.3 Systematic Literature Review: Execution and Paper Analysis ................................ 16

2.3.1 Paper Analysis ................................................................................................. 17

2.3.2 Differences between COTS and OSS adoption identified in literature ............. 18

2.3.3 “Risk” in OSS adoption – a taxonomy from literature ....................................... 19

2.3.4 Measures used in literature .............................................................................. 21

2.3.5 Risk mitigation .................................................................................................. 22

2.3.6 Risk, events, metrics, mitigation: a conceptual map ........................................ 23

2.3.7 Data retrieval and empirical evaluation for OSS risks and mitigation activities 24

2.4 References: SLR Selected Papers .......................................................................... 24

2.5 Goal-Oriented risk identification and modelling techniques .................................... 28



2.5.1 Goal-Risk framework (i*-based risk-analysis technique) .................................. 28

2.5.2 Tropos variability and failure modelling ............................................................ 28

2.5.3 Nòmos .............................................................................................................. 29

2.5.4 KAOS Obstacle analysis .................................................................................. 30

2.5.5 EKD Methodology ............................................................................................ 31

2.6 Risk analysis and evaluation techniques ................................................................. 35

2.6.1 CORAS ............................................................................................................ 35

2.6.2 Risk analysis and evaluation in past EU projects ............................................. 37

2.7 Discussion about limits of the current approaches in OSS adoption risk management and about research opportunities ................................................................. 37

3 Initial requirements for OSS adoption risk representation and analysis ......................... 39

3.1 Representation of ecosystems and risks for analysis purposes .............................. 39

3.1.1 Representation in Goal-oriented methodologies for analysis purposes ........... 39

3.1.2 Business processes ......................................................................................... 40

3.1.3 Ontologies for representation of and reasoning on risks .................................. 41

3.2 Data processing and analysis: statistical approaches ............................................. 41

3.2.1 Operational risk measurement techniques ....................................................... 42

3.2.2 Statistical approaches for risk evaluation ......................................................... 45

3.3 Data processing and analysis techniques: formal and search-based techniques ... 48

3.3.1 Formal approaches .......................................................................................... 48

3.3.2 Search-based and Machine Learning based optimization techniques ............. 50

Final Discussion and Conclusion ........................................................................................... 55

References ............................................................................................................................ 56

Annex A ................................................................................................................................. 60

RISCOSS Analytics: An Example ................................................................................... 60



List of Figures

Figure 1: Map of the concepts related to risk and risk analysis in OSS adoption .................. 23 Figure 2: An example actor-role diagram with “authorization” dependencies ........................ 32 Figure 3: Role-Activity diagrams and activity interaction between Role 1 and Role 2 ........... 32 Figure 4: The meta-model of the CORAS framework ............................................................ 36 Figure 5: schema for a Monte Carlo simulation execution ..................................................... 42 Figure 6: Bugs and releases .................................................................................................. 60 Figure 7: Versions affected by unresolved bugs .................................................................... 61 Figure 8: Bugs by resolution over time .................................................................................. 62 Figure 9: Frequencies of Keywords Across Chat Sessions ................................................... 63 Figure 10: Chat Keyword Frequency ..................................................................................... 64



List of Tables

Table 1: Quality criteria for pattern usage .............................................................................. 33 Table 2: Analytic features, analysis goals and corresponding R packages ........................... 45 Table 3: Risk identification and RISCOSS use cases data sources ...................................... 46 Table 4: Risks in adoption and deployment of open source software ................................... 46 Table 5: Key Risk Indicators in adoption and deployment of open source software ............. 47 Table 6: Association results for the term “Bug”: ..................................................................... 64 Table 7: Association results for the term “Issue” ................................................................... 65



Abstract

The deliverable presents a description of the state of the art in risk management techniques – including its identification, analysis, assessment and mitigation – in general and in the field of Open Source Software (OSS) in particular as a result of a Systematic Literature Review activity. It reports on the concepts characterizing the activities of risk identification, analysis, assessment and mitigation in OSS adoption phase, and on the current techniques exploited in this phase. We also discuss limitations of current approaches.

Moving from this state of the art, we also state initial requirements for new analysis tech-niques to be applied and report a preliminary screening of these techniques that are mainly statistical, such as Bayesian Networks, search based, such as Multi-objective genetic algo-rithms, and formal, such as logics and Sat/SMT. A final discussion points out some domain requirements and limitations on the use of the techniques.

This deliverable will be updated after the delivery month when more refined requirements for the techniques will be identified.



1 Introduction

This deliverable is a first step towards the definition of a new process of risk management for the adoption of Open Source Software (OSS) components in software products.

The deliverable is divided into two main parts: in the first part we present a literature review of the current work related to the various aspects of risk management – including its identifi-cation, analysis, assessment and mitigation – in the field of OSS risk adoption; in the second we propose requirements for new techniques to be used in the RISCOSS approach.

In the first part of the review we present a description of the state of the art in risk manage-ment in general and in the field of OSS, as result of a Systematic Literature Review activity. The review allows us to examine and collect the concepts characterizing the activities of risk identification, analysis, assessment and mitigation in the OSS adoption phase. This is a first step towards the definition of a comprehensive ontology of risk in OSS, to be integrated with other conceptualizations related to OSS ecosystems and business models.

We also analyse representation and reasoning techniques currently used for risk analysis. These techniques are mainly model-based and supported by probabilistic and statistical reasoning techniques. The interest and use of these techniques in the software engineering practice is increasingly growing in the last years, but they are often applied without exploiting domain-based knowledge, resulting therefore of limited effectiveness when applied to do-main specific problems like the one we are addressing in our project. We will study them with respect to the domain of OSS adoption.

From the state of the art we move towards the initial requirements for risk management techniques, focussing in particular on risk analysis and prioritisation, to be applied in our project. A preliminary screening of these techniques spans from statistics, such as Bayesian Networks, to search based techniques mainly based on meta-heuristics such as multi-objective genetic algorithms, to formal approaches, including model checking algorithms and SAT solvers. A final discussion points out some domain requirements and limitations in the use of these techniques.

1.1 Motivation The motivation for this deliverable is mainly related to the need of analysing the literature to have a detailed view of the state of the art of risk management in OSS, and to envisage promising modelling and reasoning techniques to apply in a risk identification and manage-ment process in the RISCOSS project.

1.2 Glossary of terms OSS: Open Source Software

OSS Ecosystem: the community, users, and companies involved and having interest in an OSS project.

Risk: product of the probability and the impact (or cost) of a hazardous event.



Component adoption: Integration of software components for the realisation of close and open source products.

SLR: Systematic literature review

1.3 Intended audience This deliverable is intended for the internal use of the project. Parts of this document, in particular the Systematic Literature review and the definition of the set of terms collected during the review activity, can be exploited for the preparation of scientific and industrial papers and documents intended for the scientific and industrial communities interested in Open Source Software but they can also means to disseminate RISCOSS objectives and solutions to a larger non-technical audience.

1.4 Relation to other deliverables

There is a backward relation to deliverable D4.1 that discusses the state of the art in OSS domain as described in previous EU projects and with D1.1 describing the initial ontology for OSS ecosystems.

1.5 Scope The scope of this document is the RISCOSS project itself.

1.6 Document structure Section 2 introduces the concept of operational risk and shows the state of the art, discuss-ing various risk modelling and analysis techniques, and performing a systematic literature review on risks in OSS adoption. Section 3 presents promising initial requirements for the RISCOSS project, including methods and techniques for organisational modelling, data processing and analysis with statistical approaches and optimisation techniques. The docu-ment concludes with a discussion and conclusion. Annex A contains a first example for data processing with statistical techniques.



2 State of the art in OSS adoption risk management

In this section we define the concept of risk as used in this document and work package, and present an overview over relevant literature in the field of risk identification and management in open source software component adoption, and an overview of promising approaches in the fields of risks modelling and analysis. To cover the topic of risks in OSS adoption in a comprehensible and structured way, we first describe risk and risk-management related concepts (Section 2.1), then we show a systematic literature review (SLR), whose process and a synthesis of the outcomes are detailed in the Sections 2.2 to 2.4. Please note that in this document we do not discuss the single proposals and outcomes of each reviewed pa-per.

Moreover, we review approaches that consider the concept of risk in generic software engi-neering methodologies and in goal-oriented approaches (in particular, i*, KAOS, and EKD). Furthermore, Sections 2.5 and 2.6 cover modelling, analysis and optimization techniques, whose use appears promising in the risk management approach envisioned in the RISCOSS project.

2.1 Risks and Risk management

2.1.1 Operational Risk Management In this work, we consider risks specifically as operational risks. While financial risks have been recognized long ago, Operational Risks are in fact part of everyday life and not just a business issue. Operational risks and their management have been misdiagnosed frequently as human error, machine malfunction, accidents and so on. Often these risks were treated as disconnected episodes of random events, and thus were not managed. With the ad-vancement of computerized systems came the recognition that operational mishaps and accidents have an effect, sometimes a very considerable one, and that they must be brought under control. Today, Operational Risk Management is gaining importance within business-es for a variety of reasons. One of them is the regulatory demand to do so in important sectors of the economy like banking, insurance and the pharmaceutical industry. Another is the recognition that since operations are something that the business can control completely or almost completely, it ought also to manage the risk associated with these operations so that the controls are more satisfactory to the various stakeholders in the business.

2.1.2 Definitions of Operational Risk Management

"Operational risk is defined as the risk of loss resulting from inadequate or failed internal processes, people and systems or from external events. This definition includes legal risk, but excludes strategic and reputational risk…" (Basel II, 2006). In layman terms, operational risk covers unwanted results brought about by people not following standard operational procedures, by systems, including computer-based systems, or by external events.

Operational risks abound in every sector of the economy, and in every human endeavour. They are found in the health sector, in the energy industry, in banking, in education and, indeed, in all activities. Some sectors, because of enhanced sensitivity to risks or because of government regulations, have implemented advanced processes for identifying the risks



specific to their activities. However, operational risks exist when any activity occurs, whether we manage it or not.

In summary, operational risks include most of what can cause an organization harm, that is foreseeable and, to a very large extent, avoidable – if not the events them-selves, then at least their impact on the organization. It is important to understand that a risk, once identified, is no longer a risk – it is a management issue.

2.1.3 Operational Risk Management Techniques

2.1.3.1 Risk Identification In order to effectively manage and control risk, management needs a clear and detailed picture of the risk and control environment in which they operate. Without this knowledge, appropriate action cannot be taken to deal with rising problems. For this purpose, risks must be identified. This includes the risk sources, the risk events and the risk consequences. For this and other risk-related definitions, see also ISO 73, 2009.

All risks, specific to an enterprise, must be identified by using a methodology designed to discover possible risks. There are a number of ways of identifying risks, including:

• Using event logs to sift the risks included in them • Collecting expert opinions as to what may go wrong in the enterprise • Simulating business processes and creating a list of undesirable results • Systematically going through every business process used in the enterprise and find-

ing out what may go wrong • Using databanks of risk events that materialized in similar businesses, in order to

learn from their experience

Once risks have been identified, controls must be put in place to mitigate those risks. Con-trols can be defined as processes, equipments, or other methods, including knowledge/skills and organization design that have a specific purpose of mitigating risk. Controls should be identified and updated on a regular basis. They should be:

• Directly related to a risk or a class of risks (not a generic statement of good practice) • Tangible and normally capable of being evidenced • Precise and clear in terms of what specific action is required to implement the control

A control assurance process aims to provide assurance throughout the business that con-trols are being operated and reports the actual status of a control’s performance. A shortlist of controls to be included in the control assurance process should consider the following:

• The impact and likelihood of the risk mitigated by the control • The effectiveness and importance of the control • The frequency of the control operation • The regulatory relevance of the control • The cost/performance ratio of developing and implementing the control

Risk event capture is the process of collecting and analysing risk event data. An operation-al risk event could e.g. result in an actual financial loss or financial profit of a defined amount, in a situation where no money was actually lost but could have been were it not for the operation of a control, or in a situation where damage is caused to equipment and to people



2.1.3.2 Risk and Control Assessments Risk and Control Assessment (RCA) is a core component of the risk management frame-work, used to identify the key risks to the business, to establish areas where control cover-age is inadequate, and drive improvement actions for those risks which are assessed as outside agreed threshold limits for risk

One of the goals of this activity is to be able to predict the risks facing the organizations, so that the priorities for handling them can be properly decided.

2.1.3.3 Key Risk Indicators Key Risk Indicators, or KRIs, are metrics monitored in order to enable an immediate re-sponse by the risk managers to evolving risks. For more on KRIs see [Kenett and Raanan, 2010], [Kenett and Baker, 2010] and [Ograjenšek and Kenett, 2008].

2.1.3.4 Issues and Action Management The management of issues and their associated actions is fundamental, providing a stand-ardized mechanism for identifying, prioritizing, classifying, escalating and reporting issues throughout the company. The collection of issues and actions information allows the busi-ness to adopt a proactive approach to Risk Management and allows for quick reactions to changes in the business environment.

2.1.4 Risk Mitigation

Risk mitigation is an action to counteract, in advance, the effects on the business of risk events materializing. Strategies could be to avoid the risk, to accept the risk, to transfer the risk to others, or to reduce the risk.

Avoiding the risk means not taking the action that may generate it. With operational risk, that means not performing that operation. Accepting the risk means that the organization, while well aware of the risk, decides to go ahead and perform the operation that may end in the risk event occurring, and to suffer the consequences of that occurrence. Transferring the risk may be accomplished by a number of methods. The most familiar one is to insure the busi-ness against the occurrence of that risk event. This way, we transfer the risk to the insurance company and substitute a probabilistic loss event (the risk actually occurring and causing damage) with a deterministic, known loss – the insurance premium. Another way of transfer-ring the risk is to subcontract the work that entails the risk, thereby causing some other business to assume the risk. Finally, reducing the risk means taking steps to lower either the probability of the risk event happening or the amount of the damage that will be caused if it does occur. It is possible to act on these two distributions simultaneously, thereby achieving a lower over-all risk.

2.2 Systematic literature review: Risks in OSS adoption

We conducted a systematic literature review (SLR) on risks in OSS adoption, with various objectives: firstly, to describe the state of the art in the field, to understand the main research problems, the difficulties, the limitations of current approaches and open issues; secondly, to study the terminology used in the field, to structure, categorize and detail the problem do-main, i.e. risks, risk measures, mitigation strategies, and the peculiar role of the OSS devel-opment and business model in OSS component adoption; and lastly, to contribute to the



discovery of the initial requirements for a new risk analysis process, with a preliminary screening of the techniques used for data gathering, for giving evidence to claims and for the evaluation of the proposed approaches, as an indicator for their reliability and maturity, as a potential source for data, and as a guide for the evaluation of own approaches.

The SLR focuses on the identification of technical, legal and organisational risks and risk analysis techniques in the domain of open source software adoption, where we intend adop-tion mainly as component integration. To obtain a more complete view on the state of the art in risk analysis in this domain, it includes also literature, which mainly concerns off-the-shelf (OTS) components in general. It excludes literature on business-related risks, which are analysed in Work Package 3, security and safety risks. Moreover, it excludes OSS adoption by the final user (i.e. deployment), e.g. we are not considering to analyse the challenges and risks in the adoption of Linux, Firefox or LibreOffice by an organisation, nor the use of server applications such as the LAMP stack.

In the following, we detail the SLR process, summarize the results, present the obtained terminologies, and discuss the achievements and lacks of current approaches.

2.2.1 Systematic literature review: protocol

For the SLR we defined the following protocol, adopting the guidelines by [Kitchenham and Charters, 2007]. The single steps are documented and the retrieved data stored in a reposi-tory accessible by the project partners.

1. Definition of the purpose and intended goals of the review 2. Definition of the details of the literature search. The search strategy includes:

o definition of the publication channels to be searched (primary conferences, journals, other traceable sources of knowledge);

o definition of search terms and selection of the libraries to be searched; o selection criteria, which are used to determine which studies are included in,

or excluded from, a systematic review. The justification that the comprehen-siveness is still assured with the adopted exclusion criteria

4. Relevance and quality assessment procedures (screening for exclusion), describing what will be the selection criteria to remove papers of insufficient quality or that are out of the domain, e.g. with checklists. The assessment is made, in this order, on ti-tle, abstract, introduction and the full text.

5. The data extraction strategy that defines how the information required will be ob-tained systematically.

6. The quantitative or qualitative analysis of the extracted data, in a way that it could be independently reproduced, and the writing of the detailed results of the review.

2.2.2 Purpose of the SLR

The purpose of this SLR is to define the state of the art, the terminology and the techniques used, in the field of risk analysis for OSS component adoption for the development of com-mercial and open-source software, focussing on technical and organisational risks. It ex-cludes business-related risks (analysed in Work Package 3), security and safety risks. Moreover, it excludes the adoption of OSS by end-users.



2.2.3 Literature search: publication channels

We defined an a-priori a set of source publication channels where we expect to find a com-prehensive set of scientific publications on the argument. This set is coordinated with the sources used for the literature review on open source ecosystems and ontologies, contem-porarily carried on in WP1, and with the involved project partners. During the manual screen-ing for identifying promising publications, other relevant channels were added.

2.2.3.1 Journals and Magazines ACM Transactions on Software Engineering and Methodology (TOSEM), Communications of the ACM, Empirical Software Engineering (EMSE), IEE Review, IEE Proceedings – Soft-ware, IEEE Computer, IEEE Software, IEEE Transactions on Software Engineering (TSE), Information and Software Technology (IST), Journal of Systems and Software (JSS), Soft-ware Practice and Experience, Software Process Improvement and Practice, Requirements Engineering Journal (REJ), Automated Software Engineering Journal (ASE Journal), Data and Knowledge Engineering (DKE), Software and System Modeling (SoSyM), Information Systems Journal (ISJ).

2.2.3.2 Conferences and Workshops International Conference on Software Engineering (ICSE), IEEE International Symposium on Software Metrics (METRICS), International Symposium on Empirical Software Engineering and Measurement (ESEM), ACM International Symposium on the Foundations of Software Engineering (FSE), IEEE International Conference on Automated Software Engineering (ASE), IEEE International Conference on Requirements Engineering (RE), IEEE Signature Conference on Computers, Software, and Applications (COMPSAC), Hawaii International Conference on System Sciences (HICSS), ACM Symposium of Applied Computing (SAC).

2.2.3.3 Newly added sources recognized to be important in the field International Conference on Open Source Systems (OSS), International Conference on Digital Ecosystems and Technologies (DEST), European Conference on Software Mainte-nance and Reengineering (CSMR), Workshop on Emerging Trends in Free/Libre/Open Source Software Research and Development (FLOSS), Working Conference on Reverse Engineering (WCRE), Reliability and Maintainability Symposium (RAMS), International Con-ference on Composition-Based Software Systems (ICCBSS)

2.2.4 Search terms and libraries To find relevant articles, we searched 5 popular, complete and comprehensive meta-libraries for publications in computer science. Moreover, if we recognized that one of the sources listed above is not indexed in anyone of these libraries, we added the corresponding library.

Search in meta-libraries is carried out with a conjunction of the following search strings. Most of the search engines used did not give a larger set of results by using also the plural or longer acronyms such as "FLOSS". As a result of some tests, the terms "Free software" and "Libre software" were not considered as synonyms of OSS, since they are mainly used in the field of end-user software. We list a general form of the search string. The concrete search string used depends on the syntax available in each meta-library search facility. Sometimes, also a stepwise search was necessary.



2.2.4.1 Search terms (used in conjunction) The keywords to find relevant papers have been the following:

• To find papers related to Open Source Software: OSS OR “Open Source"

• To find papers related to Risks: Risk OR Obstacle OR Problem

• To find papers related to Adoption: Adoption

We would like to note that we excluded synonyms of risk such as pitfall, complication, trou-ble, danger, or peril, because they are related more to business issues, to safety issues, or not commonly used in this context. We included the term Problem, because it gives some relevant results, even if many of the papers retrieved are out of the topic.

2.2.5 Selection/exclusion criteria Literature collected searching the repositories previously defined is first scanned for dupli-cates. Then, the criteria for relevance and quality assessment (i.e. screening for exclusion) are defined.

2.2.5.1 Step 1: direct exclusion criteria The criteria for direct exclusion of papers have been the following:

• Papers not written in English.

• Introductions, indexes, book reviews, PhD symposium papers, editorials.

• Short papers or demo papers shorter than 3 pages.

• Papers, which are not published in one of the venues (Journals/ Magazines/ Confer-ences/ Workshops) defined above need to undergo a manual screening for relevance by evaluating, in this order, venue, title, keywords and abstract. In this step, the list of venues could be completed. Additional workshop papers are included only if they have a particularly high relevance and quality.

While performing the review, additional exclusion criteria appeared, that are documented in each phase.

2.2.5.2 Step 2: Exclusion based on title and keyword relevance Papers have also been excluded by means of a screening, based on the venue and topic relevance (removing papers which speak only about security risks, social risks, software use in education, electronic devices, mathematical problems, medical risks, disaster manage-ment).

2.2.5.3 Step 3: Exclusion based on reading of abstract and introduction Lastly, papers have been excluded on the basis of their content, with the objective to find the most relevant papers for an in-depth reading, aimed at contributing to the main purpose of the SLR: to identify technical and organizational risks and risk analysis techniques in the domain of open source software adoption. On the basis of a screening by reading the ab-stracts and, if necessary, the main parts of the paper, we excluded papers which do not contribute to this approach; in particular papers prevalently about:

• Business-related and legal risks (analysed in WP3)

• General security and safety risks, and risks which are not related to software engi-neering (e.g. papers on disaster management)



• Mathematical problems, optimisation problems, etc.

• Focussing only on the development of open source

• Papers on the end-user adoption or deployment of open source programs such as Linux, Firefox or LibreOffice by end users or on Servers (e.g. Tomcat, PhP)

• "OSS" used as a different abbreviation (Operational Support System, Operating Sys-temS)

The selected papers will thus contain information on:

• Technical risks in OSS adoption and use.

• Risk analysis techniques in the domain of open source software (adoption).

• Limitations of current approaches, capabilities needed for new approaches.

• Guidelines for risk mitigation with COTS/OSS use.

2.2.6 Data Analysis and Extraction

Each of the papers remaining from the previous steps has been read in-depth in a collabora-tive way, with focus on the following aspects: the use and definition of concepts related to OSS, risks, risk measures and mitigation activities, with the aim of creating a terminology of the concepts used; the community or OSS ecosystem structure and the underlying business strategy of involved companies and individuals; the modelling approach used, if any; the documented problems encountered, the risks that were identified (in explicit or implicit way), the mitigation strategies, and their validation. The papers are then clustered by research group (to find correlated works) and by the domain of the paper (OSS, adoption, COTS, risk analysis,..., SLR). With this information, the following form is filled.

1. Main message of the paper 2. Paper type (exploratory, theoretical, experimental,...) 3. Important concepts defined 4. Risks or issues described 5. Risk measures described, data analysis techniques described 6. Decision and mitigation techniques described 7. Paper type (exploratory, theoretical, experimental) 8. Maturity (intuitive, tested on example, qualitative (on questionnaires), quantitative (on

collected data), semantic (NLP on text), Integrated (ETL, various data sources in sync.)

9. Evaluation (none, informal, qualitative or quantitative experiment, formal) and data obtained

10. Subjective importance for the RISCOSS project 11. Evaluation results and reader's subjective comments 12. Identified differences between COTS and OSS adoption 13. Lessons learned (in the paper and by the reader)

2.3 Systematic Literature Review: Execution and Paper Analysis

The search is done on the following five meta-libraries, by using the web-search option available in the Open-Source application JabRef. A search for duplicates is made while adding the results for each library. Moreover, a first screening on title and keywords is per-formed, to ensure the efficiency of the search terms and to remove clearly unrelated papers.



Note, that the search engines index different data to be used in the searches, e.g. some include the abstracts, others not.

• DBLP: Search executed with several variants and conjunctions of the search terms: 125 results. 44 papers added.

• IEEE Library: Search executed with several variants of the search terms, including plurals: 11 searches, 236 results without duplicates. 207 added.

• ACM DL (search is limited to the 20 most relevant papers for every search key): 6 searches, 86 papers retrieved, 10 duplicates, 76 added.

• Additional manual search:

o SpringerLink: not exportable. Manual check of the first 20 publications for rel-evant articles for search string 'risk AND "open source"'. 2 added.

o ACM DL: manual search on "risk AND OSS", results ordered by recent date, first 100 results (2011-2013) analysed. 3 relevant journal papers identified and added

Total papers in the review: 332.

Relevant paper extraction STEP 1 Manually delete publications, which are not articles (e.g. full proceedings (18), collections (1), books (2)). Remaining papers: 311.

Manual screening for short papers, duplicates, invalid entries, to-be-published and language (English). Remaining papers: 282.

Relevant paper extraction STEP 2 Screening for venue and additionally for topic relevance, based on title and keywords. Re-maining: 93 plus 13 interesting, not directly relevant papers.

Relevant paper extraction STEP 3 Screening based on based on reading of abstract and introduction: The result of this step, performed as described above, are the 47 relevant papers for the review, reported in Sec-tion 2.4.

2.3.1 Paper Analysis

A detailed review of the selected 47 papers reported in Section 2.4 (please note that these references are indicated as [SLR1] to [SLR47] in this section, and on purpose not merged with the rest of the references) gave a comprehensive overview on the actual state of the art and the community in the field of risk analysis and management in the domain of Open-Source-Software adoption. Furthermore, we extracted several relevant aspects in a struc-tured manner, in order to build glossaries of terms used and to discover relevant measures and techniques. In the following, we categorise the papers, highlight the differences between OSS and COTS adoption as individuated in these papers, extract and structure the issues and risks (which can also be seen, in a positive way, as opportunities) that were evidenced or analysed in the papers, and explain the measures and mitigation activities proposed, used or validated.



2.3.1.1 Categorisation For the purpose of a bitter overview, we divided the resulting articles into seven main areas, based on the main focus of their content. The categories were defined based on a partition-ing of the articles, guided by the relevant concepts in the domain of this SLR, that are OSS, risk, and component adoption. These categories are not mutually exclusive, rather, they define different levels of coverage of the concepts of interest.

• OSS component adoption: 12 papers, thereof one paper low-level (line-wise) adop-tion

• OSS in general (use, participation, OSS practices): 17 papers

• Off the shelf (OTS) components and development (commercial and OSS): 10 papers

• Software risk analysis (in general): 1 paper

• Software quality assurance (relevant for risk management): 2 papers

• Software in general: 2 papers

• Systematic literature reviews (SLR): 3 (1 on OSS adoption, 2 on OSS development in general)

The reader can note that only one quarter of the papers explicitly deals with OSS component adoption. However, most of the other papers include parts treating this topic, among other, more general aspects.

An analysis of research groups collaborating on a single topic gave only one mentionable cluster of work, namely the nine papers by the first authors Jingyue Li [SLR18][SLR19][SLR20][SLR47], O. Hauge [SLR12][SLR13], R. Conradi [SLR7] and C. Ayala [SLR2][SLR3].

2.3.2 Differences between COTS and OSS adoption identified in literature The RISCOSS project has a concrete focus on the adoption of open-source software. Pro-cesses for commercial component selection and risk evaluation are adopted, in a more or less formal way by many medium and big companies [Li, 2006] [Carvallo et al., 2007] [Ayala et al., 2011] [Comella-Dorda et al., 2002]. However, these processes are not suitable, or are suitable only partially, to the evaluation of OSS components, because they rely on direct costs, quality metrics, responsibilities and certifications such as CMMI (Capability Maturity Model Integration), which are not available or difficult to obtain without the possibility to evaluate the company behind a component [SLR29]. Also, quality metrics are difficult to obtain, because traditional project monitoring focuses on human reporting in a tightly cou-pled organization [SLR44]. Moreover, OSS demands for decision criteria where indirect costs, an evaluation of the community, etc. need to get particular attention. Here we give a structured view on the main differences between COTS and OSS adoption, as identified in literature, as a starting point for focussing on the relevant aspects for an OSS adoption risk analysis process.

2.3.2.1 Access to data By the definition of the Open Source Initiative (OSI), open source software must comply, among others, with the following criteria: the source code must be available, it must be allowed to copy and freely redistribute it, to improve the program, and to release these im-provements.



This definition and its consequence of possibly having a community working on the software bring about the most evident differences from commercial software (certainly, exceptions prove the rule), the availability of the source code and of a bug tracker [SLR15][SLR34][SLR19][SLR17], which make white box testing possible [SLR34], and ac-cess to data by and about the developer and user community, which makes various commu-nity measures possible [SLR17][SLR34].

2.3.2.2 Differences emerging from the organizational structure Having no purchasing costs, but typically various hidden costs, implies the need for a differ-ent focus in component evaluation processes [SLR40][SLR4][SLR39][SLR12]. Another critical difference, vendor (in)dependency [SLR40], was already highlighted in the previous sections. Evaluation thus needs to be project-, not company-oriented [SLR29], and commer-cial contact for support [SLR19][SLR12] can bring different opportunities (e.g. competition) and challenges (e.g. missing competence), and the need for creating expertise in the adopt-ing company [SLR12].

The community-oriented organisational structure leads to novel license, liability, responsibil-ity, and intellectual property issues [SLR9][SLR12][SLR40], highlighted in most literature on OSS risks. Uncertainty about product future [SLR40][SLR4], unknown development sched-ules [SLR17] and the uncertain long-term support [SLR20][SLR27] provide further challeng-es.

2.3.2.3 Differences emerging from the OSS development process Due to the development processes followed in the OSS community and the reduced influ-ence from companies’ business strategies, various qualities are attributed to OSS and em-phasized as differences to COTS, such as a missing long-term roadmap [SLR34][SLR46], lower quality documentation [SLR40][SLR4], the different testing process [SLR27], and missing unambiguous specifications [SLR1] with few architectural choices or design for usability [SLR32]. Moreover, rapid code and API change [SLR1][SLR4], lacks in quality evaluation [SLR40], but also better adherence to open standards are attributed to OSS projects.

However, various works also underline the similarities of OSS and COTS. In practice, code changes are seldom, and external support can be ensured, so update difficulty, liability, licenses, and testing do not differ significantly [SLR3][SLR18].

2.3.3 “Risk” in OSS adoption – a taxonomy from literature

In the following we give a structured list of terms for the various risks which were identified, which were discussed or which emerged from surveys or other analyses, in the reviewed papers. Inspired by the various categorisations given in some of the works, we define six main types of risks (which can, however, also be seen as opportunities) in our domain of interest: risks inherent to the component selection and integration process, knowledge-related risks (e.g., depending on experience or on availability of documentation and training), risks linked to legal issues (property rights) ad business models, risks in maintenance and support, risks emerging form the community and its organisation, and risks related to code quality.



2.3.3.1 Selection and Integration Risks • component selection risks

o requirements satisfaction [SLR14][SLR38], requirements not negotiable [SLR16][SLR47]

o lack of products [SLR12] o decide what fork should be chosen [SLR40] o Identification of product quality, uncertainty about quality [SLR40]

• adoption (integration) process risks [SLR19] o uncontrolled adoption [SLR12], less caution in selection of OSS [SLR19] o no care for quality in adoption [SLR1] o lack of OTS-driven requirements engineering process [SLR16]

• integration risks [SLR40][SLR6] o integration effort ill-estimated [SLR16][SLR19][SLR47][SLR40] o deployment risks [SLR47] compatibility [SLR28] o complicated multi-OTS components arrangement [SLR16]

• final product project risks (budget, delivery time, required quality) [SLR34] o impact on product reliability [SLR17][SLR47][SLR40][SLR41][SLR42] o impact on product security [SLR1][SLR28][SLR47] o impact on product performance [SLR47]

2.3.3.2 Knowledge-Related Risks • company human capital risks [SLR21]

o availability of skilled personnel [SLR38] o lack of expertise [SLR12][SLR28] o product training availability [SLR2][SLR34]

• lacks in documentation o documentation availability [SLR17][SLR34] o Insufficient component documents [SLR16] o tutorial and book availability [SLR34]

2.3.3.3 Legal and Business Risks • legal issues

o legal, intellectual property [SLR2][SLR9][SLR22][SLR40][SLR46] o licensing [SLR12][SLR14][SLR22][SLR24][SLR34][SLR40]

• liability [SLR9] o no ownership [SLR28] o liability for component quality [SLR24] o lack of providers [SLR12] o project-specific, not company-specific [SLR29] o lack of information on provider [SLR16] o (no) vendor lock in [SLR28]

• monetary risks o cost, monetary risk [SLR25][SLR28] availability of financial resources [SLR38] o hidden costs [SLR12] o market success [SLR26]

• non-it organisational risks (business agreements, ideology, management support) [SLR47]

• OSS component business model [SLR40] o commercial versions [SLR34] o availability of additional non-OSS functionalities, licensing costs

[SLR38][SLR2]



2.3.3.4 Maintenance Risks • versioning, update risks [SLR2][SLR28][SLR37][SLR40][SLR46]

o update frequency [SLR34] o innovation (as benefit) [SLR28] o explicit and hidden semantic changes [SLR15]

API changes [SLR4] quality changes [SLR15] impact on self-implemented code [SLR15] update backwards compatibility [SLR40] upgrade feasibility [SLR16][SLR47]

• lack of support [SLR9][SLR12][SLR16][SLR47][SLR40] o uncertainty in service and support [SLR21] o bug correction/fix time [SLR5][SLR15][SLR34] o unknown development schedules [SLR17]

2.3.3.5 Community-Related Risks • project community activeness [SLR34]

o presence of heroes [SLR32] o absence of key committers [SLR44] o de-motivation of the project community [SLR44] o fail to create community [SLR8] o lacks in marketing [SLR28]

• possibility of contribution [SLR34][SLR35][SLR40] o patch rejection, change acceptance [SLR35][SLR40] o cost of contribution [SLR35][SLR42] o not adaptable to requirements changes [SLR16][SLR47] o reduced control of future evolution [SLR16]

• uncertainty in future and roadmap [SLR34][SLR46] o standards implementation [SLR34] o maintenance planning unfeasible [SLR16][SLR47] o impossibility of ensuring future response time [SLR34]

2.3.3.6 OSS Component Code Quality Risks • lack of information on quality [SLR24]

o lack of a testing process [SLR27] o testing shortfalls [SLR34] o overlook NFR's [SLR9] o maturity [SLR38]

• usability, user friendliness [SLR28][SLR31] • component dependability [SLR1] • bad code quality [SLR15][SLR24][SLR30]

o difficult defect localisation [SLR19][SLR47] o bug risk [SLR30] error proneness [SLR36] o component dependency (code level) [SLR34]

• flexibility of use [SLR28]

2.3.4 Measures used in literature

Various measures are used in literature that explore the particular data provided by OSS projects, to quantify various qualities of the source code, data on bugs and bug fixing, on the community and its future roadmap, and on the adopting organisation. Also, subjective expert opinions can be a valuable source for measure. In the following subsections, we define five



categories of measures and list them as they were identified or used in the single papers, without going into detail for single values.

2.3.4.1 Code metrics: API usage metrics, tool, informal results analysis [SLR4], analyse dependencies, detect 3rd party signature changes, detect 3rd party code internal changes, code quality: QBench benchmarks [SLR15], module usage (coupling), fault count, complexity (cyclomatic number) [SLR17][SLR36], software quality metrics, documentation and Interaction support, integra-tion and adaptability metrics [SLR1], Sotograph code metrics [SLR31], regression to find associations metric-fault [SLR36], lines of code [SLR37].

2.3.4.2 Bugtracker: Bugginess measures [SLR37], correction time [SLR5], invalid method parameters fault, wrong method usage fault, faulty method implementation, environment fault [SLR6], analyse bug tracker [SLR15], bug reports [SLR17], keyword search in logs for "bug", "problem" [SLR30], repository measures (number of committers, number of commits by committer, etc.) [SLR33], bug reports [SLR37], mean time between software failures [SLR42].

2.3.4.3 Community and Roadmap: Various developers community and users community measures, project activity and release delivery frequency [SLR8], development environments, community group activities, and on-going efforts [SLR17], mails per month, developer interactions, active power users, postings vs. downloads [SLR44].

2.3.4.4 Expert opinions, involved companies, support significance (expert opinion) [SLR17], interviews to developers and managers [SLR18], support reputation [SLR19], perception [SLR28][SLR29], questionnaire-based on all aspects of adoption risk, measure level of usability-awareness: preliminary, recognized, defined, streamlined, institutionalized [SLR32], understand where the managers see the risk [SLR43].

2.3.4.5 Metrics for the adopting company, business value metrics Organization size, magnitude of outsourced IT, prior experience in adoption of OSS and proportion of IT employees within an organization [SLR21], Real Options Theory: analyse, in a monetary fashion, the economic value, Petri nets, RiskSimulator tool [SLR25], testing process certification, TMM Testing Maturity Model [SLR27], developer experience [SLR37].

2.3.5 Risk mitigation

Few publications in this field speak about risk mitigation. Mostly, mitigation of the various risks encountered in OSS adoption is only mentioned informally, in form of general hints, such as to train the people, i.e. to develop the existing stock of human capital [SLR21], to follow general COTS adoption decision processes, to evaluate the community [SLR24], to evaluate similarity to previous projects [SLR44], to evaluate the OSS project’s roadmap and possible future directions [SLR46], or to make managers aware of risks and opportunities [SLR37]. Empirical experiments were used to identify risks and to identify effective mitigation activities. However, none of the works showed evidences for causal relationships between these risks, concrete measures and the effectiveness of the mitigation activities. Only for more concrete risks, as e.g. for lowering the risk of introducing errors when upgrading to new



versions, concrete mitigation activities were proposed, such as automatically checking API compatibility [SLR4] or exploring code executability with test cases [SLR14] to ensure cor-rect functionality.

2.3.6 Risk, events, metrics, mitigation: a conceptual map The following figure shows an excerpt of a map of the main concepts related to risk and risk analysis in OSS adoption, putting terms in context on basis of [Kenett and Raanan, 2010] and the papers resulting from the SLR. The resulting terminology, which needs to be dis-cussed with the RISCOSS partners, refined and completed, and can be seen as a first step towards a formal ontology for risks in OSS adoption.

Figure 1: Map of the concepts related to risk and risk analysis in OSS adoption



2.3.7 Data retrieval and empirical evaluation for OSS risks and mitigation activities

In this section we report the forms in which data was acquired and proposals were evaluated in the identified literature.

Out of the 47 analysed papers, three were literature reviews, of which two can be defined as systematic. Twelve papers are theoretical, but evaluated only informally, by a proof-of-concept application to a single example. 13 papers report, analyse and draw conclusions from surveys, questionnaires and interviews with developers and managers in software firms, basing mostly on qualitative data such as opinions. Other ten papers based their analyses on quantitative data, with data from log files for retrieving the software used by servers, from bug tracking systems for activity, bug reporting and resolution data, from code (various code metrics, available test cases, interface analysis), and code repositories metadata, e.g. for changes in time and changes per developer (see Section 2.3.4).

Notice that there are a huge number of publications, which report and interpret results from qualitative and quantitative studies to identify possible risks. It stands out that only [SLR40] calculates threshold values for defining bug risks, and evaluates their performance, while no paper identifies risks based on quantitative data of project failure or created losses and revenues, or correlates project failures and losses with quantitative data such as the number of bug reports and bug fixes.

Few papers consider quantitative measures on community qualities (number of contributors, activity, presence of heroes [SLR39] etc.), while no work empirically evaluates the existence of causal relationships between the metrics applied, the risks identified and the actual faults happening. Moreover, an empirical evaluation of the effectiveness of mitigation activities by their influence on the retrieved metrics is also missing in the works, which propose specific mitigation activities. Also for very complete works, such as the ones by the group [SLR12][SLR19][SLR20][SLR47], whose surveys retrieve data for risks and mitigation activi-ties, do not show any link between these two: typical mitigation activities adopted in software companies are very general (see Section 2.3.5) and would reduce various risks.

2.4 References: SLR Selected Papers

[SLR1] Ardagna, C. A., Banzi, M., Damiani, E., Frati, F. and El Ioini, N. "An assurance model for OSS adoption in next-generation telco environments", Proc. 3rd IEEE Int. Conf. Digital Ecosystems and Technologies DEST '09, 2009, pp. 619--624.

[SLR2] Ayala, C. P., Cruzes, D., Oyvind Hauge and Conradi, R. "Five Facts on the Adoption of Open Source Software," IEEE Software (28:2), 2011, pp. 95-99.

[SLR3] Ayala, C. P., Sorensen, C.-F., Conradi, R., Franch, X. and Li, J. "Open Source Col-laboration for Fostering Off-The-Shelf Components Selection" OSS, 2007, pp. 17-30.

[SLR4] Bauer, V. and Heinemann, L. "Understanding API Usage to Support Informed Deci-sion Making in Software Maintenance" In Proc. 16th European Conf. on Software Maintenance and Reengineering (CSMR), 2012, pp. 435--440.



[SLR5] Canfora, G., Ceccarelli, M., Cerulo, L. and Di Penta, M. "How Long Does a Bug Sur-vive? An Empirical Study" In Proc. Of the 18th Working Conf. on Reverse Engineer-ing (WCRE), 2011, pp. 191--200.

[SLR6] Chang, H., Mariani, L. and Pezze, M. "Self-healing strategies for component integra-tion faults", In Proc. of the 23rd IEEE/ACM Int. Conf. on Automated Software Engi-neering - ASE Workshops 2008, 2008, pp. 25--32.

[SLR7] Conradi, R., Li, J., Slyngstad, O. P. N., Kampenes, V. B., Bunse, C., Morisio, M. and Torchiano, M. "Reflections on conducting an international survey of software engi-neering", In Proc. of the Int. Empirical Software Engineering Symposium, 2005.

[SLR8] Duenas, J. C., Parada G., H. A., Cuadrado, F., Santillan, M. and Ruiz, J. L. "Apache and Eclipse: Comparing Open Source Project Incubators," IEEE Software (24:6), 2007, pp. 90--98.

[SLR9] Ebert, C. "Open Source Drives Innovation," IEEE Software (24:3), 2007, pp. 105--109.

[SLR10] Groen, F. J. and Smith, C. "Concept for the NASA risk and reliability data collection and analysis environment" In Proc. of the Annual Reliability and Maintainability Sym-posium RAMS 2009, 2009, pp. 134--139.

[SLR11] Höst, M. and Orucevic-Alagic, A. "A systematic review of research on open source software in commercial software product development," Information & Software Technology (53:6), 2011, pp. 616-624.

[SLR12] Oyvind Hauge, Cruzes, D., Conradi, R., Velle, K. S. and Skarpenes, T. A. "Risks and Risk Mitigation in Open Source Software Adoption: Bridging the Gap between Litera-ture and Practice", OSS, 2010, pp. 105-118.

[SLR13] Oyvind Hauge, Sorensen, C.-F. and Conradi, R. "Adoption of Open Source in the Software Industry", OSS, 2008, pp. 211-221.

[SLR14] Janjic, W., Stoll, D., Bostan, P. and Atkinson, C. "Lowering the barrier to reuse through test-driven search" In Proceedings of the 2009 ICSE Workshop on Search-Driven Development-Users, Infrastructure, Tools and Evaluation, IEEE Computer Society, Washington, DC, USA, 2009, pp. 21--24.

[SLR15] Klatt, B., Durdik, Z., Koziolek, H., Krogmann, K., Stammel, J. and Weiss, R. "Identify Impacts of Evolving Third Party Components on Long-Living Software Systems" In Proc. of the 16th European Conference on Software Maintenance and Reengineering (CSMR), 2012, pp. 461--464.

[SLR16] Kusumo, D. S., Staples, M., Zhu, L. and Jeffery, R. "Analyzing differences in risk per-ceptions between developers and acquirers in OTS-based custom software projects using stakeholder analysis" In Proceedings of the ACM-IEEE international symposi-um on Empirical software engineering and measurement, ACM, New York, NY, USA, 2012, pp. 69--78.

[SLR17] Lee, W., Lee, J. K. and Baik, J. "Software Reliability Prediction for Open Source Software Adoption Systems Based on Early Lifecycle Measurements", COMPSAC, 2011, pp. 366-371.

[SLR18] Li, J., Conradi, R., Bunse, C., Torchiano, M., Slyngstad, O. and Morisio, M. "Devel-opment with Off-the-Shelf Components: 10 Facts," IEEE Software (26:2), 2009, pp. 80--87.

[SLR19] Li, J., Conradi, R., Slyngstad, O. P. N., Bunse, C., Khan, U., Morisio, M. and Torchi-ano, M. "Barriers to disseminating off-the-shelf based development theories to IT in-dustry" In Proceedings of the second international workshop on Models and



processes for the evaluation of off-the-shelf components, ACM, New York, NY, USA, 2005, pp. 1--4.

[SLR20] Li, J., Conradi, R., Slyngstad, O. P. N., Bunse, C., Torchiano, M. and Morisio, M. "An empirical study on decision making in off-the-shelf component-based development" In Proceedings of the 28th international conference on Software engineering, ACM, New York, NY, USA, 2006, pp. 897--900.

[SLR21] Li, Y., Tan, C. H. and Teo, H. H. "Firm-specificity and organizational learning-related scale on investment in internal human capital for open source software adoption" In Proceedings of the 2008 ACM SIGMIS CPR conference on Computer personnel doc-toral consortium and research, ACM, New York, NY, USA, 2008, pp. 22--29.

[SLR22] Link, C. "Patterns for the commercial use of open source: legal and licensing as-pects", In Proceedings of the 15th European Conference on Pattern Languages of Programs, ACM, New York, NY, USA, 2010, pp. 7:1--7:10.

[SLR23] Luoma, E., Helander, N. and Frank, L. "Adoption of Open Source Software and Soft-ware-as-a- Service Models in the Telecommunication Industry", ICSOB, 2011, pp. 70-84.

[SLR24] Maki-Asiala, P. and Matinlassi, M. "Quality Assurance of Open Source Components: Integrator Point of View" In Proc. of the 30th Annual Int. Computer Software and Ap-plications Conf. COMPSAC '06, 2006, pp. 189--194.

[SLR25] Mavridis, A. and Stamelos, I. "Real options as tool enhancing rationale of OSS com-ponents selection", In Proc. 3rd IEEE Int. Conf. Digital Ecosystems and Technologies DEST '09, 2009, pp. 613--618.

[SLR26] Messerschmitt, D. G. "Marketplace issues in software planning and design," IEEE Software (21:3), 2004, pp. 62--70.

[SLR27] Morasca, S., Taibi, D. and Tosi, D. "Towards certifying the testing process of Open-Source Software: New challenges or old methodologies?" In Proceedings of the 2009 ICSE Workshop on Emerging Trends in Free/Libre/Open Source Software Research and Development, IEEE Computer Society, Washington, DC, USA, 2009, pp. 25--30.

[SLR28] Morgan, L. and Finnegan, P. "Open innovation in secondary software firms: an explo-ration of managers' perceptions of open source software," SIGMIS Database (41:1), 2010, pp. 76--95.

[SLR29] Petrinja, E., Sillitti, A. and Succi, G. "Adoption of OSS Development Practices by the Software Industry: A Survey", OSS, 2011, pp. 233-243.

[SLR30] Phadke, A. A. and Allen, E. B. "Predicting risky modules in open-source software for high-performance computing" In Proceedings of the second international workshop on Software engineering for high performance computing system applications, ACM, New York, NY, USA, 2005, pp. 60--64.

[SLR31] Ramler, R., Wolfmaier, K. and Natschlager, T. "Observing Distributions in Size Met-rics: Experience from Analyzing Large Software Systems" In Proc. 31st Annual Int. Computer Software and Applications Conf. COMPSAC 2007, 2007, pp. 299--304.

[SLR32] Raza, A., Capretz, L. F. and Ahmed, F. "An open source usability maturity model (OS-UMM)," Computers in Human Behavior (28:4), 2012, pp. 1109-1121.

[SLR33] Ricca, F. and Marchetto, A. "Heroes in FLOSS Projects: An Explorative Study" IN Proc. 17th Working Conf. Reverse Engineering (WCRE), 2010, pp. 155--159.

[SLR34] Rudzki, J., Kiviluoma, K., Poikonen, T. and Hammouda, I. "Evaluating Quality of Open Source Components for Reuse-Intensive Commercial Solutions" In Proc. 35th



Euromicro Conf. Software Engineering and Advanced Applications SEAA '09, 2009, pp. 11--19.

[SLR35] Sethanandha, B. D., Massey, B. and Jones, W. "Managing open source contributions for software project sustainability" In Proc. PICMET '10: Technology Management for Global Economic Growth (PICMET), 2010, pp. 1--9.

[SLR36] Shatnawi, R. "A Quantitative Investigation of the Acceptable Risk Levels of Object-Oriented Metrics in Open-Source Systems," IEEE Transactions in Software Engi-neering (36:2), 2010, pp. 216-225.

[SLR37] Shihab, E., Hassan, A. E., Adams, B. and Jiang, Z. M. "An industrial study on the risk of software changes" In Proceedings of the ACM SIGSOFT 20th International Sym-posium on the Foundations of Software Engineering, ACM, New York, NY, USA, 2012, pp. 62:1--62:11.

[SLR38] Smith, II, J. D. "An Alternative to Technology Readiness Levels for Non-Developmental Item (NDI) Software" In Proc. of the 38th Annual Hawaii Int. Conf. System Sciences HICSS '05, 2005.

[SLR39] Spinellis, D. and Giannikas, V. "Organizational adoption of open source software," Journal of Systems and Software (85:3), 2012, pp. 666-682.

[SLR40] Stol, K.-J. and Ali Babar, M. "Challenges in using open source software in product development: a review of the literature" In Proceedings of the 3rd International Work-shop on Emerging Trends in Free/Libre/Open Source Software Research and Devel-opment, ACM, New York, NY, USA, 2010, pp. 17--22.

[SLR41] Tamura, Y. and Yamada, S. "Component-oriented Reliability Analysis and Optimal Version-upgrade Problems for Open Source Software," JSW (3:6), 2008, pp. 1-8.

[SLR42] Tamura, Y. and Yamada, S. "Software reliability assessment and optimal version-upgrade problem for Open Source Software", SMC, 2007, pp. 1333-1338.

[SLR43] Tiwana, A. and Keil, M. "Functionality Risk in Information Systems Development: An Empirical Investigation," IEEE Transactions on Engineering Management (53:3), 2006, pp. 412--425.

[SLR44] Wahyudin, D. and Tjoa, A. M. "Event-Based Monitoring of Open Source Software Projects" In Proc. of the Second Int. Conf. Availability, Reliability and Security (ARES 2007), 2007, pp. 1108--1115.

[SLR45] von Wangenheim, C. G., Hauck, J. and von Wangenheim, A. "Enhancing Open Source Software in Alignment with CMMI-DEV," IEEE Software (26:2), 2009, pp. 59--67.

[SLR46] Yamakami, T. "A stage model of evolution of open source software: Implications for the next stage of open source software development" In Proc. of the 6th Int Digital Content, Multimedia Technology and its Applications (IDC) Conf, 2010, pp. 203--207.

[SLR47] Li, J., Conradi, R., Slyngstad, O. P. N., Torchiano, M., Morisio, M. and Bunse, C. "A State-of-the-Practice Survey of Risk Management in Development with Off-the-Shelf Software Components," IEEE Trans. Software Eng. (34:2), 2008, pp. 271-286.



2.5 Goal-Oriented risk identification and modelling techniques

In this section we present various goal-oriented approaches and techniques for risk analysis and assessment that were proposed for the identification, modelling and analysis of organi-sational risks, based on i*/Tropos, KAOS and EKD.

2.5.1 Goal-Risk framework (i*-based risk-analysis technique)

[Asnar et al., 2011] propose the Goal-Risk framework to capture, analyse and assess risk at an early stage of the requirements engineering process, using goals models. A Goal-Risk model is a triplet {G, R, I} where G is a set of nodes, R a set of relations and I a set of a special type of relation: impact relations. The nodes can be characterised by two properties: Sat and Den, indicating the evidence that the node is satisfied/fulfilled/present or de-nied/failed/absent, respectively.

A goal-risk model is comprised by 3 layers: an asset layer, an event layer and a treatment layer. The Asset layer is used to model the assets. Taken from the definition given by the ISO 13335 standard, an asset is defined as ‘‘anything that has value to an organization’’. So the nodes contained in the asset layer can be (1) resources, (2) tasks (executed to generate value), or (3) goals (whose fulfilment generates value). Assets are decomposed through And- and Or- decomposition relations, and linked each other through contribution relations. The Event layer is used to model phenomena that could happen without having the possibil-ity of controlling them. Events are characterised by their likelihood to happen. Event likeli-hood is modelled through the Sat/Den property of the Event node. Events also have a severity, which indicates their capability to prevent or promote goals to be achieved. Event severity is modelled by linking events through Impact relations to the goals whose achieva-bility they prevent or promote. Impact relations can be negative (–) or strongly negative (– –); in this case the events represent risks. They can positive (+) or strongly positive (++); in this case they represent an opportunity. The Treatment layer contains the countermeasures set up to mitigate the risks, by reducing their likelihood or attenuating their severity. To mitigate risks, proper countermeasures are modelled as tasks. To reduce risk likelihood, negative contribution relations (–, – –) are used to relate tasks to risky events; this means that, when the task is performed, there is evidence that the event is less likely to happen. To attenuate risk severity, Alleviation relations are used to relate countermeasure tasks to impact rela-tions. This means that, when the countermeasure task is performed, there is evidence that the strength of the impact relation decreases.

2.5.2 Tropos variability and failure modelling

Risk assessment techniques have the objective of reducing known risks to an acceptable combination of likelihood or severity. However, some factors play against this approach. For example, as stated by the bounded rationality theory, the knowledge about possible risks may not be completely available nor completely tractable. Moreover, the cost of assessing all the risks may be simply too high to be acceptable. In this case, a different approach consists in letting the system be aware of the possibility of failure due to uncontrolled risks, and design it to recover from the failure by adapting itself.

Awareness Requirements modelling [Souza et al., 2011] consists in providing an explicit representation of the states requirements can assume during their execution at runtime: initially they are undecided and become later succeeded, failed or cancelled. Awareness



requirements specify the awareness about the success, failure or cancellation of other re-quirements, by specifying thresholds, to be evaluated at run-time; such thresholds define the transition from one state to another. Intuitively, awareness requirements rely on a require-ments monitoring framework to support them.

Another candidate technique for goal-oriented modelling and analysis in the context of risks, is Tropos4AS [Morandini et al., 2008]. Originally proposed for the description of self-aware adaptive systems, Tropos4AS, combines Tropos goal models with the modelling and analy-sis of failures, their causes and the possible mitigation activities.

The Tropos4AS methodology extends Tropos to provide a process and modelling language that captures at design time the knowledge necessary for a system to deliberate on its goals in a dynamic environment, thus enabling a basic feature of self-adaptation. It integrates the goals of the system with the environment, and gives a development process for the engi-neering of such systems, that takes into account the modelling of the environment and an explicit modelling of failures.

To capture a system’s awareness of its environment, Tropos4AS aids the software engineer in capturing and detailing at design time the environment perceived by a software system, together with possible failures that can be identified, the conditional events that lead to these failures, and the recovery activities to be enacted to prevent or mitigate them. This knowledge will guide the decision making necessary for self-adaptation at run-time. Moreo-ver, the high-level requirements are brought to run-time in the form of goal-models imple-mented software agents in a goal-directed agent platform [Morandini et al., 2009], to enable the system to monitor their satisfaction, to react upon them and to guide its behaviour ac-cording to them.

The concepts of failure, error and recovery activity, originally mainly used to elicit missing functionalities and to separate the exceptional from the nominal behaviour of the system, can be seen also from the viewpoint of risk management, for supporting the identification of critical events that will lead to possible failure of a goal of a system or of a stakeholder, thus representing possible risks, and to elicit various alternative mitigation activities. These miti-gation activities are defined in the context of the system, modelling positive and negative contributions to the system’s goals. Proper mitigation activities in a specific context can be chosen at run-time by applying one of the available goal analysis techniques, and taking account of the importance of modelled quality requirements at a given time.

2.5.3 Nòmos

Nòmos [Siena et al., 2012] is a modelling framework for the design and representation of legal knowledge in requirements engineering. It aims at supporting engineers in understand-ing and modelling compliance of information systems at design time. In particular, it focuses on the problem of variability and complexity of law, due to elements such as conditions, exceptions and derogations, which let different norms to hold under different conditions. These elements define a variability space, intended as alternative ways to comply with the set of norms within the legal text. This trait is captured in Nòmos by differentiating norms applicability from their satisfiability, and by defining compliance on the bases of the two.

Specifically, a norm is defined as a 5-tuple (type, holder, counter-part, antecedent, conse-quent): the type of the norm determines its nature – i.e., whether it s a duty or a right; holder is the actor in charge of comply with a duty, or entitled to exercise a right; counter-part is the



role, whose interests are helped if the norm is satisfied; antecedent is the condition to satisfy to make the norm applicable; and consequent is the condition to satisfy in order to comply with the norm.

Consequents and antecedents are modelled in terms of situations. A situation is a partial state of the world – or state-of-affairs – expressed through a proposition. A situation can be true, false, or have an unknown truth value. We use abbreviations ST, SF, SU to refer to truth values Satisfied True/False/Unknown, while AT, AF, AU refer to truth values Applicable True/False/Unknown. If the situations make the antecedent true, the norm applies; if the situations make the consequent true, the norm is satisfied.

Situations are related to norms and to other situations by means of four basic relations. The activate relation, from a situation to a norm, means that if the situation is satisfied the norms is applicable; vice versa, the block relation makes the norm not applicable. The satisfy rela-tion, from a situation to a norm or another situation, means that if the situation is satisfied the norm or the target situation is satisfied; vice versa, the break relation makes it not satisfied. Additionally, three shortcut relations have been defined between norms, in order to model the cases where one norm derogates, endorses or implies another one. Depending on its applicability and satisfiability value, a norm may have value: complied with, violated, tolerat-ed or inconclusive. Reasoning algorithm allow to explore norms status to search for global compliance.

2.5.4 KAOS Obstacle analysis

KAOS [Lamsweerde and Letier, 2000] is a framework for goal-oriented requirements model-ling and analysis. The idea behind KAOS is that models of requirements, represented through the concept of goal, can be formally checked for interesting properties to derive a consistent requirements specification. Modelling proceeds from top down, by identifying the top level goals and the refining them through And-/Or-decompositions into subgoals. De-composition continues until completeness is reached. Completeness means that each leaf goal is either an expectation, a domain property or a requirement.

With the aim of risk modelling and management, KAOS complements goal analysis with obstacle analysis. Obstacle analysis consists in analysing the adverse conditions that may prevent a goal to be achieved. Possible obstacles are searched for every leaf goal and attached to the leaf goal through a negative contribution relation. The root obstacle (which basically is the negation of the leaf goal) if then And- or Or- decomposed into more fine-grained obstacles. With the identification of the adverse conditions, it is possible to correct them by revising the model.

[Cailliau and van Lamsweerde, 2012] present an extension to KAOS to assess requirements risk using goal-based obstacle analysis techniques. The framework aims at achieving some objectives: (i) have a formal representation of the whole goal model; (ii) rely on strictly measurable variables; (iii) take into account partial or probabilistic values for goal satisfac-tion. Methodologically, the framework requires to precisely model the goals the system has to achieve. Subsequently, as many potentially adverse conditions as possible are identified and modelled; then risk assessment is performed to reduce the likelihood or the severity of each obstacle; finally, likely and critical obstacle are resolved by integrating appropriate countermeasures in the goal model (i.e., a revised goal model is produced).



Risk assessment using KAOS strongly relies on formal, quantitative analysis of goals satisfi-ability. Adverse conditions are evaluated with respect to their probability to happen and to their potential severity. Both values are measures associated to obstacle instances. A for-mal, probabilistic model describes how likelihood and severity are propagated through the goal/obstacle tree up to the tree root. In this scenario, And-/Or-decompositions play an important role in defining the capability of the goal tree structure to mitigate the risk before the root is compromised.

2.5.5 EKD Methodology

The Enterprise Knowledge Development (EKD) is a methodology for mapping organizational processes that can serve as a basis for identification risk management tasks [Rolland et al, 1999]. EKD was developed in the FP4 ELEKTRA project supported by the European Com-mission as a Business Process Management methodology for large organizations1. It pro-vides a controlled way of analysing and documenting an enterprise, its objectives and support systems. The approach represents an integrated set of techniques and associated tools for the purpose of dealing with situations of mapping business processes and re-engineering of information systems. At the centre of the approach is the development of enterprise knowledge models pertaining to the situations being examined. The definition of such models is carried out using appropriate modelling techniques and tools. During the process of developing these models, the participant parties engage in tasks that involve deliberation and reasoning. The objective is to provide a clear, unambiguous picture of how enterprise processes function currently in an "as is" model or in a modified "should be" mod-el.

Change is derived by the implementation of five key activities:

• Introduce – what are the new actions that need to be added

• Improve – what are the actions that already exist that need to be improved

• Extend - what are the actions that already exist and that need to be extended

• Cease - what are the actions that currently exist and that need to be ceased

• Maintain - what are the actions that are exist and that need to be maintained

The EKD approach considers the concept of an ‘enterprise process’ as a composite of four key enterprise components: (a) the roles that are played by enterprise actors in order to meet the process goals; (b) the activities involved in each role; (c) the objects that are in-volved, together with their evolution from creation to extinction (within the context of the enterprise process); and (d) the rules that determine the process components.

2.5.5.1 EKD Diagrams Through enterprise actor-role modelling, EKD encourages the identification of the key opera-tional components, which can be measured (activity duration, actor skills, resource costing etc.). Such measurable components can then be subjected to ‘what-if’ scenarios in order to evaluate alternative designs for the operation of an enterprise. A high-level view of the asso-ciation between actors and their different roles is supported through the Actor-Role Dia-gram (ARD). The actor-role diagram depicts the actors of the enterprise and the roles that they play. For each role involved in the model, information is given about the responsibilities 1 http://crinfo.univ-paris1.fr/PROJETS/elektra.html



that are assigned to the role, the dependencies that exist between the roles, the goals that the role must satisfy, and the actors, which play a particular role. Dependencies can be defined for goals, for authorisations, for coordination (expressing the need for one role to wait for completion of another role’s responsibilities), and for resources (illustrating the de-pendency of a role on the outputs of another role).

Dependency relationships are represented by arrows accompanied by the initial of the de-pendency type: A for authorization dependency, R for resource dependency and C for activi-ty coordination dependency [the “ARC” dependency]. The dependency arrow is directed from the provider role towards the requester role. The notation for actor-role diagrams is shown on an example in Figure 2.

Figure 2: An example actor-role diagram with “authorization” dependencies

In the Role-Activity Diagram the concept of a role is used to group together the details of the activities being carried out within the enterprise, including information on their order of execution and their interdependencies. The activities that a role performs are represented as sequences of nodes (drawn as black squares) or as parallel execution of more than one activity. In Figure 3, in the left part activities A2 and A3 are executed in parallel after activity A1; in the right part one of the two activities is selected according to the answer to the ques-tion posed.

Dependencies identified at the actor-role level model are represented as interactions at the role-activity level; indeed, interdependencies between roles are translated into specific activi-ties at this level, which constitute the interface between the roles.

Figure 3: Role-Activity diagrams and activity interaction between Role 1 and Role 2

Activity 1

Activity 2

Parallelexecution

A1

A4

A3A2

OK?NoYes

Alternativeexecution

A1

A4

A3A2

Role 1 Role 2

A1

A2

A3

A4

A5

A6

A7

To provide interaction point between customer and ABC

Service Customer

Goals: To supervise the functions of the local support

Field Engr. Support Mgr.

plays plays

Supervise Local Support

Goals:

Field engineer needs supervisor’s authorization

to perform changes to application

A



2.5.5.2 Organizational Patterns The software development community is a useful source of examples of pattern use, in particular by those advocating and practicing object-oriented approaches and re-use. What these efforts have in common is in their attempt to exploit knowledge about best practice in some domain. Best practice knowledge is constructed in ‘patterns’ that are subsequently used as the starting point in the programming, design, or analysis endeavours. Patterns therefore, are not invented but rather they are discovered within a particular domain with the purpose of being useful in many similar situations. A pattern is useful if it can be used in different contexts.

A pattern should be made explicit and precise so that it can be used time and time again. A pattern is explicit and precise if:

• It defines the problem (e.g. ‘we want to improve yield in a manufacturing process’) together with the forces that influence the problem and that must be resolved (e.g. ‘managers have no sense for data variability’, ‘collaboration of production personnel must be achieved’, etc.). Forces refer to any goals and constraints (synergistic or conflicting) that characterize the problem.

• It defines a concrete solution (e.g. ‘how basic problem investigations should be done’). The solution represents a resolution of all the forces characterizing the prob-lem.

• It defines its context (e.g. ‘the pattern makes sense in a situation that involves the ini-tial interaction between the statistical consultant and his customer’). A context refers to a recurring set of situations in which the pattern applies.

Following a pattern template, it will be possible to visualize and identify a pattern, so that it can be interpreted equally well by all who might share the pattern. The structure of the pat-tern includes information describing the pattern, examples of use along with information describing the relationships that the pattern has with other patterns and so on. Table 1 lists various quality criteria for the usage of patterns.

Anti-patterns are defined as telling you how to go from a problem to a bad solution, telling you how to go from a bad solution to a good solution or, something that looks like a good idea, but which backfires badly when applied. Recognizing "bad" business practice may provide knowledge or impetus for identifying and describing the relevant good practice.

In order to make EKD knowledge easy to organize and access for the benefit of the organi-zation one needs to systematize and structure the knowledge and experience gained in different parts of the company. In using patterns, we advocate an approach to describing repeatable solutions to recognizable problems. The pattern usage framework must therefore make the distinction between product or artefact patterns and process patterns, and include an indexing schema for accessing them. The patterns typology aims to distinguish between the way to solve a problem and the elements that will be used for the solution. The template presented above represents the usage perspective and the EKD modelling pre-sents the knowledge perspective.

Table 1: Quality criteria for pattern usage

Criteria Sub-criteria High Value Medium Value Low Value

Usefulness Degree of triviality

The degree to which The pattern is concerned with

While the pattern deals with a pertinent prob-

The pattern is concerned with a problem which does



the pattern addresses a problem which is of little importance because the problem or solution is obvious.

issues that are or most likely will be of concern to other parts of the company.

lem, the solution is already well known.

warrant the creation of a pattern since it is so trivial with the proposed solution being obvious to domain experts.

Grade of imple-mentability Extent that pattern is thought to be practi-cal and implementa-ble. Is change compatible with business strategy. Have trade-offs been taken into account

The pattern is useful in that it prescribes practical, easy to understand and implement solutions.

The pattern may be of some use despite some practical problems in implementation and some difficulty in understanding the solution.

The pattern is not usable. The solution is impractical and difficult to understand. The pattern only proposes "paper-based" change rather than real change

Degree of confiden-tiality

The pattern does not disclose any confi-dential business information.

Some information may be able to be used by other projects.

The pattern discloses sensitive project infor-mation.

Quality Degree of complexi-

ty The number of factors and the their relationships .

The pattern address-es only a few man-ageable main concepts and ideas.

The pattern is complex but may still be useful in that the complexity is needed.

The large number of factors that affect the implementa-tion of the solution minimis-es the chances that the solution can be implement-ed.

Addition of Value The local and global benefits accruing to the business with the implementation

The consequences of a successful imple-mentation are great value to the project directly affected as well as other projects

The local and global benefits are unclear, difficult to determine or marginal.

There are no local or global benefits or there is a conflict between these so that in total no value is added

Level of genericity Abstraction level of the problem that the pattern addresses

The pattern address-es a problem that is general enough for all the company.

The pattern addresses a problem that applies only to part of the company.

The pattern addresses a problem that is only relevant to the project in which it was discovered.

Grade of under-standability Visualisable and identifiable

The pattern is easy for decision-makers, domain experts and those to be affected, to comprehend.

The pattern is only partially understandable to decision-makers, domain experts and those to be affected.

The pattern is incompre-hensible to stakeholders

External compatibil-ity The extent to which the pattern could be used by other companies

The pattern has taken into account differ-ences in national, and organisational cultures and ways of working amongst identified future external users of the pattern.

The pattern has partially takes into account differences in national, and organisational cultures and ways of working amongst identified future external users of the pattern.

The pattern does not take into account differences in national, and organisational cultures and ways of working amongst identified future external users of the pattern..

Cost Level of experience

in their use The pattern has been implemented within the company.

The pattern has been partially or sporadically used.

The pattern has never been implemented.

Economic feasibility of the proposed solutions

Proposed solution is relatively easy to implement. Organisa-tional support exists in terms of sufficient resources as well as managerial support. The solution is politically and socially acceptable .

Proposed solution is difficult but feasible to implement. Organisa-tional support is luke-warm. Resources are available but may not be sufficient. There may exist political and social difficulties in making the pattern feasible.

Proposed solution is not feasible. Organisational support will be difficult to obtain. The resources will not be made available. The existing difficult social and/or political climate would make an implemen-tation impossible.



2.6 Risk analysis and evaluation techniques

2.6.1 CORAS CORAS [Lund et al, 2011] is a model-based approach for performing security risk analysis and assessment. CORAS is based on traditional security analysis techniques such as the structured brainstorming technique HazOp, fault tree analysis (FTA) and the failure mode and effects analysis (FMEA).

CORAS is comprised by three different components:

• A risk modelling language, which includes both the graphical syntax of the CORAS diagrams and a textual syntax and semantics.

• A method, a step-by-step description of the security analysis process, with a guide-line for constructing the CORAS diagrams.

• A tool for documenting, maintaining and reporting risk analysis results.

The CORAS modelling language allows to represent the key concepts of the methodology. It is based on the UML standard modelling language. Figure XY depicts the elements of the CORAS language. There are two main types of concepts: Risks and Treatments. Intuitively, Risk concepts concern the nature of the identified risks, as well as the assets they impact on, while treatment concepts concern the countermeasure to treat risks. Overall, the main concepts are:

Target of evaluation: The information system or organisation that is the subject of the security risk analysis. Often referred to as the target.

Party: An entity related to the target of evaluation and for which the security analysis is held.

Asset: Something to which a party of the target directly assigns value and, hence, for which this party requires protection.

Unwanted incident: An event that may harm or reduce the value of assets.

Risk: A risk is the chance of the occurrence of an unwanted incident.

Likelihood: The frequency or probability for an unwanted incident to occur.

Consequence: The impact of an unwanted incident on an asset in terms of reduction or loss of asset value.

Risk value: The level or value of a risk as derived from the likelihood and the consequence of an unwanted incident.

Treatment: The selection and implementation of appropriate options for dealing with risk. A means that is directed towards one or more risks with the objective of reducing risk value.

2.6.1.1 CORAS Diagrams CORAS diagrams depict risks and treatments according to various perspectives. There are 5 types of diagrams:

• The Assets Overview Diagram depicts: (i) the relevant parties; (ii) the relevant as-sets; (iii) the asset chains, i.e., how harm to one asset may affect others; and (iv) which asset is valued by which party.



• The Threat Diagram depicts the threats to the assets, i.e., the sequence of events that affect assets if a threat happens. A threat can be of human origin or non-human; human threat can be deliberated (such as a malicious attack) or accidental. Case re-lations can have a likelihood. Through cause relations, threats generate vulnerabili-ties, which in turn may cause threat scenarios or to unwanted incidents. These are linked through impact relations to the assets they have consequences on.

• The Risk Overview Diagram summarises Thread Diagrams, reporting only the threats, the risk scenarios and the impacted assets.

• The Treatment Diagram depicts the countermeasures intended to mitigate the risky scenarios. This is achieved using the threat scenario element, which is linked through cure relations to threat scenarios or unwanted incidents.

• The Treatment Overview Diagram summarises the proposed treatments.

Figure 4: The meta-model of the CORAS framework

2.6.1.2 CORAS guidelines The CORAS guidelines encompass 5 phases:

• In the Context Establishment phase, through meetings and interviews with clients, the overall organisational setting is scoped, as well as the threat scenarios and un-wanted incidents as perceived by the clients.

• In the Risk Identification phase, through meetings and interviews as many new threat scenarios and unwanted incidents are identified.

• In the Risk Estimation phase, the consequences and likelihood values for each of the identified unwanted incidents are estimated.

• In the Risk Evaluation phase, possible countermeasures (treatments) are identified with the help of the clients.

• Finally, in the Risk Treatment phase, the previously identified treatments are refined and structured with respect to the overall scenario.



2.6.2 Risk analysis and evaluation in past EU projects

Several past EU projects present outcomes relevant from the viewpoint of risk analysis in OSS adoption. OSS quality metrics are provided in the QualOSS Quality Model, which pro-vides about 150 quality metrics assessing the robustness and evolvability of OSS and the SQO-OSS Quality Model, which provides various metrics and an automated software evalu-ation system to evaluate both the product and the community.

The Navica Open Source Maturity Model (OSMM) and the QualiPSo OpenSource Maturity Model OMM (also refer to [SLR29] in the SLR) are processes to evaluate OSS maturity, and as such, clearly relevant, analysing a specific source for OSS risks. The approach proposed in Qualification and Selection of Open Source software (QSOS) selects and compares open source software evaluates functional coverage, risks for the user and risks for the service provider, while Open Business Readiness Rating (OpenBRR) is a method for rating open source software through the community. Various of the measures and techniques used herein could be potentially reused or adapted in the RISCOSS project.

For more details and references please refer to the RISCOSS deliverable D4.1.

2.7 Discussion about limits of the current approaches in OSS adoption risk management and about research opportunities

Our focus in this overview is adoption, intended as integration of OSS components, by de-velopers and companies, for the realisation of close and open source software products. The works we have analysed give abroad coverage of the state of the art in this area. However, we also recognized several limits, which give space to research opportunities.

Various publications analyse, on one side, the security risks of OSS and challenges and failures in OSS deployment, and on the other side, they analyse risks and propose process-es for COTS adoption, while a few works have been done specifically in our topic of interest.

The majority of recent papers on risks in OSS and COTS adoption analyse the as-is situa-tion, applying literature studies and empirical studies mainly basing on surveys and inter-views, and some times (for OSS) on a quantitative analysis of source code, repositories metadata and bug trackers. For example, the complete and detailed joint work by Conradi, et al. [SLR7][SLR12][SLR18][SLR47] classifies risks and mitigation activities on the basis of literature and various surveys, but relies, like other works, mostly on subjective perceptions, performing no evaluation of the effectiveness of the mentioned mitigation activities. Experi-ments to show the causal relations between failures, risks, measures and mitigation activi-ties with statistical evidence, were performed merely for security and business aspects.

Some of the analysed approaches propose quantitative measures and analyse their effec-tiveness (e.g. [SLR4][SLR6][SLR15] in the SLR for API metrics and code changes, [SLR14] for code executability, [SLR25] for business values). [SLR17] proposes a reliability model combining qualitative and quantitative metrics, but does not consider OSS-specific communi-ty and repository measures. In general, a few works that propose metrics to support OSS selection take account for the organisation that adopts the component, including the compa-ny’s size and structure, its internal processes and the quality requirements for its products.



Our aim is to define a process that improves the selection, management and maintenance of OSS components. We will review existing COTS quality assessment processes, including the ones enacted in the project partner companies in practice. Giving a particular focus on the differences between OSS and closed source components identified in the SLR, we try to understand the causal relationships between risks and failures, risk metrics, and risk mitiga-tion. To propose such measures we investigate into two directions: (i) evaluating the OSS project community ecosystem and the adopting company, by organisational modelling and analysis, and (ii) evaluating the data available in the OSS projects repositories, such as code repositories, bug trackers, and mailing lists, applying measurements based on a statistical analysis. Techniques and tools for this analysis, the prioritisation and multi-criteria decision making will be presented in the next section.



3 Initial requirements for OSS adoption risk representation and analysis

The description of the state of the art has shown relevant approaches and techniques that promise to give contribution to the activity of risk analysis. To support risk and cost man-agement in open source software adoption, new techniques or suited combinations of exist-ing techniques are needed.

Here we make a review of the literature related to approaches to decision making, organisa-tional representation and reasoning, risk representation and analysis via statistical ap-proaches, search based approaches and more formal approaches, such as SAT/SMT techniques in order to propose a set of requirements for the project.

We review approaches that considered the concepts representing or related to risk in gener-ic software engineering methodologies and in goal-oriented approaches (in particular in i*/Tropos [Asnar et al., 2011], in KAOS [Lamsweerde and Letier, 2000] and in EKD [Rolland et al, 1999]. These concepts have been specified also considering the possibility to perform an analysis of the models exploiting them. We then review literature on statistical risk analy-sis, search based and formal approaches in risk/cost optimisation and preference analysis.

The objective is that of presenting a portfolio of techniques that are suited for the particular domain of risks in OSS adoption with the intent of promoting an integration of these tech-niques and an advancement in some of them to deal with the challenges proposed in the field of the RISCOSS project.

3.1 Representation of ecosystems and risks for analysis purposes

We present the requirements for the identification and modelling of the domain, the organi-sations, the intentional forces, and value chains and business processes involved, as pre-requisite for risk analysis and the development of proper mitigation techniques.

3.1.1 Representation in Goal-oriented methodologies for analysis purposes

Goal orientation is a modelling paradigm, which allows us to focus on the desired objectives, analysing the possible strategies to achieve them, and exploring alternative tactics. Organi-sational goal modelling approaches capture the objectives and strategic dependencies within an organisation. With respect to risk analysis, goal models give the possibility to highlight the possible points of failure at the strategic levels. This enables for alternative solutions to be explored, without the need to modify detailed architectural designs.

3.1.1.1 i* / Tropos Methodology i* [Yu, 1995] is a goal-oriented modelling framework tailored to model an organizational domains in terms of the relevant actors, their goals, and the strategic dependencies among them.

The i* language rely on the concept of Actor, which is an active element of the model and typically represents domain stakeholders. Actors can be of type Role, thus describing an abstract stakeholder type; or Agent, thus describing the concrete instance that plays one or more roles; Positions are used to describe aggregate sets of roles. Actors have goals, which



represent states of affairs desired by the actors; when goals are achieved they are said to be satisfied. Soft-goal are not clear-cut goals, whose satisfaction can not be clearly stated. Tasks operationalize goals, by representing the way they can be satisfied. Resources are elements of various nature, whose presence is needed for the satisfaction of goals. The set of goals, soft-goals, tasks and resources belonging to an actor form the actor’s rationale. An actor has the ability to achieve a goal if (i) the actor has in its rationale another element or set of elements, whose purpose is the achievement of the goal; or, (ii) if the actor can dele-gate the achievement of the goal to another actor - the dependee. When an actor delegates another actor to satisfy a goal, a dependency relation is used to link the two actors.

In i* the domain is modelled along the two following perspectives: the strategic rationale of the actors - i.e., a description of the intentional behaviour of domain stakeholders in terms of their goals, tasks, preferences and quality aspects (represented as softgoals); and the strategic dependencies among actors - i.e., the system-wide strategic model based on the relationship between the depender, which is the actor, in a given organisational setting, who “wants” something and the dependee, that is the actor who has the ability to do something that contributes to the achievement of the depender's original goals.

Tropos [Bresciani et al., 2004] is an agent-oriented, goal-driven software engineering meth-odology that relies on the i* modelling framework, defining a process to design distributed socio-technical systems. Tropos adopts a requirements driven approach to software devel-opment, recognizing a pivotal role to the modelling of domain stakeholders and to the analy-sis of their goals, before generating a design for the system-to-be. System design then results in specifying software agents who have their own goals and capabilities that are intended to support the fulfilment of stakeholder goals.

Tropos provides guidelines to cover the whole software development process, and is struc-tured in five main phases, namely: Early Requirements Analysis phase, which focuses on the understanding of the existing organizational setting where the system-to-be will be intro-duced; Late Requirements Analysis phase, which deals with the analysis of the system-to-be; Architectural Design phase, which defines the system’s global architecture in terms of subsystems; Detailed Design phase, which specifies the system agents micro-level; Imple-mentation phase, which concerns code object- or agent-oriented generation according to the detailed design specifications.

3.1.2 Business processes

OSS adoption is not simply a technical solution, but implies a deep integration of the open source model with the enterprise environment and its business processes. Potential threats to the efficiency and effectiveness of business processes are originated from the open source model and have to be taken into consideration.

Failure modes and effects analysis (FMEA) is a standardised technique [FMEA, 2006] to analyse potential failure and their effects, having the purpose to demonstrate that no single failure will cause an undesired event. FMEA roots are in the analysis of electronic circuits

for military systems. The FMEA technique is carried out by a team of analysts, who work together in an iterative process. When potential points of failure are identified, corrective measures can be identified to strengthen the design of the system, by removing the point of failure, by mitigating its severity, or by providing alternative execution paths. The FMEA approach also documents the rationale for a particular manufacturing process. FMEA pro-



vides an organized, critical analysis of potential failure modes of the system being defined and identifies associated causes. It uses occurrence and detection probabilities in conjunc-tion with a severity criterion to develop a risk priority number (RPN) for ranking corrective action considerations.

Particularly interesting is the possibility to integrate workflows with organizational strategic models. [Bhuiyan et al., 2007] present an approach to manage business process risks by performing an integrated analysis of the organizational strategic model. This approach relies on i* strategic models to identify vulnerable actors, where a vulnerability is expressed in terms of the number dependencies on other actors. Then, the vulnerabilities are treated in the business process associated to each actor, by providing alternative execution paths to execute in the case of dependency failure.

3.1.3 Ontologies for representation of and reasoning on risks An ontology is traditionally defined as a formal specification of a conceptualization [Gruber, 1993]. This definition refers to the formal study of the nature of beings. In some sense, it may be conceived as a view on the world (or part of it). We use ontologies to look at the work of risk OSS, and organizations. In particular, we take benefit from the adoption of a lower (i.e., domain-specific) ontology able to capture the relevant relations among concepts. Once the concepts, necessary to describe the domain under analysis (such as organization-al settings, risks and countermeasures), have to be formally specified, the resulting ontology may allow to reason about risk at a class level. This opens the door to a different type of knowledge analysis, which allows us to infer new knowledge from available data.

Refer to Deliverable D1.1 for a detailed literature study on ontologies in the RISCOSS do-main.

3.2 Data processing and analysis: statistical approaches

Operational risks are characterised by two statistical measures related to risk events: their severity (or probability) and their frequency (or impact) [Cruz, 2002]. A common approach to model frequency and severity is to apply parametric probability distributions functions. For severity, normal and lognormal distributions are often applied. Other distributions used to model the severity are: Inverse Normal, Exponential, Weibull, Gamma, and Beta. For details on these distributions see Kenett and Zacks (1998) [Kenett and Zacks, 1998]. On the other hand, in order to model the frequency of specific operational risk events two main classes are used: ordinary (Poisson, Geometric, Binomial) and zero-truncated distributions. The most common goodness of fit test for determining if a certain distribution is appropriate for modelling the frequency of events in a specific data set is the Chi-square test. The formal test for testing the choice made for a severity distribution is instead the Kolmogorov Smirnov and related measures of interest (see [Kenett and Zacks, 1998]). Having estimated, sepa-rately, both the severity and the frequency distributions, in operational risk measurement we need to combine them into one aggregated loss distribution that allows us to predict opera-tional losses with an appropriate degree of confidence. It is usually assumed that the random variables that describe severity and frequency are stochastically independent. Formally, the explicit formula of the distribution function of the aggregated losses, in most cases, is often not analytically explicit. One popular practical solution is to apply a Monte Carlo simulation (see Figure 5).



Figure 5: schema for a Monte Carlo simulation execution

On the basis of the convolution obtained following a Monte Carlo simulation, operational risk measurement can be obtained by summary measures, such as the 99.9th percentile of the annual loss distribution, also called Value at Risk (VaR). In operational risk the distribution of a financial loss is obtained by multiplying the frequency distribution with the severity distribu-tion. These considerations motivate using the geometric mean of risk measures, when ag-gregating risks over different units. The use of the geometric mean is a necessary condition for preserving stochastic dominance when aggregating distribution functions.

Cause and effect models have also been used extensively in operational risk modelling. Specifically Bayesian methods, including Bayesian Networks, have been proposed for mod-elling the linkage between events and their probabilities. For detailed discussions on these methods see [Giudici and Billota, 2004] [Cornalba and Giudici, 2004] [Bonafede and Giudici, 2007] [Fenton and Neil, 2007] [Ben Gal, 2007] [Dalla Valle et al, 2008] [Figini et al. 2010] [Kenett and Raanan, 2012]. The following section presents a short overview of traditional operational risk measurement techniques.

3.2.1 Operational risk measurement techniques

In order to be able to assess and manage risk, it must be measured. It is impossible to man-age anything that is not measured, risk being a prime example of this approach. In this section we introduce three operational risk measurement techniques: LDA, scenario analysis and Balanced Scorecards.

3.2.1.1 The Loss Distribution Approach The Loss Distribution Approach (LDA) is a measurement technique that is particularly suita-ble for banks and other financial institutions. It aims at calculating the Value at Risk (VaR) which is a monetary value that these institutions need in order to assign adequate capital, as far as their regulators are concerned, against operational risk. This expected value may be of lesser interest for businesses that have a different approach to risk, for example if they view small losses, bounded below by a periodically changeable limit, as either negligible or being part of the cost of doing business. On the other hand, these businesses insure them-selves against losses that surpass another dynamically changed amount and consequently implement mitigation strategies to handle only losses that fall between these two bounds. This optional mitigation strategy is not available to banks and many other financial institu-



tions for they function, in effect, as their own insurers and therefore must have a more pre-cise knowledge of the risks, not just some bounds and frequencies. As an example of this type of risk management behaviour one may look at supermarkets and large food sellers in general that have become accustomed, albeit unwillingly, to losses stemming from employ-ees' theft – a definite operational risk. Many consider this theft-produced loss a part of doing business as long as it does not rise above a certain level, determined individually by each chain or food store, and take out a specific policy with an insurance company against larger thefts.

LDA, which is used extensively in calculating the capital requirements a financial institution has to meet to cover credit risks, is a statistically-based method that estimates two functions involved with risk – the occurrence frequency and the loss amount frequency. From these two distributions, the distribution of the VaR may be computed. For financial institutions, the VaR has to be calculated for each business line (Basel, 2006), and then a total VaR is calcu-lated by summing the individual business-lines VaRs multiplied by their weight in the bank's outstanding credits. While this measuring method is complex to implement and requires extensive databases – some of them external to the bank and computationally intensive, there are a number of approaches for financial institutions to calculate it. The effort and investments involved may be worth while only for large banks since it can lead to a signifi-cantly smaller capital allocation for operational risk, thus freeing a highly valuable resource for the bank.

For operational risk in other types of businesses, such very fine breakdown of events and their consequences may not be required, for a number of reasons. One, the operational risk is seldom, if ever, related to a business line. Two, operational risk events are frequently the result of more than one causing factor in the wrong range and thus attributing the risk to one of them or distributing it among them will be highly imprecise, to say the least. Three, the costs of implementing such a measurement system may prove prohibitive for a business that is capable of getting insurance against these losses for a small fraction of that cost. .

3.2.1.2 Scenarios Scenarios are used in many areas where the prospects of having accurate predictions are slim or where there are no analytical tools available to produce such predictions at all. They are frequently used for strategic planning in order to discover, as realistically as feasible, what would be a suitable reaction by the business to a wide range of possible developments of many variables that affect the business, in various combinations. Scenarios range from an extension of current reality into the foreseeable future to extreme changes in the business's environment, status, capabilities and associations. Scenarios are used in operational risk measurement in a number of cases. The first case involves an organization that wishes to engage in Operational Risk Management, but is lacking the requisite risk event repository from which to calculate – or even simply summarize – the results of the various risks. That is the most usual case, and it is frequently used because it takes a long time from the initiation of a risk management activity till the time the organization has a workable repository with enough risk events that materialized. Thus, organizations use the scenario technique in order to shorten the time to the implementation of a risk management approach with the proper mitigation strategies. The second case involves a significant change in the environ-ment that the business operates in. Usually it is a change in the external environment: new regulatory demands, radically changed economic environment, new technologies being



brought rapidly to bear on the economic segment the business operates in, and so on. Oc-casionally, it may be a drastic re-organization of the business, such as a merger of different units into a single one, or a merger with another business or an acquisition of a business and the attempt to assimilate it successfully into the business.

The scenarios technique involves a team, familiar with the business processes being stud-ied, devising possible business scenarios – and trying to see what the reaction of the busi-ness might be, and what might go wrong. Doing this systematically, step by step, and covering all possible areas (technology, people, processes, etc.) that may be affected by the scenario, results in a list of potential risk events that are latent within the business process under study. This method is then applied to every business process used in the business until a complete list of latent risk events is compiled. This list is then analysed, categorised and stored as a virtual risk event repository. Then, a measure may be computed for varia-bles that are of interest, including the Value at Risk (VaR) involved with each risk event. If some data is available that describes the frequency of executing a particular business pro-cess, estimates of expected losses can be computed. Mitigation strategies are then devised for each risk event, and the implementation of Operational Risk Management continues from this point onward. The benefits of this technique are:

1. It is not dependent on an existing repository of risk events.

2. Even if a risk event repository exists in this business, this technique may prepare the business for risk events that have not yet been registered in the repository – for the simple reason that they had not occurred or that they had occurred prior to the repos-itory being established – but these risks are nevertheless worth considering and pre-paring mitigation strategies for them.

3. It may be done in a relatively short period of time, eliminating the need for waiting for a significant accumulation of risk events in the risk repository.

4. It may be used in addition to using the risk repository.

The drawbacks of this technique are:

5. It is based on a complete mapping of all business processes in the business. Leaving out a few business processes may make the whole effort not useful since significant portions of the business activity may be left uncovered.

6. It usually requires a large team. The team usually includes people from the risk man-agement office, from the industrial engineering unit and from the operation of the business itself. The core people, like the risk managers and the industrial engineers may form the central, fixed part of the team, but the people familiar with the various business processes will have to change with each area of activity covered.

7. Lacking any significant history of risk events, it takes a very determined management to undertake such an extensive and expensive activity.

All things considered, it is a good technique, though usually the lack of a complete mapping of all business processes prevents it from being very effective. On the other hand, this map-ping – a requisite for using this technique – may be a very substantial side benefit of this operation and, indeed, it may be a sufficient benefit by itself to justify the whole process.

3.2.1.3 Balanced Scorecards Balanced Scorecards (BSC) were made famous in the business world by Norton and Kaplan in the early 1990's ([Kaplan and Norton, 1993, 1996], [Ograjenšek and Kenett, 2008]). Since



that time, the notion has caught on and today the Balanced Score Card (BSC) is widely used in businesses in all disciplines. For an application to organizations developing systems and software see [Kenett and Baker, 2010]. In short, the basic concept of the scorecards is, as the name implies, to compute a score for the measured phenomena and to act upon its changing values. The concept of an operational risk scorecard is the same as that of the general scorecard, except that in this case it is much more specialised and concerns only operational risks in the business. Whereas in the classic BSC the scores represent the performance in the financial, customer, internal processes and learning and growth facets of the business (although many variations exist), in the operational risk scorecard the meas-ured aspects may be technology, human factors and external factors affecting the business operations. This division is by no means unique, and many other divisions may be used. For example, a bank trying to comply fully with the Basel II recommendations may concentrate more heavily on the ICT part of the operations when handling operational risk, and subdivide this score into finer categories – hardware, software, communications, security and interface. Similar subdivisions may be tried out in other area representing operational risk.

After the complete classification and categorisation of all operational risks are completed, weights are assigned to the elements within each category and then a risk score may be computed for each category by providing the values of the individual risks of the elements. The resulting score must be updated periodically to be of value to the organization.

As a final note, it is worthwhile to consider a combined risk indicator, composed of the indi-vidual risk categories managed by the organization, which is added to its overall scorecard, thus providing management not only with performance indicators in the classic BSC, but also an indication of the risk level at which the organization is operating while achieving the busi-ness-related indicators.

3.2.2 Statistical approaches for risk evaluation

The statistical approaches listed in this section present models for conducting risk analysis with the open source R software using potentially available data from the RISCOSS Use Cases (an example can be found in Annex A). R is an open source programming language with more than 4300 application packages available at the Comprehensive R Archive Net-work (http://cran.r-project.org/). The approach is a proof of concept approach designed to demonstrate what can be done.

A comprehensive set of tools available in R is provided in four separate tables, and related risk issues in the context of open source software. Demonstration of the use of some of these tools is then carried out with XWiki data.

3.2.2.1 RISCOSS Analytics In this subsection we summarize various potential analytical features for the RISCOSS project, Table 2 presents analysis goals and corresponding R packages, Table 3 provides Risk identification and RISCOSS use cases data sources, Table 4 and Table 5 relate to OSS risks.

Table 2: Analytic features, analysis goals and corresponding R packages

Software Analytical Features Analysis Goal R Packages

Risk scoring Derive Key Risk Indicator QRM, CreditMetrics, highriskzone,



From structured and unstruc-tured data

rriskBayes

Probability of risk events Determine what affects risk events

Mangrove, bnlearn

Association rules Identify patterns in risks root causes

arules

Statistical correlation

Provides a measure of association that can be used in mitigation processes

WGCNA, biwt, CCM

Cluster analysis To reduce dimensionality and identify patterns

mclust, fpc

Text and semantic analysis Evaluating semantic data tm, corpora, zipfR, topicmodels, RWeka

Table 3: Risk identification and RISCOSS use cases data sources

Risk Identification Data sources in RISCOSS UC

Event logs XWiki

Expert opinion All

Scenario analysis and list of undesirable results TEI

Using databanks of risk events that materialized in similar businesses

TEI (PSIRT)

Centralized library of risks Generic risks that exist within the organization and then associate those risks with actual business activity (i* analysis)

Table 4: Risks in adoption and deployment of open source software

Risks in open source software: Derived from A Systematic Ap-proach for Managing Free and Open Source Products by María Carmela Annosi

Context risks Problems associated with using similar components in different appli-cation contexts

Process Inappropriate development process

Quality risks Perceived reliability of the software

Evaluation risks Extended development and manage-ment of component based applica-tions in all the post release activities

Software dependencies

Severity of bugs reported Impact over the application



Visibility of bugs

Type of bugs (security or not)

Relevance of functional updates

Open Source Maturity Model (OpenBRR)

Table 5: Key Risk Indicators in adoption and deployment of open source software

Key Risk Indicators Questions

Number of open bugs Adoption and Version release: What are the risks with this component?

Changes in the number of bugs

Evaluation of Community Respon-siveness. Monitoring the speed of bug fixes

3.2.2.2 Statistical models In this section we summarize various statistical models with potential applicability in RISCOSS.

• Bayesian modelling: The approach is used to match risk related mitigation patterns to specific risk conditions.

o We consider a particular organization facing an OSS related decision such as adoption, deployment, integration of OSS components (CV)

o This organization has a specific context (IP) o The RISCOSS use cases were used to construct a data base of risk related

patterns (JD) o The objective is to match {(CV), (IP)} with (JD) using Bayesian inference

• Multivariate scoring methods: These methods include: Decision trees, Cluster analysis, Discriminant analysis and Logistic regression. They provide a method to de-rive risk scores under various contextualized conditions.

• Association rules: This approach permits to analyze semantic data by identifying associations between various semantic terms used in taxonomies and ontologies.

• Case Base Reasoning: This practical problem solving approach relies on past simi-lar cases to find solutions to problems. It can be used to match relevant patterns to specific situations.

• Bayesian Networks: Bayesian networks are graphical structures used to represent knowledge on cause and effects in a specific domain. Each node in the graph repre-sents a random variable, while the edges between the nodes represent probabilistic dependencies among the corresponding random variables.

o Topology and network specification: Network nodes are built using a set of discrete/continuous causal var-

iables A set of oriented arcs links a pair of nodes Each node Xi has a conditional probability distribution

quantifying parents’ effects on the node The graph has no oriented cycles (DAG)



• FMEA: Failure Mode Effects Analysis (FMEA) is an approach for identifying all pos-sible failures in a design, a manufacturing or assembly process, or a product or ser-vice. FMEA is a structured methodology that allows for:

Identification, analysis and sorting for potential failure modes based on their risk

Definition of their impact on the system, product or system Definition for the potential causes Action items for execution

Definitions: Risk (f) = P (f) x C (f), where f: software component; Risk(f): risk exposure; P(f): failure probability; and C(f): cost of the failure.

3.2.2.3 A Roadmap for Risk Assessment in Support of OSS adoption and deploy-ment, exploiting statistical techniques

A preliminary process for a risk based evaluation of OSS components is the following:

1. Identify and classify the risk events.

2. Evaluate and measure the failure probability of software components, Software per-formance and reliability.

3. Evaluate and measure the consequence of failure of the software components

4. Assesses, audit and control of changes in the software.

5. Software security testing; impact assessment of vulnerabilities and exploits, vulnera-bilities discovered by the wider community, vulnerability notification mechanisms, time to fix vulnerabilities.

Example: An example for statistical analytics on unstructured and semi-structured data from project repositories is presented in Annex A, based on data provided by the RISCOSS part-ner XWiki from the bug tracking software Jira and the mailing list records.

3.3 Data processing and analysis techniques: formal and search-based techniques

3.3.1 Formal approaches In this work we also intend to use formal analysis techniques in multi-criteria decision-making problems, and in the (mainly quantitative) analysis of OSS ecosystem organisational models produced in the work packages 1 and 5. In particular, we envisage the usefulness of SAT/SMT based and logic based techniques such as Disjunctive Logic Programming and Datalog. These techniques are particularly suitable for qualitative analysis to be applied to the models of the organisations.

In the following we give a brief description of the candidate techniques highlighting the as-pects that can be considered in our project.

3.3.1.1 Disjunctive Datalog and DLV Datalog [Leone et al., 2006] is a first-order logic program for querying deductive databases; it belongs to the wider class of Disjunctive Logic Programming [Minker, 1994]. A Datalog program is a set of rules of the form r:- l1∧ … ∧ln, where r (called the head of the rule) is a positive literal and l1, ..., ln are literals (called the body of the rule). Intuitively, the rule states that if l1,...,ln are true then L must be true.



In our work we aim at exploiting the DLV reasoning engine that is an implementation of Datalog. It is an Answer Set system that extends Datalog in different ways. It adds disjunc-tions in the rule heads, thus giving the possibility to generate multiple alternative models. It also supports weak constraints – i.e., constraints that can be violated at a cost; this way, solutions can be ranked according to how many violations occur. The search techniques and heuristics used by DLV are backward search — similar to Sat algorithms — and advanced pruning operators, for model generation, and innovative techniques for answer-set checking. DLV generates as output a complete set of models produced by the set of predicates and assignments to the variables or a pruned set of models that depend on input preferences.

In our context, the possibility given by DLV to obtain the exploration of alternative solutions, for example corresponding to different possible strategies for risk mitigation and cost minimi-zation, is a key decision making support. In particular the idea is that of exploiting DLV in the analysis of models of the organisation and of the OSS ecosystems. This is in line with the works that exploited DLV to analyse models such as in [Ingolfo et al., 2013], where goal oriented models of laws have been analysed in order to explore possible alternative ways of being compliant to all or subsets of law paragraphs.

3.3.1.2 SAT / SMT based approaches for goal analysis SAT/SMT-based goal reasoning techniques [Sebastiani et al. 2004] [Giorgini et al., 2003] allow to verify properties of models and possibly support the analyst in the conflict resolution process. In particular, in the case of goal models we assume they are represented as a tuple (G, R), where G is a set of goals and R is a set of relations over G. In practice, a goal model can be seen as a set of and/or trees whose root goals are roots of and/or trees, whilst leaf goals are either leaves or nodes, which are not part of the trees.

For each goal of a goal model, three values are considered, which represent the current evidence of satisfiability and deniability of goals: full, partial, or none. Conflicting situations may exist, in which hold both, evidence for satisfaction and denial of a goal. For instance, for a given goal G we may have full evidence for its satisfaction and at the same time partial evidence for its denial. Such evidence is either known a priori or is the desired one. In both cases, the conflicts arise by reasoning on the models with the techniques explained below. On goal models, it is possible to execute both forward reasoning and backward reasoning analyses.

Forward Reasoning. Given an initial values assignment to some goals, input goals from now on (typically leaf goals), forward reasoning focuses on the forward propagation of these initial values to all other goals of the model. Initial values represent the evidence available about the satisfaction and the denial of a specific goal, namely evidence about the state of the goal. After the forward propagation of the initial values, the user can look at the final values of the goals of interest, called target goals from now on (typically, root goals), and reveal possible conflicts. In other words, the user observes the effects of the initial values over the goals of interests.

Backward Reasoning. Backward reasoning focuses on the backward search of possible input values leading to some desired final value, under specific constraints. The desired final values of the target goals are chosen, and an exhaustive search is done to find possible initial assignments to the input goals, which would cause the desired final values of the



target goals by forward propagation. We may also add some desired constraints, and decide to avoid strong, medium, or weak conflicts.

We expect to use these techniques to analyse the goal model in terms of the possibility to have the satisfaction of a subset of goals given the satisfaction configuration. The intent here is that of identifying how a given organisation is sensible with respect to the impossibility of satisfying some of its goals, so the possible risk for that organisation. This is in the line with the work from [Asnar et al., 2011] that adopted this technique for the analysis of the effects of risk events on the goal model. Candidate tools for supporting the goal-reasoning tech-niques we intend to use in RISCOSS are: the NuSMV/MathSAT and Yices model checkers, the PRISM probabilistic model checker, and the Jacop constraints solver2.

3.3.2 Search-based and Machine Learning based optimization techniques

Search based and machine learning approaches are increasingly influencing the field of software engineering especially in the area of software project planning, requirements man-agement and software testing, where several kind of decisions has to be supported that call for the identification of trade-offs between different criteria. In particular, in the area of soft-ware testing, several papers describe the application of search techniques such as genetic algorithms, for the automatic generation of test cases (see for example [Jia and Harman, 2010]) considering different properties of the software being tested. In the area of decision making in project planning and requirements engineering, several works focused on the problem of next release planning and in requirements prioritization as a mono- or multi-criteria decision making problem, proposing agendas and literature reviews as in [Herrmann and Daneva, 2008].

These approaches represent a baseline for managing trade-offs between different criteria related to the adoption of OSS components in a project, and, more generally, to support the choice of OSS adoption strategies at an organisational level. Candidate techniques for deci-sion making in OSS adoption and organisational cost/risk analysis are evolutionary meta-heuristics, such as mono and multi criteria genetic algorithms and machine learning algo-rithms.

Differently from the formal approaches suited for a qualitative analysis on organisational models, search based and machine learning techniques will allow us to perform quantitative reasoning on the specific data and models available in the OSS ecosystem (refer e.g. to Section 2.3.2.1), comprising OSS communities, related organisations, and companies. This analysis complements the qualitative analysis to support several level of risk and cost deci-sion-making.

Similarly to statistical techniques, search-based and machine learning techniques need and can handle a huge amount of data from the domain. Current decision making techniques in the RISCOSS Use Cases can be a first set of data to be considered. For example, in the cases of TEI and XWiki several (usually conflicting) risk criteria and related measures are considered in order to choose OSS components, or, at a higher level, to follow a specific OSS business model that has to be harmonized to the whole business model of the organi-sation.

2 see http://nusmv.fbk.eu/, http://yices.csl.sri.com, http://www.prismmodelchecker.org, http://jacop.osolpro.com, respectively.



In the following we give an overview of the search based and machine learning algorithms we consider suitable for our purposes. We also present a brief literature review related to the application of these techniques to decision making problems in project planning, software requirements and software design. Finally we sketch a series of requirements and discuss the benefits of their use in our context.

3.3.2.1 Search Based evolutionary algorithms Among the approaches to solve optimization problems (and in particular multi-objective optimization problems), evolutionary algorithms are promising. These algorithms, in fact, have proven to be general, robust and powerful search mechanisms [Marler and Arora, 2004]. They possess several characteristics that are desirable for complex problems involv-ing i) multiple conflicting objectives, and ii) large and highly complex search spaces. In the following we introduce a general description of a class of evolutionary algorithms, genetic algorithms, and their application for solving multi-objective problems.

Genetic Algorithms Genetic algorithms (GAs) [Holland, 1975] belong to the larger class of evolutionary algo-rithms, which generate solutions to optimization problems using techniques inspired by biological evolution [Mitchell, 1998]. Usually, an initial population of randomly generated candidate solutions represents the first generation. Processes based on natural selection, crossover and mutation operators are repeatedly applied to the population to produce poten-tial solutions. A fitness function is applied to the candidate solutions and any subsequent offspring. Over time, the number of above-average individuals increases. Highly-fit building blocks are combined from fit individuals to discover good solutions to the problem at hand.

In GA, populations of strings which encode the candidate solutions are known as chromo-somes or the genotype of the genome. Candidate solutions are also known as individuals, creatures, or phenotypes. A typical genetic algorithm requires (i) a genetic representation of the solution domain and (ii) a fitness function to evaluate the solution domain using different genetic operators. Here, ideas of operators that are commonly used in genetic algorithms are briefly presented:

Selection: Selection is a genetic operator that chooses part of an existing population from the current generation’s population for inclusion in the next generation’s population. Individ-ual solutions are selected through a fitness-based process, where fitter solutions (as evalu-ated by a fitness function) are more likely to be selected.

Crossover: In genetic algorithms, crossover is analogous to reproduction and biological crossover. Crossover operators aim to interchange the information and genes between candidates, taking two or more parent solutions and producing child solutions from them. One of these children may hopefully collect many good features that exist in its parents. The crossover operator is applied to mating pairs, based on probability. With given probability, crossover is used, otherwise the resultant offspring are simply copies of the parents.

Mutation: Mutation alters one or more gene values in a chromosome. With this new ordering, the individual may represent a better solution than its predecessor, analogous to mutation in biology. Mutation is an important part of genetic search algorithms as it helps to prevent the population from stopping its progress and to reach a higher fitness by escaping from local optima. Mutation occurs during evolution according to a user-definable probability.



In general, GAs require tuning and setting of genetic parameters such as population size, crossover probability, and number of generations to be calculated. The population size defines the number of individuals which participate to the evolution process. This size affects the performance and efficiency of the algorithm. Smaller population sizes cover a smaller search space and may results in poorer performance. On the other hand, a larger population size would cover a bigger space, but prevent premature convergence to local solutions. Moreover, a large population needs more evaluation per generations, which may slow down convergence. The crossover probability affects the rate at which the crossover operator is applied. A higher crossover rate introduces new strings more quickly into the population. If the crossover rate is too high, high performance strings can be eliminated faster than neces-sary, reducing improvements. Mutation Probability is the probability for random change in the genes of individuals. A low mutation rate leads to hill-climbing like convergence, whereas a high mutation rate results essentially in random search. The number of iterations defines the maximum number of runs for arriving to the final solution set. A higher number of itera-tion might produce a better result but needs longer time.

Genetic algorithms are stochastic methods that can be used to solve a broad class of opti-mization problems including allocation and prioritization problems. Different GAs have been widely used to solve multi-objective optimization problems, because of their robustness.

Multi-objective optimization and NSGA-II Multi-objective optimization (also known as multi-criteria or multi-attribute optimization) is the process of simultaneously optimizing two or more conflicting objectives according to certain constraints. Many real-world search and optimization problems are naturally posed as prob-lems having multiple objectives which are, without loss of generality, all to be maximized and all equally important. For multi-objective problems, it is difficult to identify a single solution that simultaneously optimizes each objective. Usually there is a set of solutions in which a solution cannot be eliminated from consideration by replacing it with another solution which improves an objective without worsening another one. This set of solutions are known as non-dominated, Pareto optimal or Pareto efficient solution.

Among the evolutionary algorithms suitable for the solution of these problems we are inter-ested in the Non-dominated Sorting Genetic Algorithm II (NSGA-II) [Deb et al., 2002]. NSGA-II implements elitism for multi-objective search, i.e., it privileges at each generation the solutions that best fit a fitness function, enhancing the convergence properties towards the true Pareto-optimal set. Moreover, it uses a crowding distance, which defines the density of solutions in the objective space, and the crowded comparison operator, which guides the selection process toward an uniformly spread Pareto frontier in order to have a more com-plete description of the entire solution space, allowing for a wider choice between alternative solutions.

3.3.2.2 Applications of Search based techniques in project planning and require-ments engineering decision making

Several search-based approaches have been exploited for the solution of different kinds of project planning problems in software engineering and in requirements engineering. In [Sivzittian and Nuseibeh, 2001] the Next Release Problem (NRP) is studied as an optimiza-tion problem. This work explores three different heuristics, namely simulated annealing, genetic algorithms, and ant colony optimization to solve NRP in some case studies. The



results of the paper show that the solutions discovered by the techniques contain a high percentage of requirements judged by the analysts and the user as important for a particular release of the system.

Among the methods exploiting genetic algorithms for software project planning and require-ments management, the EVOLVE method [Greer and Ruhe, 2004] supports continuous planning for incremental software development. In particular, this method allows for an opti-mal allocation of requirements, at the same time assessing and optimizing the conflicts among technical constraints and stakeholder priorities. The result is to balance required and available resources for each incremental version of the software. The approach is based on an iterative optimization method making use of a genetic algorithm.

Zhang et al. [Zhang et al. 2007] focused on the Multi-Objective Next Release Problem (MONRP) applied in requirements engineering and present the results of an empirical study on the suitability of weighted and Pareto-optimal genetic algorithms, together with the non-dominated sorting genetic algorithm NSGA-II [Deb et al., 2002], providing evidence to sup-port the claim that NSGA-II is well suited to the MONRP. In [Zhang et al. 2009] Zhang et al. propose an archive-based multi-objective evolutionary algorithm, based on NSGA-II to solve the problem of Requirements Interaction Management when dependencies between re-quirements influence the automatic selection of requirements during the phase of release planning. They demonstrate that their framework is able to find an optimal balance in the solution space for release planning under different contexts (such as different decision mak-ers, value, cost and risk relationships).

In [Finkelstein et al. 2009] a multi-objective optimization approach is presented, to investi-gate of the trade-offs in various aspects of fairness between multiple customers, technical and organisational risks, when a decision on the set of requirements to be implemented has to be taken. The approach is validated by using two real-world data sets and using data sets created specifically to stress-test the approach. Moreover, experiments are reported to determine the most suitable algorithm for this problem, comparing the results of the NSGA-II algorithms, and the Two-Archive evolutionary algorithm.

Several prioritization approaches share a prioritization schema that passes through the selection of one or more prioritization criteria (including risks, business goals and technical features), the acquisition of priorities for the alternatives (e.g., requirements, components) on the basis of criteria elicited from the stakeholders, and the integration of acquired orderings into the final one.

[Tonella et al., 2013] proposes a prioritization approach that aims at minimizing the disa-greement between a total order of prioritized requirements and the various constraints (rep-resenting costs, risks, expected benefits constraints) that are either encoded within the requirements or that are expressed iteratively by the user during the prioritization process. The approach uses an interactive genetic algorithm to achieve this minimization, taking advantage of the interactive input from the user whenever the fitness function cannot be computed precisely based on the information available. Specifically, each individual in the population being evolved represents an alternative prioritization of the requirements. When individuals with a high fitness (i.e., a low disagreement with the constraints) cannot be dis-tinguished, since their fitness function evaluates to a plateau, user input is requested interac-tively, to make the fitness function landscape better suited for further minimization. The



prioritization process terminates when a low disagreement is reached, a time-out is reached or the allocated elicitation budget has been consumed.

In our context the techniques will be exploited for risk, cost, benefit trade-offs identification and optimization especially in the case of conflicting constraints and partial knowledge in the decision-making phase.

3.3.2.3 The CBRank Machine Learning Algorithm The CBRank method [Avesani et al. 2003] [Perini et al., 2013] supports decision-making for ordering a set of alternatives considering a set of potentially conflicting constraints defined as ordering criteria over the set of alternatives. The framework provides an iterative prioriti-zation process that can handle single and multiple human decision makers (stakeholders) and different ordering criteria. A peculiarity of this framework is the use of machine learning techniques, in particular the Rank Boost algorithm to reduce the information elicitation effort required from decision makers.

The elicitation of the preference value in CBRank is performed via a contrast analysis. In fact, the preference is formulated as relative preference between two alternatives. Given two alternatives A and B the output will be for example “A is more important than B”. This way of proceeding requires an exhaustive analysis of all the pairs of candidate alternatives. Given the complete preference structure over the set of all possible pairs, a total rank is trivially computed.

To decrease the effort, the CBRank strategy decomposes the problem into two parts: explicit and implicit preference elicitation. The explicit preference elicitation involves directly the end user while the implicit preference elicitation is automated. The implicit preference elicitation is accomplished through an approximation process that learns the preference structure of the decision makers. Explicit and implicit preferences are combined together to obtain a global rank of the alternatives also considering the ranking of the alternatives induced by other criteria.

The high level structure of the algorithm can be summarized in three steps:

1. Pair sampling, an automated procedure selects a pair of alternatives and submits it to the decision maker to acquire the relative priority relationship.

2. Priority elicitation, this step involves the activity of the decision maker: given a pair of alternatives the decision maker chooses which one is to be preferred. The priority is formulated as a boolean choice on pairs of alternatives.

3. Ranking learning, given a partial elicitation of the user priorities, a learning algorithm (Rank Boost) produces an approximation of the unknown priority relationships thanks to the other ranking criteria available for the alternatives and a ranking of the whole set of alternatives is derived. This step is iterated till a stable ranking configuration is raised.

The result of this process is a rank of the alternatives that combines explicit preferences extracted from the decision-makers and existing alternative rankings in the domain. This process is particularly useful when the alternatives are already ranked in a partial way by criteria such as cost and value, and if the decision algorithm should produce a final rank that is a combination of these partial rankings to a total or partial order.

Among the pairwise comparison techniques, CBRank [Perini et al., 2013] has been adopted in a requirements preference elicitation process that combines sets of preferences elicited



from human decision makers with sets of constraints, which are automatically computed through the Rank Boost machine learning technique. It also exploits knowledge about (par-tial) rankings of the alternatives (describing for example the risk and costs intrinsic to each alternative) that may be encoded in the description of the alternative themselves as proper-ties (e.g., priorities, preferences, values, risks).

Final Discussion and Conclusion

In this document we have presented a review of the current literature about the techniques to be used for the purpose of OSS risk management in the phase of OSS adoption in differ-ent kind of organisations. We also presented some initial proposals for candidate techniques for the analysis and mitigation of risk in this context.

The main property of these approaches is that they allow to face with risk related problems considering not only the technical dimension related to the software to be adopted but also the strategic and business dimension, so going from the representation of the business ecosystem on which the analysis has to be performed to the specific risks analysis methods for identification, analysis and mitigation.

An interesting property of the envisaged approaches is that they support both qualitative and quantitative analysis of the OSS domain so being hopefully robust to the problem of the missing data in the case of purely quantitative evaluation. We expect to use qualitative ap-proaches (such as modelling and models analysis) to identify relevant dimensions to be considered during the quantitative (e.g., statistical) analysis and, the other way, extract relevant properties from the data through statistics to identify relevant properties to be con-sidered in risk management in a specific case. A first application of statistical techniques, described in Appendix A, envisaged the use of several sources of data, some of them relat-ed to the OSS communities and to the OSS component, others internal to the specific organ-isation that aims at adopting OSS. The preliminary study shown that these kinds of analyses are feasible and useful for the project. Clearly, for statistical and search based techniques, and more generally for quantitative techniques, an adequate quantity of data is needed to obtain statistically significant outcomes. This point is a major requirement from WP2 to the other work packages, in particular for WP5 were the project use cases are analysed.



References

[Asnar et al., 2011] Yudistira Asnar, Paolo Giorgini, John Mylopoulos: Goal-driven risk as-sessment in requirements engineering. Requir. Eng. 16(2): 101-116 (2011)

[Avesani et al. 2003] Paolo Avesani, Sara Ferrari, Angelo Susi: Case-Based Ranking for Decision Support Systems. ICCBR 2003: 35-49

[Ayala et al., 2011] Claudia P. Ayala, Øyvind Hauge, Reidar Conradi, Xavier Franch, Jingyue Li: Selection of third party software in Off-The-Shelf-based software development - An inter-view study with industrial practitioners. Journal of Systems and Software 84(4): 620-637 (2011)

[Ben Gal, 2007] Ben Gal, I. (2007) Bayesian Networks, In F. Ruggeri, R.S. Kenett and F. Faltin (eds), Encyclopaedia of Statistics in Quality and Reliability. Chichester: John Wiley & Sons, Ltd.

[Bhuiyan et al., 2007] Bhuiyan, M., Islam, M. M., Koliadis, G., Krishna, A., and Ghose, A. K. (2007). Managing business process risk using rich organizational models. In 31st Annual IEEE International Computer Software and Applications Conference COMPSAC.

[Bresciani et al., 2004] Paolo Bresciani, Anna Perini, Paolo Giorgini, Fausto Giunchiglia, John Mylopoulos: Tropos: An Agent-Oriented Software Development Methodology. Autono-mous Agents and Multi-Agent Systems 8(3): 203-236 (2004)

[Bonafede and Giudici, 2007] Bonafede, E.C. and Giudici, P. (2007) Bayesian networks for enterprise risk assessment, Physica A, 382, 1, pp. 22-28.

[Cailliau and van Lamsweerde, 2012] Antoine Cailliau, Axel van Lamsweerde: A probabilistic framework for goal-oriented risk analysis. RE 2012: 201-210

[Carvallo et al., 2007] Juan Pablo Carvallo, Xavier Franch, Carme Quer: Determining Criteria for Selecting Software Components: Lessons Learned. IEEE Software 24(3): 84-94 (2007).

[Comella-Dorda et al., 2002] Santiago Comella-Dorda, John C. Dean, Edwin J. Morris, Patri-cia A. Oberndorf: A Process for COTS Software Product Evaluation. ICCBSS 2002: 86-96

[Cornalba and Giudici, 2004] Cornalba, C. and Giudici, P. (2004) Statistical models for oper-ational risk management, Physica A, 338, pp. 166-172.

[Cruz, 2002] Cruz M. (2002), Modeling, Measuring and Hedging Operational Risk, John Wiley and Sons, Chichester.

[Dalla Valle et al, 2008] Dalla Valle, L., Fantazzini, D. and Giudici, P. (2008) Copulae and operational risk, International Journal of Risk Assessment and Management, 9, 3, pp. 238-257.

[Deb et al., 2002] Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and T. Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolution-ary Computation, 6(2):182–197.

[Fenton and Neil, 2007] Fenton, N. and Neil, M., Managing Risk in the Modern World: Appli-cations of Bayesian Networks, London Mathematical Society (2007).



[Figini et al. 2010] Figini, S., Kenett, R.S. and Salini, S., Integrating Operational and Finan-cial Risk Assessments, Quality and Reliability Engineering International (2010) http://services.bepress.com/unimi/statistics/art48

[Finkelstein et al. 2009] A. Finkelstein, M. Harman, S. Mansouri, J. Ren, Y. Zhang, A search based approach to fairness analysis in requirement assignments to aid negotiation, media-tion and decision making, Requirements engineering 14 (2009) 231– 245.

[FMEA, 2006] IEC 60812 Standard, “Analysis Techniques for System Reliability—Procedure for Failure Modes and Effects Analysis,” FMEA, Geneve, 2006.

[Giorgini et al., 2003] Paolo Giorgini, John Mylopoulos, Eleonora Nicchiarelli, Roberto Se-bastiani: Formal Reasoning Techniques for Goal Models. J. Data Semantics 1: 1-20 (2003)

[Giudici and Billota, 2004] Giudici, P. and Bilotta, A. (2004) Modelling Operational Losses: A Bayesian Approach. Qual. Reliab. Engng. Int., 20: 407-417. doi: 10.1002/qre.655

[Greer and Ruhe, 2004] D. Greer, G. Ruhe, Software release planning: an evolutionary and iterative approach, Information and Software Technology 46 (2004) 243–253.

[Gruber, 1993] Gruber, T. R., A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition, 5(2):199-220, 1993.

[Herrmann and Daneva, 2008] A. Herrmann, M. Daneva, Requirements prioritization based on benefit and cost prediction: An agenda for future research, in: RE, IEEE Computer Socie-ty, 2008, pp. 125–134

[Holland, 1975] J.H. Holland, 1975. Adaptation in natural and artificial systems.

[Ingolfo et al., 2013] Silvia Ingolfo, Alberto Siena, Ivan Jureta, Angelo Susi, Anna Perini, John Mylopoulos: Choosing Compliance Solutions through Stakeholder Preferences. REFSQ 2013: 206-220

[Jia and Harman, 2010] Y. Jia, M. Harman, An analysis and survey of the development of mutation testing, Software Engineering, IEEE Transactions on (2010)

[Kaplan and Norton, 1996] Kaplan R.S. and Norton D.P., (1996), The Balanced Scorecard: Translating Strategy into Action, Harvard Business School Press, Boston, Massachusetts.

[Kaplan and Norton, 1993] Kaplan R.S. and Norton D.P., Putting the Balanced Scorecard to Work, Harvard Business Review, 71, 5, pp. 134-142 (1993).

[Kaplan and Norton, 1993] Kaplan, R.S. and Norton, D.P., The Balanced Scorecard - Measures that Drive Performance, Harvard Business Review, 70, 1, pp. 71-79 (1992).

[Kenett and Baker, 2010] Kenett, R. and Baker E., Process Improvement and CMMI for Systems and Software: Planning, Implementation, and Management, Taylor and Francis, Auerbach Publications, 2010

[Kenett and Raanan, 2010] Kenett, Ron, and Yossi Raanan. Operational Risk Management: a Practical Approach to Intelligent Data Analysis. Chichester: John Wiley & Sons, 2010.

[Kenett and Zacks, 1998] Kenett, R.S. and Zacks, S. (1998), Modern Industrial Statistics: Design and Control of Quality and Reliability, Duxbury Press, San Francisco, Spanish edi-tion, 2000, 2nd edition 2003, Chinese edition, 2004.



[Kitchenham and Charters, 2007] Kitchenham, B., & Charters, S. (2007). Guidelines for performing Systematic Literature Reviews in Software Engineering (Technical Report No. EBSE-2007-01). Evidence-Based Software Engineering (p. 65). Keele, UK: Keele University. Retrieved from http://www.dur.ac.uk/ebse/guidelines.php.

[Lamsweerde and Letier, 2000] Axel van Lamsweerde, Emmanuel Letier: Handling Obsta-cles in Goal-Oriented Requirements Engineering. IEEE Trans. Software Eng. 26(10): 978-1005 (2000)

[Leone et al., 2006] N. Leone, G. Pfeifer, W. Faber, T. Eiter, G. Gottlob, S. Perri, and F. Scarcello. The dlv system for knowledge representation and reasoning. ACM Trans. Com-put. Log., 7(3):499–562, 2006.

[Li, 2006] Jingyue Li, Process Improvement and Risk Management in Off-The-Shelf Compo-nent-Based Development, Doctoral Thesis, Norwegian University of Science and Technolo-gy, June 2006.

[Lund et al, 2011] Mass Soldal Lund, Bjørnar Solhaug, Ketil Stølen: Model-Driven Risk Anal-ysis - The CORAS Approach. Springer 2011, isbn 978-3-642-12322-1, pp. I-XVI, 1-460

[Marler and Arora, 2004] R. T. Marler and J. S. Arora. Survey of multi-objective optimization methods for engineering. Structural and Multidisciplinary Optimization, 26(6):369–395, April 2004.

[Minker, 1994] J. Minker. Overview of disjunctive logic programming. Ann. Math. Artif. Intell., 12(1-2):1–24, 1994.

[Mitchell, 1998] Melanie Mitchell, 1998. An introduction to genetic algorithms. MIT Press.

[Morandini et al., 2008] Morandini, M., Penserini, L., and Perini, A. 2008b. Towards goal-oriented development of self-adaptive systems. In SEAMS '08: Workshop on Software engi-neering for adaptive and self-managing systems,. ACM, 9-16.

[Morandini et al., 2009] Mirko Morandini, Loris Penserini, Anna Perini: Operational semantics of goal models in adaptive agents. AAMAS (1) 2009: 129-136

[Ograjenšek and Kenett, 2008] Ograjenšek, I. and Kenett, R. S., Management Statistics, in Statistical Practice in Business and Industry, Coleman, S., Greenfield, T., Stewardson, D. and Montgomery, D. (editors), John Wiley and Sons, Chichester (2008).

[Perini et al., 2013] Anna Perini, Angelo Susi, Paolo Avesani: A Machine Learning Approach to Software Requirements Prioritization. IEEE Trans. Software Eng. 39(4): 445-461 (2013)

[Rolland et al, 1999] Rolland, C., Nurcan, S. and Grosz G., 1999. "Enterprise Knowledge Development: The Process View", Information and Management Journal, Elsevier 36(3), p.165-184.

[Sebastiani et al. 2004] Roberto Sebastiani, Paolo Giorgini, John Mylopoulos: Simple and Minimum-Cost Satisfiability for Goal Models. CAiSE 2004: 20-35

[Siena et al., 2012] A. Siena, I. Jureta, S. Ingolfo, A. Susi, A. Perini, and J. Mylopoulos. Capturing variability of law with Nòmos 2. ER’12, 2012.

[Sivzittian and Nuseibeh, 2001] S. Sivzittian, B. Nuseibeh, Linking the Selection of Require-ments to Market Value: A Portfolio - Based Approach, in: REFSQ 2001.



[SLR1] to [SLR47] Please refer to Section 2.4 for the list of references identified in the sys-tematic literature review.

[deSouza et al, 2011] J. T. de Souza, C. L. B. Maia, T. do Nascimento Ferreira, R. A. F. do Carmo, M. M. A. Brasil, An ant colony optimization approach to the software release plan-ning with dependent requirements, in: M. B. Cohen, M. O ́. Cinne ́ide (Eds.), SSBSE, volume 6956 of Lecture Notes in Computer Sci- ence, Springer, 2011, pp. 142–157.

[Souza et al., 2011] Vítor Estêvão Silva Souza, Alexei Lapouchnian, William N. Robinson, John Mylopoulos: Awareness requirements for adaptive systems. SEAMS 2011: 60-69

[Tonella et al., 2013] Paolo Tonella, Angelo Susi, Francis Palma: Interactive requirements prioritization using a genetic algorithm. Information & Software Technology 55(1): 173-187 (2013)

[Yu, 1995] Yu, E. 1995. Modelling strategic relationships for process reengineering. Ph.D. thesis, University of Toronto, Department of Computer Science.

[Zhang et al. 2007] Y. Zhang, M. Harman, S. A. Mansouri, The multi-objective next release problem, in: GECCO ’07, ACM, 2007, pp. 1129–1137.

[Zhang et al. 2009] Y. Zhang, M. Harman, Search Based Optimization of Requirements Inter- action Management, in: 2nd International Symposium on Search Based Software Engineering, IEEE, pp. 47–56 2009.



Annex A

RISCOSS Analytics: An Example This example is based on data provided by the RISCOSS partner XWiki regarding its com-munity-based Wiki project XWiki.org, on bugs and releases as well as various chat sessions that took place.

The analysis of the bugs and releases for open source software would take place during the selection and deployment of open source components. This process should provide up to date warning signals to relevant actors.

Bugs and Releases

Data Source: http://Jira.XWiki.org

i) Issue Type

Application of the R language used for the analysis: qualityTools

The Pareto chart was used to analyse the issue type frequencies contained within the XWiki Jira repository. The horizontal axis represents the attributes of interest to us in the analysis. The Pareto chart presents the findings from the highest to the lowest frequency for XWiki issues as indicated in the Jira. This type of analysis assists in indicating the few issues that cover the majority of cases and the connected line represents the cumulative percentage line for the attributes, issue types, so the added contribution of each issue can be evaluated.

Figure 6: Bugs and releases



91% of the issue types are related to Bugs, Improvements and Tasks, where the highest issue as indicated within the Jira for XWiki is Bugs representing 57% of the total issue types.

ii) Current Bug Priority and Resolution

R applications: qualityTools

Tabulated statistics: Priority, Resolution

Cannot Reproduce Duplicate Fixed Invalid Unresolved Won't Fix Blocker 3 7 74 0 0 0 Critical 0 6 18 0 25 0 Major 7 10 199 2 107 12 Minor 2 3 52 1 28 3 Trivial 1 3 6 0 2 1

A cross tabulation table represents the tabulated frequency levels for bug resolution provid-ed by XWiki by the priority level of the bug. The analysis is used to analyse the number of bugs present and unresolved by XWiki. There are currently 107 Major, 25 Critical, 28 Minor and 2 Trivial bugs that are Unresolved.

The number of bugs currently present and unresolved in the XWiki platform will provide valuable information for new users who would like to establish the current stability of the system.

iii) Unresolved Bugs by Version

R applications: qualityTools

Figure 7: Versions affected by unresolved bugs



The Pareto chart was used to analyse the versions affected by unresolved bugs for XWiki. The horizontal axis represents the attributes of interest to us in the analysis. The Pareto chart presents the findings from the highest to the lowest frequency for XWiki versions that currently contain unresolved bugs. This type of analysis assists in presenting the versions that cover the majority of cases and the connected line represents the cumulative percent-age line for the attributes.

53% of the affected versions are located in versions 4.2, 4.3, 4.4, 4.4.1 and 4.5. The most affected version for XWiki based on unresolved bugs is 4.2, containing 16% of unresolved bugs.

iv) Bugs by Resolution Over Time

R applications: ggplot2, reshape2

The time series plot for bug resolution conducted by XWiki over time provides a graphical representation of the ability of the XWiki core team and community to solve and fix open bugs. An optimal situation is where the fixed bugs are always higher than the unresolved bugs across time. Bugs will show in the software and the ability of the team and community to quickly release patches will impact the use of the OSS.

Figure 8: Bugs by resolution over time

Chat archives Analysis of chats can be used by the communities of open source projects. The chat session can provide an early warning sign of issues related to the code brought forward by users or contributors.

Source: http://dev.XWiki.org/XWiki/bin/view/IRC/WebHome

i) Frequencies of Keywords Across Chat Sessions

R applications: tm, Snowball, ggplot2, ggthemes, RWeka, reshape



XWiki chat sessions were analysed to locate keywords used in high frequency that could provide an indication of bugs or issues within the XWiki platform. The frequency graph high-lights cells based on the frequency of the term within a specific chat session. The X axis contains the chat sessions involved in the analysis and in the Y axis contains the terms used with the highest frequency levels across the eight chat sessions. The legend indicates the colour coding for the frequency levels of the terms used in the chat sessions.

In addition to analysing the frequency levels it is also important to spot certain keywords used in chat sessions including, bugs or issue.

Within the analysis of the eight XWiki chat sessions the keyword “issue” is included during chat sessions XWikiArchive20130111, XWikiArchive20130116 and XWikiArchive20130117, the keyword “blocker” was used during chart session XWikiArchive20130110 and the term “XWikibot” is used in a relatively high frequency within the analysed chat sessions.

Figure 9: Frequencies of Keywords Across Chat Sessions

ii) Chat Keyword Frequency

R applications: tm, Snowball, ggplot2



Figure 10: Chat Keyword Frequency

Findings: The chat keyword frequency graph presents the terms used in highest frequency across all eight chat sessions.

Potential Keywords of interest used at a high frequency level across all eight chat sessions include, “XWikibot”, “issue”, “version”, “release”, “error” and “internal”.

iii) Association Rules

R applications: tm, arules, MASS, klaR

Table 6: Association results for the term “Bug”:

Associated Term Association Score

yeah 0.97

checking 0.93

display 0.93

etc 0.93

stuck 0.93

weird 0.93

reported 0.89

idea 0.88

whats 0.87

busy 0.84

care 0.84

changing 0.84



completely 0.84

edit 0.84

expert 0.84

fail 0.84

fine 0.84

implementation 0.84

modules 0.84

proposed 0.84

quickly 0.84

related 0.84

reply 0.84

upgrade 0.84

The association rule approach permits to analyze semantic data by identifying associations between various terms used within the chat sessions. The analysis was used to establish terms with high associations with the term bug. The interesting associations that showed up within the analysis include “reported”, “implementation” and “upgrade”.

Table 7: Association results for the term “Issue”

Associated Term Association Score

depends 0.95

wrong 0.94

ago 0.90

broken 0.90

difference 0.90

happens 0.90

locally 0.90

sorry 0.90

unusable 0.90

visible 0.90

level 0.89

anyway 0.88

checking 0.88

cool 0.88

email 0.88

etc 0.88

reproduce 0.88

stuck 0.88



according 0.87

class 0.87

defined 0.87

hard 0.87

hours 0.87

tests 0.87

The association rule approach permits to analyze semantic data by identifying associations between various terms used within the chat sessions. The analysis was used to establish terms with high associations with the term issue. The interesting associations that showed up within the analysis include “locally” and “reproduce”.

Conclusions Two different forms of analysis were conducted on the data from the XWiki project to estab-lish the risk in implementing the XWiki platform. The first analysis focused on the bugs and the ability of the XWiki core team to resolve the bugs across different priority levels while the second focused on chat sessions between community members and the core team.

Carrying out the analysis of the bugs provides an indication of the stability of the open source platform across various versions and the level of commitment to fixing those bugs especially critical bugs. The chat session analysis provides an early warning signal of bugs that may potentially appear in the system.

Both studies carried out included readily accessible data from the XWiki team and should therefore also be accessible through other open source projects.

Documents

D2-1 State of the art and initial requirements v1-00 · 2016. 2. 18. · D2.1 State of the art and initial requirements © RISCOSS Consortium 2 Version history: Version Date Description