Quality Improvement with Focus on Performance in Software ...fileadmin.cs.lth.se/serg/old-serg-dok/docs-serg/297_thesis_050503.pdf · Performance in Software Platform Development

Quality Improvement with Focus on

Performance in Software Platform

Development

Enrico Johansson

Department of Communication Systems

ii

ISSN 1101-3931ISRN LUTEDX/TETS–1074–SE+139Pc© Enrico Johansson

Printed in SwedenE-kopLund 2005

iii

To my family and friends

iv

This thesis is submitted to Research Board FIME - Physics, Informatics, Math-ematics and Electrical Engineering - at Lund Institute of Technology, LundUniversity in partial fulfilment of the requirements for the degree of Doctorof Philosophy in Engineering.

Contact information:

Enrico JohanssonDepartment of Communication SystemsLund UniversityBox 118SE-221 00 LUNDSweden

Phone: +46 46 222 04 89e-mail: [email protected]

Abstract

Platform development provides software organisations with means to quickly respondto changing consumer needs. Product reuse and improved development efficiency canbe achieved if platform development is introduced.

A major challenge when using software platforms to produce a variety of prod-ucts is to keep a high quality of the platform throughout the development of theproducts. It is therefore essential to monitor, control and explore quality attributeswhen designing and managing the platform.

The thesis presents a number of approaches to support quality improvementsin software platform development. Empirical methods, i.e. case studies and surveysin real industrial settings together with a controlled experiment, are used to investi-gate the introduced approaches. Approaches directed to both quality improvement ingeneral and improvement with focus on software performance are introduced.

The thesis introduces an approach to find process improvements by benchmark-ing the platform management process used in the organisation. The benchmarkingapproach is evaluated in a case study involving two different companies providinggains for both, according to their own evaluation. A measure for tracking degrada-tion in software product lines is introduced in the thesis. The measure is validatedwith data from different version of a commercial software platform.

A qualitative methodology is introduced to survey and tailor processes to improvemanagement of software performance. The methodology was used in a company andprovided valuable data for process improvement. In a controlled experiment, we val-idate different methods using subjective estimations of software performance. Theresult shows that a method relying on data from prior platforms improve the estima-tion of software performance. Case studies are carried out with software performancemeasurements from a commercial software platform. Principles for storing and us-ing the measurements related to software performance estimations are presented andevaluated. Also, it is shown how a simple performance model, parameterized withtrace files, provides useful support for estimating software performance in softwareplatform development.

Acknowledgments

I am very grateful to Dr. Martin Höst and Dr. Fredrik Wartenberg for theirmentoring and friendship during the work with the thesis. Also a sincerethanks to Professor Per Runeson and Dr. Christian Nyberg for valuable dis-cussions and contributions.

Many thanks to all co-authors and each and every one who have con-tributed to the research presented in this thesis. A special thanks to Dr. FredrikDahlgren, Dr. Johan Eker, Dr. Joakim Persson and Dr. Anders Wesslén fortheir support during the time spent in the research group at Ericsson AB. Ialso extend a special thanks to Dr. Lars Bratthall and Professor Claes Wohlinfor good collaboration in the early part of the thesis.

A warm thanks to all the current and former colleagues at the Depart-ment of Communication Systems and for providing a stimulating and funenvironment during the work with the thesis, and for just putting up withme in general. The doors in the department have always been exceptionallywide-open, thus creating a good ground for exciting and illuminating discus-sions whenever I needed encouragement and new impulses. In particular Iwould thank Professor Ulf Körner, the head of the department, for makingthis possible. Finally, I would like to thank my family and friends for theirunconditional support and encouragement during this time.

Enrico JohanssonLund, April 2005

Table of Contents

Abstract vi

Acknowledgements viii

List of papers 1

Related publications 2

Introduction 31 Software platforms . . . . . . . . . . . . . . . . . . . . . . 52 Software quality . . . . . . . . . . . . . . . . . . . . . . . . 93 Software performance . . . . . . . . . . . . . . . . . . . . . 124 Research methodology . . . . . . . . . . . . . . . . . . . . 165 Research result . . . . . . . . . . . . . . . . . . . . . . . . 24References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

PAPER I: Benchmarking of Processes for Managing Product Plat-forms - a Case Study 311 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 322 Benchmarking methodology . . . . . . . . . . . . . . . . . 333 Case study . . . . . . . . . . . . . . . . . . . . . . . . . . . 404 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 44References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

x

PAPER II: Tracking Degradation in Software Product Lines throughMeasurement of Design Rule Violations 471 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 482 Design rules in platform development . . . . . . . . . . . . 493 A measure of degradation . . . . . . . . . . . . . . . . . . . 504 Usage implications of the measure . . . . . . . . . . . . . . 545 The case study . . . . . . . . . . . . . . . . . . . . . . . . 566 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 597 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 60References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

PAPER III: A Qualitative Approach to Tailor Software PerformanceActivities 631 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 642 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653 Case study . . . . . . . . . . . . . . . . . . . . . . . . . . . 694 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 77References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

PAPER IV: Performance Prediction Based on Knowledge of Prior Prod-uct Versions 811 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 822 Research design . . . . . . . . . . . . . . . . . . . . . . . . 843 Analysis of results . . . . . . . . . . . . . . . . . . . . . . . 954 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 985 Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . 99References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

PAPER V: Proposal and Evaluation for Organising and Using Avail-able Data for Software Performance Estimations in EmbeddedPlatform Development 1011 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 1022 Research Methodology . . . . . . . . . . . . . . . . . . . . 1023 Results from the archive analysis and interviews . . . . . . . 1034 Proposal for organising data - a software performance database 1075 Usages for EMP . . . . . . . . . . . . . . . . . . . . . . . . 1116 Restrictions of the proposal . . . . . . . . . . . . . . . . . . 1137 User Experiences . . . . . . . . . . . . . . . . . . . . . . . 1148 Conclusion and Future work . . . . . . . . . . . . . . . . . 117

xi

9 Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . 117References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

PAPER VI: Modelling Choices When Using Trace File Data to Para-meterize a Software Performance Model 1191 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 1202 Platform development . . . . . . . . . . . . . . . . . . . . 1223 Performance model . . . . . . . . . . . . . . . . . . . . . . 1234 Research Method . . . . . . . . . . . . . . . . . . . . . . . 1275 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1316 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 1337 Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . 136References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

1

List of papers

The papers included in this thesis have had small changes and minor improve-ments, compared to the published versions. For example, text formatting hasbeen changed to provide a common layout of all material. The following pa-pers are included in this thesis:

I Benchmarking of Processes for Managing Product Platforms - a CaseStudyMartin Höst, Enrico Johansson, Adam Norén and Lars BratthallIEE Proceedings-Software, Volume 149, Number 5, Pages 137–142, TheInstitution of Electrical Engineers, 2002

II Tracking Degradation in Software Product Lines through Measurement ofDesign Rule ViolationsEnrico Johansson and Martin HöstProceedings of 14th International Conference on Software Engineering andKnowledge Engineering, Pages 249–254, ACM Press, 2002

III A Qualitative Methodology for Tailoring SPE Activities in EmbeddedPlatform DevelopmentEnrico Johansson, Josef Nedstam, Fredrik Wartenberg and Martin HöstTo appear in Proceedings of 6th International Conference on Product FocusedSoftware Process Improvement, Springer-Verlag, 2005

IV Performance Prediction Based on Knowledge of Prior Product VersionsMartin Höst and Enrico JohanssonProceedings of 9th European Conference on Software Maintenance andReengineering, Pages 12–20, IEEE Computer Society, 2005

V Proposal and Evaluation for Organising and Using Available Data forSoftware Performance Estimations in Embedded Platform DevelopmentEnrico Johansson and Fredrik WartenbergProceedings of 10th IEEE Real-Time and Embedded Technology and ApplicationsSymposium, Pages 156–163, IEEE Computer Society, 2004

VI Modelling Choices When Using Trace File Data to Parameterize aSoftware Performance ModelEnrico Johansson, Fredrik Wartenberg, Martin Höst and Christian NybergSubmitted to Information and Software Technology, Elsevier Journals, April2005

2

Related publications

The following papers are related but not included in the thesis:

VII The Importance of Quality Requirements in Software PlatformDevelopment - A SurveyEnrico Johansson, Martin Höst, Anders Wesslén and Lars BratthallProceedings of 34th International Conference on System Sciences, Volume 9, Pages9057–9067, IEEE Computer Society, 2001

VIII Is a Design Rationale Vital when Predicting Change Impact?Lars Bratthall, Enrico Johansson and Björn RegnellProceedings of 2nd International Conference on Product Focused Software Process,Improvements, Volume 1840, Pages 126–139, Springer-Verlag, 2000

IX A Systematic Performance Model Generation of a Location-based ServiceSystem Using a Layered Queuing NetworkMartin Waldén and Enrico JohanssonProceedings of 3rd Conference on Software Engineering Research and Practise inSweden, Pages 137–146, 2003

Introduction

During the last decades, software has become an indispensable part of manycommercial products. Products of all kinds, from digital cameras and cellu-lar phones to transaction systems, automobiles and aeroplanes contain moreand more software every year. This has meant that software has become acompetitive advantage in many industries. Strategic thinking about the func-tionality, quality and lead-time of the entire product is largely dependent ofthe methods and tools used in the development and management of the soft-ware (Cusumano, 2004).

Building product platforms is one of the methods that has become a cru-cial element of today’s business planning. Product platforms are a set of com-mon components around which a stream of derivative products can be built.The use of platforms as an engineering concept is not new, it consists of build-ing a line of products around a set of shared components. For example, in thecase of the Black & Decker power tools (Meyer & Lehnerd, 1997), the sharedcomponents are the motor platform and battery pack platform. Product plat-forms have also been used with success by the aircraft industry (Meyer &Lehnerd, 1997), car manufactures (Womack et al., 1991) and manufacturesof consumer electronics (Sanderson & Uzumeri, 1995).

Components that are common for many products and consist of soft-ware form a software platform. CelsiusTech Systems AB presented one of thefirst documented success stories of using software platforms in Clements &Northrop (2001). To meet a compressed project schedule the engineers builta product line around a software platform. Another area where the use of soft-ware platforms has gained popularity is in embedded platforms (Sangiovanni-

4 1. I

Vincentelli et al., 2004). An embedded platform is a specific type of productplatform where a computer is built into the product platform and is not seenby the user as being a computer. Consumer products, as mobile phones andhandheld multimedia devices, are typical examples built on embedded plat-forms. For embedded platforms, performance is a crucial software quality at-tribute (Burns & Wellings, 2001). It is crucial due to a variety of reasons likethe existence of real time critical tasks, advanced multimedia functionality aswell as cost and energy constraints, to name some. The term software plat-form can also be used as an application on which other applications can beexecuted, for example an operating system. This definition is, however, nei-ther used nor further discussed in this thesis. There are risks, which must bemanaged and assessed when using software platforms. A number of productversions are developed based on the same software platform. The platformmust therefore be managed and updated according to new requirements ifit should be reusable in a series of releases. This means that the platform isconstantly changed during its life-cycle.

This thesis focuses on identifying and evaluating methods to improve thequality of a software platform and the products based on the platform. A spe-cial focus is put on software performance, this is motivated by its importancein embedded platforms. The intention is that the findings should be generalenough to be applied in organisations with similar software platforms. The re-search objective is to find improvement methods both aimed at product qual-ity itself and at the supporting work processes. Also, the quality improvementmethods should take into consideration the needs and possibilities inherent tothe development of software platforms. The result of the thesis indicates that,although the problem of maintaining and managing the quality in a softwareplatform is a complex task, some basic approaches are usable.

The thesis is organised into two main parts. The first part contains an in-troduction, which is divided in sections as follows: In Section 1, software plat-forms are further introduced. Section 2 presents the concept of software qual-ity and software performance is discussed in Section 3. The research method-ology is summarised in Section 4 and in Section 5 the research results arepresented together with suggestions for future work. The second part of thethesis contains six research papers that constitute the main research contribu-tion of the thesis.

S 5

Fk,1

Platform Release k

Fk,n...Versions 1...n of product family Fk

Platform Evolution

Fk+1,1

Platform Release k+1

Fk+1,m...Versions 1...m of product family Fk+1

Figure 1: An illustration of two product families Fk and Fk+1 that are basedon two consecutive releases of a software platform (i.e. Release k andRelease k+1).

1 Software platforms

Many products with high requirements on time to market are developed in aseries of different versions, which are part of the same product family. Thatis, one product family can contain versions of the product, which differ infunctionality and complexity. The advantage of this is that a large portion ofthe product can be reused in one product family (Parnas, 1976; Weiss & Lai,1995). To achieve a high degree of reuse and shortening the time betweenreleases of consecutive product families, different product families could sharea common software platform, see Figure 1. The intention is that reuse, withinproduct families and between the different releases of product families, shoulddecrease the development time and the cost of new products. The intentionis also that it improves the reliability since already tested and evaluated partscan be used.

1.1 Processes

However, reuse is difficult to achieve if it is not managed and planned for inearlier products. Software developing organisations need processes that sup-port reuse between different products and between different versions of thesame product. This type of processes is often included in a product line ar-chitecture (Bosch, 2000; Jazayeri et al., 2000; Clements & Northrop, 2001),which is the basis for a number of versions of a product.

A product line architecture is a common architecture for related prod-ucts or systems developed by an organisation. With this type of processes,

6 1. I

development of a series of products can be summarised as follows. First asoftware platform is developed. The architecture of the platform should begeneral enough to be useful in the releases where it should be used. When thesoftware platform has been developed, a number of product projects can belaunched. Every product project results in a product version that may be soldon the market. Normally, some product projects are run in parallel, in orderto obtain an appropriate pace of new product releases. Each of the productprojects uses the software platform as a basis and adds a set of functions thatis specific for the release of that version.

The software platform is by Clements & Northrop (2001) defined as the“assets that form the basis of the software product line”. The development ofsoftware platforms is together with product development and managementidentified as essential activities in the successful scoping of a software productline. The product line development depends on three outputs described as theproduct line scope, the production plan and the software platform.

1.2 Stakeholders

The development of a platform and the products based on the platform isoften organised and performed in a number of separated teams. One teammanages the development of the software platform, and the product projectsare run by a number of project teams. The objective of the project teams is todevelop a new release with a given release date to a given cost. The objectiveof the platform teams is to manage the software platform and to support thedifferent product projects with functionality packaged in the platform. Thismeans that each of the product projects has a number of requirements on theplatform and the objective of the platform team is to provide a platform thatmeets these requirements.

Typically, there is a substantial investments made on the development ofa software platform. Therefore, there is a large incentive to use the currentplatform for as many product versions and product families as possible. Ide-ally the requirements of all product projects on the software platform coincideand the platform team can develop a platform that can be used directly by alldevelopment teams. However, for a number of reasons the platform must beenhanced and changed during its lifetime. New requirements are constantlyidentified, and functions implemented in one product project should often bepart of future products. This means that the software platform should be con-stantly enhanced and changed based on the requirements from the productteams (i.e. platform evolution).

S 7

To manage constant changes a number of design rules and design con-straints may be formulated. The goal is to limit the loss in quality caused bythe changes. Such loss of quality would make the reuse of the product linebetween two product families and within the product family very difficult toachieve.

1.3 Software architecture

During the early development phases of a software platform possible archi-tecture solutions are explored and a preferred solution is chosen (i.e. systemdesign). This is done by relying on data from documents that are present inearly phases of the platform development (e.g. use cases from the requirementphase or design proposals from the design phase). To succeed in this, the qual-itative and quantitative knowledge present during the system design phase ofplatform development must be characterised. Such knowledge may consist ofparameters describing relevant attributes and constraints of the platform. Thesystem design of the platform software must consider two issues. One issue ishow to map functional blocks to components in the architecture (i.e. func-tional partitioning). Another issue is how to design the software architectureitself (i.e. architecture evaluation).

The software platform can be modeled by a set of architecture elements,each of which behaves according to specific characteristics and dependenciestowards each other. Four typical architecture elements suitable for system de-sign are components, connectors, interfaces and design rules. Each of the ele-ments are defined in Table 1.

Table 1: Different software architecture elements.

Architecture Element Description

Components Elements that contain a selection of functions. A software plat-form can contain one or several components.

Interfaces Elements that enable inter-component communication.

Connectors The mechanisms used for the communication between the com-ponents.

Rules Descriptions of the constraints, the usage of components, inter-faces and connectors.

The problem of functional partitioning in the early phases can be de-

8 1. I

fined as follows: given different functionality and a pre-defined system archi-tecture, identify which the different components of the system architectureshould be used to implement the functionality. Another important activityin the early development phases is the evaluation of different architecture al-ternatives (Clements et al., 2002). The evaluation is based on identifying thedesired combination of architecture elements for the system. Both functionalpartitioning and architecture evaluations are applicable when developing dif-ferent version of the platform as well as upgrading the platform itself. It isnormally not economically possible to explore all possible design solutions byimplementing all the possible design combinations (the design space) of thesystem. Instead, design decisions can be based on estimating the quality at-tributes of different design alternatives that provides the desired functionality.The estimations can be made by characterising the functionality, the softwarearchitecture and the hardware architecture. The characterisation consists offinding the relevant parameters and appropriate values for different designalternatives.

1.4 Platform evolution

New product requirements make an evolution of the platform necessary. Theevolution of the platform can be seen as updating and upgrading the func-tionality in the platform. This evolution puts continuously new requirementson the platform and indirectly on the software architecture. In order to ad-dress the new requirements, the design of the architectural mechanisms mightbe changed. These changes have an impact on several quality attributes in thesystem. Predictions of how quality attributes are affected are therefore an im-portant input when deciding on what and how to change the architecture.The desired design objectives in the early phases can be supported by mod-elling and analysing the system architecture.

Since the platform is large and complex, the overall software structure, orsoftware system becomes a central design problem. The system architectureshould therefore make it possible to analyse important quality properties. Atthe same time, it should provide a model that suppresses implementation de-tails, allowing the system designer and system architect to concentrate on theanalysis and decisions that are most crucial in structuring the system to meetits functional and quality requirements.

A model containing the elements presented in Table 1 can be used to high-light the correspondence between elements of the platform and system-levelproperties (e.g. scalability, capacity, throughput, maintainability, etc.). These

S 9

Table 2: Quality factors (derived from McCall et al., 1977).

Product Operation Product Revision Product Transition

Usability Maintainability ReusabilityIntegrity Testability PortabilityEfficiency Flexibility InteroperabilityCorrectnessReliability

properties can be used for system-level analysis: for example the predictionsof performance and other quality attributes. Different notations can be usedin a system design activity, UML (Rumbaugh et al., 1998) is for examplewidely used both in industry and academia. Use case maps (UCM) have alsobeen proposed to describe the behavior of software in system design (Buhr &Casselman, 1996; de Bruin & van Vliet, 2003).

2 Software quality

Product quality must be in place and continuously improved, if successfulproduct manufacturing is to be achieved. However, the intangible facet ofquality makes the improvement of it a challenging task. To deal with this,different quality models and quality improvement techniques have been de-veloped (Bergman & Klefsjö, 2002). When these models and techniques areused in the software domain, it must be considered that software developmentis mostly a development skill and not a manufacturing skill. General qualityimprovement techniques and models used in the manufacturing industry can-not be used directly in the software development.

The discipline of software quality engineering has set focus on both pro-ducing new models and modifying traditional models for the software indus-try. An established software quality model is for example the McCall qualitymodel (McCall et al., 1977). The model identifies a number of quality fac-tors from three different viewpoints (operation, revision, transition) of usinga software product. Table 2 illustrates the model.

Another established model which provides a framework for evaluatingsoftware quality is the ISO/IEC 9126-1 (2000). It defines a quality modelwith six main quality characteristics and suggestion of quality sub-characteristicsfor each of the six characteristics (Table 3).

10 1. I

Table 3: Quality characteristics and sub-characteristics defined in ISO/IEC9126-1 (2000).

Characteristics Sub-characteristics

Functionality Suitability, Accuracy, Interoperability, Security,Functionality compliance

Reliability Maturity, Fault tolerance, Recoverability,Reliability compliance

Usability Understandability, Learnability, Operability, Attractiveness,Usability compliance

Efficiency Time behaviour, Resource utilisation,Efficiency compliance

Maintainability Analysability, Changeability, Stability, TestabilityMaintainability compliance

Portability Adaptability, Installability, Co-existance, Replaceability,Portability compliance

2.1 Quality prioritazion

Software platforms are used when striving towards maintaining a number ofquality factors in a series of products. Some of these quality factors are con-cerned with the ability of the software to function as a software platform.Other factors are concerned with quality requirements from the domain ofthe products that are based on the platform (e.g. real-time, embedded, safetycritical, enterprise, etc.). Therefore, a software platform has several qualities,such as its maintainability, its efficiency etc.

For a quality improvement initiative to be successful, the developmentorganisation as a whole must agree upon the importance of which qualityattributes to focus on. In an organisation, which uses a software platformapproach, there are several stakeholders involved in determining the qualitiesneeded in the software platform and the products based on the platform.

Three typical stakeholders in the early development phases are: softwaredevelopers, system developers and product managers. There is risk is that thesedifferent stakeholders have different views on how to perceive the appropri-ate quality requirements. This risk can be explained due to that the threegroups normally work at different architectural aggregation levels, where prod-uct managers work at the highest level and software developers at the lowest

S 11

level.Different stakeholders might prioritize various quality requirements with

impact on the software architecture differently, despite that the organisation’sgoal are the same for the stakeholders (Johansson et al., 2001). This risk holdstrue both for the development of a software platform as well as the use of asoftware platform. If the stakeholders have different views on what qualitiesare expensive to create in a software platform, and which have a positive im-pact on the creation the products based on the platform, there is a risk thatthe software platform is sub-optimised.

The difference in prioritizing may lead to erroneous balancing of qualitieswhen developing a software platform. It is therefore important that all thestakeholders understand the prioritization of the quality attributes made forthe software platform. Measures must be taken in order to create a softwaredevelopment environment with a good foundation for achieving a mutualunderstanding of the challenges the different stakeholders are faced with.

An example of a measure that can be taken is to ensure that a softwareplatform development project should not be part of a more market-orientedproject, as this can have very hard lead-time requirements. If a platform isdeveloped as part of a market-oriented project, it is likely that either the plat-form does not become reliable, or other aspect of the product may suffer (Pa-per VII).

Also, adequate metrics for the impact of qualities in a software platformshould be introduced in the organisations. A software measurements processcan be initiated to ensure that the impact of different qualities are made clearto the stakeholders in software platform development in advance. In addition,a feedback loop from the use of the software platform to the development ofthe next such product should be deployed.

2.2 Software quality improvement

There exist several established techniques for assuring and improving the qual-ity of software development. A list of such techniques, without pretending tobe complete, can be summarised as follows:

• Testing

• Inspections and Reviews

• Software Process Assessment

• Measurements and Estimations

12 1. I

• Benchmarking

However, other techniques exist that are customised for the development ofsoftware platforms and products based on software platforms. For example,in Bosch (2000) it is suggested to use architecture transformations as a meansof improving the quality of a software product line. Four different types oftransformations are introduced (imposing an architecture style, imposing anarchitectural pattern, applying a design pattern and converting quality re-quirements to functionality). Normally it is necessary to use a combinationof these four different types to obtain the required quality improvement. Thetransformations can be further specialised to improve the quality of softwareproduct lines. The specialisation is done in order to achieve three crucial as-pects of a product line, (variability, optionality and conflict resolution). Thevariability aspect is needed when the architectural solution does not satisfy allthe requirements of the product line. An optionality aspect is needed to en-able the inclusion of optional functionality in the products. Finally, the thirdaspect concerns the conflicts that can arise in a software product line.

Variability in product lines is brought up as an essential quality enablerfor the software platforms (Svahnberg, 2003). Mechanisms as a service com-ponent framework and a requirement definition hierarchy are proposed as amean of achieving the variability. Product line management deals with theresource planning, co-ordination and supervision of activities. The manage-ment must be committed to both the technical and organisational levels ofa software product line. Techniques for collecting metrics and tracking dataare also discussed as means to manage the product line process.Clements &Northrop (2001) present a part of the extensive work done by the ProductLine Initiative at Software Engineering Institute.

Bosch (2000) gives different examples of techniques improving the run-time quality attributes (i.e. performance and reliability) of software productlines are presented. The exemplified techniques are concerned with caches,memory management, indirect calls and wrappers.

3 Software performance

Performance is an important quality attribute for the software in an embed-ded system, as for example a software platform for handheld communicationdevices. Focus should, therefore, be put on optimising the performance byusing the hardware resources as efficiently as possible. A way of setting fo-cus on this issue in the early development phases is to take performance into

S 13

Table 4: Different performance characteristics of a software system.

Characteristic Description

Response-time The time interval during which the response to an event must be(Latency) processed.

Throughput The number of event responses that have been completed over a giveninterval (Lazowska et al., 1984). In some cases it is however not sufficientto just specify a processing rate. The observation intervals should also bespecified, or the complete distribution of the processing rated over time.

Capacity Capacity is a measure of the amount of work a system can process, andis normally defined in terms of throughput. The maximum number ofevents per unit time that can be achieved for a pre-defined set of eventsis one definition of capacity. For networks this is called bandwidth andis often expressed in bits per second. Another definition is the maximumachievable throughput without violating required latency requirements,and is defined by Jain (1991) as usable capacity.

Utilisation The average percent of time a hardware or a software resource is busyduring usage.

consideration when designing the architecture of the system.Also when designing the architecture to primarily deal with other qual-

ity attributes than performance, for example maintainability and reusability,there is a strong cost and real-time incentive to avoid performance penaltiesin the architecture. There is, therefore, a need to predict the performance ofa chosen architecture, or at least a need to compare available architectures forbest performance characteristics.

Software performance is, as any other software quality attributes, an ab-straction of a (intangible) property of the software product. In order to mea-sure and estimate a quality attribute, it must therefore be mapped to a quan-tifiable measure. There exist a number of established definitions of perfor-mance. A general definition is given in IEEE 610.12 (1990) which states thatperformance “is the degree to which a system or component accomplishesits designated functions within given constraints, such as speed, accuracy, ormemory usage”. In McCall et al. (1977) and ISO/IEC 9126-1 (2000), effi-ciency is used to refer to the properties comprised in the definition of perfor-mance.

The performance of a system can be characterised by the attributes listedin Table 4.

14 1. I

3.1 Software performance models

In order to build a performance model there is the need to take into consider-ation how the software is used, how the software itself is designed and how thehardware on which the software is executed is designed. These three prereq-uisites are by Smith (1990) further categorised into performance objectives,workload specifications, software plans, execution environment, resource re-quirements, and processing overhead. Smith (1990) also defines performancescenarios which are using use cases (Jacobson, 1992), workload specificationsand the software plans as input.

Performance objectives express the quantitative measurements for the sys-tem to be modelled. Which measurements are used and how they are definedis decided by the type of application o be modelled and measured. Measure-ments of any attributes in Table 4 could be applicable. For enterprise sys-tems, throughput might be chosen and defined as the number of transactionsprocessed. As for real-time systems, response times defined by the number ofseconds to respond to a request or event. The measurement chosen could alsobe a resource constraint, defined by the limits on the amount of services of-fered by hardware. A resource constraint can for example be set on how muchof the CPU capacity is used.

The workload intensity specifies the rate at which the software applicationis used. From a user perspective, the intensity may be expressed either as thearrival rate for requests in usage scenarios, and the amount of time betweentheir requests. From a system architecture perspective the workload can bedescribed as the arrival rate of the requests to different components and layersin the system.

The software plans are used to define the software execution path(s) of thesoftware. They are by Smith (1990) defined as:

“The software plans should specify the software componentsthat execute, the order in which they execute, and any repetitionas well as conditional and/or parallel execution of components forthe corresponding workload.”

Although the concept of execution graphs has been proposed by Smith (1990)as a notation for software plans, they are mostly used by performance expertsbut not widely used in software development. Instead, software plans can bederived from UML (OMG, 2003), SDL (ITU, 1992) or any other notationsthat models the dynamic aspects of software, even if they do not provide asmuch support for software performance as execution graph do.

S 15

Performance scenarios define the particular use cases that should be analysedwith the performance model. Typically those use cases that have an impact orhave performance requirements are chosen. Use case diagrams are used to de-scribe interactions between the system and its environment or between objectswithin the system. The use case are quantified using measures defined in theperformance objectives. The software plans are used to connect the use casesto the internal behavior of the software.

The execution environment describes the platform on which the softwareis executed. This platform consists of hardware as for example a CPU, hard-ware accelerators, databuses, etc. Included in the execution environment isalso the system software (the operating system and other utility programs thatare used by the application software without being a part of it). The resourcerequirements specify the amount of service required from devices in executionenvironment, where the software application is executed.

Processing overhead is a translation of the software resources services ontothe execution environment. An overhead specification for a particular softwareresource type should consider both which devices used and the amount ofservice required from each device. For example, a software resource mightrequire service from two different CPUs as well as access to system software inthe execution environment.

The definition if processing overhead by Smith (1990) should not beconfused with the two more common definitions used in computer science.Processing overhead can refer to the processing time required by system soft-ware when executing a specific application. In addition, processing overheadcan be used to describe the amount of processing time an additional appli-cation will add to the amount already required by the software previouslyinstalled.

3.2 Queuing theory

One of the more widely spread techniques to build performance models arebased on queuing theory. Queuing system are defined by Kleinrock (1975) as“any system in which arrivals place demands upon a finite capacity resource".

Queuing theory can be used to describe real world queues or more abstractqueues such as computer systems. Other applications of queuing theory areconcurrent processing, process scheduling, client-server systems, and telecom-munication systems. The use of queuing networks to model computer time-sharing systems was one the first steps of making these techniques available forsoftware performance analysis. As software products has become increasingly

16 1. I

more complex, other approaches have evolved. One of the most used queuingtechniques nowadays to model software systems are the layered queuing net-work (LQN). A layered queuing network is an extension of the queuing net-work models (King, 1990; Rolia & Sevcik, 1995; Smith & Williams, 2002).LQN defines a system in terms of requests sent and services given by differenthardware and software entities. The entities can be divided into three differ-ence categories: client tasks that only requests service, client-server tasks thatcan both receive and send requests and server tasks that only receives requests.These entities are placed in different layers where an entity at a higher levelis allowed to request service from a lower layer, but not vice versa. Modellinga system as an LQN provides analytical means to estimate the system per-formance with different software and hardware configurations with differentworkloads.

4 Research methodology

Research can be defined as the method of investigation that, if results areobtained and if correctly undertaken, builds knowledge. The investigationdone, represents a systematically investigation of facts about a certain subject.The research can be performed as either applied research or basic research.Whereas basic research attempts to expand the limits of knowledge, appliedresearch attempts to find the solution to a specific problem (Robson, 2002).Research can also be classified in many other ways. For example, a distinctionbetween quantitative and qualitative research (Seaman, 1999) can be made.

The research presented in the thesis, which can be classified as appliedresearch, has been performed to improve the development of software plat-forms. The focus in this section is on what methods were used to explore andtest the research questions. In the end of the section, the validity and industrialapplication of the thesis are discussed.

The research questions pursued in the thesis are the following:

Q1 How can the development and management processes be improved toenhance the quality of a software platform?

Q2 How can the development and management processes be improved toenhance the quality of products built on a software platform?

Q3 How can the development and management processes be improved toenhance the performance of a software platform?

R 17

Q4 How can the development and management processes be improved toenhance the performance of products built on a software platform?

4.1 Empirical research

Experiments, surveys and case studies have been used to validate the result ofthe research. The methods are by Robson (2002) summarised as:

Experiment: An experiment is performed to measure the effects of manip-ulating one variable on another variable. A typical feature of an ex-periment is that samples of individuals from known populations arestudied. The samples are then allocated to different experimental con-ditions. The conditions are then altered by making changes to oneor more variables. Normally measurements can be performed only onsmall number of variables, when at the same time issuing control on alarger amount of other variables. The analysis of an experiment usuallyinvolves hypothesis testing.

Survey: A survey is performed by collecting information in a standardisedform, from groups of people. A typical feature is that samples of indi-viduals from known populations are studied. A relatively small amountof data in a standardised way is collected from each individual. A sur-vey usually employs questionnaires or structured interviews to collectthe data.

Case study: A case study is performed by studying and collecting detailedknowledge about a single ‘case’ or a small number of related cases. Atypical feature is to investigate a selection of a single case (or a smallnumber of related cases) of a situation. The case can concern an indi-vidual or group. The study of the case is done in its context. A range ofdata collection techniques including observations, interview and docu-mentary analysis can be used.

Simulation: Simulation is the activity of designing a model of a real or fic-tive systems and carry out experiments on that model. Simulations areoften referred to as being of a continuous or discrete-event type (Law& Kelton, 1997; Banks et al., 2004). In continuous simulation mod-els systems the state is changed continuously over time. Whereas indiscrete-event simulations, the stat e changes at discrete points in timeand well-defined events are executed. The discrete-event type is nor-mally used in simulation models of computer systems.

18 1. I

Discrete-Event models can be developed using two different designs (Law& Kelton, 1997; Banks et al., 2004), either a process-oriented or event-oriented can be used. In a process-oriented simulation model, each sim-ulation activity is modeled by a process. A process consists of a set ofrelated events are grouped together. This process concept is similar tothe one used in an operating system. The simulation model is designedas a set of interacting processes. The event-oriented design is based ona direct scheduling and canceling of future events. The system is sim-ulated as a sequence of independent events, which are normally storedin a list in chronological order. After completion of an event, the nextevent in the list is start processed and the event completed is removedform the list.

Each of the methods used has its advantages and disadvantages. To decidewhat method to use is an important step in research design. The choice donewill govern the validity expected of the result. It will also govern the possibilityof the conclusion that can be drawn from the result. Therefore, the methodchosen must be based on the research questions that are explored. The choicesmade for each research question are summarised in Section 5.

4.2 Validity threats

To every research study, there are a number of threats to the validity. In thissection the validity is discussed based on an often-used list of threats to va-lidity (Campbell & Stanley, 1966). The validity of empirical studies is oftenevaluated based on four aspects: conclusion validity, internal validity, con-struct validity, and external validity. These aspects are described in Table 5.In the following subsections, each validity concern is discussed along with anumber of considerations that can be seen as threats to the validity of theresult presented in the thesis.

Conclusion validity

In this thesis, the low number of people that has been involved in the stud-ies is the most serious threat to the conclusion validity. In addition, a relatedthreat concerns the uncertainty and the dispersion of the participants. It willgenerally be hard to draw valid conclusions from the study if the participantsanswer differently and the uncertainty is large. If there are few people, disper-sion in the answers is hard to interpret. When designing the research, the goalhas been to minimise these threats by using as many participants as possible.

R 19

Table 5: Validity threats to the result of empirical studies.

Validity threats Description

Conclusion The conclusion validity concerns whether it is possible to draw statisti-cally significant conclusions from the study.

Internal The internal validity concerns the possibility that the effect (outcome)has been caused by factors unknown to the researchers.

Construct The construct validity concerns whether the measurements and inter-views represent the constructs of interest in the study.

External The external validity of a study concerns the ability to generalise thefindings.

Internal validity

None of the results in this thesis has been validated by performing a replicationin the same environment and with the same instrumentation. Therefore, itcannot be ruled out that the outcome would have been different if the studieshad been performed at another point of time or by other researchers. However,great care has been put on the instrumentation to minimise this threat. Carehas also been taken when selecting participants for the studies. For example,when groups were compared, attention was taken to ensure that the membershad equal level of expertise.

Another aspect when selecting participants is that they have always beenrecruited on voluntary basis. Knowing or controlling all the factors that affectthe research is a truly challenging task. It must however be done to maximisethe internal validity of the result. The precautions taken are believed to havehad a positive effect in achieving better internal validity.

Construct validity

A threat to the construct validity in the performed studied is that persons in-terviewed have been restricted in their answers, which means that the resultactually concerns different constructs than the researchers intended to investi-gate. It is however believed that the participating persons have been speakingopenly in the studies performed. They have been guaranteed anonymity andthe questions have not been considered of a sensitive nature (at least to the re-searchers’ knowledge). This, in combination with that the atmosphere at the

20 1. I

companies has been open, means that it is not believed that this kind of threatto the construct validity is large in the thesis.

A special consideration about the construct validity must be taken whenperforming simulation studies. The simulation kernel used in this thesis hasbeen thoroughly verified against analytical calculations.

External validity

External validity deals with the reliability with respect to the environments inthe other companies besides those where participated in the research. It is ofcourse, not obvious that the factors affecting the result would be the same in allapplicable companies. However, it probably exists companies that are differentfrom the ones studies as well as there are companies that are similar to thosestudied. To gain better external validity further studies and replications in thearea could be carried out.

4.3 Empirical research in industry

To be able to impact on an industrial software organisation with the gainedresult, the research must have high industry validity. Robson (2002) classifiesthis validity from a real world perspective as:

Level A: The traditional science approach. The research is of a theoreticalnature. Although the focus can be on solving practical problem, theapplication of the research is not seen as important and is often left toothers to study.

Level B: Building bridges between researcher and user. The researcher be-lieves that the eventual outcome of the research has practical implica-tions, and wants to influence the client with the outcome. The workmay be conducted in collaboration with the client and may includegiving the client status reports of the ongoing work.

Level C: Research-client equality The researcher and client discuss the prob-lem areas to investigate in collaboration. The client, the researcher orthe client and researcher in collaboration, may identify the researchproblem. The work must be performed together with the client, wherethe client can issue some control over the work done.

Level D: Client-professional exploration. A client requests help from a re-searcher. The collection of data is minimal and the advice or recom-mendation is based on the researcher’s expertise in the area.

R 21

Level E: Client dominated quest. The client requests help from a specialistor colleague with relevant expertise. The advice or recommendation isgiven based on current practices or knowledge.

The research questions in this thesis have been investigated with level of ap-plication equal to B and C. The application level for each research question issummarised in Section 5.

4.4 Research interaction between academia and industry

When conducting academic empirical research in interaction with organisa-tions in industry it is important to understand the underlying needs and pos-sibilities of the parties (i.e. academia and industry).

The interaction between industry and academia has been going on fromthe very start of the software engineering discipline. There are several examplesof industry involvement in academic research projects that have been fruitfulfor both parties. A win-win situation for both parties is normally the goalfor the collaboration between industry and academia. This situation does nothappen per se. On the contrary, in order to make it work there is a numberof issues that must be taken into consideration. There are a variety of ways toorganise the interaction between industry and academia. Different forms ofinteraction can for example be:

• Workshops and seminars through university-industry research centers.

• Networks to focus on specific areas and needs of the local industry.

• Industry-sponsored academics (Professors, Ph.D. students, etc.)

• Research projects with industrial collaborators.

In following sections incentives for collaboration are presented and practicalconsiderations are discussed in context of conducting research in industry.

Incentives for a collaboration

In order for any of these collaborations to be a perceived as a win-win sit-uation, both parties must have realistic incentives when entering them. In-centives are the motives for collaboration, which are based on the beliefs ofbenefits to be gained. The incentives induce people and organisations to be-have in a certain way. Incentives for the collaboration between industry andacademia can be classified in different categories (Lee, 2000) (see Table 6).

22 1. I

Table 6: Incentives for the collaboration between industry and academia (de-rived from Lee, 2000).

Incentive Description

Infrastructure Incentives such as technology transfer and professional contacts.

Economic Incentives as for example the trend in the recent years that the Swedishgovernment matches the economical commitment from industry with anequal value.

Honorific Incentives such as official awards and unofficial public recognition are im-portant driving forces.

Knowledge Incentives such as training, participation in seminars and workshops andparticipating in projects.

One interesting issue for collaboration is to understand if the incentivesfor collaborations were realistic. Realistic incentives are defined as incentivesthat are based on expectations that came true during the collaboration. Fromthe listed categories, three incentives are normally mentioned when discussingthe reason for academia to collaborate with industry. There is the incentive foracademia and in particular for faculty members to gain practical knowledgeto improve their own pedagogical function (knowledge incentive). Anotherincentive is to support and advance their personal research agenda (honorificand economic incentives). Finally, there is the incentive of entrepreneurshipthat means to capitalise on its own research and intelligence property (in-frastructure incentive). Lee (2000) presents a detailed list of incentives for allparties.

Practical considerations

In this section practical considerations of time, management, feedback andintelligence property in research collaborations are discussed. The discussionfocuses on performing research in the role as an industrial Ph.D. student.None of the considerations is specifically related to any single research projector case study performed during this work. Instead, it is a summary of guide-lines with general applicability.

Time: In a software company with projects with tight deadlines, time is al-ways a scarce resource. There might be numerous meetings and other

R 23

project duties that must be completed during a day at work. An en-vironment like that is far from optimal from a collaborative point ofview. Unfortunately, the described conditions are is common amongsoftware companies. There are, however, some practical considerationsthat can be used to make research collaboration smoother to perform.One practical consideration could be to plan the research session to beperformed during a period of the project that is relatively calm. A dif-ferent aspect of time is the duration of the research session. The longerthe research session is intended to be the more in advance should it beplanned. Besides working on research projects alone, industrial Ph.D.students have the opportunity to work on short-term industrial projectsto gain additional experience. This allows the student to keep in touchwith industry practices. Although the intention is for the best, there aredifficulties along the way. The main difficulties derive from the differ-ent perspectives of projects. Academia stands for the long-term projectperspective and industry for the short-term perspective.

Management: When starting a collaboration that involves employees in acompany, the consent and support from project managers is required.The consent is required even if the company already funds the research.In addition, in discussions with management it is important to high-light the objective of the research done in terms of solving specificproblems and values for the company as a whole. Another potentialmanagement issue that must be clarified prior to the collaboration ishow the researchers’ time is managed and prioritized.

Feedback: The companies’ commitments to a research project are often madeby the management. It is, on the other hand, not the managers but thetechnical staff that participate in the collaboration. Furthermore, thetechnical staff are the ones that allow part of their time into the researchcollaboration. Therefore, it is important to realise that the technical staff

must feel that participating in the research gives something of valueto them. When the research collaboration solves a specific problem,the answer to the problem is the self-evident feedback. The researchcollaboration can on the other hand be of a more general aspect andlooking at more long-terms aspects. Even if this is the case, there is stilla number of ways to give feedback to the participants. For example,training sessions can be included in the research. The training can bedone before, during or after the research. A seminar and explanation of

24 1. I

the result is the least that should be offered to the participants in theresearch. If the research is conducted during a long period, it may be awise approach to give the feedback as intermediate seminars or trainingsessions.

Intellectual Property: Academia and industry share a common goal to pro-duce intellectual property. There are, however, differences in the objec-tive of wanting to achieve the goal and of using the intellectual property.The objective in academia is to use the intellectual property as a toolto advance and disseminate knowledge. The objective in industry is tocapitalise the intellectual property in patents and products. That is, theindustry has a protective relation to its intellectual property, while acad-emia has a complete opposite relation. Publications of research findingsare in academia encouraged and mandatory.

Although considerations have been given to practical problems, the re-search collaboration might not pay off. The reason could be that wrong partic-ipants have been chosen for the research collaboration. Another reason couldbe that the company research strategies do not endorse collaboration withacademia.

There is a possibility to gain mutual benefits in an academia-industry co-operation if and only if the threat of having the wrong scope and incentivesin mind when starting the collaboration are taken seriously, and dealt withappropriately.

5 Research result

The result of the thesis indicates the possibility of quality improvement inseveral areas of software platform development. The main findings can besummarised as a number of contributions that are presented in the papers (seePage 1) included in the thesis . Two contributions (Paper I and Paper II) are di-rected to quality improvement in general and four contributions (Paper III, ...,Paper VI) are directed to improvement with focus on software performance.

For the papers that the author of the thesis is not the main author, thecontribution of the author can be demarcated as follows. In Paper I, the maincontribution lays in the compilation the benchmarking instrument, writingthe description for one of the company, carrying out the interviews in bothcompanies and together with the co-authors contributing to the methodolog-ical parts. In Paper IV the main contribution lays in collecting timing infor-

R 25

mation from the real product and together with the co-author contributingto the methodological parts.

5.1 Paper I

An approach to find process improvements by benchmarking the platform processhas been proposed and evaluated.

An approach to benchmark two software platform organisations is proposedand proven feasible in a case study. The benchmarking approach consists oftwo parts. One part is a questionnaire with eight questions customised toelicit improvements. The other parts consist of letting organisations reviewdescriptions of each other´s answers to the questionnaire. The majority of theparticipants acknowledged the questionnaire as positive. The result indicatesthat it is possible to use the questionnaire for collecting information followedby reviews, as a benchmarking approach.

5.2 Paper II

A new measure to track the quality of a software platform and products built on asoftware platform has been proposed and evaluated.

A measure for product line degradation is proposed and it is shown that themeasure is based on sound theoretical properties. In a case study the measureis evaluated and there are indications that the measure can be usable to trackproduct line degradation based on measurements on the software platform.

5.3 Paper III

A qualitative methodology to survey and tailor processes to improve managementof software performance is proposed and evaluated.

A qualitative methodology to survey and tailor process activities related tosoftware performance is presented and evaluated in a case study. The companyconsidered the results as both valuable and relevant, showing that the method-ology presented gives highly valuable input for tailoring a SPE processes. Inaddition, a number of observation and suggestions related to an establishedand general SPE process has been brought forward.

26 1. I

5.4 Paper IV

It is shown that a method relying on data from prior platforms improve the esti-mation of software performance compared to not using prior information

Two methods for subjective predictions of performance are investigated. Withone of the methods experts estimate the relative resource usage of softwaretasks without using any knowledge of earlier versions of the product, andwith the other method experts use their experience and knowledge of earlierversions of the system. This result shows that available data for the currentproduct is valuable input for the subjective estimations.

5.5 Paper V

Principles for storing and using measurements related to software performance es-timations are proposed and evaluated.

A method to use an organise performance data is introduced. A database toolis proposed and implemented in order to simplify access and availability ofsoftware performance data for an entire development organisation, which al-lows organising, presenting, and to some extent evaluating the existing, het-erogeneous data for software performance estimations.

5.6 Paper VI

It is shown which choices should be made regarding the distribution of interarrivaltimes and execution times, when a simple performance model, parameterized withtracefiles data is used.

The model that gives the lowest estimation error is when the execution timesare modelled with samples drawn from trace files. This is true irrespectively ofwhether the interarrival times are modelled with an exponential distributionor samples from trace files. An implication of this result is that the executionenvironment is more important to model with real values than the interarrivaltimes.

5.7 Summary

The empirical investigation methods used and the industrial applicability ofeach research question are summarised in Table 7.

R 27

Table 7: Summary of the research questions and empirical research method usedin each research question.

Research Paper Empirical Industrialquestion method applicability

Q1 I Case study, Survey Level B,C

Q2 II Case study Level B

Q3 III, V Case study, Survey Level B,C

Q4 IV, V, VI Case study, Survey, Experiment Level B,C

5.8 Future work

The research contributions provided in this thesis can be further elaboratedand provide a starting point for further work. Some possible areas for furtherwork are described in the following paragraphs.

Further research can be done in the area of tracking product line degra-dation (Paper II). A significant challenge for such a tracking is to depict thedegradation quantitatively and relate the quantitative values to actions per-formed in the platform development process. The approach can also be in-vestigated in order to track the degradation of specific quality attributes (e.g.degradation of maintainability, degradation of performance, degradation ofusability, etc.).

Further research can be done in benchmarking the platform process (Pa-per I). For example, the real improvement effect of the benchmarking canbe studied and the use of the benchmarking instrument may be extended.It would be interesting to investigate if the benchmarking instrument can beused for self-assessment or if a third party could perform the assessment. Thereal improvement effect of the qualitative approach (Paper III) could also besubject for further study.

Further research can also be done to investigate methods for estimating theuncertainty of different individual subjective estimations (Paper IV). It wouldbe an advantage to receive an indication of the uncertainty of a estimationalready when the estimation is made.

Further research can be done in methods for combining subjective esti-mations (Paper IV) with performance data from current products (Paper V).The estimations and available data would then be used to use simulations (Pa-

28 REFERENCES

per VI) to estimate the system performance of software platforms and prod-ucts based on software platforms.

Further research can be done on financial benefits gained when using adatabase approach for collecting performance data (Paper V). Also investiga-tions on how to add support for more complex software models and hardwaremodels can be made.

References

Banks, J., Carson, J., Nelson, B. L., & Nicol, D. (2004). Discrete-event systemsimulation (4 ed.). Prentice Hall.

Bergman, B., & Klefsjö, B. (2002). Quality from customer needs to customersatisfaction (2 ed.). Studentlitteratur AB.

Bosch, J. (2000). Design and use of software architectures: Adopting and evolvinga product-line approach. ACM Press/Addison-Wesley.

Buhr, R., & Casselman, R. (1996). Use case maps for object-oriented systems.Prentice Hall.

Burns, A., & Wellings, A. (2001). Real-time systems and programming lan-guages (3 ed.). Addison-Wesley.

Campbell, D. T., & Stanley, J. C. (1966). Experimental and quasi-experimentaldesigns for research. Houghton Mifflin Company College Division.

Clements, P. C., Kazman, R., & Klein, M. (2002). Evaluating software archi-tectures: Methods and case studies. Addison-Wesley.

Clements, P. C., & Northrop, L. (2001). Software product lines: Practices andpatterns. Addison-Wesley.

Cusumano, M. A. (2004). The business of software: What every manager,programmer, and entrepreneur must know to thrive and survive in good timesand bad. Free Press.

de Bruin, H., & van Vliet, H. (2003). Quality-driven software architecturecomposition. Journal of Systems and Software, 66 (3), 269–284.

IEEE 610.12. (1990). IEEE standard glossary of software engineering terminol-ogy. Institute of Electrical and Electronics Engineers.

REFERENCES 29

ISO/IEC 9126-1. (2000). Software Engineering – Product Quality – Part 1:Quality Model. ISO/IEC.

Jacobson, I. (1992). Object-oriented software engineering: A use case drivenapproach. Addison-Wesley.

Jain, R. (1991). The art of computer systems performance analysis. Wiley.

Jazayeri, M., Ran, A., & van der Linden, F. (2000). Software architecture forproduct families: Principles and practise. Addison-Wesley.

Johansson, E., Höst, M., Wesslén, A., & Bratthall, L. (2001). The importanceof quality requirements in software platform development - a survey. InProceedings of 34th International Conference on System Sciences (Vol. 9, pp.9057–9067).

King, P. J. (1990). Computer and communication system performance modelling.Prentice-Hall.

Kleinrock, L. (1975). Queueing systems, volume i: Theory. Wiley Interscience.

Law, A. M., & Kelton, W. D. (1997). Simulation modeling and analysis.McGraw-Hill.

Lazowska, E. D., Zahorjan, J., Graham, G. S., & Sevcik, K. (1984). Quanti-tative system performance. Prentice-Hall.

Lee, Y. S. (2000). The sustainability of university-industry research collabo-ration: An empirical assessment. The Journal of Technology Transfer, 25(2),111–33.

McCall, J. A., Richards, P. K., & Walters, G. F. (1977, November). Factors insoftware quality (three volumes) (Tech. Rep. No. RADC-TR77-369). RomeAir Development Center, Italy: General Electric Company.

Meyer, M. H., & Lehnerd, A. P. (1997). The power of product platforms:Building value and cost leadership. Free Press.

Object Management Group. (2003). Unified modeling language, version 1.5.(http://www.uml.org/)

Parnas, D. (1976). On the design and development of program families.IEEE Transactions on Software Engineering, 2(2), 1–9.

30 REFERENCES

Robson, C. (2002). Real world research a resource for social scientists andpractitioner-researchers (2 ed.). Blackwell Publishers.

Rolia, J. A., & Sevcik, K. C. (1995). The method of layers. IEEE Transactionson Software Engineering, 21(8), 689–700.

Rumbaugh, J., Jacobson, I., & Booch, G. (1998). Unified modeling languagereference manual. Addison-Wesley Publishing Company.

Sanderson, S., & Uzumeri, M. (1995). Managing product families: The caseof the sony walkman. Research Policy, 24, 761–782.

Sangiovanni-Vincentelli, A., Carloni, L., Bernardinis, F. D., & Sgroi, M.(2004). Benefits and challenges for platform-based design. In Proceedingsof 41st annual conference on Design automation (pp. 409–414).

Seaman, C. (1999). Qualitative methods in empirical studies of softwareengineering. IEEE Transactions on Software Engineering, 25(4), 557–572.

Smith, C. U. (1990). Performance engineering of software systems. Addison-Wesley.

Smith, C. U., & Williams, L. G. (2002). Performance solutions: A practicalguide to creating responsive, scalable software. Addison Wesley.

Specification and description language (SDL). (1992). ITU-T Standard Z.100,International Telecommunication Union.

Svahnberg, M. (2003). Supporting software architecture evolution–architectureselection and variability. PhD dissertation, Blekinge Institute of Technology,Ronneby, Sweden.

Weiss, D., & Lai, C. (1995). Software product-line engineering: A family-basedsoftware development process. Addison-Wesley.

Womack, J., Ones, D., & Rooms, D. (1991). The machine that changed theworld. Harper-Colons.

I

PAPER I

Benchmarking of Processes for ManagingProduct Platforms - a Case Study

Martin Höst, Enrico Johansson, Adam Norén and Lars Bratthall

IEE Proceedings-Software, Volume 149, Number 5, Pages 137–142, The Institution ofElectrical Engineers, 2002

Abstract

A case study is presented in which two organisations have participated ina benchmarking initiative to discover improvement suggestions for theirprocesses for managing product platforms. The initiative, based on an instru-ment which consists of a list of questions, has been developed as part of thisstudy and contains eight major categories of questions that guide the partic-ipating organisations to describe their processes. The descriptions were thenreviewed by the organisations cross-wise in order to identify areas for improve-ment. The major objective of the case study is to evaluate the benchmarkingprocedure and instrument in practice. The result is that the benchmarkingprocedure with the benchmarking instrument has been well received in thestudy. It was therefore concluded that the approach is probably applicable forother similar organisations as well.

32 I.

1 Introduction

Effective management is important for the process of developing platforms onwhich a number of products are based and for the process of developing theplatforms themselves. A platform can be seen as the core asset of a productline architecture, a type of architecture which is used in products that come inmany versions or families. These families are based on the same platform to re-duce cost and shorten lead-times (Bosch, 2000). Software companies shouldnot have to recreate the infrastructure for each new project, instead the de-velopment and the management activities should be based on well-definedprocesses and a solid product architecture. It is also critical that managers,platform developers and product developers all share the same vision for asoftware platform.

“Technically excellent product line architectures do fail, often becausethey are not effectively used. Some are developed but never used or, if used,they are used in an incorrect way.” Cohen et al. (2000) There is a large numberof users of a platform (e.g. software product developers), and there can be sev-eral different concurrent product projects using the platform. Managementprocesses are needed to support dependent development efforts in differentphases of product projects, platform projects and across the separate develop-ment teams involved.

Processes for managing product platforms (product platform managementprocesses (PPMPs)) have recently been given much attention, for example Ter-sine & Hummingbird (1995); Wheelwright & Clark (1995); Narahari et al.(1999). In practice, there are, of courses, many different processes in place,which means that companies constantly have to evaluate and improve theirprocesses in this area. The objectives for an organisation for carrying out astudy such as the one presented here are to identify improvement propos-als for their PPMPs and, in the long run, to introduce improvements in theprocess. That is, this kind of study can be part of a software process improve-ment program.

In this paper, a procedure for carrying out a benchmarking initiative toimprove PPMPs is presented and evaluated in a case study.

B 33

2 Benchmarking methodology

2.1 Introduction

Software process improvement is important for all software development andmanagement processes. Often the steps that are taken in software improve-ment are to, first, carry out an assessment and, then, based on the assessment,identify improvement proposals hence to introduce, evaluate and tune thechanges (Humphrey, 1989). The benchmarking technique presented here isone technique that can be used for assessment and identification of improve-ment proposals.

Benchmarking (Camp, 1989; Bergman & Klefsjö, 2002) has been usedas a general improvement approach in a variety of business areas. The basicidea behind the benchmarking concept is that, for each company, there are anumber of other companies that have been working with the same issues andproblems. A company that wants to improve in a certain area should thereforeidentify appropriate companies to compare themselves with and learn from. Itis, of course, important that competitors are not chosen. Instead, companieswhich are working in the same way with similar processes, but with productsnot competing, should be chosen. In the study that is presented in this pa-per, the companies have in common that they are developing new productsbased on a common platform, and that they have a PPMP. Camp presents abenchmarking process using four major steps (Camp, 1989): the first step is‘planning’, where it is decided what business function or type of product thebenchmarking initiative should focus on, what companies should be involvedin the initiative and details on how the initiative should be carried out (forexample, it is decided how data should be collected during the work); thesecond step is the ‘analysis’ step, where the actual comparison of the compa-nies is carried out; the third step is an ‘integration’ step, where it is decidedand planned how findings from the initiative can be integrated in the currentprocess (it is, for example, important to obtain management and operationalacceptance and to communicate the changes to all levels of the organisation);the fourth step is the ‘action’ step, where the changes are implemented andthe results are monitored.

The benchmarking approach that is presented and evaluated in this paperconcerns the first two steps (planning and analysis) of the process presentedabove. It is also tailored to be used for comparing PPMPs, i.e. it gives guidanceon how to compare two processes for product platform management. Afterthe comparison, it is important to identify improvement actions from the

34 I.

result of the comparison and to actually implement the changes. It would,however, be too much to cover implementation issues in this paper.

There are a number of issues from the philosophy of Total Quality Man-agement (TQM) that are relevant in the area of benchmarking. Benchmarkinginitiatives can be designed around the key issues of, for example, the Mal-colm Baldridge approach (National Institute of Standards and Technology,2001) or the Capability Maturity Model from the Software Engineering In-stitute (Paulk et al., 1993). Part of the approach that is presented here is in-fluenced by the method of performing quality evaluations according to, forexample, the Malcolm Baldridge approach. However, in this approach the in-volved organisations are not compared to any general model. Instead, they arecompared to each other in areas that are of interest for them.

Organisations can be analysed with respect to a number of dimensions (Na-tional Institute of Standards and Technology, 2001; Swedish Instiute for Qual-ity, 2000), e.g. with respect to the approach (i.e. the process) that they de-scribe, the deployment of the approach, i.e. to which extent the work accord-ing to the approach is performed in the organisation, and the results that theapproach provides.

The organisations that participate in this kind of initiative have a numberof objectives for participating. First of all, one objective is to receive informa-tion on how another company has organised their work in the area. Namely,it is a way of receiving improvement proposals from the work of another or-ganisation. Another objective is to receive feedback on their processes fromexperts in the area. The idea of the approach is that the participating organi-sations should achieve their objectives by participating in the work, i.e. if twoorganisations participate, both of them should reach their goals by mutuallyanalysing and commenting on the processes of each other.

2.2 Benchmarking process

In this Section, a benchmarking approach that can be used for analysingPPMPs is presented. The benchmarking approach has been evaluated in acase study, which will be presented in Section 3. The benchmarking processis based on a benchmarking instrument (BI), which is a list of questions thatthe participating companies use to develop descriptions of their PPMPs. Thefollowing benchmarking process can be used in a benchmarking activity withtwo organisations (A and B):

Step 1: Identify important issues to work with and agree on a final BI. Thismay, for example, be done by letting the participating organisations

B 35

review an initial proposal for a BI. In this way, they will be able to addissues that they are interested in and remove issues that they are notinterested in. The final BI that was used in the case study is presentedin Section “Benchmarking instrument:”.

Step 2: Each organisation develops a presentation of their PPMP accordingto the BI. This results in one document from each company (PA, PB).

Step 3: Each organisation reviews the presentation written by the other com-pany. The reviews are carried out according to a review template (RT)which is presented in Section “Review of presentation:”. Each reviewresults in a number of comments to the presentation (CP). That is,company A reviews PB , which results in CPB , and company B reviewsPA, which results in CPA. The comments to the presentations are sentto the benchmarking coordinator.

Step 4: CPA is distributed to company A and CPB is distributed to companyB.

Step 5: A meeting is held with representatives from each company. At thismeeting, it is, for example, possible to discuss common improvementproposals.

The procedure may be organised by a coordinator who is responsible forinitiating the different tasks and delivering the needed documentation etc.This is illustrated in Figure 1. The role of the coordinator is refined in Section3, where the case study is presented. Step 2 of the benchmarking approachis further described in Section “Development of presentation:”, and step 3 isfurther described in Section “Review of presentation:”.

The idea of carrying out a benchmarking initiative by letting organisa-tions review descriptions of each other is not unique to this approach. Thebenchmarking instrument that was used is however developed especially forthis study.

Benchmarking instrument:

In this Section, the benchmarking instrument that has been used is presented.The benchmarking instrument consists of a number of questions that are pre-sented in Table 1. There are questions at two levels, one high-level and onemore detailed level. For every high-level question there are a number of de-tailed questions. The benchmarking instrument guides the participating or-ganisations to describe their PPMPs with respect to the following issues:

36 I.

Company BSte p

1 BI BI

PA PB

PB, RT PA, RT

CPB CPA

CPA CPB

2

3

4

CoordinatorCompany A

Figure 1: The benchmarking process.

Introduction/context: In order to better interpret answers number of ques-tions that characterise the development organisation in general. Theparticipants are asked to make an overview of the architecture of theirproduct, as well as of their development process. In this way, it is possi-ble to interpret answers in the context of, for example, evolutionary de-velopment or more waterfall-model development. These questions arenot further analysed in this paper and they are not included in Table 1.

Organisational context for architecture work: Each participating organisa-tion is known to have a group that is responsible for their product ar-chitecture. Questions Q1 and Q2 concern the responsible groups’ em-powerment and responsibilities.

Initial architecture design: This section contains detailed questions regard-ing architecture development. The questions in this section are inten-tionally formulated as being open-ended, as it is believed that this willhelp elicit a broad spectrum of techniques. This is mainly covered byquestions Q3, Q4 and Q5.

Architecture use and evolution: At some point, the architecture is no longernot only the property of an architecture group, as many people mustuse it. This section focuses on the deployment and follow up of thearchitec-ture. This is mainly covered by questions Q6, Q7 and Q8.

The benchmarking instrument that was used in the case study guides theparticipating companies to present their approach in the area. It would be

B 37

Table 1: Questions in benchmarking instrumentation.

Id Questions

Q1 How is the responsibility and empowerment for the group responsible for the plat-form defined?

Q1a How and by whom is the group responsible for the platform constituted?Q1b Who decides when the group responsible for the platform should be involved in

the decision-making?Q1c What and who gives the group responsible for the platform the ability to perform?Q2 Describe the stakeholder interaction.Q2a To which stakeholders do the group responsible for the platform communicate?Q2b What commitments are made to different stakeholders?Q3 Which procedures are used to identify architectural requirements?Q3a What different types of architectural requirement are identified?Q3b How are the different types of requirements handled?Q3c How are the architectural requirements prioritized?Q4 What design process is used?Q4a What methods are used when identifying the requirements impact on the architec-

ture?Q4b What languages are used in order to design the architecture?Q4c What granularity does the architecture design cover?Q4d What technical solution are used in order to ensure the wanted quality on the

architecture?Q5 Describe how the architecture is validated.Q5a How is it validated that the architecture meets the identified requirements?Q6 Describe how the architecture is distributed.Q6a How is the architecture presented and stored?Q6b What architecture views are used? Why are they?Q6c How are the architecture views presented and stored?Q7 Describe how it is ensured that software architecture has its self-evident place in the

software development process.Q7a How do you ensure that the architecture rules and visions are used in the rest of the

organization?Q7b How is the feedback on the architecture work collectedQ7c How is it ensured that the product-line architecture is conceived as positive among

the users of the architectureQ8 How is the product-line architecture controlled?Q8a How is it ensured that the architecture has a positive impact on the organization?Q8b How is ensured that the correct architecture vision is applied by the architecture

group?Q8c How is the wanted architectural quality ensured when the architecture is evolved?Q8d How are the causes of product-line architecture erosion identified?Q8e How is it ensured that the product-line architecture is used in a correct way?

38 I.

possible to include questions that guide the companies to present other di-mensions of their PPMPs as described in Section Section 2.1. For example,the results of this approach could be presented. However, in this case therewas not enough data available. The results are instead captured in the reviewquestions as described in Section “Review of presentation:”.

Development of presentation:

The presentation should preferably be developed by persons that are familiarwith the organisation, but it can be developed in different ways. It may, forexample, be developed by one person that already has knowledge of most ofthe work in the area and therefore, can describe the work directly. It mayalso be developed by one person that gathers information from the work in astructured way.

In the first case, a presentation can be developed initially and then beinternally reviewed by appropriate persons in the organisation. In the latercase, the person that is responsible for developing the presentation may firstinterview people in the organisation and then develop a presentation that canbe internally reviewed by appropriate persons in the organisation.

In the case study, presented in Section 3, it was decided that the presen-tation should be written in natural language, because this was considered themost natural for the questions that were defined.

Review of presentation:

There are two major objectives of the reviews: the first is that the organisa-tions that have developed the presentations should receive feedback on theirwork and the second objective is that people in the reviewing organisationsshould get information from another organisation. Both of these objectivesare premiered by letting many people review the presentations. However, forpractical reasons, such as limited available effort, there will be a limited num-ber of people that are able to review the presentations. It is important to noteissues of confidentiality, which may also limit the number of people that re-view the presentations.

A likely procedure is that a number of people review each presentation andthat one person is responsible for summarising the findings and producinga report with feedback. After that, the feedback report is presented to theorganisation that has produced the presentation in step 4 of the process (seeFigure 1).

B 39

Table 2: Quantitative review questions.

Id Formulation Description of grades

RQ1 Evaluate the maturity 0: No approach presented.of the presented approach 1: An example of how it was once done is presented.and grade it (0-10) 5: According to the presentation, most work is car-

ried out in a methodological way.10: A systematic procedure is used in every situa-tion without exception. Systematic improvementsare integrated in the approach.

RQ2 Evaluate and grade (0-10) 0: There is no (or a negative) result of usingthe results that you believe the presented approach.that the presented approach 5: Most results are good. The approach seemsgives to the presenting to be a good way of working.organisation 10: The best possible result is obtained. No other

approach would give a better result.

To guide the reviewers during the review and to obtain interesting feed-back to the other organisation, a review template with five review questionshas been used. In the first two review questions, the reviewer should assessthe maturity of the presented approach and the result that it produces witha grade (0-10). The template includes a description of some of the grades.The intermediate are, of course, also allowed. The two quantitative reviewquestions are presented in Table 2.

RQ1 and RQ2 guide the user to review the description with respect tothe approach, the deployment of the approach and the result of the approach,as discussed in Section 2.1. For the three other review questions, the reviewershould state the answers qualitatively. The three qualitative review questionsare presented in Table 3. All review questions (RQ1-RQ5) should be answeredfor every area in the description (according to the major questions in thebenchmarking instrument, Q1-Q8).

The intention is that the most important information transferred to theother company should be provided by the reviewers in RQ3 and RQ4. Theobjective of the first two review questions is to obtain data that allows com-parison of the different reviewers and to let the reviewers reflect over the issuesin the questions. This is why the order of the questions has been chosen in thisway. If you are first asked to grade an approach, you will probably evaluate itcarefully and compare it to the other approaches etc.

40 I.

Table 3: Qualitative review questions.

Id Formulation

RQ3 Explain your reasons for the grades in questions RQ1 and RQ2.

RQ4 Present suggestions for improvements for the presenting organisation. It is impor-tant for the presenting company that you answer this question.

RQ5 Additional comments.

3 Case study

The benchmarking approach has been evaluated in a case study (Wohlin etal., 2000; Robson, 2002). The research objectives of the study are to evaluatethe importance of the areas covered by the questions in the benchmarkinginstrument, and to get experience from carrying out a benchmarking initiativeaccording to the process presented in Section 2. The case study is carried outas a cooperation between two organisations:

ABB Automation Technology Products AB: The organisation develops real-time systems that operate in industrial environments. The systems havea life cycle of about 10 to 30 years. Goals of the architecture are hard-ware and operating system independency, easy extension, openness, highreliability, small size and high performance. Many products are builtfrom the same system platform. The developed products are part of alarger system. The configuration of the used systems may differ greatly,from single components to large plants or geographically spread-outconfigurations.

Ericsson Mobile Communications AB: The company develops consumerproducts within the telecommunications area where competition hasincreased drastically during recent years. Consumer products for mo-bile Internet will be cheaper, smaller and more complex. To be able tocompete in this cutting-edge market, Ericsson Mobile Communicationfocuses on developing a software architecture that can be extended eas-ily, maintained and reused. The purpose of the software architecture isto provide a common platform that can be used as a basis when devel-oping consumer products with low cost and small effort.

C 41

Lund University and both of the industrial organisations are part of thesame research network (LUCAS, Center for Applied Software Research (Fur-ther information on LUCAS can be found at http:/lwww.lucas.lth.se)). It wastherefore natural to involve the two industrial organisations as participants inthe case study.

In the benchmarking approach, a number of additional actions were takenin order to be able to draw conclusions. To receive feedback on the questionsin the benchmarking instrument, a number of feedback questions (FQ) wereformulated and distributed together with the review template to the two or-ganisations. Everyone who reviewed the descriptions also replied to the feed-back questions. This means that feedback information (FI) was produced byall participants in the study. The research actions and the usage of the researchinstrumentation are described in Figure 2.

company Bste p

1 BI BI

PA PB

PB, RT, FQ PA, RT, FQ

CPB, FIA1,...FIAn CPA, FIB1,...FIBmCPA CPB

2

3

4

.

coordinatorcompany A

Figure 2: The benchmarking process with research instrumentation.

Two feedback questions were formulated. Everyone that reviewed mater-ial from the other company answered the feedback questions for every generalquestion in the benchmarking instrument. The answers to the feedback ques-tions were given quantitatively with a grade from 0 to 10. The two feedbackquestions are presented in Table 4.

3.1 Analysis and results

In total, 11 persons answered the feedback questions. There were 6 personsfrom one of the organisations, organisation A, and 5 persons from the other

42 I.

Table 4: Feedback questions.

Id Formulation Description of grades

FQ1 Grade (0-10) the importance 0: Not important at all.for your own organisation ofthe issue that is covered by 10: Our most important question.the question.

FQ2 Grade (0-10) the use of the 0: The answers to the question cannot help us atanswer to the question. all when we compare to our own organization.

5: The presented approach can help us to findchanges to the procedures at our company. Or,parts of the presented approaches may bedirectly introduced in our organisation.

10: The procedures that are presented in theanswer are directly applicable in our organisa-tion and they would without doubt help us toimprove with respect to quality, cost and leadtime in our projects.

organisation, organisation B (for confidentiality reasons it is not publishedwhich one of the involved organisations is denoted company A and whichorganisation is denoted company B) (out of these data, there was one personfrom each organisation that only answered FQI). The results of the analysiswith respect to the feedback questions FQI and FQ2 are displayed in Figure3.

The results are displayed as mean values and error bars representing onestandard deviation above the mean and one standard deviation below themean. A small error bar shows that the reviewers’ answers to the questionwere in close agreement.

The benchmarking instrument is analysed with respect to the major cat-egories of questions, i.e. Q1, Q2, . . . , Q8 in Table 1. This approach waschosen because it was deemed to be too hard for the reviewers to distinguishbetween their opinions about the different subquestions, Q1a, Q1b, etc. Mostreviewers answered with respect to the major questions, but a few reviewersanswered with respect to both the major questions and the subquestions. Forthose persons, a major score was estimated as the mean value of the majorquestion and the subquestions. For example, for the first major question thegeneral answer would be calculated as (Q1+Q1a+Q1b+Q1c)/4.

C 43

Figure 3: Analysis-results for the feedback questions.

The results in Figure 3 are shown for each major question (Q1-Q8), or-ganisation and feedback question (FQ1, FQ2). FQ1 is denoted ‘importance’and FQ2 is denoted ‘use’ in the Figure. It can be seen that all questions are im-portant for both organisations. No question seems to be less important thanany other question. It can also be seen that the use of the descriptions of theother organisation is given a lower grade than the importance of the questionby both organisations. In some questions organisation A gives a higher gradeto FQ2 than organisation B, and in some cases it is the other way around. Thismay be interpreted as meaning that the organisations learn different thingsfrom each other in different areas. It is not only that the areas are consideredimportant, it also indicates that, in this case, it was possible to use descriptionsfrom the other organisation to identify improvement proposals. However, itshould be noted that the values with respect to FQ2 are lower than the val-ues with respect to FQ1, i.e. both organisations grade the importance of thequestions higher than the value of the descriptions of the other organisation.This may be because, even if an area is considered important, it may not bethe case that the other organisation has a solution that is relevant for the ownorganisation.

To summarise, it can be stated that the questions were appropriate for theorganisations involved in this benchmarking initiative. However, it should benoted that there are few data points and the results are insecure. For exam-

44 I.

ple, there were some persons who gave the same figure for all categories ofquestions (Q1-Q8), especially for FQ2. This means that they have describeda lever of usage for all the areas, but it is not sure that the result for every area(Q1-Q8) actually could be treated in detail.

4 Conclusions

It can be concluded that the presented benchmarking approach was feasiblein the case study. All major areas of the benchmarking instrument were re-garded as important by both organisations. No areas in the benchmarkinginstrument were considered significantly less interesting than the other areas.This means that no major changes must be made to the instrument before itis further used and evaluated. Both organisations were interested in acquiringknowledge from the descriptions that were developed by the other organisa-tion. It should, however, be noted that both organisations were involved indeveloping the instrument.

It is important to evaluate the validity of the results. In this case, the au-thors believe that the most important issue to evaluate is the possibility togeneralise the results. It is hard to estimate the possibilities of generalisationof the study. Further studies are needed in the area to be able to draw gen-eral conclusions. The small number of participants should also be noticed.Another threat is that the organisations have been active in developing theinstrument that later on has been evaluated in the case study. This also makesit hard to draw general conclusions. although it is not believed that it affectsthe other results to any large degree.

There are a number of areas that could be investigated in further work.Some time after the case study it would be possible to investigate what changesthat have actually been introduced in the organisations as a result of the initia-tive. However, when changes have been made in an organisation after a bench-marking like this, it will probably not be obvious as to which are the changesthat are a direct result of the benchmarking. Changes are decided based on anumber of sources of information, where the benchmarking approach may beone source. Possible further research issues, of course, also include further casestudies involving other companies.

As has been stated in this paper, both organisations grade the importanceof the questions in the benchmarking instrument higher than the value of thedescriptions of the other organisation. This means that the questions in theinstrument are considered important, and a possible area of further research

REFERENCES 45

is to investigate the possibility of carrying out a self-assessment based on thequestions in the benchmarking instrument.

Acknowledgements

This work was partly funded by The Swedish Agency for Innovation Systems(VINNOVA), under a grant for the Center for Applied Software Researchat Lund University (LUCAS). The authors would also like to thank all theparticipants in the study.

References

Bergman, B., & Klefsjö, B. (2002). Quality from customer needs to customersatisfaction (2 ed.). Studentlitteratur AB.


Camp, R. (1989). Benchmarking - the search for industry best practices that leadto superior performance. ASQC Quality Press.

Cohen, S., Gallagher, B., Fisher, M., L.Jones, Krut, R., Northorp, L., et al.(2000). Third DoD product line practice workshop report (Tech. Rep. No.CMU/SEI-2000-TR-024). Software Engineering Institute.

Humphrey, W. (1989). Managing the software process. Addison-Wesley.

Narahari, Y., Viswandham, N., & Kumar, K. K. (1999). Lead time modelingand acceleration of product design and development. IEEE Transactions onRobotics and Automation, 15(5), 882–896.

National Institute of Standards and Technology. (2001). Criteria for per-formance excellence baldrige national quality program. National Institute ofStandards and Technology. (Available via http://www.quality.nist.gov)

Paulk, M., Curtis, B., Chrissis, M., & Weber, C. (1993). Capability matu-rity model for software, version 1.1 (Tech. Rep. No. CMU/SEI-93-TR-24).Software Engineering Institute.


46 REFERENCES

Swedish Instiute for Quality. (2000). The SIQ model for performance excellence.(Available via http://www.siq.se)

Tersine, R., & Hummingbird, E. (1995). Lead-time reduction: The search forcompetitive advantage. International Journal of Operations and ProductionManagement, 15(2), 8–18.

Wheelwright, S., & Clark, K. (1995). Leading product development: The seniormanager’s guide to creating and shaping the enterprise. The Free Press.

Wohlin, C., Runeson, P., Höst, M., Ohlsson, M., Regnell, B., & Wesslén, A.(2000). Experimentation in software engineering: an introduction. KluwerAcademic Publishers.

II

PAPER II

Tracking Degradation in Software Product Linesthrough Measurement of Design Rule Violations

Enrico Johansson and Martin Höst

Proceedings of 14th International Conference on Software Engineering and KnowledgeEngineering, Pages 249–254, ACM Press, 2002

Abstract

In order to increase reuse, a number of product versions may be developedbased on the same software platform. The platform must, however, be man-aged and updated according to new requirements if it should be reusable ina series of releases. This means that the platform is constantly changed dur-ing its lifecycle, and changes can result in degradation of the platform. Inthis paper, a measurement approach is proposed as a means of tracking thedegradation of a software platform and consequently in the product line. Thetracking approach is evaluated in a case study where it is applied to a series ofdifferent releases of a product. The result of the case study indicates that thepresented approach is feasible.

48 II.

1 Introduction

Software organisations use tracking to promote efficiency in management anddevelopment of software. The tracking consists of monitoring and controllingthat organisations stay within wanted cost, quality and lead-time boundaries.This is also valid for the tracking approach presented in this paper, thoughfocused on software platforms and product line architectures (Parnas, 1976;Jazayeri et al., 2000; Bosch, 2000; Clements & Northrop, 2001; Bass et al.,2003). The importance of tracking is accentuated in software platforms, thecore asset of product line architecture. This is due to the fact that the sameplatform architecture supports a large number of products with guide-linesand mechanisms and is as such often in the critical line of product projects.

The benefit of this kind of reuse is that the quality is maintained and lead-times improved when building many products on the same platform (Sander-son & Uzumeri, 1995; Meyer & Lehnerd, 1997).

Since the software platform is one of the core reusable assets of a productline architecture it has influence on the quality of the whole software prod-uct line. The quality is regarded as the totality of features and characteristics ofthe software platform that bears on its ability to satisfy stated or implied needsfrom the product line (ISO 9000:2000, 2000). Therefore, tracking the devi-ation from a wanted platform structure (i.e. “erosion” (Perry & Wolf, 1992)and “software aging” (Parnas, 1994)) can communicate knowledge concern-ing the degradation of the whole product line. The degradation is charac-terised by a product line to pass from a higher grade to a lower grade of ac-complishment of its purposes. This concerns the possibility to build differentof versions and a family of products based on it.

The objective of the research presented in this paper is to identify andevaluate a measure for tracking degradation in product lines. The measure isbased on design rules and a graph representation of the design rule violations.

The benefits of using measurement to improve and track the develop-ment and maintenance of software has been recognised in many studies, seee.g. Briand et al. (1993). Setting-up a measurement program for software plat-forms based on the measure presented is simple and can be highly automated.The measure has the possibility to detect the trend when platform erosionstarts, and can cause the product line to degrade. Computing the presentedmeasure at consecutive points in time gives the possibility to track the degra-dation over desired platforms versions or product families.

Similar studies have been made to investigate and present quantifiablemetrics for degradation of software structures. Approaches to explicitly find

D 49

architectural metrics have been presented in several studies, for example Kaz-man (1998); Carriere et al. (1999); Jaktman et al. (1999). However, theirmain objective has not been the tracking of degradation of product lines andnor has graph measurements been used.

This paper is organised as follows. In Section 2, the usage of design rulesin a product line is presented. The measure is defined in Section 3. The casestudy and the results derived from it are discussed in Section 4. In Section 5,conclusions and suggestions for further work are presented.

2 Design rules in platform development

“The structure of the system is given by a set of design rules, a structure canbe seen as the architecture of the system” (Gacek et al., 1994). This state-ment is also true for the structure of a software platform. The structure of thesystem is however complemented by adding elements of implementation tothe picture. The implementation elements can take the form of components,interfaces, connection mechanism or whatever elements of high-level designthat are used in the platform (Garland & Shaw, 1993). The implementationelements together with the design rules define the physical view of the archi-tecture (Hofmeister et al., 2000). The way of building software products withthese elements is described by a set of design rules, which the developers arerequired to follow when developing software products. The set of design rulescommunicates an architecture that is believed to be appropriate (Jones, 1993).The usage of design rules can be summarised as follows:

1. A design rule describes the prescribed usage of the components, mech-anisms and interfaces.

2. If a design rule is applicable to a component, mechanism, interfaces ora combination of these, the rule must be followed.

3. It can be the case that a design rule is not applicable for the productbeing developed. Some of the design rules may only be valid undersome specific conditions. If these conditions are not met, the designrule is not applicable.

During the development of a product, several versions of design rules maybe released. This can result in situations where a design that previously wasconsidered as violations against the design rules is accepted and not consid-ered as violations any more. The opposite can also be the case. Non-violations

50 II.

become violations because of new design rules that invalidate the design al-ready in place. Therefore, the tracking approach suggested in this paper takeinto consideration both different releases of the platform and different releasesof design rules.

The reasons for violating the design rules can be grouped into four cat-egories, which relate to lead-time, quality and cost discussions of softwaredevelopment. The four reasons are defined by Perry & Wolf (1992). Thefirst three concern architectural deviation and the forth concerns architecturaldrift.

Lead-time: Following the design rules will result in missing a market deadlineand the economic revenue.

Quality: Following the design rules will result in missing the quality goal (e.g.throughput) of the software products, which are based on the platform.

Cost: The cost of following the design rules will cause the software project tomiss its economical goal.

Knowledge: The developers using the design rules to develop a softwareproduct do not have the knowledge to use them correctly.

3 A measure of degradation

In this paper, a measure for tracking and quantifying product line degrada-tion is presented. The tracking is performed by investigating the number ofviolations of design rules in a software platform. The measure can be used fortracking different releases and versions of a product, as well as the platformitself. However, before describing the measure and its proposed usage, somebasic concepts concerning development, usage and evolution of software plat-form and software products are clarified.

In a product line development project the platform architecture is reusedto develop several versions of a software product and even product families.Design rules are, as discussed earlier, a vital part of the architecture. During thedevelopment, these design rules may be removed, added or changed in orderto preserve or change wanted quality attributes. These changes can be consid-ered as violations to the original design rules that were defined to uphold aspecific architecture in the platform. The violations will cause the product todeviate from the original perception of how an optimal platform and product

A 51

architecture should look like. This deviation can give knowledge of a possi-ble indication of degradation in the product line. This section describes theproperties of a tracking approach based on a graph measure.

3.1 Graph measure properties

In a product line there can exist different versions of products. The productversions 1 to s are denoted as: P1, P2, .., Ps. In the same product line thereexist different releases of approved design rules. The design rule releases 1 to t

are denoted as: DR1, DR2, .., DRt.The graph measure can be seen as a function of Pi and DRj for all i and

j, m(Pi, DRj). This denotes the deviation of the architecture structure forproduct i compared to the wanted structure defined by release j of the designrules. The structure of Pi can be represented by a finite non-directed graphcontaining arcs and nodes. The nodes represent the implementation elementsdescribed in Section 2. It is important that the arcs define the existence of arelationship between nodes, not the number of relationships.

The arcs (relationships) that are not allowed by the design rules are con-sidered as design rule violations. In the same way, arcs only represent the exis-tence of relationships. The arcs defining violations of design rules only depictthe existence of violations, not the number of violations between nodes, northe type of the relationship between nodes.

A measure is meaningful and well formulated if the properties that are de-scribed below are true. The properties are based on the tree impurity measureproperties presented by Fenton & Pfleeger (1998). The four properties are asfollows:

Property 1. m(Pi, DRj) = 0 if and only if Pi does not violate any designrules of DRj

Property 2. m(Pk, DRj) > m(Pi, DRj) if Pk differs from Pi only by theinsertion of an extra arc (representing a violation of a design rule ofDRj )

Property 3. Let ni denote the number of nodes in Pi and violations(Pi, DRj)the number of violations against design rules DRj . Then let Pk be anincrement of Pi by having added new nodes to the product accordingto the design rules DRj . Both products use the same versions of designrules. Then the following is valid:

52 II.

if nk > ni

andviolations(Pk, DRj) = violations(Pi, DRj)

thenm(Pk, DRj) ≤ m(Pi, DRj)

That is, the graph of Pk has more nodes than the graph of Pi, but inboth cases, the arcs that represent violations against the design rulesare the same. The graph impurity should be greater or equal for Pk

compared to Pi. The property formalises the intuitive notion that themeasure should be smaller if the number of violations are the same butthe system is larger.

Property 4. Let violations(Pi, DRj) denote the number of violations of de-sign rules DRj in product Pi. The maximum value of m(Pi, DRj) forany DRj and Pi is reached only when the product Pi violates all possi-ble design rules possible for the product.

3.2 Definition of the graph measure

We can define a measure that satisfies all four properties:Let number_of_violations(Pi, DRj) denote the number of violations againstthe design rules DRj which are present in Pi.Let max_number_of_violations(Pi, DRj) denote the maximal number of vio-lations against the design rules DRj which are possible in Pi. A measure canbe defined as the following:

m(Pi, DRj) =number_of_violations(Pi, DRj)

max_number_of_violations(Pi, DRj)(1)

If the design rules DRj allow the max_number_of_violations(Pi, DRj) tobe zero the measure is not valid. However, there is no sense in measuringviolations against the design rules, if no violations are possible to make.

It is assumed that the number of violations can be measured by calculatingthe number of arcs in a finite non-directed graph. The graph represents thestructure of the product, where the nodes represent architectural implementa-tion elements and the arcs represents a dependency between the architecturalelements. The number of arcs in Pi considered as violations against the designrules DRj , are denoted as arcs_violations(Pi, DRj). The nodes and arcs arealso used to calculate the maximum number of possible violations against the

A 53

design rules DRj for a product Pi. The maximum number of arcs in a fi-nite non-directed graph with ni nodes is subtracted by the number of allowedarcs. The number of allowed arcs given by the design rules in release DRj forproduct Pi is denoted arcs_allowed(Pi, DRj). If ni denotes the number ofnodes in Pi, then the maximum number of arcs in a finite non-directed graphwith ni nodes, is equal to ni(ni−1)

2 . Thus, Equation (1) can be rewritten as:

m(Pi, DRj) =arcs_violations(Pi, DRj)

ni(ni−1)2 − arcs_allowed(Pi, DRj)

(2)

3.3 Properties of the proposed measure

Below it is shown that the four properties are true for the proposed measure.

Property 1. The consequence that Pi does not violate any design rules DRj

is that arcs_violations(Pi, DRj) = 0 and this makes the Equation (1)equal to zero.

Property 2. If Pk differs from Pi only by an extra arc (representing a violationof a design rule of DRj) the value of the denominator in Equation (1)is equal for both m(Pk, DRj) and m(Pi, DRj).The relation m(Pi, DRj) > m(Pk, DRj) can thus be simplified toarcs_violations(Pi, DRj) > arcs_violations(Pk, DRj) which was a pre-requisite for the property to be true.

Property 3. For the property to be true, it must be shown that the maximumnumber of violations always are larger for Pk compared to Pi. Pk andPi are different versions of the same product and the number nodes(architectural implementation elements) are larger in Pk compared toPi. Both versions of the product use the same version of design rules.By adding new nodes more possibility to violating the design rules aregiven. Thus, the maximum number of violations is increased for Pk

compared to Pi. This is the property that should be shown. The equalitystated by property 3 is true when the number of violations are 0 for bothproducts (i.e. property 1 is satisfied).

Property 4. For any given Pi and DRj the value ofni(ni − 1) and arcs_desginrule(Pi, DRj) is fixed. This means that themaximum value of Equation (2) is given by the maximum of numeratorarcs_violations(Pi, DRj).

54 II.

The maximum value of arcs_violations(Pi, DRj) is reached when it de-notes all possible violations in product Pi using DRj .The numerator arcs_violations(Pi, DRj) would then be equal to thedenominator max_number_of_violations(Pi, DRj). The measure willthen take the value 1.

4 Usage implications of the measure

Using the design rule violations as an integral part in a software measurementprogram puts strong requirements on the design rules. They should be han-dled and stored in a formal way. The design rules should preferably be storedin a version handling database, be categorised and have high visibility in theorganisation. By treating the design rules as any other artifact (e.g. require-ments, test cases, source code) of the platform development will make thispossible. This would be facilitated if the design rules had their own naturalplace in the version handling structure of the platform.

The design rules can be documented in different categories of quality as-pects (e.g. maintainability, performance, usability, etc.). This would enablethe measurement of the degradation for these particular aspects of quality(e.g. degradation of maintainability, degradation of performance, degradationof usability, etc.).

Violations against the design rules can be found by examining the sourcecode of the product and the platform. The source code can be regarded asthe building block of the software, and it is up to the measurement methodto extract the information of interest and to process it to show the aspect ofinterest.

The measure (Equation (1)) can be used in the following three scenar-ios. Scenario 1 is concerned by a long-term aspect of degradation of the plat-form degradation while Scenario 2 and Scenario 3 are concerned with trackingshort-term aspect of the degradation.

Scenario 1. This scenario describes the rationale of measuring platform degra-dation for product version i with respect to the first release of designrules, i.e. m(Pi, DRj). The measure can be used to track how well theinitial design meets the requirements from a product built on the plat-form. A great increase in the degradation measure can indicate that theintended design or process was improper and many modifications hadto be performed in the platform during the development of the product.This knowledge can give the developers of the software platform rele-

U 55

vant incentives for improvements when designing a new platform forthe next product families. Project managers can use the knowledge toassess the risks and resources of a project to develop a new product fam-ily. A low degradation could mean that the platform could be reused,while a high value could mean that the complete platform and thedevelopment process supporting it must be redesigned. The redesignshould take into consideration the causes of degradation and find reme-dies against them.

Scenario 2. This scenario describes the rationale of measuring platform degra-dation for product version i with respect to the last release of designrules, i.e. m(Pi, DRj). This measure can be used to keep of track ofthe platform’s degradation in the current project development. The op-timal outcome would be when that the degradation is zero. When themeasure is plotted against different versions of the product, a horisontalline should visualise this. The measure can be of interest for the projectmanagers and developers of the platform in order to track this type ofdegradation. The knowledge visualised by the tracking can be used bydevelopers to find corrective actions before the degradation gets out ofhand. Project managers can use the knowledge from the tracking, in anearly stage, to detect potential quality and lead-time risks in the productrelease.

Scenario 3. This scenario describes the rationale of measuring platform degra-dation for product version i with respect to the previous release of de-sign rules, i.e. m(Pi, DRj − 1). This measure can be used to trackdegradation in conjunction with the one described in Scenario 2. Thetrend in degradation describes if the action taken to stop the degrada-tion tracked by Scenario 2 has been successful. If the comparison showsthat the tracked degradation is still the same, the action done did nothave any effect. In the worst case, it could be that the action taken haveincreased the degradation. The scenario described should primarily beof interest for the developers of the platform and the project managersof the product. This can be motivated with the same argument as inScenario 2.

56 II.

5 The case study

The case study is performed on a product line at Ericsson Mobile Communi-cations AB, Sweden. The product line architecture reuses a software platformand the accompanying design rules when developing new versions and fami-lies of products. The goals of the case study are to:

1. Find out if it is possible, in a meaningful way, to represent the architec-ture of a platform with arcs and nodes.

2. Find out if it is possible, in a meaningful way, to represent the violationsagainst design rules with arcs.

3. Find out if Equation (1) provides a result that is in accordance with theexperiences of the organisation.

The case study is conducted by collecting data from the product line and cal-culating the degradation measure proposed in Section 3. Finally, it is discussedwhether the case study indicates that the measure can be used to track productline degradation.

5.1 Ericsson Mobile Communications AB, Sweden

The measurements are performed on a product developed by Ericsson MobileCommunications AB. Ericsson Mobile Communications AB used a productline approach for developing consumer products within the telecommunica-tion area. The purpose of the platform in the product line is to provide com-mon software that can be adapted to a variety of consumer products with lowcosts and small effort. The platform is an embedded system with support fora variety of functionality, for example wireless protocols, data communicationprotocols and multimedia services.

The software architecture of the studied software product is basically aclient-server solution, where software components provide public interfacesthat are to be used by other software components. Standardised architecturalmechanisms for communication between software components are definedand used throughout the system. The aim is to have a set of stable softwarecomponents and mechanisms that can be reused between products withoutmajor redesign or rework.

T 57

5.2 Detecting design rule violations

It is important for the measure that the structure (i.e. architecture) symbolisessome sort of relationship between the components. These relationships can bein the form of source structure or behavior structure (Stafford & Wolf, 1998).The source structure involves static dependencies while the behavior structureinvolves dynamic interaction dependencies.

The case study investigates the behavioral structure, as violations againstdesign rules for intra-module invocations. A module is a high level compo-nent of the system. Design rules defining the usage of function calls andintra-module calls are therefore possible to check with this method. Designrules stating which components can be called by another component can bemodeled in a graph. The arcs denote the invocations between componentsand the nodes denote the components themselves. Therefore, a graph show-ing the intra-module components invocation is feasible to use as input for themeasure proposed.

5.3 Performing the measurement on a selected product

The measurement is based on data collected from five different developmentreleases of the product stored in a configuration management (CM) database.It has been possible to retrieve all the artifacts used and generated by thebuilding process of the product. The source code, make-files and other arti-facts have been used to check for design rule violations in the product. Fivereleases were selected to cover the life cycle of the product stored in the data-base in a uniform way. This is achieved by having chosen the same number ofstored releases between each of the measurement points.

All the five releases of the source code are checked in order to find viola-tions against the design rules of Scenario 1, mentioned in Section 4.

The result from the measurements of the five releases is presented in Fig-ure 1.

5.4 Result from case study

The curve in Figure 1 shows the measure per measurement points. Consid-ering the curve, it can be noted that the gradient of the curve is positive notnegative. It can also be noted that the gradient is not constant. In other words,Figure 1 can be interpreted as the product line is degrading with non-constantrate between product versions. This interpretation was confirmed by the or-ganisation in charge of the platform where the case study has been performed.

58 II.

Graph measure

0,054

0,056

0,058

0,06

0,062

0,064

0,066

0,068

1 2 3 4 5

measurment points

m(P

i, D

R1)

Figure 1: The degradation measure is plotted against five measurement points(i.e. five development versions).

They confirmed that the trend for the product is that the deviation from thewanted structure generally increases for every version of a product based onthe platform. This shows that using the gradient of the proposed graph mea-sure gives the possibility to track degradation in the product line.

Despite this encouraging response of the case study, a set of issues thatstill must be overcome to empirically validate the measure. The issues are thefollowing:

• Other variables from the platform also have an increasing trend andcould therefore be used to track the degradation. For example, the num-ber of design rule violation also shows an increasing trend. It can beargued that only studying the violations would be a good enough track-ing measure. This is true, but one of the prerequisites of the measurewas that it should be normalised with the size of the platform. Thisis not true for a measure containing only the number of design ruleviolations. There is, however, a need for a comparative study of differ-ent possible measures (including the graph measure). The study shouldquantitatively compare the trends of degradation to the trends in themeasures.

C 59

• The rise of the graph measure calculated in the case study is very small,in fact so small that the rising slope can be considered as noise. Toreally investigate this issue, larger case studies should be performed anda larger number of design rules should be taken into consideration.

• The case study has only considered design rules that are related to theinvocations of implementation elements. A complete set of design rulesshould be taken into consideration in order to have an opinion of thecomplete platform.

• The case study consists of measurement from one company and fromonly one product. Also, data is only collected on five measurementpoints. Case studies should be performed in other companies, moreproducts and with a larger set of data.

• Only scenario 1 (see Section 4) is comprised in the case study. However,it is believed that the practical implication of this scenario can be relatedto the other scenarios.

6 Conclusion

A significant challenge for software product lines is to develop a software plat-form that maintains the wanted quality between products and product fami-lies. This article takes a step forward towards that goal by providing a methodto track an eventual degradation of product lines. The contributions of themethod are the following:

• A measure is identified that measures the deviation of the product fromthe original structure depicted by the design rules of the software plat-form. The measure is based on graph representation and design ruleviolations.

• The measure is evaluated in a limited case study. A goal of the methodis that it should be practical and feasible to use in a company using theconcept of software platforms. This is shown in the case study. Moreextensive, empirical studies must however be performed in order to val-idate the measures further.

• Different scenarios of how to use the measure are presented. The dif-ferent scenarios describe the usage of the measures to track both short-term and long-term aspects of degradation. The short-term aspect is

60 REFERENCES

concerned with the degradation of the product line in the product be-ing developed. The longterm aspect is concerned with degradation ofsoftware platforms between product lines. In all scenarios, stakeholder,usage and practical consideration are described.

In developing the basis for the measurement a general discussion of how to usedesign rules in a software platform environment are presented. Four propertiesof a graph measure are discussed and it is shown that they are applicable to theidentified measure. The software platform process is described in light of theusage of platform design rules. Possible causes for degradation are describedand it is discussed how they affect the usage and definitions of design rules.The design rules are an important property of many architectural descriptions.In addition, they open a way to analyse the system. In this case, the analysisis done by counting the number of violations against the design rules andrelating them to the possible number of violations.

7 Acknowledgements

This work was partly funded by The Swedish Agency for Innovation Systems(VINNOVA), under a grant for the Center for Applied Software Researchat Fund University (LUCAS). We thank employees at Ericsson Mobile Plat-forms AB, Sweden, particularly Jan Lind and Peter Lerup, as well as FredrikNilsson at Enea Realtime System, Sweden. A thanks also to Daniel Karlströmat the Department for Communications Systems, Lund University, Sweden,for reviewing the article.

References

Bass, L., Clements, P. C., & Kazman, R. (2003). Software architecture inpractice. Addison-Wesley.


Briand, L., Morasca, S., & Basili, V. (1993). Measuring and assessing main-tainability at the end of high level design. In Proceedings of 9th InternationalConference on Software Maintenance (pp. 88–97).

Carriere, S., Kazman, R., & Woods, S. (1999). Assessing and maintaining

REFERENCES 61

architectural quality. In Proceedings of 3rd European Conference on SoftwareMaintenance and Reengineering (pp. 22–30).


Fenton, N., & Pfleeger, S. (1998). Software metrics: A rigorous and practicalapproach, revised (2 ed.). Course Technology.

Gacek, C., B Clark, A. A.-A. ans, & Boehm, B. (1994). Focused workshopon software architectures: Issue paper (Tech. Rep.). USC Center for SoftwareEngineering.

Garland, D., & Shaw, M. (1993). An introduction to software architecture(Vol. 1). World Scientific Publishing Company.

Hofmeister, C., Nord, R., & Soni, D. (Eds.). (2000). Applied software archi-tecture. Addison-Wesley.

ISO 9000:2000. (2000). Quality management systems-fundamentals and vo-cabulary. International Organization for Standardization (ISO).

Jaktman, C., Leaney, J., & Liu, M. (1999). Structural analysis of the soft-ware architecture-a maintenance assessment case study. In Proceedings of 1stWorking IFIP Conference on Software Architecture (pp. 455–470).


Jones, A. (1993). The maturing of software architecture. In Software engi-neering symposium.

Kazman, R. (1998). Assessing architectural complexity. In Proceedings of2nd Euromicro Conference on Software Maintenance and Reengineering (pp.104–112).



Parnas, D. (1994). Software aging. In Proceedings of 16th International Con-ference on Software Engineering (pp. 279–287).

62 REFERENCES

Perry, D., & Wolf, A. (1992). Foundations for the study of software architec-ture. In Proceedings of ACM SIGSOFT (Vol. 17, pp. 40–52).

Sanderson, S., & Uzumeri, M. (1995). Managing product families: The caseof the sony walkman. Research Policy, 24, 761–782.

Stafford, J., & Wolf, A. (1998). Architectural-level dependence analysis insupport of software maintenance. In Proceedings of 3rd International Soft-ware Architecture Workshop (pp. 129–132).

III

PAPER III

A Qualitative Approach to Tailor SoftwarePerformance Activities

Enrico Johansson, Josef Nedstam, Fredrik Wartenberg and Martin Höst

To appear in Proceedings of 6th International Conference on Product Focused Software ProcessImprovement, Springer-Verlag, 2005

Abstract

For real time embedded systems software performance is one of the mostimportant quality attributes. Controlling and predicting the software perfor-mance in software is associated with a number of challenges. One of the chal-lenges is to tailor the established and rather general performance activities tothe needs and available opportunities of a specific organisation. This studypresents a qualitative methodology for tailoring process activities to a specificorganisation. The proposed methodology is used in case study performed ina large company that develops embedded platforms. A number of suggestionsfor modification and addition of process activities has been brought forwardas a result of the study. The result can further be summarised as SPE in em-bedded platform development holds more opportunities for reuse, but alsorequires more focus on external stakeholders, continual training and coordi-nation between projects.

64 III.

1 Introduction

Software performance (i.e. response times, latency, throughput and workload)is in focus during the development and evolution of a variety of product cat-egories. Examples of products are websites, network nodes, handheld devices,transaction systems, etc (Smith & Williams, 2002)

A number of software process activities have been proposed to help anorganisation with software performance engineering (SPE). Typical perfor-mance issues can include identifying performance bottlenecks, giving guide-lines for functional partitioning, and help selecting the best alternatives of anumber of design proposals. A normal challenge in introducing such processactivities is to tailor the established and rather general performance activitiesto the needs and opportunities of the specific organisation.

A product platform is per definition the basis of a variety of product ver-sions. Each product version should be constructed with a low effort, comparedto developing the complete platform (Clements & Northrop, 2001; Meyer &Lehnerd, 1997). An embedded (Labrosse, 2002) platform is a specific type ofproduct platform where a computer is built into the product platform and isnot seen by the user as being a computer. Most real-time systems (Burns &Wellings, 2001) are embedded products.

For embedded platforms, specific possibilities and needs are present andshould be considered in a process to support software performance work. Em-bedded platforms often have high-priority requirements on low cost. Whenthis is the case, it is not possible to solve performance problems by, for ex-ample, increasing the hardware performance. A very long lead-time to changethe hardware platform would be required, and it would also be expensive.During the development of an embedded platform, consideration of both thesoftware design and hardware design must be taken. The design of the hard-ware architecture must be dimensioned based on the needs of the softwareapplications that in many cases are yet to be implemented.

This study presents a qualitative methodology that can be used to tailorprocess activities to a specific organisation. In the methodology, a conceptualframework containing software performance activities and software processactivities is mapped to the needs and possibilities of the development withina case company. The case company develops real-time embedded platformswhere software performance is one of the most important quality attributes ofthe product. The following research questions are investigated:

1. What restrictions or opportunities for a software performance engineer-

M 65

ing process are imposed by an embedded platform?

2. Does the presented methodology provide valuable input when tailoringan SPE process?

The paper is structured as follows. Section 2 introduces the qualitativemethodology used, in Section 3 the methodology is applied to a specific com-pany and in Section 4 the conclusions are presented.

2 Method

The research is carried out using a qualitative methodology (Seaman, 1999;Polo et al., 2002) designed to reveal which behaviors and perceptions thatdrive the studied organisation towards a specific goal. This implies that theresults of qualitative research are of a descriptive nature. Qualitative researchis particularly useful for determining what is important to individuals andwhy it is important. In this context, qualitative research provides a processfrom which key research issues are identified and questions formulated bydiscovering what really matters to the organisations and why they matter.

The general method for qualitative studies (Lantz, 1993; Miles & Hu-berman, 1994; Robson, 2002) shown in Figure 1 is modified for tailoringsoftware process activities to a particular area and company. One modifica-

Data collection Data reduction Pattern finding

Creating dimensions

Critical review

Figure 1: A general model for qualitative research presented by Lantz (1993).

tion is the addition of the actual tailoring activity where new, changed andthe deletion of activities can be proposed based on the pattern found. Also, aconceptual framework is introduced to set up the dimensions used during thepattern finding. The conceptual framework includes general process activities,development phases and stakeholders. These three different parts are initiallypopulated by established and general knowledge of the particular process that

66 III.

is to be tailored. Performing an archive analysis in the studied company isdone to find additional stakeholders and development phases which shouldbe added to the initial conceptual framework.

2.1 Data collection

Interviews were performed with 16 selected members of the personnel fromthe case organisation. The duration of each interview was approximately onehour. During the interviews, open questions were asked, related to softwareperformance engineering. The subject’s own reflections on needed activitiesfor software performance were also collected. The sampling of subjects wasbased on availability, knowledge and experience in a way that resulted in asbroad a spectrum of knowledge as possible related to the work in the organi-sation. The context of the interview was the upgrade of an existing platform,by addition of new functionality. The purpose of the interviews was two-fold:one purpose was to find general needs in the process related to software per-formance activities; another was to investigate the possibility of different activ-ities to promote the software performance work. The interviews were carriedout as open ended in order to catch the needs and possibilities of a softwareperformance methodology, without excluding issues that the researcher didnot foresee. Furthermore, since all participants are from the same companyand project it is believed that the discourse in the interviews is equal for allinterviews. The following open-ended questions have been used as a startingpoint for the interviews.

1. In what phases do development projects need performance informationabout the system, or part of the system?

2. For what part of the system is it most important to get performanceestimates and how accurate do the estimates need to be?

2.2 Creating dimension

There exist as described above an established process for software performanceengineering (Smith & Williams, 2002). The workflow and individual activi-ties are not described in this paper due to limitations in space but an overviewof the workflow is given in Figure 2. These SPE activities are included in theconceptual framework in order to capture knowledge about general SPE ac-tivities. The SPE process presented by (Smith & Williams, 2002) is willinglymade general, with no mapping of each activity in process phases and stake-holders. For the conceptual framework to be useful in the research context,

M 67

it must be complemented with a list of potential stakeholders and applicabledevelopment phases. From Wolf (1994); Suzuki & Sangiovanni-Vincentelli(1996); Clements & Northrop (2001); Russell & Jacome (2003); Johansson& Wartenberg (2004); Höst & Johansson (2005) a number of success factorsfor a process tailored for embedded platform development are given. Stake-holders in terms of development units must at least contain hardware devel-opment, software development, and product management. The phases per-formed during the development and evolution of a software platform is notbelieved to be specialised for that purpose. Therefore, rather general processphases related to the evolution of a platform are chosen to be part of the con-ceptual framework.

Assessperformance

risk

Identifycritical use

cases

Select keyperformance

scenarios

Establishperformance

objectives

Constructperformance

models

Add softwareresource

requirements

Addhardwareresource

requirements

Evaluateperformance

models

Reviseperformance

objectives

Modifyproductconcept

Verify andvalidate

modedels

Modify/Createscenarios

Figure 2: The SPE workflow presented by Smith & Williams (2002).

A conceptual framework that includes process activities, stakeholders anddevelopment phases involved is proposed as a starting point for the research.The initial framework includes the following parts:

1. Process Activities: Those included in the SPE workflow in Figure 2.

2. Process Stakeholders: Hardware Development, Software Development,Product Management.

3. Development Phases: System Feasibility Study, System Design, Mod-ule Design, System Integration.

68 III.

The applicability and refinement of the conceptual framework is carried outby going through documentation produced for the development and upgradeof the platform, with the objective to search for data related to software per-formance. Such data is for example used to parameterise the performancemodels, compare design alternatives, and when setting the requirements ofthe platform. The intention is that the documentation should show the avail-able information flow used in the process of upgrading a platform.

2.3 Data reduction

The interview transcripts are reduced to a number keywords and comments asa result of the the data reduction activity. A further coding is done be groupingthe keywords into process categories determined by SPICE (SPICE, 1998).The qualitative data is grouped into five process categories defined in part 2of the standard; Customer-Supplier, Engineering, Project, Support and Or-ganisation.

2.4 Pattern finding

The objective is to construct a visualisation of the material from the data col-lections relations the conceptual framework. The visualisation is carried outby relating the material collected in to a triplet consisting of an SPE activity,a project phase and a stakeholder. This activity is presented in two matrixes.

2.5 Critical reviews

During all parts of the qualitative methodology (i.e. data collection, data re-duction, creating dimensions and pattern finding) critical reviews are made bythe researchers and employees at EMP. In the presented research the data col-lection was performed and reviewed by three researchers, and contained both adocument analysis and open-ended interviews (Lantz, 1993; Miles & Huber-man, 1994; Robson, 2002). Data reduction and critical review was performedon the transcript from the document analysis and from the interviews. Dur-ing the document analysis the researcher carried out the data reduction andthe reviewing. The interviewees performed the critical review for the open-ended interviews; they read the transcripts in search for erroneous citationsand omissions. Both the pattern finding and tailoring was done independentlyby two researchers and reviewed by a third researcher at the case company. Ithas been an iterative processes, where both the activities and the reviews whereperformed until all participants could agree upon the result.

C 69

3 Case study

The case study was performed at Ericsson Mobile Platforms AB (EMP). EMPoffers complete embedded platforms to manufacturers of mobile phones andother wireless devices. The technology covers 3G (Korhonen, 2003) and oldermobile phone systems. Results from the different parts of the presented qual-itative methodology is presented in the following subsections. In the last sub-section proposals for new activities are presented.

3.1 Creating dimensions

The activity resulted in an overview of currently available SPE data and amapping of stakeholders and data to the different phases of the developmentprocess. Two stakeholders that were not listed in the initial conceptual frame-work were found to play a role in the SPE work. These two stakeholders, theproject management and the customer or buyer of the platform, are in thefurther analysis added to the stakeholder list in the conceptual framework.

3.2 Data collection and reduction

The following keywords and comments are the result after the data collectionand reduction of the interview material.

Customer-Supplier

1. Usage specification: Platform customers demand a performance speci-fication of the system and its services early in the process in order todevelop interfaces for the applications on top of the platform.

2. Prioritizing use-cases: Late requirement changes can be solved by a prod-uct management organisation by prioritizing the use-cases that are thebasis for the requirements specifications. Such use-case analysis givesinformation of which parallel use cases are possible, providing a morecomplete picture of the system requirements. In addition, room for per-formance flexibility must be included in the use-case analysis. The flex-ibility is set in terms of the maximum execution times and responsetimes that can be tolerated by the use- cases. In addition, the differentvariation of the use case should be included; as for example, what dif-ferent multimedia codecs are needed. In this way, developers can focuson prioritized requirements, and it is easier to remove functionality thatoverruns the performance budget.

70 III.

3. Rough performance indications: Product management needs to get afeeling for what is possible or not, in terms of parallel use cases andnew functionality. Indications are required to understand the scope ofthe performance requirements of a new product portfolio. Early perfor-mance estimations can be used as such indicators. There is in this casenot a strong demand of accuracy on the performance estimations; evenrough performance estimations are acceptable for this purpose. In addi-tion, in critical changes regarding available processing power or mem-ory there must be an immediate performance feedback to performancestakeholders. The feedback might initially be rough indications, whichcan be used for decisions on whether or not perform a more thoroughimpact analysis.

Engineering

4. Feedback: There is a need to give feedback on the estimation from theanalysis phase. This can be done through performance measurementsin the subsequent phases. Making updates to documents that containsuch estimates risks taking significant lead-time, which to some degreehinders continuous updates of estimates and metrics. The solution is tosee the estimates as changing over time and not "a fixed truth", and val-idation of the estimates is necessary to promote organisational learning.

5. Static worst-case estimations: Using static worst-case memory estima-tion can be satisfactory when analysing sequential tasks. The demandsof sequential tasks can often be considered linearly, which is howevernot true when several processes are running at the same time, wherethe worst-case scenario is rare and order of magnitudes more resource-consuming than ordinary operation. It might therefore be hard to inter-pret the result in parallel, non-deterministic processing environment.

6. Define performance metric: There is generally a need for common de-finition and understanding of software performance metrics, as in thecase of instructions-per-second versus cycles-per-second metrics. Suchmetrics should be defined with the customers’ and users’ perception ofperformance in mind.

7. Hardware requirements: The hardware developing unit needs to knowthe performance requirements in terms of memory sizes and processingpower early enough to be able to deliver hardware in time for internal

C 71

integration of the platform. These prerequisites have intricate implica-tions on estimates, metrics collection, and any performance predictionmodels that are to be introduced into an organisation.

8. Hardware/software partitioning: The hardware project needs require-ments on CPU and memory load and possible hardware accelerators.The requirements on hardware accelerators stems from the hardwareand software trade-offs made. All these requirements are needed beforethe software project reaches the implementation part.

9. Measurements: When a project reaches the design and implementationphase, it becomes possible to make measurements on the actual per-formance and validate the performance model. In this phase the per-formance can be tracked on module level and be compared with thepredictions of the model. The system performance due to interactionbetween functionality originating in different sub-projects will be vali-dated in the integration phase.

10. Algorithms: The performance analysis of new functionality can be doneon algorithm level with regards to CPU processing performance re-quirements, and static worst-case memory demands can be estimated.There is however, a big step from algorithm-level estimates of newfunctionality, to the system-level metrics collected from previous plat-forms. The interviewees therefore sought for models that could useperformance estimates of the hardware and hardware related software,and produce estimates of the dynamic system behavior. The modulesthat are performance bottlenecks are often commonly known, and theyshould set the level of granularity in the model (i.e. the model shoulduse the same granularity).

11. Interaction between software processes: It is difficult to intuitively makesoftware performance estimates of software that interacts and is depen-dent of the scheduling and synchronisation of other software real-timeprocesses. One of the reasons is because of its many possible states andnon-deterministic use-cases.

12. Validated: In order to make a model both useful and trustworthy, theresult must be validated when the project reaches implementation.

13. Forecast and outcome: If the software needs in terms of memory sizeand CPU are greater than the estimates, the specified functionality will

72 III.

not fit into the platform. On the other hand, if the outcome is less thanthe estimates, the platform will be more expensive than necessary, interms of hardware costs. The performance estimates therefore need tohave a high level of accuracy since they are related to both the specifica-tions and to the outcome.

14. Accuracy needed: There is not one value defining the needed accuracyof the estimates for each different phase. What can be stated is that theneed for accuracy is increased for each phase the nearer the project isthe system integration. However, when deciding the requirements ofhardware, the software development units should reach an optimisticaccuracy of 10-30% in their performance estimates in the pre-studyphase.

15. Optimisation: Most software in embedded platforms is optimised forthe specific hardware it is running on, which means that even hardwareenhancements might degrade performance if new optimisations are notdone. To move functionality from one processor type to another re-quires further optimisations. It is also dangerous to take performancemetrics from a previous product for granted, as these implementationscan often be optimised further.

16. Software architecture: Technological solutions to performance issues donot only involve hardware improvements. The software architecturealso has impact on performance. An architectural change to simplifyperformance estimation can for example be to limit the number of mod-ules and simultaneous processes considered.

17. Ability to understand: In order to make a model both useful and trust-worthy, the assumptions behind the model must be visible.

Project

18. Knowledge and education: Software developers need to know about thehardware they are running on, understand performance aspects, andunderstand the different types of memory that are available. They alsoneed to know how the compilers and operating systems treat the codethey write. There is also a difference between a simulated environmentand the actual hardware; things that work in the simulator might notwork on hardware. Apart from training, an organisation should give

C 73

credit to expert developers in order to promote performance-related de-velopment competency.

19. Iterative platform development: An iterative development process canbe used to get hold of early systems performance estimates. After eachiteration the implemented software system can be measured. On theother hand, there are also problems related to such a process, at leastwhen it comes to iterative specifications. Sometimes not all of the spec-ified functionality fits in the performance constraints, so the perfor-mance requirements on the software often have to be renegotiated any-how towards the later stages of projects.

20. Hardware/Software co-design: In a platform development scenario, hard-ware developers are typically ahead of software developers. A character-istic overlap is created between projects, where hardware developers starta new project before software developers have finalised theirs, making itnot so obvious how to transfer resources and experiences from projectto project. Software developers therefore can have a shortage of timeto carry out performance estimate analysis in the beginning of projects,while the hardware developers do not have the complete set of prereq-uisites at hand, when starting their analysis and specification activities.

Support

None of the keywords could be grouped in this category.

Organisation

21. Project focused organisations: In order to reuse metrics between twodevelopment projects coordination is needed from the management incharge of the overall product and development quality in the company.A characteristic scenario where this prerequisite is not fulfilled is in theproject-focused organisation (where the project controls all resources).In such cases, the return on investment of the methodology used mustbe visible in the actual project budget. Therefore, it is difficult to makeimprovements in methods and processes that overlap two projects. Anexample is the collection of data in previously developed projects to beused in future projects.

74 III.

Table 1: Keywords mapped contra SPE process activities.

Keywords SFS SDS MDI SI

Assess performance risk 2, 3Identify critical use cases 2, 3Select key performance scenarios 2, 3Establish performance objectives 7Construct performance model(s) 1, 5, 6, 1, 5, 6, 1, 5, 6,

10, 11 10, 11 10, 11Add software resource requirements 6, 16Add computer resource requirements 1, 8, 7Evaluate performance model(s) 17 17 17Modify and Create scenarios 2 2 2Modify product concept 7, 8, 15* 7, 8, 15* 7 , 8, 13,

15*Revise performance objectivesVerify and validate models 4, 9, 14 4, 9, 14 4, 9, 14 4, 9, 12,

14Other activity 18, 19, 18, 19, 13, 18, 19, 18, 19,

20, 21 20, 21 20, 21 20, 21

3.3 Pattern finding

Each of the keywords (1-21) was during the pattering finding mapped tothe conceptual framework. Table 1 visualises the keywords in relation to SPEactivities and project phases. Each row in the matrix represents SPE activitiesand for keywords that do not fit into an activity there is an activity depictedas "Other activity". In cases where the keywords only partially adheres to theactivity the keyword is denoted with a star (*). The development phases havebeen coded as following: SFS (System Feasibility Study), SD (System Design),MDI (Module Design and Implementation, SI (System Integration).

Table 2 visualises the SPE activities relation to the stakeholders. In orderto further clarify the stakeholder’s role an additional coding is used, besidesthe one to keywords. Each stakeholder is coded as an initiator of an activity,executor of an activity or a receiver of the result from an activity. For eachpair of SPE activity and stakeholder more than one of the codes I (Initia-tor), E (Executer) or R (Receiver) can be appropriate. The following codeshas been used to denote the development phases: SwD (Software Develop-ment), HwD (Hardware Development), PrM (Product Management), ProjM(Project Management and UBPI (Users and Buyer of the Platform).

C 75

Table 2: SPE process activities contra stakeholders.

Keywords SwD HwD PrM ProjM UBPl

Assess performance risk E E R I I

Identify critical use cases E, R E, R I I

Select key performance scenarios E, R E, R I

Establish performance objectives E E I

Construct performance model(s) I, E, R

Add software resource requirements E, R R I

Add hardware resource requirements R E, R I

Evaluate performance model(s) I, E, R

Modify and Create scenarios R R I, E I

Modify product concept E, R I I

Revise performance objectives E, R E, R I

Verify and validate models I, E, R

3.4 Proposal for new activities

From table 1 a number of observations can be made. For example, an obser-vation is that there is a number of new activities that are mapped to keywordsthat cannot be related to any of the SPE activities set up by the conceptualframework, and are placed in the “Other activity" row. Another observationthat can be made is which activities and project phases that are mapped toone or more keywords are candidates, in which phase most of the activitiesare performed.

1. The row “Other activity” contains keywords 18, 19, 20, 21 for all phasesand 13 for the “Module Design and Implementation” phase. Since thekeywords do not fit into the conceptual framework and are present inall phases, it implies that they should generate new SPE activities.

2. Most of the activities are involved in the early phases.

3. Keywords in the row “Construct performance model(s)” and “Evaluateperformance model(s)” is in focus in three phases.

4. The row “Verify and validate models” contains a number of keywords(4, 9, 14) that are present during all phases.

76 III.

Similarly, a number of observations can be made from Table 2. In this case, anobservation that can be made is which role the stakeholders play in differentactivities. The activities proposed should be updated to conform to responsi-bility and mandate given to each stakeholder compared to what is needed tobring the activity into accomplishment.

5. In the column HwD (Hardware Development) there are nearly as muchexecution of activities (coded as Executor) as in the SwD (Software De-velopment) column. This implies that the department that develops thehardware part of the platform is significantly involved in the SPE work.

6. In the column UBPI (User and Buyer) there are many initiations ofactivities (coded as Initiator). This implies that buyers and users of theplatform initiate performance activities.

The general observation is that the collected data is in line with the generaland established SPE literature. There are nevertheless a few observations thatcan be made for the specialised case of an SPE approach of the developmentand evolution of an embedded software platform. When using a platform de-velopment there is the possibility to learn through the maintenance and evo-lution of the different versions (observation 3). This is not too different fromthe objectives of the experience factory (Basili et al., 1992). In the case pre-sented, the motivation and the possibility are given by the nature of platformdevelopment, where new versions are constructed from the same platform.Therefore, the same performance model can be reused and further validatedand evaluated through every version. However, when the platform itself isupdated by changing hardware the reuse is less straightforward than whenupdates are made to build variations of the platforms (i.e. versions of the plat-form). Another observation is that the validation and evaluation of the modelis done throughout the whole evolution and maintenance cycle (observation4). There are two reasons motivating this observation, one is that an ongoingvalidation and verification will give the project staff enough confidence in theresult of the models. The second reason is of course that the model itself canbe improved. A further observation is that the evaluation and validation of themodel is very much coupled to each other, because the same model is reusedin the analysis phase of the next version of the product or platform. Obser-vation 6 shows that the performance activities largely have external drivingforces, from buyers (consumers) or project managers. The reason could bethat when developing many versions there is a continuous discussion of theperformance requirements of the platform. In addition, buyers of the plat-

C 77

forms are concerned with the performance of the platform since they need tobuild applications on top of the platforms. They therefore need to set theirown performance objectives, scenarios and requirements. They perform thesame activities as mentioned in the conceptual framework, regarding the em-bedded platform as the hardware resource. In addition, project managers andbuyers do not demand a high level of accuracy concerning the performance es-timations. Even accuracy levels that are usable only as indications are valuablefor the platform development. The value is present as long as the estimatescan be used for strategic decisions about the project or the product.

4 Conclusions

The organisation under study carries out several large and related projects,while the SPE workflow in Figure 2 is focused on single projects. Mappingthe two together therefore implies that the first three activities of Figure 2 arecarried out before the projects are initiated. They are carried out as a part oflong-term product management, and in feasibility studies ahead of projects.The fourth activity, establish performance objectives, is then used to mediatelong-term objectives with project objectives when a project is initiated. It isalso used to concretise performance objectives defined in the previous phases,and therefore involves iterations where customers, product management andproject management define specific performance criteria for a certain project.

The keywords that are not possible to map to the conceptual framework(observation 1) are candidates to elicit new activities needed for platform de-velopment. A number of observations can be made concerning these key-words (13, 18, 19, 20, 21). When dealing with embedded platforms andhardware/software co-design with tight real-time constraints it is importantto continuously train the software personnel about the performance implica-tion of new hardware and hardware related software. Competence in this fieldwill also facilitate the performance modeling (keyword 18 and observation 2)when a change of hardware in the platform is performed.

Another observation is that in hardware/software co-design there is a needfor performance estimates in practice before the general feasibility study phaseis started within the software project (keywords 19, 20, 21 and observation2). Observation 5 further emphasises this. The reason why all these activitiesfall out of the conceptual framework is that the SPE workflow used in theframework does not take into consideration aspects of the organisation or theproduct development process used. An SPE workflow cannot be introduced

78 III.

and used with only the activities proposed. There are other activities found inthis study that must be included in an overall SPE effort. Promoting knowl-edge and education about SPE aspects of the system developed is one of theactivities that should be added. However, it is not enough to include a trainingcourse at the start of the project, the activity must be ongoing and bridge overdifferent product lifecycles and projects. A solution could be, for this kind ofproduct, to have a specific organisation to carry out performance modelingand estimations that is not tied to the different projects. The final observationconcerns the optimisation (keyword 15), which is similar to modifying theproduct concept, however in this case the concept is not changed but opti-mised for the original concept. The initial conceptual framework seems to in-clude many of the needs required from an organisation that develops softwarefor an embedded platform. However the analysis of the collected data showsthat new activities must be added, together with new stakeholders comparedto the ones present in the initial conceptual framework.

The goal of qualitative research is looking for principles and understand-ing behind actions performed under specific circumstances. The same goalsappear when wanting to improve by tailoring it to the need and possibilitiesof a specific company. Therefore it was rather appealing to marry a qualita-tive methodology with a software process improvement initiative. Softwareperformance is still a troublesome quality attribute to control and predict inthe software industry despite of many suggested approaches. This is especiallytrue for software products with limited hardware resources. An example ofa software product with limited resource is an embedded product platform.Therefore the case study is performed on such a product.

The result of this paper presents an updated qualitative methodology forsoftware process tailoring. It also contributes to the understanding of softwareperformance engineering during the maintenance and evolution of an embed-ded software platform. The company considered the results as both valuableand relevant, showing that the methodology presented gives highly valuableinput for tailoring a SPE processes. A number of observation and suggestionsrelated to an established and general SPE process has been brought forward.They can be summarised as that SPE in embedded platform developmentholds more opportunities for reuse, but also requires more focus on externalstakeholders and a continual training and SPE effort rather than confined toindividual projects, as suggested by current SPE models. These are believedto be some of the key success factors of and SPE process for the developmentand evolution of an embedded software platform. It is also believed that it is

REFERENCES 79

possible to compile a tailored workflow based in the explanations and infor-mation about the interaction between development phases, stakeholders andprocess activities.

To strengthen the generalisability of this the results, further studies onother companies have to be made. It is e.g. likely that different companies willhave different roles responsible for performance issues.

Acknowledgment

The authors would like to thank all the participants from Ericsson MobilePlatform AB. This work was partly funded by the Swedish Agency for Inno-vation Systems, project number P23918-2A.

References

Basili, V. R., Caldiera, G., McGarry, F. E., Pajerski, R., Page, G. T., & Walig-ora, S. (1992). The software engineering laboratory: An operational soft-ware experience factory. In Proceedings of 14th International Conference onSoftware Engineering (pp. 370–381).



Höst, M., & Johansson, E. (2005). Performance prediction based on knowl-edge of prior product versions. In Proceedings of 9th European Conference onSoftware Maintenance and Reengineering (pp. 12–20).

Johansson, E., & Wartenberg, F. (2004). Proposal and evaluation for or-ganising and using available data for software performance estimations inembedded platform development. In Proceedings of 10th IEEE Real-Timeand Embedded Technology and Applications Symposium (pp. 156–163).

Korhonen, J. (2003). Introduction to 3G mobile communications (2 ed.).Artech House Mobile Communications Series.

Labrosse, J. (2002). MicroC/OS-II: The Real-Time Kernel (2 ed.). CMPBooks.

80 REFERENCES

Lantz, A. (1993). Intervjuteknik [Interview methods]. Studentlitteratur.


Miles, M., & Huberman, A. (1994). Qualitative data analysis. Sage.

Polo, M., Piattini, M., & Ruiz, F. (2002). Using a qualitative research methodfor building a software maintenance methodology. Software: Practice andExperience, 32(13), 1239–1260.


Russell, J., & Jacome, M. (2003). Architecture-level performance evaluationof component-based embedded systems. In Proceedings of 40th conferenceon Design automation (pp. 396–401).

Seaman, C. (1999). Qualitative methods in empirical studies of softwareengineering. IEEE Transactions on Software Engineering, 25(4), 557–572.


SPICE. (1998). ISO/IEC TR 15504:1998(E). Information Technology -Software Process Assessment.

Suzuki, K., & Sangiovanni-Vincentelli, A. (1996). Efficient software perfor-mance estimation methods for hardware/software codesign. In Proceedingsof 33rd annual conference on Design automation (pp. 605–610).

Wolf, W. (1994). Hardware-software codesign of embedded systems. Pro-ceedings of the IEEE, 82(7), 967–989.

IV

PAPER IV

Performance Prediction Based on Knowledge of PriorProduct Versions

Martin Höst and Enrico Johansson

Proceedings of 9th European Conference on Software Maintenance and Reengineering, Pages12–20, IEEE Computer Society, 2005

Abstract

Performance estimation is traditionally carried out when measurement froma product can be obtained. In many cases there is, however, a need to startto make predictions earlier in a development project, when for example dif-ferent architectures are compared. In this paper, two methods for subjectivepredictions of performance are investigated. With one of the methods expertsestimate the relative resource usage of software tasks without using any knowl-edge of earlier versions of the product, and with the other method experts usetheir experience and knowledge of earlier versions of the system. With bothmethods there are rather large differences between different individual predic-tions, but the median of the prediction error indicates that the second methodis worth further investigations.

82 IV.

1 Introduction

During development of software systems it is important to be able to predictthe impact of different design alternatives on various quality attributes. It isnot a sufficient alternative to wait until the product is developed to see whatthe quality is. If, for example, the performance with respect to response timefor the provided functionality is not satisfactory it is impossible to use thesoftware without very large modifications.

For many systems, software performance, i.e. the degree to which a soft-ware system meets its objective of timeliness (Smith & Williams, 2002), is oneof the most important quality attributes. This is especially true for real timeembedded systems, produced in large quantities. These types of systems oftenhave high-priority requirements on keeping the cost down of the products.When this is the case, it is not possible to solve performance problems by, forexample, increasing the hardware performance. This would be expensive andit would require a very long lead-time to change the hardware platform.

Traditionally, performance prediction and estimation is carried out whenthere are prototypes or detailed models available. For example, in Zimran &Butchart (1993) a methodology for performance analysis is presented where,basically, the performance requirements are set in the requirements phase, butthe actual estimations are not carried out until there are executable modulesavailable from the system.

To estimate the software performance of a software system without actu-ally building it (for example as a prototype), modeling techniques can be used.Established modeling techniques are for example discrete event simulationand analytical methods based on queuing networks such as layered queuingnetworks (Rolia & Sevcik, 1995; Woodside et al., 2001). For whichever sys-tem modeling technique chosen there are a number of input parameters thatmust be quantified. The input parameters describe the run-time behavior ofthe components included in the models. Typical parameters needed in themodels are, for example, execution times, resource utilisation, and number ofvisits to different hardware and software resources.

Although values of the parameters can be extracted via measurements or aprototype of the product, this requires in many cases too much effort. Insteadthe values are often extracted using subjective estimations. In this way theoption to measure can be chosen for the parts that are considered to presentperformance risks.

During development of the product there is a need to explore the design-space to evaluate the performance of the system. The different design alterna-

I 83

tives include different choices of hardware resources, software and hardwarepartitioning, software and hardware architecture, etc. A performance model isan effective way of performing such a design-space exploration. If the designchoices are few there is the option to prototype all the design alternatives andmeasure the performance instead of using models parameterised via subjec-tive estimations. This is however not usually the case for large-size softwareproducts in the early development phases. Even if the system for some partrely on already implemented components there can still be a large numberof design decisions to be made about choices of software and hardware thatinteract with the component.

In software engineering, other important metrics are in many cases esti-mated subjectively. A large amount of research has for example been carriedout with respect to subjective estimation of the effort that is required for de-veloping a software product, e.g. (Höst & Wohlin, 1997, 1998; Jørgensen,2004). It is assumed that experts are capable of predicting the required ef-fort for developing a project, and it has been found that certain methods forprediction are better than other methods. The objective of this paper is toinvestigate whether it is possible to use subjective estimation in software per-formance engineering, and more specificly how knowledge from developmentof previous versions of the product could be used in this process.

System expertise is available when the same personnel as in a previous ver-sion of a system is engaged in the development of a new version of a system.The more similarity there is between the different versions, the more knowl-edge from the previous product can be reused. This is an option in productplatform development, which consists of either upgrading a product platformor by building products as variants of the product platform. A product plat-form is per definition the basis of a variety of product versions, where each ofthem is a variant of the platform. Each product version should be constructedwith a low effort, compared to developing the complete platform (Meyer &Lehnerd, 1997; Clements & Northrop, 2001).

Two methods for prediction are investigated and compared in this paper.With one of the methods (method 1) experts are encouraged to use knowledgefrom development of earlier versions of the products, while they with theother method (method 2) are not encouraged to use this knowledge. The twomethods are described in more detail in Section 2.6.

84 IV.

User A User BSoftware

OffHook

StartTone

Digit

StopTone

Digit

Digit

Digit

StartRing

StartTone

OffHook

StopRing

StopTone

OnHook

OnHook

1

2

3

4

5

6

7

8

TALKING

Figure 1: Normal usage (Scenario 1).

2 Research design

In order to compare the two methods, a controlled experiment was executed,where subjects were given the estimation tasks according to the estimationassignment described below.

2.1 Study object

The system that was used is a simple telephone exchange system, which hasbeen used in teaching for a long time at Lund University. A normal usage (atelephone call from user A to user B) of the system is depicted in Figure 1.The software execution times in Figure 1 can be explained as follows:

1. User A lifts the receiver. Dial-tone is sent to user A.

2. User A has pressed the first digit in the telephone number of user B.This means that the dial tone should be removed. A telephone numberin this simple exchange consists of four digits.

R 85

3. User A has pressed the second digit in the telephone number.

4. User A has pressed the third digit in the telephone number.

5. User A has pressed the fourth digit in the telephone number. Thismeans that ringing should be started at user B, and that user A shouldreceive a tone that indicates that ringing has started.

6. User B lifts receiver, which means that ringing should be stopped at UserB and that the tone at user A also should be stopped. The two parties (Aand B) are now connected and they can talk to each other. In a normaltelephone switch, the CPU could be involved in transferring the digitalrepresentation of the communication. This is however not done by thisswitch, and that task should consequently not be considered.

7. User B puts on receiver.

8. User A puts on receiver.

In the study two different product versions of the system were used:

Version 1: This is a basic version of the system that supports the normalusage, as described in the above scenario. Variants of the scenario, suchas interrupted sequences of digits in a telephone number, and when thecalled party is already engaged in another telephone call, are of coursehandled, but no additional services are provided.

Version 2: This version also supports the following additional services: charg-ing (i.e. possibility to keep track of charges for different customers),call fetching (i.e. possibility for the called party to answer a telephonecall in another telephone than the one that the call is directed to), callforwarding, and management (i.e. possibility to add and remove cus-tomers, change charging information, change telephone numbers, etc).When the system is used in courses, students start with version 1 andimplement version 2. When version 2 is developed based on version 1,changes are made to the software tasks of version 1. This means, forexample, that the execution times of the tasks of scenario 1 (Figure 1)will not be the same on version 1 and version 2 of the system.

The system cannot be considered as very large, but it is not trivial forthe subjects to understand exactly how every detail is implemented or howit should be implemented. It is specified in SDL (ITU, 1992), from which

86 IV.

the code is automatically generated from a CASE tool. If the detailed designspecification (SDL specification) of version 1 is printed, it requires 29 pages.A detailed design specification of version 2 is, of course, considerably largerthan for version 1.

2.2 Estimation assignment

In the study, subjects were asked to estimate the execution time of a num-ber of tasks of the software system. As described in Section 1, two differentestimation methods were used during the estimation:

Method 1: In this method estimations are carried out relative to a baselineexecution time. The subjects were given the task to estimate the relativeexecution times compared to the baseline task. For example, if theythink that the execution time of a certain task is twice the executiontime of the baseline task, they should answer “2”. The intention is thatthis method should represent a very basic method that virtually alwayscould be used. It does, for example, not require that the experts have anyknowledge of prior versions of the system. If the experts have knowledgeof earlier product versions this experience is not used in the predictionsin any structured way.

Method 2: This method is based on the platform-based software develop-ment process. It is assumed that the experts have knowledge of a priorversion of the product, they are able to identify a similar task in the plat-form software, and that they are able to estimate the relative executiontime of the task compared to the identified baseline task. For every soft-ware task for which the experts estimated the execution time, they firstdecided which software task in the platform system that was most sim-ilar. Then they estimated the relative execution time compared to thechosen baseline task. The difference between this method and method1 is that the experts using this method are not given a baseline task, butinstead for every estimation choose one from the software platform.

The participants carried out the estimation by following the instructionsin a questionnaire. In the questionnaire there were three major parts that wererelated to the estimation. For every part they were asked to estimate the exe-cution times in a scenario as the one in Figure 1.

Two different scenarios were used:

Scenario 1: This corresponds to the scenario in Figure 1.

R 87

User Software

OffHook

StartTone

“*”

StopTone

“1”

Digit

StartTone

OnHook

1

2

3

4

9

10

“*”

5Digit

6Digit

7Digit

8“#”

Figure 2: Scenario 2.

Scenario 2: This corresponds to ordering of call forward, as outlined in Fig-ure 2. Call forward is ordered with the following sequence:

*1* telephone_number #

The details of this scenario is not given in this paper, but can be foundin the experiment material1.

When the subjects carried out the estimation they were not given anydetailed design or code of the system. They were only given the scenarios aspresented in Figure 1 and Figure 2, and explanations similar to the explanationin Section 2.1.

2.3 Experiment schedule

The experiment was divided into three parts, as described in Table 1.

1http://serg.telecom.lth.se/research/packages/PerfExp/

88 IV.

Table 1: Experimental schedule.

Group 1 Group 2Part 1 Method 1

(Training) Scenario 1Version 1

Part 2 Method 1 Method 2Scenario 1 Scenario 1Version 2 Version 2

Part 3 Method 2 Method 1Scenario 2 Scenario 2Version 2 Version 2

Part 1 served primarily as a “training round”, where the subjects werepresented with the system and subjective estimations in general. In part 1the subjects used method 1 to estimate the execution times of scenario 1 inversion 1. All subjects carried out the same tasks, i.e. there is no differencebetween group 1 and group 2. The participants used the first execution timein the scenario as baseline execution time. That is, they estimated how longexecution times 2-8 are compared to execution time 1 in the same scenario.The first task of the scenario was used as a baseline task, since there was noother task that naturally could serve as a baseline task.

The objective of part 2 and part 3 is to give answer to research objectivesof the paper. In order to provide as much data as possible, it was decidedthat all subjects should use both method 1 and method 2 for estimation of assimilar tasks as possible. Since it is probably not possible to obtain differenttasks that are exactly similar with respect to how hard it is to estimate theirexecution times, it was decided to use both methods on both scenarios.

Both groups estimate the same tasks in part 2 and they also estimate thesame tasks in part 3 of the experiment. However, group 1 first uses method1 and then method 2, while group 2 first uses method 2 and then method 1.In this way, unwanted effects related to that it is not equally hard to estimatedifferent tasks are decreased. Effects of different order between the methodsare also decreased.

In part 2 and part 3 of the experiment, method 1 and method 2 wereused for estimation of the tasks of scenario 1 and scenario 2 on version 2 ofthe software.

R 89

When the subjects used method 1, the first task of the scenario for whichthe estimation is done was used as a baseline task. That is, in part 2 and part3, task 1 from scenario 1 or 2 on version 2 of the software system were usedas baseline tasks.

When the subjects used method 2, they chose baseline task from scenario1 on version 1 of the system. This corresponds, as described above, to do-ing estimation based on experience and knowledge of earlier versions of thesystem. Since all subjects directly before part 2 and 3 of the experiment hadcarried out part 1, where version 1 was studied, all of them should have freshknowledge of version 1.

2.4 Subjects

There were in total 10 subjects in the study. All of them have experience fromusing the system, at least as students. One of the subjects is the developerof version 1 of the system. The system was developed several years ago (inthe 1980s). This, in combination with that the system is not written in atraditional programming language but instead automatically generated froma specification, makes him a relevant subject of the study. Most subjects haveduring the last 10 years worked with the system as teacher in courses where itis used.

Since the subjects have worked with the system as described above, theyhave good knowledge of the specification of version 1, and the requirementson version 2 of the system. They have however not investigated the actual codelevel implementation of the system and they have not measured the perfor-mance or similar measures of the system. The intention is that this should becomparable to an architect working with a system in the early phases of devel-opment of new functionality. In this case it is only possible to have knowledgeof the overall design and the requirements of the functionality to be imple-mented. It is not possible to have detailed knowledge of the parts that shouldbe implemented or changed.

2.5 Measurement of real values

There are a number of techniques to collect timing information for a selectedportion of a software program. One technique is to measure the elapsed wallclock time between two points of interest in the program; therefore this tech-nique is normally denoted as the wall-clock or stopwatch approach. When us-ing this approach the waiting time which is a result of time spent in IO trans-

90 IV.

actions, memory accesses and other system activities is included in the mea-sured time. In time-sharing system the waiting time includes also time con-sumed by other applications. Another technique is the CPU time approachconsisting of only measuring the time spent in the CPU, which doesn’t in-clude time spent in other system activities or in running other programs (Lilja,2003).

When collecting the subjective estimations the wall-clock execution timewas asked for, as a consequence the wall-clock approach was used when mea-suring the execution times for each of the tasks included in the experiment.

The execution times for the SDL language constructs in each task definedin scenario 1 and scenario 2 has been measured by adding a specific mea-surement process and two new signals to the SDL systems under test. Onesignal contains parameters defining the timestamp and the start or stop of atask; the other signal contains no parameters. The tool Telelogic SDL/TTCNSuite 4.5 has been used to compile the SDL system and performing real timesimulations of the modified system. Simulations have been chosen to makethe measurements independent of the execution hardware and the telephonyexchange. The measured execution times cannot be used as real life executiontimes, since they would in that case be dependent on the execution environ-ment (OS, CPU, Buses, other software applications). However the measure-ment can be used to calculate the ratio of the execution times for the differenttasks in scenario 1 and scenario 2.

The simulation has been driven by a script (monitor-file), which definesthe signals and parameters sent to the SDL system. After each arrival of asignal starting a task, one of the added signals is sent to the measurementprocess with the actual timestamp, the measurement process collects the dataand returns the control directly the SDL system under test. When the end ofa task is reached, the same signal but with a parameter defining that this is theend of a task is sent to the measurement process. The measurement processstores each of the timestamps in a vector. After the last signals that completethe scenario the second added signal is sent to the measurement process, whichthen calculates the execution times for each task and prints them to a text file.For each scenario this is repeated 50 times.

When using the real-time simulator, the SDL now-operator has been usedto measure the system time. The now-operator returns a value obtained froma clock function and the implementation of the clock function depends onwhich OS platform the SDL system is compiled and executed on (i.e. differentWindows and UNIX families). In this study the simulation and compilation

R 91

has been made on an Intel Pentium 4, 1.7 GHz processor with MicrosoftWindows 2000 running.

2.6 Execution

The experiment was conducted as follows. First the subjects were informedthat the experiment should be carried out, and that their participation waswanted. At this occasion the major objectives of the experiment were alsobriefly outlined.

A few weeks after that, each subject was given a form and they committedto a personal deadline for handing in the filled out form. The subjects handedin the survey within two weeks. Since all subjects have access to the system,they were instructed not to study any code or detailed design of the system.They were, as described above, only given information similar to the scenar-ios in Figure 1 and Figure 2, and descriptions as in Section 2.1. They were,however, allowed and encouraged to use the knowledge that they have of thesystem.

The instructions and the actual estimation assignments were given to thesubjects through a questionnaire. Since the research design requires that thesubjects should be divided into two groups, there were two versions of thequestionnaire and it was decided by random which subjects should receivewhich version. In order to balance the design there were, as far as possible,handed out equally many forms of each version.

The subjects filled out the forms individually without discussing individ-ual estimates with each other. It took about half an hour to one hour to fillout the forms.

2.7 Estimation error metric

Estimations are, as described above, based on relative values of execution timescompared to the execution time of baseline tasks. When the relative valuehas been collected from an expert, an estimate E of the absolute value ofthe execution time can be derived from the estimated relative value and themeasured value of the baseline task.

Let F (E,A) denote the estimation error as a function of the estimatedvalue, E, and the actual value, A.

In this study, the following properties of F are wanted:

1. If there is no estimation error, then F should be 0.

92 IV.

2. The sign of F should indicate whether the estimate is larger than or lessthan the actual value.

3. With this type of estimates it is natural that the absolute estimationerror becomes larger when the experts estimate too high values, thanthe errors become when the experts estimate too low values. Estimatesthat are higher than the actual value can fall in an infinite interval,(A,∞). Estimates, on the other hand, that are lower than the actualvalue fall in a limited interval, (0, A). This means that it is not suitableto use the absolute error or the standard relative error (E − A)/A aserror function.

We argue that an estimate that, for example, is twice as high as the actualvalue is as good as an estimate that is half as high as the actual value, andan estimate that is 0.9 times the actual value is as good as an estimatethat is 1/0.9 times the actual value. More formally, this property can beexpressed as

|F (aA,A)| = |F (1aA,A)| a > 0

where a is the relative value of the estimate compared to the actualvalue. For example, if the estimate is half the actual value, a = 0.5.

4. An estimate that, for example, is twice as high as the actual value shouldbe considered as good as all other estimates that are twice as high as theactual value, irrespective of what the actual value is. More formally, thisproperty can be expressed as

F (aAi, Ai) = F (aAj , Aj) a > 0

It could be observed that the described properties are met by a functionthat is based on the logarithm of E/A. In this study, the following error func-tion is used:

F (E,A) = ln(E

A)

For this measure, property 1 and property 2 are trivial to show, and notfurther discussed in this section. By observing that

| ln(aA

A)| = | − ln(

A

aA)| = ln(

1aA

A)

property 3 is shown. Property 4 is also trivial to show, and not further dis-cussed in this section.

R 93

It should be noted that it is possible to define other error functions thatalso meet the properties. For example, in (Jørgensen & Sjøberg, 2003) theBalanced Relative Error (BRE) is used for cost estimation errors. BRE is cal-culated as (A − E)/A if A ≤ E, and as (A − E)/E if A > E. However, inthis paper the F function is used as defined above. The exact choice of metricis not critical as long as it meets the criteria.

2.8 Validity

In order to evaluate the validity of the study, a checklist from Wohlin et al.(2000) is used. Validity threats may be classified into the following four classes:conclusion validity, construct validity, internal validity, and external validity.

The conclusion validity is related to the possibilities to draw correct con-clusions about relations between the treatment and the outcome of the exper-iment. Typical threats of this type are, for example, to use wrong statisticaltests, to use statistical tests with too low power, or to obtain significant differ-ences by measuring on too many dependent variables (“fishing and the errorrate”). Since the design does not involve very many dependent variables andnon-parametric statistics are used, these particular threats are not consideredlarge in this study. The most important threat of this kind is probably insteadthe measurement reliability of the actual values of the execution time. Thishas been considered in the choice of measurement approach. A simulationenvironment has been used to rule out the effects of processes running in theOS not part of the SDL system under test. Also the measurements have beenrepeated a number of times to assure that the same execution time for eachtask could be repeated. The measured value of the execution times used inanalysis is the average value of all the 50 runs. However, it should be notedthat it is hard to measure these kind of values without affecting the system atall.

The internal validity is affected by confounding factors that affect the mea-sured values outside the control, or knowledge, of the researcher. This may,for example, be that the groups of subjects carried out their assignments underdifferent conditions, or maturation of participants which could result in thatthe first assignments were carried out with more effort than later assignments.Other threats to internal validity are instrumentation and selection or group-ing of subjects. In order to lower the internal threats in this experiment, effortwas taken to make the study as less disturbing as possible for the subjects.The assignment that every person filled out was small with respect to the re-quired effort, and they could carry out the assignment whenever they wanted

94 IV.

before an individually decided deadline. Most subjects finished in about halfan hour to one hour. This should prevent subjects from leaving the study orfrom being “bored” and thereby not spending equally much effort with everyassignment.

There is a risk that the actions that were taken to lower the internal threatscould have introduced new internal threats. Since the subjects carry out thetasks by themselves without the researcher, there is a risk that they talk toomuch with each other about the assignment, or in worst case, that they handin the same assignments. This has, however, not been a problem in this study.

Threats to construct validity denotes the relation between the concepts andtheories behind the experiment, and the measurements and treatments thatwere analysed. In this case the threat, for example, corresponds to whether theestimation methods represent estimation methods in practice, and whetherthe measured errors represent relevant factors. The experiment has, of course,been designed to have as good validity as possible with respect to this. Sincethere has not been carried out many similar experiments in the past, somenew concepts had to be defined for the experiment. It is our belief that theproposed methods for estimation, i.e. method 1 and method 2, represent twobasically different methods and contexts for estimation. Method 1 representsthe most basic method and context where estimation is carried out for a newproduct without taking advantage of the organisation’s knowledge of previousversions of the product. Method 2, on the other hand, is intended to modelthe situation when experience from previous versions are used in the estima-tion. It should, however, be noted that the methods represent simplifications,i.e. models, of practically usable methods.

The external validity reflects primarily how general the results are withrespect to the subject population and the experiment object. The intentionis that the subjects in this experiment should be representative of engineersworking with this type of estimation in live projects. The subjects have, asdescribed in Section 2.4, a knowledge of the system that is comparable to theknowledge of engineers in the early phases of a project. They know the designof it, but not the details of the implementation. One difference that couldhave some importance is the motivation of the subjects. In a real project, theresult of the subjects’ work have a large impact on the success of the project,and thereby on the reputation etc. of the individual. In this case the subjectsparticipated in the study as anonymous experiment subjects, and the prod-uct will never be used in a real situation where performance is monitored.This means that there is a risk that estimation errors are not considered as

A 95

2 3 4 5 6 7 8 101 102 103 104 105 106 107 108

−2

−1.5

−1

−0.5

0

0.5

1

2−8: task 2−8, method 1; 101−108: task 1−8, method 2

F(E

, A)

Figure 3: Estimation errors, scenario 1.

important as in a real project. We do not believe that this threat is signifi-cantly affecting the results. We believe that the subjects actually tried to do asgood estimates as possible. However, this threat should always be consideredin studies where the result of the subjects’ work will not be evaluated “for real”.The study object, i.e. the software system, is not a commercial system, and itis probably not as complete as a commercial system either. It is however, asdescribed above, probably large enough to prevent the subjects from knowingor directly understanding all details of it.

3 Analysis of results

In this section data from part 2 and part 3 of the experiment are analysed.Non-parametric methods are used for analysis and all significance tests are, ifnothing else is stated, carried out at the 0.05 level of significance. Box-plots(e.g. Fenton & Pfleeger (1998)) are used to visualise the estimation errors.Notice that the box-plots are plotted with “notches” (i.e., they look like “sand-glasses” instead of ordinary rectangles). The reason for this is that in somebox-plots otherwise it is hard to know whether the median coincides with thetop or the bottom of the rectangle.

As described above, all estimations are carried out for the tasks in version2 of the software system. Let Emste denote the absolute estimate of expert e,estimating task t in scenario s, using method m. It can be calculated as

Emste = RmsteBmste

96 IV.

3 4 5 6 7 8 9 10 101 103 104 105 106 107 108 109 110

−1

−0.5

0

0.5

1

1.5

2

2.5

3−10: task 3−10, method 1; 101−110: task 1−10, method 2

F(E

, A)

Figure 4: Estimation errors, scenario 2.

where

• Rmste is the relative estimate of task t in scenario s from expert e, usingmethod m, and

• Bmste is the baseline value that the expert used when estimating therelative value of Rmste. For method m = 1, B1ste is the measured valueof task 1 of scenario s, executed on version 2. For method m = 2,B2ste is the measured value of the task that was chosen by the expertfrom scenario 1, version 1.

Based on this, the estimation error can, as described in Section 2.7, forevery estimated task, be calculated as F (Emste, Ast), where Ast denotes themeasured execution time of task t in scenario s on version 2 of the softwaresystem.

First, the estimation error for every task with method 1 and method 2 isanalysed. In order to limit the number of possible combinations of method,task, etc. in one diagram the tasks of scenario 1 are first analysed, then the tasksof scenario 2 are analysed. In Figure 3 the estimation errors are plotted forscenario 1. The box-plots to the left (2-8) denote the results when method 1is used for task 2-8 and the boxes to the right (101-108) denote the resultswhen method 2 was used for the same tasks (and task 1). It can clearly beseen that the estimation errors are smaller when method 2 is used than whenmethod 1 is used.

When the same analysis is to be carried out for scenario 2 it is found that

A 97

1 2

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

3

Method

F(E

, A)

Figure 5: Estimation errors, method 1 and method 2.

one of the measured values (task 2) in scenario 2 (version 2 of the product)was very low, and no estimate was near the value. Since it was so low, and itaffected both methods equally, it was decided to remove the value from thestudy. This means that 5 data points were removed from the measurementsfrom method 1 and 5 data points were removed from the measurements frommethod 2.

In Figure 4 the estimation errors for the tasks of scenario 2 with method 1and method 2 are displayed in a similar way as for scenario 1 in Figure 3. Thetendency that method 2 produces lower errors than method 1 is not as clearfor scenario 2 as for scenario 1. Notice that the values for task 2 are removedfor the reasons described above.

An explanation that the difference between method 1 and method 2 islarger for scenario 1 than for scenario 2 may be that scenario 1 is more suitedfor method 2. Since baseline tasks are selected from the same scenario (butfor a different version) it is probably easier to identify the best baseline task inevery case.

In our next analysis we analyse all estimation errors from method 1 andmethod 2 in part 2 and 3 of the experiment. That is, the estimates are fromscenario 1 and 2 on version 2 of the software The estimation errors for method 1and method 2 are plotted in Figure 5. The median value of the estimation er-rors of method 1 is FM

1 = −0.2939, and the median error of method 2 isFM

2 = 0.0128. That is, FMm denotes the median of the relative errors for

method m.

98 IV.

In order to compare the estimates of the two methods a goodness indica-tion function Gm and a null hypothesis are defined:

H0 : G1 = G2

In this case we use as a goodness function the absolute value of the median ofthe estimation errors, i.e. Gm = |FM

m |. Notice that the better the estimationsare, the nearer the goodness indication function is to zero.

Since, in this case, FM1 < 0 and FM

2 > 0, the null hypothesis can bereformulated as

H0 : −FM1 = FM

2

where −FM1 is the median of the individual −F1ste values. It is not possible

to reject the null hypothesis with a Wilcoxon rank sum test (p = 0.053) atthe 5% level of significance, although the p-value is small. However, there is adifference between the median values. That is, the data indicates that method2, at least under these circumstances, produces better estimates than method1. This could be interpreted as encouraging for the ideas of method 2. Itshould however be noted that more research is needed to conclude whetherthere is a statistically significant difference, and more exactly under whichconditions it is better. The calculation is based on 75 data points for method1 and 84 data points for method 2.

4 Conclusions

It could be concluded that there are significant prediction errors with bothmethods. However, it seems like the prediction error, in this case, is smallerfor method 2 than for method 1. For method 2, the median error correspondsto about 1% (e0.0128 ≈ 1.01). However, there is a significant dispersion in thevalues, and it is not clear how to obtain a prediction with as small error as thiswith fewer estimates.

That is, it can be concluded that there are significant and large errorswhen the estimation of individual tasks are analysed. However, if the estimatesof all tasks are analysed, e.g. by looking at the mean value, the errors fromdifferent tasks seem to rule each other out. It can also be concluded that theprediction error in this case is lower when knowledge and experience fromformer versions of the software are used in the predictions. This becomes moreevident the more similar tasks that can be identified in the former version ofthe software.

A 99

A set of different questions for further research can be concluded from thepresented study. At a more general level, further research is needed to decideif estimation errors are too large to be used in predictions or it is actuallypossible to use subjective estimations in performance prediction.

There is also a need to investigate method 2 in more detail. In this studyit has, for example, not been decided if the largest errors are due to that thesubjects must choose a baseline task or because they should compare the esti-mated task to the baseline task.

Further research is also needed to investigate methods for estimating theuncertainty of different individual predictions. It would be an advantage toreceive an indication of the uncertainty of a prediction already when the pre-diction is made.

5 Acknowledgment

The authors would like to thank all the participants in the experiment.This work was partly funded by the Swedish Agency for Innovation Sys-

tems, project number P23918-2A.

References


Fenton, N., & Pfleeger, S. (1998). Software metrics: A rigorous and practicalapproach, revised (2 ed.). Course Technology.

Höst, M., & Wohlin, C. (1997). A subjective effort estimation experiment.Information and Software Technology, 39(11), 755–762.

Höst, M., & Wohlin, C. (1998). An experimental study of individual sub-jective effort estimations and combinations of the estimates. In Proceedingsof 20th International Conference on Software Engineering (pp. 332–339).

Jørgensen, M. (2004). A review of studies on expert estimation of softwaredevelopment effort. Journal of Systems and Software, 70(1-2), 37–60.

Jørgensen, M., & Sjøberg, D. I. K. (2003). An effort prediction interval ap-proach based on the empirical distribution of previous estimation accuracy.Information and Software Technology, 45(3), 123–136.

100 REFERENCES

Lilja, D. J. (2003). Measuring computer performance: A practitioner’s guide.New York, NY: Cambridge University Press.




Specification and description language (SDL). (1992). ITU-T Standard Z.100,International Telecommunication Union.

Wohlin, C., Runeson, P., Höst, M., Ohlsson, M., Regnell, B., & Wesslén, A.(2000). Experimentation in software engineering: an introduction. KluwerAcademic Publishers.

Woodside, C. M., Hrischuk, C. E., Selic, B., & Bayarov, S. (2001). Auto-mated performance modeling of software generated by a design environ-ment. Performance Evaluation, 45(2-3), 107–123.

Zimran, E., & Butchart, D. (1993). Performance engineering throughout theproduct life cycle. In Proceedings of 7th Computers in Design, Manufacturing,and Production (pp. 344–349).

V

PAPER V

Proposal and Evaluation for Organising and UsingAvailable Data for Software Performance Estimationsin Embedded Platform Development

Enrico Johansson and Fredrik Wartenberg

Proceedings of 10th IEEE Real-Time and Embedded Technology and Applications Symposium,Pages 156–163, IEEE Computer Society, 2004

Abstract

Embedded platform development, where a platform is a basis for a familyof products, is characterised by soft- and hardware co-design as well as verytight constraints on cost and available real-time system performance. The lat-ter implies that early software performance estimation plays a crucial role inembedded platform development. Due to the different focus of software andhardware design, performance data and estimations techniques needed are bynature heterogeneous. In order to simplify access and availability of softwareperformance data for an entire development organisation, a database tool isproposed and implemented, which allows organising, presenting, and to someextent evaluating the existing, heterogeneous data for software performanceestimations.

102 V.

1 Introduction

Estimations of software performance (Gelenbe, 1999; King, 1990) refer toproviding the response times and the workloads before the software is fullyimplemented and integrated in a product. Software performance is a criticaldesign criterion, especially for embedded products with restricted resourcesand strong cost constraints (Bechini & Prete, 2002; Micheli & Gupta, 2002;Wolf, 1994). In a large “embedded” company, like Ericsson Mobile PlatformsAB (2004)(EMP), work on software performance is carried out in many partsof the development projects. EMP is committed to supply phone manufactur-ers and original device manufacturers (ODMs) with products, allowing themto build mobile phones for GSM/GRPS, EDGE, and WCDMA standards(Collins & Smith, 2002). EMP’s products consist of a hardware reference de-sign and complete software stacks up to an application API, where EMP’scustomers can place their own applications. EMP’s entire range of productsis based on a small number of embedded platforms (Clements & Northrop,2001; Meyer & Lehnerd, 1997).

In this context of embedded platform design, software performance is cru-cial due to a variety of reasons like the existence of real time critical tasks,advanced multimedia functionality as well as cost and energy constraints, toname some. Thus, work on software performance is carried out in many partsof EMP, representing various needs as well as providing various expertises.Amongst others, system designers, programmers, or product managers are in-volved in EMP software performance engineering (SPE) work. Typically, eachof these groups has their specific requirements on SPE data. It is not trivial touse performance data (A. Smith, 1994) effectively over the entire company,even more so as the time needed to collect, organise and retrieve data for acomplex system is not negligible.

On this background, a case study was carried out at EMP. The approachwas to study which performance data is available or needed, to which pur-poses performance data is currently used, and which new usages are madepossible by organising and structuring the available performance data. Basedon this case study, a software performance database was proposed to organiseand use software performance data effectively at EMP. The database was thenimplemented and evaluated.

2 Research Methodology

The main research question pursued is the following:

R 103

• How should SPE data be organised and made available to satisfy theneeds and possibilities for the development of an embedded platform?

Three activities have been conducted to answer the research question. The firstactivity was an archive analysis of development documents to learn which datais available and who the users are. About 20 design documents were analysedduring this activity. The second activity was a set of interviews with selectedpersonnel from the case organisation. The goal was to understand how theactual data is used and what types of performance estimations are of inter-est. The sampling of subjects was based on their knowledge and experiencein a way that resulted in a broad spectrum of knowledge of the work in theorganisation. In total 10 experts from various fields (system design, multi-media, audio, system test, digital signal processing, Java) where interviewed.The third and final activity was to propose, implement and use a method oforganising performance data and estimations to support effective SPE.

3 Results from the archive analysis and interviews

Although a software performance engineering process is presented in C. U.Smith & Williams (2002) the result let us believe that a development of anembedded platform poses requirements in the development processes not cov-ered by C. U. Smith & Williams (2002). The overall results are covered in thispaper, while others of a more detailed nature are of a confidential nature forEMP, and thus cannot be published.

The archive analysis revealed that available data is highly heterogeneous,and that it is spread over entire EMP. The heterogeneity reflects the differentneeds within EMP (e.g. system design vs. software optimisation) as well asthe different methods to obtain data (e.g. estimations vs. measurements) alsoimplying that the data is of different quality, as e.g. measurements typicallyprovide more reliable data than do estimations. That SPE data is spread overthe entire organisation is a consequence of the fact the SPE is an integral partof the entire embedded platform development. Moreover, different users doneed the SPE data for considerably different purposes, related to their rolesat EMP i.e. product managers, software developers, hardware developers ortesters.

The result can further be summarised in a number of requirements con-cerning the collection (Chapter 3.1), display (Chapter 3.2), usages and analy-sis (Chapter 3.3) of the software performance data.

104 V.

3.1 SPE Data Collection

During the interviews and the archive study, it became clear that the conceptof use case Rumbaugh et al. (1998) is central. A use case represents somethinga mobile phone user accomplishes using the phone. Use cases play a centralrole as organising entity during specification, design, implementation, andtesting. It should also be possible to represent scenarios of use cases, e.g. asequence of parallel use cases, as this is of great practical importance. In theview of a hierarchical organisation, the scenarios would constitute the levelabove the use case.

Performance measurements should be associated with the use cases. Thesemeasurements should be representing at the hierarchy level below the usecases, further on referred to as tasks. The heterogeneity of the data makes itnecessary that different types of measurements can be represented (e.g. work-loads, call dependencies, or process timing distributions). Moreover, it man-dates the storage of data origin. Since the data may originate from varyingsources like for example, early estimations present in design documents ormeasurements made on different hardware or software releases.

It should be possible to add other relevant performance measures with-out altering the structure of data representation even when this data maynot directly be associated to software performance, (e.g. memory bandwidthrequirements, power consumption). To achieve these goals, the informationabout the hardware architecture must be included, as the SPE data is alwaystied to the hardware where the measurement was made. In the hierarchicalstructure of the SPE data, these hardware representations constitute the low-est level.

3.2 Data display and performance analysis

Concerning data display, it turned out, that a main requirement was the abil-ity to quickly retrieve performance data on the different hierarchy levels (i.e.scenarios, use cases or tasks), to easily add more detailed analyses to the datadisplay (e.g. process timing distributions instead of only focusing on work-load). Moreover, producing customised graphs and specific statistical evalua-tions based on the stored data was perceived as a very valuable feature.

There should also be means to judge whether certain performance crite-ria are met, e.g. that the sum of the workloads for a scenario does not ex-ceed the CPU capacity. Mcps (Millions of processor cycles per second) is usedthroughout the case study as the measurement unit for workload. Concerning

R 105

the hardware representations, one requirement was to provide a way for defin-ing operators for transformation of measurements between different hardwarearchitectures. Considering performance estimations based on SPE data, it wasconsidered as important, that estimation models could be easily integratedwith the organised SPE data.

3.3 Usages of Organised SPE Data

The usage is listed for each development phase; Design phase, Implementa-tion phase, Maintenance phase, Platform update.

(1) Design phase: During its initial stage, when decisions upon the systemdesign of the platform are taken, one of the focuses is to set requirements onthe hardware and software. The requirements are used as input to the nextproject stage, when a more detailed design is to be developed. Consideringdifferent design alternatives, the predicted performance of future applicationsis an important factor. Therefore, the choice between different alternative de-signs is often based on their estimated performance. Typical design alterna-tives that constitute the design space during system design are e.g. concernedwith hardware choices as: different CPU clock frequency and architecture (i.e.ARM CPU vs. Digital Signal Processor (DSP)), hardware accelerators, com-munication links, hardware/software partitioning, etc. Here it is seen as cru-cial to use existing performance data to estimate system performance for anew system design. For an efficient SPE process, this data needs to be wellorganised, easily retrievable and well integrated with estimation models andmethods. A further requirement was to support automated analyses, so thatthe impact of changes in system design can be evaluated quickly. Also, thepossibility to assess performance characteristics of the entire system (e.g. bymaking it simple to roughly estimate the performance implication of certaincritical use cases) might be appealing to product managers as an easy to usemean of reviewing customer requirements.

(2) Implementation phase: During implementation, when the system de-sign has been set, performance engineering can be used to find bottlenecksof the proposed design, by evaluating different design alternatives, using sim-ulations or prototypes. For these tasks, it was seen as very beneficial if theperformance data could be collected, organised, analysed in a standardisedand automated way. Once bottlenecks are identified, the same SPE processcan be used to track the results of software or system optimisation.

(3) Maintenance phase: In this phase well organised SPE data and analy-sis tools can be a means for tracking the effects of ongoing smaller design

106 V.

changes in terms of performance (i.e. change the partition of functionality ondifferent hardware or software entities). The tool can also be used for assess-ing the performance implications of new usages (new scenarios and use cases)of the platform. Moreover, well-organised SPE data and analysis tools will bebeneficial for system testing.

(4) Platform update, major configuration changes: When making majorchanges in the configuration of the platform, there is a need to know howdifferent configurations impact on system performance. The configurationchanges include both the software and hardware of the system. The systemperformance can be defined as the response times and workload for differentapplications. For example, knowing the threshold of the maximum workloadallowed to achieve a desirable responsiveness can create design guidelines thatthe CPU must not exceed this specific workload. In case the platform mustbe updated in order to cater for new functionality and better system perfor-mance, there is a need to know how to re-design the platform in order tomeet the desired goals. Performance estimates needed for this purpose can beviewed in terms of output and input information to the development life cy-cle of an embedded platform where software and hardware co-design is a vitaldriving force. The possibility to use existing performance data to estimate theeffects of the changes or updates will be very much facilitated through stan-dardised, accessible, and well-organised performance data. A further benefit ofwell-organised, accessible and easy to use performance data, analysis tools, andestimation methods, is that the need for software engineers to get involved inthe details of the hardware in order to make estimations of the software per-formance decreases. This is a time consuming operation for most softwaredevelopers. Hardware simulators must be set up and the software applicationexecuted on it. Still, even with these efforts, the result will suffer in accuracyand certainty. This is due to the limited accuracy of the hardware model usedin the simulations, which stem from the fact that when the estimations areneeded there is no hardware to run the software on. The estimation is in factused to take strategic decision about the design of hardware.

The conclusion is that well organised performance data, which is tightlyintegrated with analysis tools and estimation methods, is beneficial for allstages of embedded product development.

P - 107

4 Proposal for organising data - a software perfor-mance database

In order to cope with the overall data heterogeneity and the distribution ofdata over entire EMP a software performance database is proposed for sup-porting EMP’s SPE effort. The design of the database should fulfil the pre-sented requirements on data, analysis and estimation tools. A database allowsimproving effectiveness of SPE by helping to define, organise, and commu-nicate the company performance strategy; and to translate strategy into op-erational objectives related to performance work. With the continuous loopof insight into the estimates and measurements that can be provided by thedatabase, the company can easily measure effectiveness and continually refinethe SPE methodology by for example continuously validate the estimationsagainst a real product.

There exist alternatives to collecting the SPE data into a central database,which after all takes resources to build up and administrate. It is howeverbelieved that a database decrease the situations where each group involvedin a software performance work, independently collects, defines, and uses itsown software performance data whenever needed.

4.1 Database requirements

The requirements elicited from EMP and discussed in Chapter 3 can berewritten in the context of using a database solution. The rewriting gives riseto a requirement list (DB1-DB7).

DB1. The database should be a central repository for all performance datain the company.

DB2. The database should allow for easily retrieving and visualising perfor-mance data.

DB3. The database should be able to represent any type of performance data(e.g. be processor load, memory bandwidth, or power consumption).

DB4. The database should allow for aggregation of entries (e.g. summing upall data on OS tasks for one use case)

DB5. The database should be a tool for a wide range of prediction activitiesin all development phases (e.g. estimation of CPU load of sets of usecase, calculation of response times and latencies)

108 V.

DB6. The database should allow for integration with a wide range of evalua-tion and analysis tools for software and system performance estimation.(e.g. static models, dynamic models, simulation kernels).

DB7. The database should allow for visualisation of analysis results on dif-ferent levels, e.g. on use case level, RTOS process levels, or algorithmlevels.

4.2 Design of the Database

In the context of the development of EMP’s next platform architecture, a firstdesign of a performance database has been specified and implemented. Thefirst usage objective at hand was to collect and organise performance data ofthe existing platform architecture, and use it to estimate the performance ofthe new platform performance.

For this first implementation of the database, it was decided to use theworkload (Mcps) consumed by specific tasks as primary performance measure.Workload is a very important performance measure, and much of EMP’s per-formance data is available in terms of workload per task. It was also decidedto organise the performance data in terms of (parallel) use cases. This becausea goal with the performance estimation was to judge whether the use cases theplatform is designed for can be realised with the proposed system architecture.

A strong focus of the design has been to use the data existing at EMPand to have means for easily integrating them into the database. Existing datacan be found in current platform design proposals, use case validation, inmeasurements from the software groups as well as concurrency measurementsconducted by the system test group. In a next step, the database is to serve asinput for more sophisticated performance estimations based on queuing mod-els and simulations. The data analysis was conceived with this aim in mindallowing the extraction of data on e.g. timing distributions or call matrices onRTOS process level.

4.3 Structure of the data representation

The structure of the database is hierarchical and heterogeneous see Figure 1.The top-level entities are scenarios reflecting sequences of parallel use

cases. A typical scenario would e.g. be to (1) turn on the phone (2) down-load of file over GPRS and at the same time receive a voice call. The entitiesscenarios consist of are use cases. As most of the customer requirements and

P - 109

Scenario 1

Scenario 2

Scenario 3

1: UC1, UC3

2: UC1, UC5, UC6, ..

3: UC5 (Video Conference

...

Task 1: Video Encode Task 2: Video Decode

Current Platform(measured)

5 ARM Mcps25 DSP Mcps

...

New Platform(estimated)

Transformation

5 CPU1 Mcps75 CPU2 Mcps

...

Figure 1: Data representation in the database.

internal design requirements are formulated in terms of the possibility to ex-ecute specific use cases on the platform, use cases can be seen as the centralentity of database organisation. For the initial version of the database, the usecase table was directly extracted from the “use case requirements document”,listing and describing the use cases the current platform has to handle. Formeans of representing performance figures, each use case is composed of anumber of tasks. A task relates to something that can execute on a device’shardware, e.g. an algorithm, an RTOS process, a group of RTOS processes, orsimply a number of CPU cycles. The use case “GPRS file download” e.g. con-sists, amongst others of, a “datacom task”, reflecting all processes related to thedatacom stack, and “GSM access” task, related to the GSM access processes.Which hardware is involved and which resources are consumed is held in thehardware descriptions, which constitute the lowest level of the database. Eachtask may contain several hardware descriptions reflecting the task executingon different hardware platforms. The hardware descriptions are flexible struc-tures, reflecting the need to represent different hardware architectures and

110 V.

measurement figures as well as the necessity to abstract the software processesfrom the hardware they execute on. In the current version of the database twohardware descriptions are defined, one for the current platform and one for apossible upcoming platform. Currently, the tasks only contain values on theworkload, but might also involve other hardware resources like memory band-width or power consumption. Consumed hardware resources per task duringlow CPU load and the maximum available workload for the respective hard-ware differentiate these values. The latter value can be seen as the workloadbudget for that specific hardware. An example of the data represented in thehardware description for a released platform is showed in Table 1, represent-ing the datacom task of the use case mentioned earlier. The “budget” columnreflects the maximum Mcps available for the respective hardware. The “taskrequirement” column represents the number of Mcps used by the specific task.The values used are fictive.

Table 1: Data represented for the Task “Datacom” in the hardware descriptionfor the current platform.

Current Hw, task Datacom Budget Task requirementARM Mcps 100 10DSP Mcps 100 2

4.4 Transformations to new hardware architectures and queuingmodels

The transformation of performance data measured on one hardware set up istransformed to different hardware architectures by means of scaling factors,e.g. assuming that a task consuming 100 DSP cycles will consume 300 cycleson an ARM CPU. Obviously, very detailed knowledge of the transformed taskand the hardware set-up is needed to get an accurate estimation of the mul-tiplication factors. As this knowledge may not be available in the first phasesof the development cycle, these factors introduce an uncertainty into the gen-erated performance estimations. In the current implementation, no explicitdata is available of how the tasks make use of caches, nor is bandwidth ofdata- and instructions buses available. This is not a restriction when perfor-mance estimations are calculated for the same hardware as the measurementsare made on, since these factors are implicit to the measured performancedata. This is however not true for performance estimations obtained for new

U EMP 111

hardware architecture where these factors can imply large uncertainties be-cause not considering use of caches, nor the bandwidth of data- and instruc-tions buses. In addition, to capture the dynamic behaviour the transformedmeasures are used as input to queuing models such as the layered queuingnetwork (LQN) model. A layered queuing network (LQN) is an extension ofthe queuing network models (Rolia & Sevcik, 1995). LQN defines a systemin terms of request sent and service given by different hardware and softwareentities. The entities can be divided in three difference categories: client tasksthat only requests service, client-server tasks that can both receive and sendrequests and server tasks that only receives requests. These entities are placedin different layers where an entity at a higher level where an entity from ahigher level is allowed to request service from a lower layer, but not vice versa.Modelling a system as an LQN provides analytical means to estimate the sys-tem performance with different software and hardware configurations withdifferent workloads.

5 Usages for EMP

5.1 Data collection

The database can be used as a central point of collection for the regularlyconducted performance measurements by the various design teams. As thedatabase is accessible from any workstation, performance figures will becomemore easily available even outside the respective design teams. This greatly in-creases the possibilities make performance testing and monitoring an integralpart of system design.

5.2 Data Retrieval and visualisation

On the most elementary level, the database can be used to retrieve the perfor-mance figures stored in various ways: either by browsing scenarios, use cases ortasks, alternatively by searching for a specific use case or task. In case scenariosor use cases are concerned, the current version of the database will report thesum of Mcps of all tasks associated with the respective hardware entity andthus give a figure for system load. Moreover, for the data obtained from tracefiles, detailed per process statistics are available in addition to the task figures.The statistics currently implemented are relative execution times per processfor a use case and, distributions off the execution times for all processes permeasured use case as shown in Figure 2.

112 V.In time for Process VIE_ENCODERP

Mean time = 60time [uSec]

N

0 50 100 150 200 250 300

050

010

0015

00

Figure 2: Distribution of the execution times for a process.

5.3 Performance estimations

The database can be used to make performance estimations based on existingmeasurements and estimations. One example is to estimate performance fig-ures for parallel use cases based on the measurement of individual use cases.This is especially interesting for system design issues, as the database collectsinformation about the entire system (e.g. DSP and Application CPU). Thusthe group responsible for development of the DSP gets the possibility to es-timate the effects of their performance work on the entire system, somethingwhich turns out to be difficult if data and estimation methods are not avail-able through one single interface but rather are spread out over the entirecompany. Another example is to use the data available for the current plat-form to do performance estimations for an upcoming design. By specifyingthe transformers for the tasks affected by the changes in the hardware plat-form, rough performance estimations for the new hardware configuration caneasily be obtained. In our implementation, a transformer distributes the per-formance figures measured on the existing design’s different hardware com-ponents (DSP and ARM) to the new design’s processors (DSP’ and ARM’)applying simple scaling factors. Once again, the database as a central collec-tion point makes such estimations much more feasible, as all data is readily

R 113

available and the estimation models can be integrated into the database. If thealgorithms or methods are altered and used for specific software tasks, the fig-ures in the database and the performance estimation can be quickly updatedas to always reflect the most up to date system parameters. Although the abovedescribed performance estimation methods have their limitations (see Chap-ter 6), there is today no alternative way for quickly getting a large number onestimations based on measurements. If these estimations are treated as indi-cations and a means to identify possible performance bottlenecks, instead ofconsidering them as exact numbers, they can be very beneficial for the systemdesign process.

5.4 Example - CPU Budget

During the design, work with a platform upgrade the performance databasewas used to calculate a “CPU budge” for the upgraded platform. The plat-form upgrade implied that parts of the software system were to be redistrib-uted from CPU1 and CPU2 to two CPUs of other type CPU1’ and CPU2’.The first step in this work was to identify the most critical use cases amongstothers videoconferencing and MP3 playback. For these use cases a set of mea-surements were conducted on the existing platform and imported into theperformance database by means of the integrated trace-file analyser. By meansof these measurements and the architecture description of the upgraded plat-form (memory bandwidth, cache system and CPU characteristics) the tasktransformers were determined. Moreover, two sets of alternative hardware de-scriptions for the upcoming platform were integrated in the database. Afterthese activities were completed the database was used to produce performanceestimations for the alternative hardware descriptions of the platform whichdirectly showed the CPU budget (Table 2). CPU1 and CPU2 represent mea-sured MCPS for the existing platform while CPU1’ and CPU2’ representestimations for the upgraded platform. The actual CPU budget consisted ofthe sums of all tasks (not reported here) for the different CPUs.

6 Restrictions of the proposal

When using the database some reservations must be made. They arise fromrestrictions of the measurements, the representation of hardware and softwarearchitecture and finally by methodologies used for obtaining performance es-timations.

114 V.

Table 2: Data in Mcps for three tasks in the videoconferencing CPU budget.

Task Current UpgradedCPU CPU2 CPU1’ CPU2’

PHY 0 0.8 0.9 0.2BSS 0 1.8 2.0 0.4DEC 2.5 0 0 10.5

6.1 Non independent use cases

Typically, the use cases are non-independent in terms of software that is ex-ecuted. Two different use cases may share the same software resources. Thisis not considered in the currently implemented model, which simply adds upprocessor cycles for two separately conducted measurements on two use cases.This can lead to that data from OS processes are being included twice in theresulting performance estimations. The resulting figures are thus an overesti-mation of the required processor cycles.

One way to handle this problem is to detect the common resources and toconsider this when calculating performance estimations. Another way to at-tack the problem is to assume (or better investigate) that shared service timesare negligible in relation to overall service times and explicitly state the result-ing estimations as upper bound estimations.

6.2 Hardware transformations

Although the performance estimation mechanisms in terms if hardware trans-formers implemented in the database are plausible and straightforward, thelack of validation implies that performance estimations have to be seen asrough estimates rather than facts.

7 User Experiences

This chapter reports these collected experiences from all these activities to-gether with the authors’ own experiences of using the database tool. Theycover different aspects in line with the research questions, as having a cen-tralised repository for storage of measurement data, retrieval and visualisationof data, as well as performing analyses and estimations.

U E 115

7.1 Central repository

It is evident that different users can use software performance data in manydifferent ways and therefore might have different expectations on the data-base. Nevertheless, all users were positive of using a central repository. Thisseems obvious at first glance but there is to remember that using the databaseinitially imposes an additional effort and new routines to the users. The ini-tial effort is in comparison of using an own local internal representation andstorage. When asked a direct question about the benefits of the database, usersbelieve that the benefits gained are greater than the extra effort required by thedatabase. So, when the total experiences from the users point to the fact thatthat all are willing to put up with the extra time and effort when using thedatabase, this is considered a very positive sign to the usefulness of the centralrepository structure of the database.

7.2 Storage of measurement data

The main sources of measurement data in an embedded environment areanalysis of trace-files for RTOS processes. Other measurements done in thedevelopment are for example the processor cycle counts for algorithms ina digital signal processor (DSP). Both these types of data did fit very wellinto the structure and usages of the proposed database. The flexibility andrather small scale approach has allowed to quickly implement new featuresand analysis methods, as well as to change the database organisation to sup-port the demand of the its potential users easily. In this way it is believedthat is has been possible to some extent tailor the database to an embeddedplatform development organisation such as EMP. During the development ofthe database, additional suggestions of requirements have been collected. Twoexamples are the following:

• To present the optimisation grade of the software process or algorithm.The optimisation grade would be used to know if there are possibil-ities of reducing the CPU load or achieve a faster response times forapplications, by optimising single software tasks.

• The hierarchical decomposition of the use cases has been perceived asuseful and natural to use. However, there have been requests of othergrouping, more related to the structure of software development organ-isation.

116 V.

7.3 Retrieval and visualisation of information

The user interface provided was appreciated the users. The appreciation wasmuch based on the hierarchal structure implemented, which facilitated thequick access to the data on the level most relevant to the respective users.It did support all interested parties of software performance in exchanging,collecting and estimating software performance. The support was present onboth the task view; use case view and scenario view. Support was also given tofeatures like call dependency graphs and statistical data analyses.

7.4 Software performance estimations

The database has been used as the main source for retrieving data for differ-ent estimations models during a project. The decision of using a hardwareabstraction layer has been experienced as very positive. Even for experts insoftware/hardware co-design and co-simulations it can be stated that it takes anon-negligible effort for producing the estimations needed. This is the ratio-nale behind the proposal of the implemented and used hardware abstractionlayer. The implementation of the hardware abstraction layers by using simplemultiplication factors to separate the execution time between different hard-ware has been well accepted by the individuals involved in SPE. The accep-tance has not only been from software developers, but also by the experts thathave delivered the factors. The reason is the possibility of continuously dur-ing the development adjusting the multiplication factors, and observing theimpact, the estimated factors have on the software performance (estimations)at system level. The possibility of separating the system software estimationfrom the hardware model estimation is yet another experience that supportsthe proposed database organisation and usage. This is a fact that becomes evenmore obvious when a new platform is to be developed, where performance es-timations are done considering a baseline (i.e. the current platform which canbe executed and profiled). The multiplications factors are then estimated fromthe old platform to the new platform, supporting the system estimation of thenew hardware. A negative experience was that there was not enough supportfor identifying tasks that will not be executed twice when adding togetherseparate or the same scenarios (i.e. the estimation of parallel scenarios). Thisrestriction has been discussed in Chapter 6.1.

C F 117

8 Conclusion and Future work

The first round of interviews showed an inhomogeneous requirement pic-ture reflecting the different focuses of various groups involved in embeddedplatform development. When presenting the idea of the database a lot of ex-pectations were raised from these different groups. The database has met themajority of the expectations and it is therefore valuable as base for the soft-ware performance work which is requested from within EMP. The structureand usages has also facilitated the communication between different partiesinvolved in the software performance discussions. The conclusion is that theproposed performance database is a valuable tool for embedded platform de-velopment. The hierarchically structured database, organised at the highestlevel in terms of scenarios of parallel use cases and the tight integration withestimation models is perceived as an appropriate solution to the requirementsdiscussed in Chapter 3. Moreover, the proposal was considered as beneficialfor most phases of embedded platform development.

In the future more analysis methods will be implemented, such as for ex-ample RMA calculations (Nord & Cheng, 1994) for scheduling purposes,queuing and simulations models for estimation of responsiveness. Further-more, the database should extend the support of parallel scenarios, by forexample extracting the common and shared software processes used by differ-ent scenarios. There might also be a need of more advanced hardware modelsused in the hardware abstractions models if needed, if it is discovered thatthe ones used does not fulfil all the needs of SPE work. Although good ex-periences have been received, it is important to realise that a more extendedvalidation is needed for the parts of the proposal concerned with performanceestimations.

9 Acknowledgment

The work in this paper has partially been financed by the Swedish Agency forInnovation Systems project number P23918-2A. Thanks and appreciation tothe employees from Ericsson Mobile Platforms AB who participated in thestudy.

118 REFERENCES

References

Bechini, A., & Prete, C. A. (2002). Performance-steered design of softwarearchitectures for embedded multicore systems. Software Practice and Expe-rience, 32(12), 1155–1173.


Collins, D., & Smith, C. (2002). 3G wireless networks. McGraw-Hill.

Gelenbe, E. (Ed.). (1999). System performance evaluation: Methodologies andapplications. CRC press.

(2004). (http://www.ericsson.com/mobileplatforms)



Micheli, G. D., & Gupta, R. K. (2002). Hardware/software co-design. Pro-ceedings of the IEEE, 85(3), 349–365.

Nord, R., & Cheng, B. (1994). Using RMA for evaluating design decisions.In Proceedings of 2nd IEEE Workshop on Real-Time Applications (pp. 76–80).



Smith, A. (1994). The need for measured data in computer system perfor-mance analysis or garbage in, garbage out. In Proceedings of 18th ComputerSoftware and Applications Conference (pp. 126–431).


Wolf, W. (1994). Hardware-software codesign of embedded systems. Pro-ceedings of the IEEE, 82(7), 967–989.

VI

PAPER VI

Modelling Choices When Using Trace File Data toParameterize a Software Performance Model

Enrico Johansson, Fredrik Wartenberg, Martin Höst and Christian Nyberg

Submitted to Information and Software Technology (IST), Elsevier Journals, April 2005

Abstract

When developing products based on a software platform, a performance modelis useful for predicting the performance of a usually large number of productversions that can be developed. It is not realistic to prototype them all inorder to assess the performance of each particular version. A case study is car-ried out to compare the response times obtained from a performance model toresponse times obtained from a commercial software product. The responsetimes and the data to parameterize the performance model are derived fromtrace files. To acquire a high accuracy, the recording of trace files has beencarried out using a specialised integrated circuit. A simulation experiment hasbeen designed and conducted to assess the estimation error of difference mod-elling choices for execution times and interarrival times. The choices regardwhether to use an exponential distribution or samples drawn from trace files.The modelling choice that gives the lowest estimation error is when the ex-ecution times are modelled with samples drawn from trace files. This resultis valid irrespectively of whether the interarrival times are modelled with anexponential distribution or samples drawn from trace files. An implication ofthis result could be that it is more important that the execution environmentis parameterized with real values than the model of the interarrival times. Theresult can be used to make strategic decisions, about how to model a soft-ware platforms or a product based on the platforms, in the early phases of thedevelopment.

120 VI.

1 Introduction

Software performance is critical in a wide variety of products, from web sitesand network nodes to handheld devices and transaction systems. Predictingperformance deficits early in the development will often result in reducingthe cost of activities such as hardware redesign, software architecture redesign,and manual software optimisation.

Many of these products can be developed using product platform devel-opment. A product platform is per definition the basis of a variety of productversions, where each of them is a variant of the platform. Each product versionshould be constructed with a low effort, compared to developing the completeplatform (Meyer & Lehnerd, 1997).

When the common basis of a product platform consists of software, itforms a software platform (Clements & Northrop, 2001). Software platformsare, in the same ways as product platforms, used in software development forreleasing different versions of a product, which are part of the same productfamily and software platform.

When developing products based on a software platform, a performancemodel is useful for predicting the performance of the usually large number ofproduct versions that will be developed. Different product versions and prod-uct families are built by adding and removing functionality from a platform.One important aspect when deciding what functionality should be added orremoved is to early understand how these changes affect the performance ofthe final product. The number of versions is normally large and it is not real-istic to prototype them all in order assess the performance of each particularversion. Performance models are therefore useful for predicting the perfor-mance of different versions of a product.

Different modelling techniques can be used to predict software perfor-mance during the early development phases of a software product (Woodsideet al., 2001; Cortellesa & Mirandola, 2000). In these phases the software hasnot yet been deployed or implemented in the product. Critical performanceparameters that can be predicted using models include response time, latency,throughput and workload. These parameters help to detect performance bot-tlenecks, give guidelines for functional partitioning, and help programmers toselect the best alternatives of proposed designs for new product versions.

Since models are by nature simplifications, the predictions obtained fromthe models might not completely reflect the reality for all cases and for allboundaries of the parameters. One way to increase the reliability of the modelsis to include a large number of details describing the behaviour of the real

I 121

world. A more complex model tends to give more accurate results, since moreinformation about the final product is taken into consideration in the model.The downside of such an approach is that a detailed model would requiremore time and effort to develop, to maintain and to use compared to a simplermodel (Eklundh, 1982; Law & Kelton, 1997). Furthermore, the effort tosolve and validate a detailed model also tends to be more complex than theeffort required for a simple model.

Achieving the wanted reliability and accuracy of models by increasing thelevel of detail in the model is thus not an attractive solution because of thelevel of effort required by such an approach. In addition, the details that areavailable about the final product are limited in the early phases. A reason-able accuracy and reliability must therefore be achieved without increasingthe complexity of the model (and the effort of building it). A simple modelwith few details is thus attractive in order to reduce the effort. Instead, oneoption could be to increase the accuracy of the input data to the model.

A queuing model with a queue/server pair (King, 1990; Kleinrock, 1975)is attractive to use as a performance model in software platform developmentbecause of its simplicity. Queuing models are widely used for performancemodelling. They are used to describe and analyse the contention when mul-tiple job requests arrive and need to share restricted resources. A system de-scribed with a queuing model may be investigated by analytical, numerical orsimulation techniques.

The research performed in this study investigates the impact that differ-ent modelling choices have on the resulting performance predictions. It isassumed that a simple queuing model as defined by a single queue/server pairmay be as effective more detailed models under certain conditions. Modellingchoices regard customer arrival rates to the queue (King, 1990; Kleinrock,1975), and the average service time at the server (King, 1990; Kleinrock,1975). The result can be used to make strategic decisions about how to modela software platform or product based on the platforms in the early phases ofthe development. Although similar studies have been carried out in for exam-ple (Rose, 1976; Lindzey & Browne, 1979; Turner, 1979; Calzarossa et al.,1995; Hrischuk et al., 1999; da Silva et al., 2000; Kim et al., 2003), nonehave had same objective as the research presented in this paper.

In Section 2 the relevance of a simple performance models in softwareplatform development is discussed. In Section 3 the performance model andthe collection of input data via trace files are presented. In Section 4 the re-search design is presented. The result of the research is presented in Section 5,

122 VI.

Fk,1

Platform Release k

Fk,n...Versions 1...n of product family Fk

Platform Evolution

Fk+1,1

Platform Release k+1

Fk+1,m...Versions 1...m of product family Fk+1

Figure 1: An illustration of two product families Fk and Fk+1 that are basedon two consecutive releases of a software platform (i.e. Release k andRelease k+1.)

and in Section 6 the conclusions are discussed.

2 Platform development

The more similarity there is between two products, the more knowledge fromthe previous product can be reused. This is an option in product platformdevelopment, which consists of either upgrading a product platform or bybuilding products as variants of the product platform.

Software platforms are used in product development for releasing differentversions of a product, which are part of the same product family and platform.That is, one product family can contain versions of the product, which dif-fer in functionality and complexity, but still be a part of the same platform(Figure 1).

The advantage of this is that a large portion of the product can be reusedin one product family (Parnas, 1976; Weiss & Lai, 1999). To achieve soft-ware reuse and shortening the time between releases of consecutive productfamilies, different product families could share a common software platform.This type of processes is often included in a product line architecture (Jazay-eri et al., 2000; Bosch, 2000; Clements & Northrop, 2001; Bass et al., 2003),which is the basis for a number of versions of a product.

Normally the platform is more expensive to build than the product ver-sions. There is therefore an economic incentive to build a large number ofproducts on the same platform. To decide which product designs that matchthe performance objectives an even larger number of product designs must be

P 123

assessed.For a number of reasons the platform must be enhanced and changed

during its lifetime. New requirements are constantly identified and functionsimplemented in one product project should often be part of future products.This means that the software platform should be constantly enhanced andchanged based on the requirements from the product teams. New platformare therefore developed to handle the new requirements. Even in this case anumber of design possibilities should be assessed to see if the performanceobjective of the platform is met. Missing the performance objective of theplatform would make the reuse between two product families and within theproduct family very difficult to achieve.

Building performance models for each of the products families Fk andFk+1 and the respective product versions is not feasible. The reason is the rel-ative large cost of parameterizing and verifying the model. Also, the incentiveto build as many product version and product families as possible, results in alarge number of performance models to parameterize and validate. The largenumber of products impacts also the total execution time, for solving the per-formance models, necessary for optimisations purposes. This is especially trueif the models are complex and not solvable by analytical methods.

Knowledge of the software performance is a typical knowledge that canbe reused. The reuse can be done by reusing the performance models itself interms of hardware models, software models and user models. Other parts thatare preferably reused, if possible, are the parameterizations and estimationsof different parts of the models. In this paper the contribution to the reuseproblem in platform development is to evaluate a simple performance modeland its parameterizations.

3 Performance model

A queuing model consisting of a single queue and single server (Figure 2) isused to model the software performance of the embedded platform. The par-ticular choice of model is based on its low complexity and its realistic mappingto the execution environment. The execution environment consists mainly ofa real time operating system (RTOS) running on a single central processingunit (CPU). There exist a number of other software performance models thatcould have been used (King, 1990; Rolia & Sevcik, 1995; Balsamo et al.,1998; Smith & Williams, 2002). However, when wanting to use a simplemodel the choices are narrowed down and the chosen model is regarded as

124 VI.

Queue CPU

Process execution interrupted by arriving processwith higher priority and therefore queued

Processes arrivingto executionenvironment

Processfinishedexecution

Prioritypreemption isused to scheduleprocess from thequeue to the CPU

Figure 2: An illustration of queueing model with a single queue and a singleserver.

a sufficient trade-off between simplicity and complexity. The low complexityin the model is mainly achieved by the simplification concerning the descrip-tion of the run-time behavior of the software. Information about the callingrelations between the processes, normally referred to as inter-process commu-nication, is not present in the chosen model. This is a kind of informationthat, for example, can be represented by using sequence charts (Rumbaugh etal., 1998) or dynamic dependency graphs.

Since the inter-process communications are not modelled, there is a needfor an alternative description of the transition between the different processstates. Typical process states are (Silberschatz et al., 2001):

New A new process that was just created.

Running The process instructions are being executed

Waiting Process is waiting for some event to occur.

Ready Process is waiting to be assigned to execution.

Terminated A process has finished execution.

In the model that was used in this research, the process calls are consideredindependent of each other. The transition from Waiting to Ready is describedas processes arriving to the queue. When arriving to an empty queue the tran-sition is done directly to the state Running. No instantiation or termination ismodelled, therefore the process states New and Terminated are not used. Thetransitions and the states are described in Figure 3. The transitions betweenthe different states are done with specific intensities and distributions.

The scheduling in the performance model used follows the priority pre-emption principle. The queue receives and stores the job arrivals and the server

P 125

Ready

RunningWaiting

Stimulus arrive Schedule to run

Task completed

Preemption

Figure 3: An state diagram for the process states and transitions used in the per-formance model.

node processes the job arrivals. If the server is busy at the arrival of a job, thepriority of the job in the server is compared to priority of the arrived job. Ifthe job priority of the arriving job is higher (having a lower index), the job inthe server is interrupted, the remaining execution time is calculated and thejob is put into the queue and the job with higher priority is executed. When ajob finishes, the job with the lowest priority is fetched from the queue. If thequeue contains a job with the same priority, the first arrived of these jobs arefirst served. It can also occur that both the server and the queue are empty atthe arrival of a job. In this case, the job is executed in the server at once.

For each process, three values must be parameterized: job priority, inter-arrival times and execution times. The job priority is determined from theconfiguration files of the modelled system and is not changed during the ex-ecution. The interarrival times are the difference between two consecutivearrivals and the execution time is the time spent in the server. The interar-rival times and execution times are chosen from two different distributions;either exponential (King, 1990; Kleinrock, 1975) or randomly drawn (withequal probability) samples drawn from trace files. An exponential distributionis characterised entirely by its mean. When the interarrival times are expo-nentially distributed, the number of arrivals over a given interval is an arrivalprocess defined as the Poisson process. The Poisson process is often used tomodel computer systems, where the exponential distribution is used to modelthe execution times as well.

The duration of time which each of the processes spends in the queue andin the server can be derived by solving the model, with either simulations (Law& Kelton, 1997) or analytical (Kleinrock, 1975; King, 1990) methods. Both

126 VI.

simulation and analytical methods can be used to derive the time durationswhen using the exponential distributions. For the real distribution, however,only simulations can be used to solve the model. It is normally very difficultto solve a model using analytical solution methods, especially if the dynamicbehaviour over time is needed. In this paper, discrete-event simulations (Law& Kelton, 1997; Banks et al., 2004) are used to solve the models. Analyticalmethods have only been used to verify the model.

3.1 Trace files

Recording trace files is a common way to trace how a software application isexecuted on a CPU. Instructions and data access executed by the processorare captured during the use of the application. The captured events are loggedover a time-period and saved into a file (trace file) for later analysis.

In order to facilitate real-time recording of trace files many manufactureshave added specialised hardware in their CPU. The hardware captures theinstructions and data accesses executed by the CPU, in real-time, and storesthem in a buffer. By using this hardware, it is assured that the needed infor-mation is captured without altering the behaviour of the CPU or interferingwith it. In fact, real-time tracing with hardware support is one of the methods,which gives the best timing information when recording trace files.

The embedded platform used in the study includes an embedded tracemacrocell (ETM) (Furber, 2000) from the CPU manufacturer ARM to sup-port real-time tracing. The ETM also includes features to allow triggers to beset, and to allow the trace to be filtered as it is captured. The triggers are setto start recording traces when specific conditions in the instruction and databuffers are met. The filtering and triggering facilities of the ETM has beenused to capture events when they are sent to a process and when the processstarts to execute. This has been possible by knowing the specific addresses,data and instructions the RTOS uses for inter-process communication andprocess swapping. By taking into consideration the scheduling principle usedand the priorities of the processes the interarrival, waiting and execution timescan be derived.

The next step is concerned with how to decide what processes should beincluded in the performance model. There is reason for concern when fewtraces are collected for the processes, and thus leaving only few samples tocharacterise the behaviour of the process. The problem is to decide the min-imum number of samples needed without removing processes important forthe applications running. This predicament has been solved in by gathering

R M 127

subjective opinions from the experts themselves. Based on the following threeparameters a list has been made with potential candidates to be excluded fromthe model.

• Low interarrival intensity.

• Low execution times.

• Few traces and therefore a low number of samples for parameterizingthe process in the performance model.

Based on this list and knowledge of the application the experts on the systemdid unanimously agree on what should be removed and what should not beremoved.

4 Research Method

4.1 Case study

A case study is performed to compare the result obtained from the perfor-mance model to measurements performed on a commercial software product.The response times for the software product are derived from trace-files inorder to get as high accuracy as possible. The trace-files are recorded from amultimedia application running on an embedded platform.

The domain used in the case study can be described with a software appli-cations running on an embedded platform. The platform used in the experi-ment is available for commercial use for telecommunication and multimediaapplications. Both software and hardware is included in the platform used inthe experiment. An embedded platform (Labrosse, 2002) is a specific type ofproduct platform where a computer is built into the product platform and isnot seen by the user as being a computer. Most real-time systems (Burns &Wellings, 2001) are embedded products.

Due to a non-disclosure agreement with the company that develops theembedded platforms, no details about the software applications used in theexperiment are disclosed. However, the applications used can be described asstimuli and response-based software (Sommerville, 2004) running on a Real-Time Operating System (RTOS). The RTOS uses a preemptive scheduling.

The software includes codecs (coders/decoders), hardware drivers, soft-ware for user interface, etc. Typical software applications for the platform canbe multimedia or telecommunication applications. Both types of applications

128 VI.

Table 1: The three research questions pursued in the experiment.

Research question

1 Which one of the distribution choices, exponential or samples drawnfrom trace files, models execution times best?

2 Which one of the distribution choices, exponential or samples drawnfrom trace files, models interarrival times best?

3 Which of the parts of the performance model regarding interarrival timesor executions times has the major impact on the result?

are software intense applications containing software for running different al-gorithms as well software for managing the internal resources (both hardwareand software). The different applications can be combined in different sce-narios and executed in parallel or as stand-alone applications. In the case usedthere are a number of different applications running in parallel.

4.2 Experimental design

A simulation experiment is designed and conducted with data from a com-mercial product to assess accuracy of four different performance models. Ex-perimental designs are developed to answer hypotheses, or testable statements,formulated to answer specific questions. Therefore defining a clear objectiveof what to be examined is the first step in preparing an experimental design.

The general research objective is therefore broken down to three researchquestions. Two of them deal with the design choices regarding the interarrivaltimes and execution times, respectively. The third deals with the contributionof each of these two parameters on the estimation error of the model. Thethree research question are formulated as described in Table 1.

To find answers to the research questions, the data are analysed accordingto the 2x2 experimental design as described in Figure 4.

The experiment contains two independent variables, each with two differ-ent value levels. The independent variables are the interarrival time distribu-tion and execution time distribution of the model. The dependent variable isthe error of the model compared the real system. The used distributions in themodel are either exponential distribution or samples drawn from trace files.During the experiment the interarrival time distribution and execution timedistribution are manipulated, while all other factors remained constant across

R M 129

2x2 EXPERIMENT ON DIFFERENT MODELING CHOICES

EXECUTION DISTRIBUTION

INTERARRIVAL DISTRIBUTION

ESTIMATION ERRORINDEPENDENT

VARIABLES

DEPENDENTVARIABLE

Figure 4: Design of experiment.

Table 2: The different simulation runs executed.

Simulation Distribution of Distribution ofrun interarrival time execution times

A Exponential Exponential

B Exponential Samples from trace files

C Samples from trace files Exponential

D Samples from trace files Samples from trace files

cells. All four possible combinations of the experiment will be investigated.The four different experimental runs are listed in Table 2 with the differentlevels of independent variables.

Estimation error

The best combination of design choices for the simulation model is the onethat results in an output closest to the real measurements. The difference be-tween the output from the simulation model and the real measurements isin the experiment defined as the simulation error (i.e. the smaller simulationerror the better is the simulation model).

The simulation error (E) is therefore used as the dependent variable in theexperiment, and is calculated as the relative error as shown in (Abramowitz &Stegun, 1977). Different metrics (for example the metrics presented in Höst& Johansson (2005)) where considered for calculating the simulation error.The relative error was however perceived as a more valuable metric by the

130 VI.

experts on the system and was therefore chosen to define the simulation error.Using the variable R (Real value) to depict the response time derived from

the trace files and the variable S (Simulated value) as the response times forthe processes derived form the simulations performed in each experimentalrun (simulation A, ..., simulation D).

E =R− S

S(1)

Equation (1) satisfies two main objectives. The first objective is to accountfor the importance of the real value of the error when dealing with real-timeapplication; this is achieved by the numerator. The second objective is thenormalisation of the simulation error, in order to make the value compara-ble between the different treatments. This is achieved by a division with themeasured value.

For each of the different simulations runs (e.g. simulation run A, ..., sim-ulation run D) a number of samples of the simulated response time are col-lected and the estimation error is calculated. The variable S as described inequation (1) is thus in reality an indexed variable Snps. The first index n de-picts the sample number, the second index p depicts the process id and thethird index depict the simulation run. The variable R is also an indexed vari-able Rnp. The first index n depicts the sample number, the second index p

depicts the process id. The mean value for each process id is then depicted asRp. The variable E is indexed by sample number, process id and experimentalrun, it can thus be rewritten as Enps. Equation (1) can now be rewritten withthe variables Snps, Rp and Enps.

Enps =Rp − Snps

Snps(2)

Validity

Extraneous variables (external to the experiment) are variables that may in-fluence or affect the results of the treatment on the subject. Variables relatedto the software application (type if applications run the number of processes,the internal characteristics of the process, etc) are examples of extraneous vari-ables. This threat is minimised by including 26 different processes in the sim-ulation.

A special consideration must be taken about the construct validity forthe measurement results from the simulations. The construct validity is con-cerned with whether the measurements represent the constructs of interest in

R 131

the study. One threat to the construct validity is if the simulation programis a correct representation the performance model. To minimise this threat,the simulation program has been thoroughly verified against analytical calcu-lations. The verification has been performed by comparing analytical derivedwaiting times for each job class to waiting times derived from simulations.The waiting times are calculated and simulated for an M/M/1 queuing sys-tem with different job classes. Each job class is defined by an execution time,interarrival time and a priority. In addition, the result from the simulationprogram has been compared to other simulation programs.

Another threat to the construct validity can be if the correct response-times, execution times and interarrival times are correctly derived from thetrace files. The derivation of execution times, interarrival times and responsetimes from tracefiles is however straightforward and has been thoroughly in-vestigated by two researches. The threat against the construct validity fromthis aspect of the study is therefore considered minimal.

The method used to derive the measurements has been thoroughly re-viewed by two researchers. In addition, the measurements gathered from tracefiles have been compared to measurements gathered using software timers (Lilja,2003).

5 Results

Table 3 presents the result of the simulation error for each of the investigatedmodelling choices (simulation run A,...,simulation run D). The closer to zerothe median of the error is, the more accurate this modelling choice is believedto be.

A statistical analysis is carried out to find out if there is a significant dif-ference in the distributions of simulation error between the different experi-mental runs. The simulation error from the simulation runs does not belongto the normal distribution. Therefore, the medians are compared instead ofthe mean value. The investigation of whether the data belongs to a normaldistribution or not is carried out using probability plots (Montgomery, 2000)and the Jarque Bera test (Judge et al., 1988).

In many cases, the data sample can be transformed so that it is approx-imately normal. For example, square roots, logarithms, and reciprocals of-ten take a positively skewed distribution and convert it to something closeto normal distribution. Neither one of these transformations were success-ful in transformation the experimental data into a normal distribution. The

132 VI.

Table 3: Sample length and median of the simulation error for each experimen-tal run. For clarity reason the corresponding distribution for the inter-arrival time (Arrival) and execution time (Execution) are included inthe table.

Run Arrival Execution Sample length Median

A Exponential Exponential 236305 -0.0536

B Exponential Samples from trace files 236217 0.0297

C Samples from trace files Exponential 236870 0.0741

D Samples from trace files Samples from trace files 236119 0.0207

Wilcoxon rank-sum test (Wilcoxon, 1945; Siegel & Castellan, 1987; Mont-gomery, 2000) is therefore used determine if there are significant differencesbetween the medians of the simulation error between the different experimen-tal runs.

The Wilcoxon test is valid for data from any distribution, whether normalor not. However, the Wilcoxon rank-sum test does assume that the shape ofthe distribution is similar in the two samples tested. This assumption holds aparticular relevance if the test is to be used as evidence that the median is sig-nificantly different between the samples. The distributions of the estimationerrors from the simulation runs did all have a similar shape.

The Wilcoxon rank-sum test returns a p value, which determines the sig-nificance for testing the null hypothesis that the populations generating twoindependent samples are from the same distributions. In this paper the nullhypotheses is rejected if the significance level is less than p < 1e-5.

All the experimental runs are pair-wise compared into six different hy-potheses. One hypothesis is made for each pair of the experimental runs. Thehypotheses are presented in Table 4 and the results of the statistical test foreach hypothesis are presented in Table 5.

C 133

Table 4: The hypotheses tested with Wilcoxon rank-sum test

Hypotheses Description

H0AB The population of the relative error generated from experimental run A andrun B are derived from the same distribution.

H0AC The population of the relative error generated from experimental run A andrun C are derived from the same distribution.

H0AD The population of the relative error generated from experimental run A andrun D are derived from the same distribution.

H0BC The population of the relative error generated from experimental run B andrun C are derived from the same distribution.

H0BD The population of the relative error generated from experimental run B andrun D are derived from the same distribution.

H0CD The population of the relative error generated from experimental run C andrun D are derived from the same distribution.

6 Conclusions

6.1 Research question 1

The rejection of hypotheses H0AB and H0CD shows that there is significantdifference in the simulation error, between real and modelled execution times.Both simulation run B and simulation run D, where the execution times aremodelled by samples drawn from trace files show a lower simulation errorcompared to the simulation runs A and C. As an answer to research question1, it can therefore be stated that a distribution with samples drawn from tracefiles is the preferred choice for modelling the execution times. The choice isbetter irrespectively of whether the interarrival times are modelled with anexponential distribution or with samples drawn from trace files.


There is no significant difference in the simulation error regarding the distrib-ution of the interarrival times. Although the difference is significant when theexponential distribution is used for modelling the execution times (Hypothe-ses H0AC), there is no significant difference when samples drawn from trace

134 VI.

Table 5: The result of Wilcoxon rank-sum test (calculated in Matlab Rel.13).

Hypothesis Probability Result

H0AB p < 1e-5 The hypothesis is rejected.The median of the simulation error in run A and B are sig-nificantly different.

H0AC p < 1e-5 The hypothesis is rejected.The median of the simulation error in run A and C are sig-nificantly different.

H0AD p < 1e-5 The hypothesis is rejected.The median of the simulation error in run A and D are sig-nificantly different.

H0BC p < 1e-5 The hypothesis is rejected.The median of the simulation error in run B and C are sig-nificantly different.

H0BD p = 0.87 The hypothesis is not rejected.The median of the simulation error in run B and D are notsignificantly different.

H0CD p < 1e-5 The hypothesis is not rejected.The median of the simulation error in run C and D are sig-nificantly different.

files are used for modelling the execution times (Hypotheses H0BD). The an-swer to research question 2 is therefore that it cannot be stated which of thetwo distributions is the better modelling choice.

The significant difference when using execution times modelled as sam-ples drawn from tracefiles, can however give guidance to which distributionto choose for the interarrival times. The choice of arrival model is dependenton whether overestimation (positive error), underestimation (negative error)or if the absolute value of the error is the preferred choice. The median of therelative error of the performance model with both exponentially distributedarrival and execution times is negative. This means that such models produceresponse times that are smaller than the real values (i.e. underestimating thevalue of the response time). On the other hand, the median of the relativeerror when interarrival times are modelled with samples drawn from tracefilesand the execution times modelled with an exponential distribution is positive.

C 135

This means that this latter model produces response times that are greater thanthe real values (i.e. overestimating the value of the response time).


The modelling choice that gives the lowest estimation error is when the ex-ecution times are modelled with samples drawn from trace files. This is trueirrespectively of whether the interarrival times are modelled with an exponen-tial distribution or with samples drawn from trace files. An implication of thisresult is that the execution environment is more important to model with realvalues than the interarrival times.

6.4 General comments

In general, the results indicate that the performance model presented couldbe used as a reusable asset in software platform development. The model to-gether with its parameterizations can be reused to explore the response timesof different design alternatives. The performance model could be used oncetrace files have been collected for the current platform and a product version.In addition, simpler methods for collecting execution times and interarrivaltimes could be an alternative when tracefiles are not available. This is indi-cated by the reasonably good results achieved without using samples drawnfrom tracefiles. Simpler methods might for example be the use of softwaretimers (Lilja, 2003).

It can also be pointed out that the worst result from performance modelyields an estimation error that is approximately 7,5%. This result is archivedusing an exponential distribution for both execution times and interarrivaltimes. The rather low estimation indicated that the model could be useful aslong as the mean is given. An application of the result could be that the morechoices that are available (i.e. more freedom) when modelling the interarrivaltimes, the easier the modelling is.

6.5 Future work

More modelling choices could be investigated. It could be interesting to in-vestigate how other distributions, beyond the once already studied, impactthe simulation error. In addition, the impact of more design choices regardingthe arrival and execution times could be the subject of a similar experimentalset-up as the one presented in this paper.

136 REFERENCES

How the modelling of priority preemption scheduling impact the simula-tion error could also be studied. For example, further studies could investigateif there is any other factors regarding the scheduling that could be accountedfor and included in the model at reasonable low cost. Other aspects of theperformance model, even though performing reasonably well, could also beimproved. For example, two mechanisms that could be modelled are con-text switch and process communication. When the CPU switches to anotherprocess, the activity is known as a context switch. Process communication isusually provided in real time operating systems allowing processes to commu-nicate with each other.

7 Acknowledgment

The work in this paper has partially been financed by the Swedish Agency forInnovation Systems project number P23918-2A.

References

Abramowitz, M., & Stegun, I. (Eds.). (1977). Handbook of mathematicalfunctions with formulas, graphs, and mathematical tables. Dover Publications.

Balsamo, S., Inverardi, P., & Mangano, C. (1998). An approach to perfor-mance evaluation of software architectures. In Proceedings of 1st Interna-tional Workshop on Software and Performance (pp. 178–190).

Banks, J., Carson, J., Nelson, B. L., & Nicol, D. (2004). Discrete-event systemsimulation (4 ed.). Prentice Hall.

Bass, L., Clements, P. C., & Kazman, R. (2003). Software architecture inpractice. Addison-Wesley.



Calzarossa, M., Massari, L., Merio, A., Pantano, M., & Tessera, D. (1995).Medea: a tool for workload characterization of parallel systems. IEEE Par-allel and Distributed Technology: Systems and Applications, 3(4), 72–80.

REFERENCES 137


Cortellesa, V., & Mirandola, R. (2000). Deriving a queueing network basedperformance model from UML diagrams. In Proceedings of 2nd Interna-tional Workshop on Software and Performance (pp. 58–70).

da Silva, P. P., Laender, A. H. F., Resende, R. S. F., & Golgher, P. B. (2000).Characterizing a synthetic workload for performance evaluation during themigration of a legacy system. In Proceedings of 4th European Software Main-tenance and Reengineering, 2000 (pp. 173–181).

Eklundh, B. (1982). Simulation of queueing structures - a feasibility study. PhDdissertation, Department of Telecommunication Systems, Lund Institute ofTechnology, Lund, Sweden.

Furber, S. (2000). ARM system-on-chip architecture (2 ed.). Addison-Wesley.

Höst, M., & Johansson, E. (2005). Performance prediction based on knowl-edge of prior product versions. In Proceedings of 9th European Conference onSoftware Maintenance and Reengineering (pp. 12–20).

Hrischuk, C. E., Woodside, C. M., Rolia, J. A., & Iversen, R. (1999). Trace-based load characterization for generating performance software models.IEEE Transactions on Software Engineering, 25(1), 122–135.


Judge, G., Hill, R., Griffiths, W. E., Lutkepohl, H., & Lee, T. (1988). Intro-duction to the theory and practice of econometrics. Wiley.

Kim, S., Im, C., & Ha, S. (2003). Schedule-aware performance estimationof communication architecture for efficient design space exploration. InProceedings of 1st IInternational Conference on Hardware/Software Codesignand System Synthesis (pp. 195–200).


Kleinrock, L. (1975). Queueing systems, volume i: Theory. Wiley Interscience.

Labrosse, J. (2002). MicroC/OS-II: The Real-Time Kernel (2 ed.). CMPBooks.

138 REFERENCES

Law, A. M., & Kelton, W. D. (1997). Simulation modeling and analysis.McGraw-Hill.

Lilja, D. J. (2003). Measuring computer performance: A practitioner’s guide.New York, NY: Cambridge University Press.

Lindzey, G. E., & Browne, J. C. (1979). Response analysis of a multi-functionsystem. In Proceedings of 11th international conference on simulation, mea-surement and modeling of computer systems (pp. 19–26).


Montgomery, D. (2000). Design and analysis of experiments. Wiley.



Rose, C. A. (1976). Validation of a queueing model with classes of customers.In Proceedings of Conference on Computer performance modeling measurementand evaluation (pp. 318–326).


Siegel, S., & Castellan, N. (1987). Nonparametric statistics. McGraw Hill.

Silberschatz, A., Galvin, P. B., & Gagne, G. (2001). Operating system concepts.John Wiley and Sons.


Sommerville, I. (2004). Software engineering (7 ed.). Addison-Wesley.

Turner, R. (1979). An investigation of several mathematical models of queue-ing systems. 8(1–2), 37–44.

Weiss, D., & Lai, C. (1999). Software product-line engineering: A family-basedsoftware development process. Addison-Wesley.

REFERENCES 139

Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biomet-rics(1), 80–83.

Woodside, C. M., Hrischuk, C. E., Selic, B., & Bayarov, S. (2001). Auto-mated performance modeling of software generated by a design environ-ment. Performance Evaluation, 45(2-3), 107–123.

Reports on Communication Systems

101. On Overload Control of SPC-systemsUlf Körner, Bengt Wallström, and Christian Nyberg, 1989.

102. Two Short Papers on Overload Control of Switching NodesChristian Nyberg, Ulf Körner, and Bengt Wallström, 1990.

103. Priorities in Circuit Switched NetworksÅke Arvidsson, Ph.D. thesis, 1990.

104. Estimations of Software Fault Content for Telecommunication SystemsBo Lennselius, Lic. thesis, 1990.

105. Reusability of Software in Telecommunication SystemsAnders Sixtensson, Lic. thesis, 1990.

106. Software Reliability and Performance Modelling for Telecommunication SystemsClaes Wohlin, Ph.D. thesis, 1991.

107. Service Protection and Overflow in Circuit Switched NetworksLars Reneby, Ph.D. thesis, 1991.

108. Queueing Models of the Window Flow Control MechanismLars Falk, Lic. thesis, 1991.

109. On Efficiency and Optimality in Overload Control of SPC SystemsTobias Rydén, Lic. thesis, 1991.

110. Enhancements of Communication ResourcesJohan M Karlsson, Ph.D. thesis, 1992.

111. On Overload Control in Telecommunication SystemsChristian Nyberg, Ph.D. thesis, 1992.

112. Black Box Specification Language for Software SystemsHenrik Cosmo, Lic. thesis, 1994.

113. Queueing Models of Window Flow Control and DQDB AnalysisLars Falk, Ph.D. thesis, 1995.

114. End to End Transport Protocols over ATMThomas Holmström, Lic. thesis, 1995.

115. An Efficient Analysis of Service Interactions in TelecommunicationsKristoffer Kimbler, Lic. thesis, 1995.

116. Usage Specifications for Certification of Software ReliabilityPer Runeson, Lic. thesis, May 1996.

117. Achieving an Early Software Reliability EstimateAnders Wesslén, Lic. thesis, May 1996.

118. On Overload Control in Intelligent NetworksMaria Kihl, Lic. thesis, June 1996.

119. Overload Control in Distributed-Memory SystemsUlf Ahlfors, Lic. thesis, June 1996.

120. Hierarchical Use Case Modelling for Requirements EngineeringBjörn Regnell, Lic. thesis, September 1996.

121. Performance Analysis and Optimization via SimulationAnders Svensson, Ph.D. thesis, September 1996.

122. On Network Oriented Overload Control in Intelligent NetworksLars Angelin, Lic. thesis, October 1996.

123. Network Oriented Load Control in Intelligent Networks Based on Optimal DecisionsStefan Pettersson, Lic. thesis, October 1996.

124. Impact Analysis in Software Process ImprovementMartin Höst, Lic. thesis, December 1996.

125. Towards Local Certifiability in Software DesignPeter Molin, Lic. thesis, February 1997.

126. Models for Estimation of Software Faults and Failures in Inspection and TestPer Runeson, Ph.D. thesis, January 1998.

127. Reactive Congestion Control in ATM NetworksPer Johansson, Lic. thesis, January 1998.

128. Switch Performance and Mobility Aspects in ATM NetworksDaniel Søbirk, Lic. thesis, June 1998.

129. VPC Management in ATM NetworksSven-Olof Larsson, Lic. thesis, June 1998.

130. On TCP/IP Traffic ModelingPär Karlsson, Lic. thesis, February 1999.

131. Overload Control Strategies for Distributed Communication NetworksMaria Kihl, Ph.D. thesis, March 1999.

132. Requirements Engineering with Use Cases - a Basis for Software DevelopmentBjörn Regnell, Ph.D. thesis, April 1999.

133. Utilisation of Historical Data for Controlling and Improving Software DevelopmentMagnus C. Ohlsson, Lic. thesis, May 1999.

134. Early Evaluation of Software Process Change ProposalsMartin Höst, Ph.D. thesis, June 1999.

135. Improving Software Quality through Understanding and Early EstimationsAnders Wesslén, Ph.D. thesis, June 1999.

136. Performance Analysis of BluetoothNiklas Johansson, Lic. thesis, March 2000.

137. Controlling Software Quality through Inspections and Fault Content EstimationsThomas Thelin, Lic. thesis, May 2000.

138. On Fault Content Estimations Applied to Software Inspections and TestingHåkan Petersson, Lic. thesis, May 2000.

139. Modeling and Evaluation of Internet ApplicationsAjit K. Jena, Lic. thesis, June 2000.

140. Dynamic traffic Control in Multiservice Networks - Applications of Decision ModelsUlf Ahlfors, Ph.D. thesis, October 2000.

141. ATM Networks Performance - Charging and Wireless ProtocolsTorgny Holmberg, Lic. thesis, October 2000.

142. Improving Product Quality through Effective Validation MethodsTomas Berling, Lic. thesis, December 2000.

143. Controlling Fault-Prone Components for Software EvolutionMagnus C. Ohlsson, Ph.D. thesis, June 2001.

144. Performance of Distributed Information SystemsNiklas Widell, Lic. thesis, February 2002.

145. Quality Improvement in Software Platform DevelopmentEnrico Johansson, Lic. thesis, April 2002.

146. Elicitation and Management of User Requirements in Market-Driven Software DevelopmentJohan Natt och Dag, Lic. thesis, June 2002.

147. Supporting Software Inspections through Fault Content Estimation and Effectiveness AnalysisHåkan Petersson, Ph.D. thesis, September 2002.

148. Empirical Evaluations of Usage-Based Reading and Fault Content Estimation for Software InspectionsThomas Thelin, Ph.D. thesis, September 2002.

149. Software Information Management in Requirements and Test DocumentationThomas Olsson, Lic. thesis, October 2002.

150. Increasing Involvement and Acceptance in Software Process ImprovementDaniel Karlström, Lic. thesis, November 2002.

151. Changes to Processes and Architectures; Suggested, Implemented and Analyzed from a Project viewpointJosef Nedstam, Lic. thesis, November 2002.

152. Resource Management in Cellular Networks -Handover Prioritization and Load Balancing ProceduresRoland Zander, Lic. thesis, March 2003.

153. On Optimisation of Fair and Robust Backbone NetworksPål Nilsson, Lic. thesis, October 2003.

154. Exploring the Software Verification and Validation Process with Focus on Efficient Fault DetectionCarina Andersson, Lic. thesis, November 2003.

155. Improving Requirements Selection Quality in Market-Driven Software DevelopmentLena Karlsson, Lic. thesis, November 2003.

156. Fair Scheduling and Resource Allocation in Packet Based Radio Access NetworksTorgny Holmberg, Ph.D. thesis, November 2003.

157. Increasing Product Quality by Verification and Validation Improvements in an Industrial SettingTomas Berling, Ph.D. thesis, December 2003.

158. Some Topics in Web Performance AnalysisJianhua Cao, Lic. thesis, June 2004.

159. Overload Control and Performance Evaluation in a Parlay/OSA EnvironmentJens K. Andersson, Lic. thesis, August 2004.

160. Performance Modeling and Control of Web ServersMikael Andersson, Lic. thesis, September 2004.

161. Integrating Management and Engineering Processes in Software Product DevelopmentDaniel Karlström, Ph.D. thesis, December 2004.

162. Managing Natural Language Requirements in Large-Scale Software DevelopmentJohan Natt och Dag, Ph.D. thesis, Febuary 2005.

163. Designing Resilient and Fair Multi-layer Telecommunication NetworksEligijus Kubilinskas, Lic. thesis, Febuary 2005.

164. Internet Access and Performance in Ad hoc NetworksAnders Nilsson, Lic. thesis, April 2005.

165. Active Resource Management in Middleware and Service-oriented ArchitecturesNiklas Widell, Ph.D. thesis, May 2005.

166. Quality Improvement with Focus on Performance in Software Platform DevelopmentEnrico Johansson, Ph.D. thesis, June 2005.

Documents

Quality Improvement with Focus on Performance in Software ...fileadmin.cs.lth.se/serg/old-serg-dok/docs-serg/297_thesis_050503.pdf · Performance in Software Platform Development