An Embedded Multi-Core Platform for Mixed-Criticality Systems1051460/... · 2016. 12. 2. · DEGREE PROJECT IN INFORMATION AND COMMUNICATION TECHNOLOGY, SECOND CYCLE, 30 CREDITS STOCKHOLM,

IN DEGREE PROJECT INFORMATION AND COMMUNICATION TECHNOLOGY,SECOND CYCLE, 30 CREDITS

, STOCKHOLM SWEDEN 2016

An Embedded Multi-Core Platform for Mixed-Criticality Systems

Study and Analysis of Virtualization Techniques

YOUSSEF ZAKI

KTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF INFORMATION AND COMMUNICATION TECHNOLOGY

An Embedded Multi-Core Platform for Mixed-CriticalitySystems

Study and Analysis of Virtualization Techniques

Youssef Zaki

Master of Science Thesis

KTH Royal Institute of TechnologySchool of Information and Communication Technology

Stockholm, Sweden

17 August 2016

Alten Advisor: Detlef ScholleKTH Advisor: Johnny ÖbergKTH Examiner: Ingo Sander

© Youssef Zaki, 17 August 2016

Abstract

The common availability of multiple processors in modern CPU devicesand the need to reduce cost of embedded systems has created a drive forintegrating functionalities from different parts of a system into a single Multi-Processor System-on-Chip (MPSoC) device. As a result, system resourcesare shared amongst the critical and non-critical components of the system,which results in a mixed-criticality system (MCS). An example of a MCSis to combine an airbag control unit with the infotainment system of acar, in such a case, both components must be certified unless an isolationmechanism that can prevent the non-critical to interfere with the criticalsubsystems is implemented. This isolation can be achieved via spatial andtemporal partitioning of system resources, such as static mapping of CPUs tocritical tasks, memory and IO virtualization, and time domain multiplexingof applications.

System isolation is currently achievable through virtualization techniques,and is commonly used in data centers and personal computers. Recently,virtualization solutions have been emerging for embedded systems in orderto cope with the increased design complexity, the stringent non-functionalrequirements, and to facilitate the certification process of MCS. The achievedperformance, safety, security, and robustness in a virtualized system dependson the virtualization architecture and hardware platform.

This thesis work performs state-of the art research in the field of mixed-criticality embedded systems with a focus on virtualization of embeddedsystems. As a result, a deep study of virtualization architectures, andopen-source virtualization solutions is conducted in order to understandthe consequences of using this technology in MCS. The work is concludedwith a design and implementation of mixed-criticality embedded systemthat leverages the hardware capabilities of the target device (Zynq-7000 allprogrammable SoC), and contributes to the Living Lab WP7 of the EMC2

project.

Keywords— Mixed-Criticality, EMC2, Safety, Security, EmbeddedSystems, Virtualization, Xilinx Zynq SoC

i

Acknowledgements

I would like to thank my advisors (Detlef Scholle and Johnny Öberg) andexaminer (Ingo Sander) for their help and guidance throughout my masterthesis work. I also thank my friend Mohamad Tagelsir (a.k.a Tage) andadvisors for reviewing my thesis, the Department of Embedded Systems(ESY) at KTH Royal Institute of Technology for providing me with thedevelopment boards, and Alten Sverge AB for giving me the opportunity tobe part of the EMC2 project.

Finally, I would like to thank my parents (Bouazza Zaki and FatimaChakour) for their support and prayers, and my sister (Nada Zaki) forintroducing me to KTH and for hosting me throughout my study periodin Sweden.

iii

Contents

1 Introduction 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4.1 Team Goal . . . . . . . . . . . . . . . . . . . . . . . . 31.4.2 Individual Goal . . . . . . . . . . . . . . . . . . . . . . 3

1.5 Initial Resources . . . . . . . . . . . . . . . . . . . . . . . . . 31.6 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.7 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Mixed-Criticality Systems 52.1 Mixed-criticality Systems . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 Definition of Safety . . . . . . . . . . . . . . . . . . . . 52.1.2 Safety in Mixed-Criticality Systems . . . . . . . . . . . 62.1.3 Security in Tradition Embedded Systems . . . . . . . . 62.1.4 Security in Mixed-Criticality Systems . . . . . . . . . . 7

2.2 Motivation for Using MPSoC . . . . . . . . . . . . . . . . . . 72.2.1 Advantages of MPSoCs in MCSs . . . . . . . . . . . . 82.2.2 Limitations of MPSoCs in MCS . . . . . . . . . . . . . 9

3 Hardware Platform 113.1 Zynq Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.1.1 Processing System . . . . . . . . . . . . . . . . . . . . 113.1.2 Programmable Logic . . . . . . . . . . . . . . . . . . . 123.1.3 PS-to-PL Boundary Interfaces . . . . . . . . . . . . . . 13

3.2 Application Processing Unit . . . . . . . . . . . . . . . . . . . 133.2.1 ARM . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.2.2 ARM Instruction Set . . . . . . . . . . . . . . . . . . . 143.2.3 ARM Processor Modes . . . . . . . . . . . . . . . . . . 143.2.4 ARM Core Registers . . . . . . . . . . . . . . . . . . . 15

v

vi Contents

3.2.5 Current Program Status Register . . . . . . . . . . . . 163.2.6 Exception Handling in ARM . . . . . . . . . . . . . . 163.2.7 ARM Coprocessors . . . . . . . . . . . . . . . . . . . . 173.2.8 Virtual Memory System Architecture . . . . . . . . . . 17

3.2.8.1 Memory Management Unit . . . . . . . . . . 183.2.8.2 Page Tables . . . . . . . . . . . . . . . . . . 183.2.8.3 Translation Lookaside Buffer . . . . . . . . . 18

3.2.9 ARM TrustZone Architecture . . . . . . . . . . . . . . 19

4 System Virtualization 214.1 High-Level View of System Virtualization . . . . . . . . . . . 21

4.1.1 Type II Hypervisor . . . . . . . . . . . . . . . . . . . . 224.1.2 Type I Hypervisor . . . . . . . . . . . . . . . . . . . . 22

4.2 Virtualization Architectures for Embedded Systems . . . . . . 234.2.1 Full Virtualization . . . . . . . . . . . . . . . . . . . . 234.2.2 Paravirtualization . . . . . . . . . . . . . . . . . . . . 244.2.3 Monolithic Hypervisor . . . . . . . . . . . . . . . . . . 244.2.4 Console Guest Hypervisor . . . . . . . . . . . . . . . . 244.2.5 Microkernel-Based Hypervisor . . . . . . . . . . . . . . 25

4.3 Resource Management . . . . . . . . . . . . . . . . . . . . . . 254.4 Hypervisor robustness . . . . . . . . . . . . . . . . . . . . . . 264.5 Hardware Virtualization Acceleration . . . . . . . . . . . . . . 26

4.5.1 Memory Virtualization . . . . . . . . . . . . . . . . . . 284.5.2 Device and I/O Virtualization . . . . . . . . . . . . . . 28

4.5.2.1 Emulation . . . . . . . . . . . . . . . . . . . 294.5.2.2 Pass-through . . . . . . . . . . . . . . . . . . 294.5.2.3 Mediated Pass-through . . . . . . . . . . . . 29

4.6 Virtualization Requirements for MCS . . . . . . . . . . . . . . 30

5 Exploration of Available Hypervisor Solutions 315.1 Xen Hypervisor . . . . . . . . . . . . . . . . . . . . . . . . . . 315.2 Xen Zynq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325.3 SEL4 Microkernel . . . . . . . . . . . . . . . . . . . . . . . . . 325.4 TrustZone-based Hypervisor . . . . . . . . . . . . . . . . . . . 33

5.4.1 SierraVisor . . . . . . . . . . . . . . . . . . . . . . . . 345.4.2 SafeG . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.5 SICS Thin Hypervisor . . . . . . . . . . . . . . . . . . . . . . 365.6 Hypervisor Solution Matrix . . . . . . . . . . . . . . . . . . . 365.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Contents vii

6 System Implementation 416.1 Implementation Tools . . . . . . . . . . . . . . . . . . . . . . 41

6.1.1 Xilinx Vivado . . . . . . . . . . . . . . . . . . . . . . . 416.1.2 Xilinx ARM Cross-Compiler . . . . . . . . . . . . . . . 41

6.2 System Architecture Overview . . . . . . . . . . . . . . . . . . 426.3 Hardware Components . . . . . . . . . . . . . . . . . . . . . . 42

6.3.1 Resource Planning . . . . . . . . . . . . . . . . . . . . 426.3.2 Network-on Chip Subsystem . . . . . . . . . . . . . . . 446.3.3 Network-on Chip Integration . . . . . . . . . . . . . . 45

6.4 Software Components . . . . . . . . . . . . . . . . . . . . . . 456.4.1 SafeG Virtual Machine Monitor . . . . . . . . . . . . . 456.4.2 TOPPERS/FMP . . . . . . . . . . . . . . . . . . . . . 456.4.3 Linux OS . . . . . . . . . . . . . . . . . . . . . . . . . 45

6.4.3.1 Linux Kernel . . . . . . . . . . . . . . . . . . 466.4.3.2 Root File System . . . . . . . . . . . . . . . 466.4.3.3 Device Tree Blob . . . . . . . . . . . . . . . . 46

6.4.4 SHAPE . . . . . . . . . . . . . . . . . . . . . . . . . . 466.4.5 SHAPE Services . . . . . . . . . . . . . . . . . . . . . 476.4.6 Inter OS Communication . . . . . . . . . . . . . . . . 47

6.5 Project File Structure . . . . . . . . . . . . . . . . . . . . . . 486.6 System Build Overview . . . . . . . . . . . . . . . . . . . . . . 48

6.6.1 Xilinx Build . . . . . . . . . . . . . . . . . . . . . . . . 496.6.2 OS Build . . . . . . . . . . . . . . . . . . . . . . . . . 516.6.3 VMM Build . . . . . . . . . . . . . . . . . . . . . . . . 516.6.4 SOA Build . . . . . . . . . . . . . . . . . . . . . . . . 51

6.7 Demo System . . . . . . . . . . . . . . . . . . . . . . . . . . . 526.7.1 System Boot . . . . . . . . . . . . . . . . . . . . . . . 526.7.2 Boot Sequence of Dual-OS System . . . . . . . . . . . 536.7.3 Hello-World Service . . . . . . . . . . . . . . . . . . . 536.7.4 Shared Memory Monitor Service . . . . . . . . . . . . 55

6.8 System Test and Results . . . . . . . . . . . . . . . . . . . . . 556.8.1 Robustness . . . . . . . . . . . . . . . . . . . . . . . . 556.8.2 Isolation Test . . . . . . . . . . . . . . . . . . . . . . . 566.8.3 Board-to-Board Communication . . . . . . . . . . . . . 566.8.4 Dual-OS Communication SHAPE Service . . . . . . . 56

7 Conclusion and Future Work 597.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

Bibliography 61

List of Figures

3.1 Summary of Zynq SoC Device . . . . . . . . . . . . . . . . . . 12

4.1 Type II Hypervisor . . . . . . . . . . . . . . . . . . . . . . . . 224.2 Type I Hypervisor . . . . . . . . . . . . . . . . . . . . . . . . 234.3 OS Level Virtualization in 2-level Mode Hierarchy System . . 274.4 Type I Hypervisor Virtualization in 2-level Mode Hierarchy

System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.5 Type I Hypervisor Virtualization in 3-level Mode Hierarchy

System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5.1 SafeG Architecture . . . . . . . . . . . . . . . . . . . . . . . . 35

6.1 Hardware and Software Synopsis of Implemented System . . . 436.2 High-Level View of Project Directory . . . . . . . . . . . . . . 486.3 System Build Overview Diagram . . . . . . . . . . . . . . . . 506.4 Demo Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526.5 Boot Sequence of the Dual-OS System . . . . . . . . . . . . . 54

ix

List of Tables

3.1 CPSR Subfield Description . . . . . . . . . . . . . . . . . . . . 163.2 ARMv7 Architecture Exception Vector Table . . . . . . . . . 173.3 ARM Page Table Details . . . . . . . . . . . . . . . . . . . . . 18

5.1 Hypervisor Solution Matrix . . . . . . . . . . . . . . . . . . . 37

6.1 Resource Planning . . . . . . . . . . . . . . . . . . . . . . . . 446.2 Address Mapping . . . . . . . . . . . . . . . . . . . . . . . . . 536.3 Code Size Comparison of System Software Components . . . . 55

xi

List of Acronyms

API Application Programming Interface

ASIC Application Specific Integrated Circuit

ASIL Automotive Safety Integrity Level

ARM Advanced Risk Machine

AXI Advanced eXtensible Interface

BSP Board Support Package

CPS Cyber Physical Systems

DTB Device Tree Blob

DTC Device Tree Compiler

DTS Device Tree System

DMA Direct Memory Access

ECU Electronic Control Unit

EMC2 Embedded Multi-Core systems for Mixed Criticalityapplications in dynamic and changeable real-timeenvironments

FPGA Field Programmable Gate Array

FSBL First Stage boot Loader

GCC GNU Compiler Collection

GPL2 General Purpose License 2

GPOS General Purpose Operating System

xiii

xiv List of Acronyms

JTAG Joint Test Action Group

IDE Integrated Development Environment

IOMMU Input Output Memory Management Unit

IP Intellectual Property

MB MicroBlaze

MCS Mixed-Criticality System

MMU Memory Management Unit

MPSoC Multi-Processor System-on-Chip

NoC Network-on-Chip

NS Non-Secure

OCM On-Chip Memory

OS Operating System

PC Personal Computer

PL Programmable logic

POSIX Portable Operating System Interface

PS Processing System

PV Processing System

QoS Quality of Service

RTOS Real-Time Operating System

RTL Register Transfer Level

SDK Software Development Kit

SDSoC Software Defined System-on Chip

SHAPE Self-configurable High Availability and Policy basedplatform for Embedded systems

SOA Service Oriented Architecture

List of Acronyms xv

SoC System-on Chip

SS SHAPE Service

SSBL Second Stage Boot Loader

TFTP Trivial File Transfer Protocol

TLB Translation Lookaside Buffer

UART Virtual Extension

VE Virtual Extension

VM Virtual Machine

VMM Virtual Machine Monitor

WCET Worst-Case Execution Time

WP7 Work Package 7

Chapter 1

Introduction

Stringent non-functional requirements [1] of complex embedded systemstogether with the common availability of Multi-Processor System-on-Chip(MPSoC) devices have created a drive for integrating system functionalitiesthat potentially have different criticality levels into a single computingplatform [2]. As a result, these mixed-criticality systems must incorporatespatial and temporal partitioning mechanisms in order to avoid unwantedinteractions between the critical and non-critical components, increase thesecurity assurance level, and enable the certification of partitions independentof the other components in the system [3].

Virtualization is a technique that is commonly used in computers andservers to provide isolated execution environments and to support theexecution of heterogeneous operating systems on the same hardware platform[4]. Virtualization of embedded systems has recently been a growing trend,mainly because it provides a mechanism to isolate execution environments.This approach provides safety and security measures, and facilitates thecertification of safety-critical systems.

This thesis will investigate virtualization solutions for embedded systems,motivate the drive behind applying virtualization techniques in order tofacilitate design and development of mixed-criticality systems, and concludewith the design and implementation of a Mixed-Criticality System (MCS)that demonstrate the mixed-criticality concept while remaining within thescope of the Living Lab Work Package 7 (WP7) of European research project”Embedded Multi-Core systems for Mixed Criticality applications in dynamicand changeable real-time environments (EMC2)” [5].

1

2 Chapter 1. Introduction

1.1 BackgroundCurrently, modern automotive systems contain a large number of ElectronicControl Units that collectively constitute many heterogeneous single-coresystems [6]. Each Electronic Control Unit (ECU) is optimized to executean application with a specific criticality level such as safety-critical anti-lock brake system or non-critical entertainment systems [7]. This approachprovides isolation for the numerous critical and non-critical applications inthe collective system, and a simple mechanism to qualify an individual ECU.However, it yields an inefficient and expensive system implementation. Inorder to lower the cost of the system and increase performance, mixed-criticality applications can be integrated into a single multicore platform.This solution will reduce the number of ECUs in the system, which in turnlowers the manufacturing and maintenance costs [6].

Combining applications into a single multicore platform can greatlyincrease performance and reduce cost. However, this approach increasessystem complexity, and hinders the certification of safety-critical systems [3].In order to facilitate the design, test, and certification of such systems, spatialand temporal partitioning can be used in the architecture of the system.

1.2 Problem StatementModern embedded systems are following the trend of integrating mixed-criticality applications into a single computing platform. In the automotiveindustry, for example, this integration would reduce the number of ECUsin a vehicle, which in turn reduces the manufacturing cost and increase thereliability of the system. However, this approach makes the certification ofsystems very tedious. This limitation can be overcome by providing spatialand temporal isolation to applications.

Virtualization techniques have been ported to embedded systems, andhave demonstrated satisfactory results regarding the isolation of subsystemsand system security. While many virtualization solutions exist, they all havedifferent capabilities. Therefore, it is important to understand the basics ofvirtualization techniques in order to identify which virtualization solution isbest suited for the EMC2 project?

1.3 PurposeThe purpose of this thesis is to study virtualization technologies as they applyto MCS, and develop a prototype platform for the next generation electronic

1.4. Goals 3

systems for commercial vehicles, as described in the EMC2 Living Lab WP7[5].

1.4 GoalsThis thesis is part of a larger team project that aims to develop the nextgeneration of electronic systems suitable for Cyber Physical Systems (CPS)in the automotive industry. The platform will aim to exploit the potentialof heterogeneous multi-cores for MCSs, and create an adaptive system ableto adjust to changes in real-time environments.

The heterogeneous multi-core embedded platform should be suitable toserve as a computing base for CPS in the automotive industry. This CPS willinclude Service Oriented Architecture (SOA) to create an adaptive systemthat can respond to changes in real-time environments, Network-on-Chip(NoC) component to provide system scalability, and system virtualization tosupport concurrent execution of heterogeneous operating systems and achieveisolation between the critical and non-critical tasks of the system.

1.4.1 Team GoalThe team consists of three master thesis students, each working on a differentlayer of the system:

• Service Oriented Architecture layer enables a system to adapt tochanging environment conditions

• Mixed-criticality Architecture enables applications with differentcriticality levels to concurrently execute in a single computing platform

• Network-on-Chip enables efficient system scalability

1.4.2 Individual GoalThe individual goal is to investigate virtualization technologies as they relateto MCSs, establish design requirements, and develop a MCS prototype thatincludes the individual contribution of this thesis and integrates the work ofthe other team members.

1.5 Initial ResourcesThe selected embedded System-on Chip (SoC) platform for the EMC2 projectis based on Xilinx’s Zynq-7000 SoC device. Therefore, the resulting prototype

4 Chapter 1. Introduction

will also target the same SoC device. In this case, the Zedboard, which isa development platform by Digilent, which contains the Zynq-7000 SoC atits core, is available (provided by KTH – Royal Institute of Technology) andwill be used during the development process. Furthermore, Alten SwedenAB, the company where this thesis work is being conducted, is providing aproprietary software (SHAPE) that represents the SOA aspects of the system.

1.6 MethodThis thesis will follow the applied research methodology, where knowledge isderived from well-known and accepted theories and principles, and is appliedto solve specific problems [8]. This implies that the project will commencewith State-of-the-Art research in the field of MCS with a focus on systemsafety, security, and certification. This investigation will help identify systemrequirements and guide the development direction for the remainder of theproject.

1.7 Thesis StructureIn order for the reader to understand the work done in this thesis, itis important to gain an understanding of MCS, the Zynq-7000 SoC witha deep understanding of the ARMv7 processor architecture, and systemvirtualization. However, the reader can skip to Chapter 5 if he or she isalready familiar with these topics. The following is a brief description ofeach subsequent chapter:

• Chapter 2 presents a brief background regarding mixed-criticalitysystems, and addresses the safety, security, and certification challengesassociated with such systems. Therefore, this chapter is important forreaders that do not have a background on MCS.

• Chapter 3 introduces the Zynq-based hardware platform with a focuson the ARMv7 architecture.

• Chapter 4 introduces the fundamentals of system virtualization.• Chapter 5 explores the available open-source virtualization solutions.• Chapter 6 presents the design, implementation, and test of the system

prototype.• Chapter 7 reflects upon the conducted work and suggests ideas for

future work.

Chapter 2

Mixed-Criticality Systems

This chapter gives an introduction to mixed-criticality systems, motivatesthe drive for pursuing such systems, and describes design challenges.

2.1 Mixed-criticality SystemsA current trend in embedded systems is to take advantage of the availabilityof multicore processor chips in order to consolidate subsystems, and achievea higher CPU utilization. Naturally, the embedded systems that are presentin CPSs, such as automotive vehicles, contain components (or ECUs) withdifferent criticality levels. As an example, the task for checking airbagsensor and deploying the airbag when needed has a higher criticality levelthan controlling the volume of the infotainment system. Therefore, thesecomponents are integrated into a single computing platform, the responsetime of the airbag system should not be affected by non-critical functions ofthe infotainment system. As a result, scheduling the two functions into thesame computing platform yields a mixed-criticality system.

The development of MCS must comply with safety and security regulationsas dictated by each industry field (e.g. automotive, aerospace, railway) inorder to certify products [2]. These industries have defined several criticalitylevels that depend on elements such as environment of operation and dangerto human life.

2.1.1 Definition of Safety”Safety is the absence of unacceptable risk, that is a system is safe if therisk associated with the system is acceptable” [9]. In industrial plants thesafety constraint is often described as an average frequency of 10-3 large

5

6 Chapter 2. Mixed-Criticality Systems

accidents per year [10]. As a result, all of computer systems, sensors, andother electronic components and subsystems of the plant must meet thisconstraint.

2.1.2 Safety in Mixed-Criticality SystemsThe safety requirements in MCS are domain specific. Depending on theindustry and application, the certification process will demand a specificintegrity level, which is defined by industry standards such as RTCADO-178B in aerospace [11], IEC 61508 in industrial control [12], EN 50129 inrailway [13], and ISO 26262 for automotive [14]. The automotive industry, asan example, assigns Automotive Safety Integrity Level (ASIL)s, from ASILA to ASIL D, to rank the level of protection required when creating a safety-critical system, where ASIL A indicates the lowest safety integrity level. Eachlevel dictates guidelines that should be followed in order to achieve a safe andcertifiable system within the targeted industry.

The characteristics of the required services for these industries alsodepends on the final application of the device. In the aerospace industry,the safety process requires both service integrity and availability because anairplane can not stop during flight if a service is no longer available. Onthe other hand, in railway signaling applications, only service integrity issufficient to meet the safety criterion because a train can simply stop if aservice is no longer available. As a result, availability is a secondary functionfor railway signaling systems [15].

2.1.3 Security in Tradition Embedded SystemsCurrent embedded systems contain multiple CPUs, Direct Memory Access(DMA) enabled devices, shared memory, and other peripherals. Thesesubsystems are typically components that are made by different vendors,and must be designed in a manner that facilitates their integration withavailable security solutions [16]. Traditionally, embedded systems employedexternal hardware security units to provide a trusted element in the system(e.g. SIM card in smart phones). However, this approach relies on softwarethat runs outside of the protected field of the trusted component, anddoes not guarantee protection for all assets in the system. Furthermore,manufacturing hardware to enable security features in addition to the SoCincreases the overall cost, where only a few components actually need highsecurity features. Alternatively, security features such as cryptographicoperations, key storage, and system monitoring can be implemented internallyin the SoC in order to reduce system cost and improve performance.

2.2. Motivation for Using MPSoC 7

However, the cryptographic module in this approach faces similar securityrobustness issues as the external hardware solution because it can onlyprotect cryptographic key material. Moreover, the system monitor processorconsumes precious silicon area, and is in general a low-performance processor,which increase the energy consumption of the system. Consequently, a newmechanism is required in order to produce a system that is secure and costeffective.

2.1.4 Security in Mixed-Criticality SystemsIn addition to the traditional issues relating to securing information andprotecting the system from external attacks, in MCSs tasks (or subsystems)of different criticality levels share system resources such as the processor,memory, and I/O devices. This implies that an isolation mechanism isrequired in order to prevent the non-critical components from affecting theexecution of critical components.

One approach to remedy this problem is to virtualize the hardware system(section 4.1). In such a configuration, the hypervisor is the trusted elementthat governs the resources of the system. The hypervisor holds the highestprivilege level in the system and controls the access of virtual machines tosystem resources. Furthermore, the hypervisor contains each virtual machinein an isolated environment that does not allow the propagation of errors.Therefore, if a guest machine is damaged, it will not contaminate the restof the system. However, this implies that the security of the system is asrobust as the hypervisor itself (section 4.4). As a result, it is recommendedto minimize the size of the hypervisor in order to facilitate the verificationand validation process. In some cases, the hypervisor can be formally verified,which yields a better guarantee that the system will behave as intended [17].

2.2 Motivation for Using MPSoCTraditionally, engineers relied heavily on voltage and frequency scaling toachieve better performance in processors. However, as transistor feature sizecontinued to shrink, this design approach became obsolete. The power andfrequency walls pushed engineers to shift design strategy, and implementmulticore system-on-chip (SoC) in order to achieve better performance. Thisinitiative started the multicore era, where modern processors are designedwith more than one core to enable parallel computing [18]. On the otherhand, FPGA companies, such as Xilinx and Altera, are combining multicoreASIC processors with FPGA fabric to enable new design techniques that


bring software and hardware design closer [19]. This approach has thepotential of delivering systems with higher performance at a lower cost.

A recent trend in the design of embedded systems is to take advantageof the available computational power of multiprocessor system-on-chip(MPSoC) devices to integrate subsystems of different criticality levels intoa single platform. This design approach reduces the number of individualphysical components in the system, which can significantly reduce the costsassociated with manufacturing and maintenance. Many industries, suchas aerospace, railway, automotive, and industrial controls, are adoptingthis technique to integrate safety-critical, mission critical, and non-criticalcomponents into a single platform in order to reduce stringent non-functionalrequirements such as size, weight, power, and cost [7] [20]. While thisapproach offers many benefits, it also presents many design and certificationchallenges.

2.2.1 Advantages of MPSoCs in MCSsMPSoCs have the potential to provide many benefits for embedded systemsas opposed to increasing the micro-architecture complexity of a single coreprocessor. According to Pollack’s Rule, within the same process technology,a leading single core processor can achieve a 40% increase in performancecompared to the previous generation of microprocessor. However, when thisrule is considered under the factors of power and area, it indicates thatincreasing the micro-architecture complexity to gain performance yields adiminishing return. On the contrary, implementing a multicore architecturecan potentially result in a near linear increase in performance [21].

It could be argued that Amdahl’s Law is a limiting factor in increasingcomputational performance of MPSoC platforms. This argument wouldbe correct if the target application is to accelerate a computation heavyalgorithm via parallel processing. Then, the smallest amount of serialcode would quickly saturate the achievable gain from multiple processors.However, in the case of MCSs, the goal is to integrate multiple independentapplications, and most of which are inherently parallel as well [7].

MPSoCs have the advantage of integrating special purpose blocks ofhardware to accelerate application specific tasks, such as video or audioencoding/decoding, data encryption/decryption, or data transformation forcontrol-loops in mechatronic systems. This design approach can be foundin many systems in the market such as personal computers, where a generalpurpose processor perform normal tasks and a graphical processor acceleratevideo processing. Similarly, heterogeneous cores are combined into a singlechip to accelerate tasks, or balance the computation load, which helps

2.2. Motivation for Using MPSoC 9

distribute the heat generated in the system for a fixed load. This integrationstrategy lowers the number of physical units in the system, which in turnreduces cost, and simplifies the hardware design of the system.

MPSoCs can also contain free logic components such as the FPGA fabricin Xilinx’s Zynq-7000 SoC device. The main processing system consist of twofast ARM Cortex-A9 cores, capable of operating at frequencies up to 1 GHz.The FPGA logic can provide custom blocks to accelerate the execution ofspecific tasks [22]. While current MPSoC platforms provide many benefit forembedded systems, they introduce some limitations with respect to safety inmixed-criticality systems.

2.2.2 Limitations of MPSoCs in MCSReal-world electronic systems require certification in order to be usedin safety-critical applications. Most of the current MPSoCs face manylimitations in this aspect because they are considered as highly complexelectronic devices. According to the classification and determination ofCOTS device characteristics section of the European Avionics Safety Agency(EASA) certification memorandum [23], a COTS micro-controller is classifiedas ”highly complex” if it contains one of the following features:

• ”more than one Central Processing Unit (CPU) are embedded and theyuse the same bus (which is not strictly separated or which uses the samesingle port memory)”

• ”several controllers of complex peripherals are dependent on each otherand exchange data”

• ”several internal busses are integrated and are used in a dynamic way(for example, a dynamic bus switch matrix)”

These characteristics apply to most MPSoCs, which render system certificationfor safety-critical applications unattainable due to time and cost budgetconstraints.

Safety-critical embedded systems that are considered as hard real-timesystems must guarantee that all safety-critical tasks meet their deadlines.The successful development of hard real-time systems depends on thecharacterization of the Worst-Case Execution Time (WCET) of tasks. Thissystem characterization allows designers to optimize the system in orderto continue functioning properly even in the worst-case scenarios. MostMPSoCs are not designed specifically for hard real-time systems. Theirarchitectures targets general purpose systems that maximize throughput


rather than WCET. Therefore, they lack temporal determinism, which isa critical element in safety-critical system with hard real-time constrains.

Lastly, most MPSoC architectures do not typically include an isolationmechanism that can separate the critical and non-critical subsystems. Thislack of partitioning could cause errors to propagate between subsystems,and even worst, for a less secure non-critical subsystem to infect a criticalsubsystem. Consequently, without any separation mechanism, the entiresystem must be certified, which can include non-critical components suchas the entertainment system. Clearly, certifying non-critical subsystemsis undesirable, especially since some of these components employ complexalgorithms that would cause the certification cost to skyrocket [7].

Chapter 3

Hardware Platform

Hardware imposes constraints during the design space exploration phase ofthe project. Therefore, in order to reach an optimal design solution, it isimperative to have a thorough understanding of the hardware architectureand its capabilities. As was previously stated, this project is restrictedto the use of the Zedboard as a hardware platform. The core of thisdevelopment board is a Zynq all programmable SoC* device. Therefore,this section will cover some of the basics of Zynq’s system architecture suchas the Processing System (PS), the Programmable Logic (PL), and theirinterconnect. Furthermore, this section will dive deeper into the Cortex-A9architecture, which constitutes the central processing unit in Zynq.

3.1 Zynq OverviewZynq combines the high performance of Application Specific IntegratedCircuit (ASIC) devices and the flexibility of Field Programmable Gate Array(FPGA) fabric into a single die. The ASIC and FPGA components arerepresented by the PS and PL regions in Figure 3.1 respectively.

3.1.1 Processing SystemThe PS is a hardwired system that is composed of several third partyIntellectual Property (IP)s from vendors such as ARM, Cadence, and Arasan[24]. The main processing unit is a dual-core Cortex-A9 ARM processor,which is a high performance processor that is commonly found in manycommercial systems. Zynq is designed such that the PS processor always

* From this point forward, this device will simply be referred to as Zynq.

11

12 Chapter 3. Hardware Platform

Processing System (PS)

PS-to-PLBoundary

32 KBI-Cache

ARM Cortex-A9

CPUMMU

FPU/NEON Engine

32 KBD-Cache

Snoop Controller,AWDT, Timer

512 KB L2 Cache

M_AXI_GP x2

S_AXI_GP x2

S_AXI_HP x4

S_AXI_ACP x1

AXI Interfaces

FCLKs

IRQ, Event,Standby

DMA Req/Ack

DDR Arb,AXI Idle,SRAM Int

FTMD Trace,FTMT Trigs

Misc. PL Signals

ProgrammableLogic (PL)

EMIO

JTAGMIOBoot Mode

PS_CLK,POR_RST_N

SRST_N

256 KBOCM

DDR Controller

TTCSWDT

System LevelControl Regs

DMA 8 Channels

GIC

Application Processing Unit (APU)

I/O PeripheralsUSB x2,GigE x2,SDIO x2,

GPIOUART x2,CAN x2,I2C x2,SPI x2

DDR Memory

User SelectIO

XADC

MGTX

PL Signals

Figure 3.1: Summary of Zynq SoC Device

boots first, which enables a software centric approach to the PL system bootand configuration.

3.1.2 Programmable Logic

The PL is based on Xilinx’s 7-series FPGA technology, which combined high-performance and low-power characteristics. Due to the flexible nature ofthe PL, systems can be designed to reach a new level of performance. Forexample, the PL region can be used to instantiate standard or custom IPhardware modules that can serve as accelerators for the PS. Additionally, thePL region enables the PS to access system resources that are only accessibleby the PL such as MGTX and User SelectIO.

3.2. Application Processing Unit 13

3.1.3 PS-to-PL Boundary InterfacesThe PS and PL regions are tightly coupled via a number of communicationports that are visible as common regions in Figure 3.1. These interfaces canbe placed into two categories:

1. Functional interfaces – include the Advanced eXtensible Interface(AXI) ports such as AXI_GP for general purpose master/slave deviceinterface between PS and PL regions, extended MIO (EMIO) whichenable PL IPs to access most I/O peripherals, interrupts, DMA flowcontrol, clocks, and debug interfaces.

2. Configuration interfaces – these signals are connected to the configurationblock of the PL, which allow the PS to control the configuration of thePL.

3.2 Application Processing UnitThe application processing unit (APU) constitutes the computing componentof the PS. The APU contains a dual-core ARM Cortex-A9 processor systemwith a memory hierarchy of 32KB L1 instruction and data cache, and ashared 512KB L2 cache memory. Additionally, the APU also includes a256KB On-Chip Memory (OCM), dedicated local timers for each core, andshared timers for the system. The PS also serves as the main unit to accessI/O peripherals such as UART, USB, and Ethernet ports [24].

The remainder of this section will focus on the ARMv7-A architecture,which is the instruction set architecture (ISA) of the Cortex-A9 system.This version of the ARMv7 architecture is the application profile thatsupports Virtual Memory System Architecture (VMSA), which is based onthe utilization of the Memory Management Unit (MMU) of the processorsystem [25].

3.2.1 ARMARM (Advance RISC Machine) ISA follows a Reduced Instruction SetComputer (RISC) architecture approach, which implies features such as:

• Large uniform register file.

• Load/store architecture: data-processing operations are performed ona register level only. Therefore, data must first be loaded from externalmemory to registers in order to modify their values.


• Simple addressing modes.

In addition to the core RISC architecture features, ARM includesadditional instructions such as combined shift with an arithmetic or logicaloperation, automatic address increment/decrement operations to optimizeprogram loops, load and store multiple instruction, and conditional executions.These extensions enrich the core ISA and lead to enhanced performance,small code size, and low power consumption.

3.2.2 ARM Instruction SetThe available ARMv7 architecture provides three instruction sets:

• ARM instruction set: 32-bit instructions that are four-bytes aligned.

• Thumb instruction set: 16-bit and 32-bit instructions are available, andcan be use in the same program. This can reduce code size at the costof reduced performance.

• Jazelle instruction set: 8-bit JavaTM byte codes.

3.2.3 ARM Processor ModesARMv7 architecture supports nine processor modes*, and each mode holdsa privilege level (PL0-to-PL2) in the system.

• User mode (USR): least privileged (or non-privileged) mode in thesystem (PL0). Operating systems run applications in User mode inorder to restrict their access to system resources. Furthermore, softwareexecuting in non-privileged mode can not cause the processor to changemodes except by creating an exception.

• System mode (SYS): processor mode with privilege level PL1. Exceptionscan not cause the processor to move into this mode, but only directmodification of the mode bits in the CPSR.

• Supervisor mode (SVC): typically, the kernel executes at the supervisormode with PL1. The processor enters this mode when a Supervisor callinstruction is executed. This instruction is usually used by OS kernelsto issue system calls.

* Not all processors with ARMv7 architecture feature all nine processor modes. Thisdepends on the availability of security and virtualization extensions


• Abort mode (ABT): privileged mode with PL1 that is entered whenData Abort exception or Prefetch exception occurs.

• Undefined mode (UND): processor enters this privileged mode (PL1)when an exception occurs due to an undefined instruction.

• Fast Interrupt mode (FIQ): processor enters this privileged mode (PL1)when fast interrupt request (FIQ) is detected*

• Interrupt mode (IRQ): processor enters this privileged mode (PL1)when an interrupt request (IRQ) is detected†

• Monitor mode (MON): privileged mode (PL1) that allows the systemto access secure and non-secure resources of the system, and supportsthe execution of TrustZone monitor software (see subsection 3.2.9 formore details).

• Hypervisor mode (HYP): highest privilege level in the system‡ (PL2)that only exists in the non-secure zone of the system. Therefore, theHypervisor mode can not be accessed while the processor is in theSecure state.

3.2.4 ARM Core RegistersFrom the application level view of the ARM core, 16 registers are available(R0 to R15), with the last 3 registers serve as special purpose registers:

• R13: Stack Pointer (SP) – each processor mode has its own SP registerexcept for User and System modes.

• R14: Link Register (LR) – stores subroutine return address. Hypervisor,System, and User modes share the same LR register.

• R15: Program Counter (PC) – stores the address of the next instructionto be fetched. PC register is common for all modes.

These banked registers, which are part of a greater set of registers thatmay or may not be available depending on the presence of Security andVirtualization Extensions in the system. Typically, these registers follow anaming convention such as R0_usr to indicate the mode association of aparticular register.* bit[6] of the CPSR must be enabled † bit[7] of the CPSR must be enabled ‡ Thisprocess mode is only available for devices that contain ARM’s virtualization extension


3.2.5 Current Program Status Register

The Current Program Status Register (CPSR) is a 32-bit register that isused to control and monitor processor internal operations.

Table 3.1: CPSR Subfield DescriptionBitField

Field Type Field Description

[31:28] Condition flags Indicate the status of arithmetic operations[27] Q Cumulative saturation bit.[26:25] IT If-Then execution state bits for Thumb IT

instruction.[24] J Jazelle bit.[23:20] N/A Reserved for future use.[19:16] GE Status bits to indicate greater than or equal

events.[15:10] IT Together with bits[26:25] make the 8-bit

status field (IT[7:0]) for If-Then Thumbinstruction.

[9] E Sets the endianness for data access[8:6] Mask bits Controls exception masking for abort, IRQ,

and FIQ signals. Subfield requires PL1 orhigher to change its state.

[5] Thumbexecutionstate bit

Together with bit[24], the instruction set stateof the processor is elected (ARM, Thumb,Jazelle, or ThumbEE)

[4:0] Mode field Used to control processor mode, and can onlybe written to at PL1 or higher.

3.2.6 Exception Handling in ARM

Exceptions cause the processor to temporarily suspend its current operationin order to handle special events, such as external interrupts or the executionof an undefined instruction. When an exception occurs, the processorexecution is moved to an address that represents the exception type. Theseaddresses are stored in an Exception Vector table, which consists of eightconsecutive word-aligned memory addresses (Table 3.2).


Table 3.2: ARMv7 Architecture Exception Vector TableMemory Offset Exception Type0x00 Reset0x04 Undefined Instruction0x08 Supervisor Call*0x0C Prefetch Abort0x10 Data Abort0x14 Hyp Trap†0x18 IRQ Interrupt0x1C FIQ Interrupt

3.2.7 ARM CoprocessorsThe ARMv7 architecture extends the ARM CPU’s ISA via coprocessorsupport. These are not physical coprocessors, but are a means to abstractthe fundamental ISA of ARM and to enhance features of the CPU system.ARMv7 architecture can provide access up to 16 coprocessors (CP0-to-CP15), where CP15 and PC14 are reserved and play an important role inthe configuration, control, and debug of the CPU system.

• CP14 is reserved for the configuration and control of debug and tracefeatures.

• CP15 is the System Control coprocessor. It provides configuration andcontrol support for the ARM CPU system, such as TLB and cachemanagement, and MMU control.

3.2.8 Virtual Memory System ArchitectureThe Virtual Memory System Architecture (VMSA) is the memory systemarchitecture of the ARMv7-A implementation. VMSA allows memoryvirtualization by providing support for virtual-to-physical address translation,memory access permission control, and memory attribute validation. Thesefeatures are enabled via the Memory Management Unit (MMU), Page Table,and Translation Lookaside Buffer (TLB).

* Depending on processor mode, this address is used for Supervisor Call, Hypervisor call,and Secure Monitor Call. † Only available for devices with Virtualization Extensionoption.


3.2.8.1 Memory Management Unit

Memory Management Unit (MMU) is a hardware block that controls accesspermissions, memory attributes, and address translations. The MMU is acritical component in system virtualization. It allows for efficient coexistenceof applications by abstracting (or virtualizing) the physical memory. If theMMU is disabled, all virtual addresses (VA)s are mapped in a one-to-onefashion to physical addresses (flat address mapping). In this case, the addresstranslation has to be managed in software, which lowers system performanceand increases software design complexity.

The MMU features different address translation capabilities dependingon the availability of Security and Virtualization Extensions. For example,with the presence of Security Extensions, the architecture provides twophysical address spaces (Secure and Non-secure). The MMU provides, inthis case, features to isolate these regions, and supporting registers to controlaccess. In all versions of the ARMv7-A architecture, the MMU divides thephysical memory into contiguous regions commonly known as pages, andstores virtual-to-physical translation information in Page Tables.

3.2.8.2 Page Tables

The VMSAv7 architecture provides two levels* of address lookup (or pagestables). The page table mapping can be divided into Small Pages (4KB),Large Pages (64KB), and Sections (1MB) (Table 3.3). Each partition schemeprovides a different level of access granularity. Section partitioning onlyrequire a level-one lookup, which reduces the penalty of a full table walk.On the other hand, a page-based mapping requires both level-one and level-two lookups. The first level (L1) table is indexed by 12 bits, and therefore,contains 4096 entries, and the second level table is indexed by 8 bit, whichyields 256 entries.

Table 3.3: ARM Page Table DetailsType Page Level Memory Size (KB) Number of EntriesSections L1 1024 4096Pages L2 4, 64 256

3.2.8.3 Translation Lookaside Buffer

The Translation Lookaside Buffer (TLB) is a cache memory that serves thespecial purpose of storing recently translated virtual addresses. Before the* Virtualization Extensions enables a third level page translation.


MMU starts the lookup process for translating virtual addresses, it checksif the address is already present in the TLB. Similar to a regular processorcache system, the lookup operation can either result in a hit or a miss. In caseof a hit, the TLB provides the translated entry of the target virtual address.Otherwise, if the TLB does not contain the translated entry, a TLB miss isgenerated, and the MMU continues with the address translation steps (tablewalk). The TLB is then updated (according to a round robin replacementpolicy) with the newly translated virtual address.

3.2.9 ARM TrustZone ArchitectureTrustZone is a security extension available in modern ARM processors thatcreates a security infrastructure designers can use to protect critical systemassets [25]. This infrastructure is achieved by enabling the partition ofsystem components, both hardware and software, into either a Secure anda Normal world (or zone). Resources that are marked as normal are notpermitted to access Secure zone components. This mechanism is enforced bythe AMBA3 (Advanced Microcontroller Bus Architecture) AXI (AdvancedeXtensible Interface) bus system. It contains an extra control signal for eachof the read and write channels (Non-Secure bit or NS bits) that control theaccess rights of the Non-Secure bus masters to the Secure slaves.

Each processor with an enabled TrustZone security extension can bepartitioned into a Normal and a Secure virtual CPU. The virtual processorsexecute in a time-multiplexed fashion, and use the ”Monitor Mode” stateto create an efficient switching mechanism between the Normal and Securezone. The NS-bit, bit[0] of the Secure Configuration Register (SCR) inthe System Control Coprocessor (CP15), controls the activation of securestate of the processor. Whenever the NS-bit is set high, the processor stateimmediately switches to the Normal world. However, if the processor is inMonitor Mode, it remains in the Secure world regardless of the state of theSCR NS-bit. The processor state can enter monitor mode by either issuing aspecial instruction, SMC (Secure Monitor Call), in software or by hardwareexception mechanisms such as IRQ, FIQ, external Data Abort, or externalPrefetch Abort. In general, the software running in monitor mode serves thepurpose of saving the state of the current world, and loading the state theother.

Chapter 4

System Virtualization

System virtualization is the abstraction and management of system resourcessuch as CPU, memory, and peripherals. Virtualization solutions implementspacial and temporal isolation techniques in order to facilitate the integrationof mixed-criticality systems [3]. This approach results in independent virtualmachine that are fully contained in an execution environment that can notinfect the remainder system. In addition to isolation, virtual machine canuse any operating system as long as the OS is compatible with the virtualmachine. As a result, a single computing platform can enable the executionof heterogeneous operating systems. This section presents the backgroundnecessary to understand the fundamentals of system virtualization.

4.1 High-Level View of System VirtualizationIn the computing world, virtualization is a method used to abstract hardwareresources of underlying platforms [4]. The most common virtualizationenvironment is the operating system found in Personal Computer (PC)s(i.e. Windows, Mac OS X, or Linux). These operating systems hidedetails about the CPU and GPU units, hard disks, Ethernet controller,and other hardware components found in PCs. As a result, software canbe developed independent of the final hardware platform via the availableApplication Programming Interface (API). While operating systems canabstract hardware systems, and achieve a satisfactory performance level, theyare not suitable for MCS. Essentially, a MCS requires deadline guarantees ofhard real-time tasks while ensuring the Quality of Service (QoS) of general-purpose applications. At the same time, the system must be secure androbust such that it offers protection against external attacks and unintendedinteractions between the critical and non-critical components of the system.

21

22 Chapter 4. System Virtualization

VM1 VM2 VMn

Hypervisor

Host GPOS(Windows/Linux)

Hardware Platform

Figure 4.1: Type II Hypervisor

Consequently, a thin software layer, hypervisor or Virtual Machine Monitor(VMM), is introduced that can abstract and manage system resources. Sincethe code size of the hypervisor is small, it is easier to validate. The hypervisorprovides an isolation mechanism that can encapsulates an entire OS andapplications into a Virtual Machine (VM). Furthermore, the hypervisor cansupport multiple virtual machines, which enables the concurrent executionof heterogeneous operating systems.

4.1.1 Type II HypervisorType II hypervisor (Figure 4.1) is a software layer that runs on top ofa General Purpose Operating System (GPOS) [4]. It takes advantageof Operating System (OS) services such as resource management (e.g.CPU allocation, scheduling, and memory management), and the hardwareabstraction facility of the host OS, which enable the reuse of device drivers,communication stacks, and APIs. Furthermore, type II hypervisors enableexecution of applications in a native as well as a virtual environment, whichprovides another level of flexibility in the system. However, the security oftype II hypervisors is as robust as the underlying host GPOS. Therefore,the hypervisor can be subverted by one of the security gaps in the hostGPOS, thereby corrupting the entire system. Additionally, the host OS layerincreases system complexity and overall code size, which is an importantfactor for resource constrained embedded systems. As a result, type IIhypervisors are not suited for most embedded systems.

4.1.2 Type I HypervisorFigure 4.2 presents a high-level view of type I hypervisor architecture. Ascan be seen, type I hypervisor runs directly on the hardware platform (baremetal). This approach avoids the complexity and inefficiency of GPOS,and can achieve a high level of isolation for safety and security critical

4.2. Virtualization Architectures for Embedded Systems 23

VM1 VM2 VMn

Hypervisor

Hardware Platform

Figure 4.2: Type I Hypervisor

applications [4]. However, legacy type I hypervisors that do not takeadvantage of hardware virtualization support (section 4.5) must developcustom device drivers and hardware management services that optimizethe performance and robustness of the virtualized system. Furthermore,some type I hypervisor solutions use a specialized guest to support I/Odevice sharing for the guest systems. With this type of solution, thecode size of the trusted software becomes too large that the virtualizationsolution is no better than type II hypervisor. Without the availability ofhardware virtualization support, these limitations become trade-offs thatvary depending on the selected type I hypervisor architecture.

4.2 Virtualization Architectures for EmbeddedSystems

Virtualization is a widely used technique in enterprise computing systems,and recently, it has been gaining strong interest in the embedded systemsdomain [4]. This is particularly relevant for multicore embedded systemsthat integrate mixed-criticality applications, where critical tasks include hardreal-time deadlines, and the system requires a high level of safety and securitystandard.

4.2.1 Full VirtualizationIn full virtualization, guest operating systems are unmodified and unaware ofthe virtualization environment [26]. Each virtual machine is provided with allservices of the physical system (e.g. virtual BIOS, virtual devices, and virtualmemory). Full virtualization employs binary translation techniques in orderto trap-and-emulate non-virtualizable and sensitive system instructions. Thisis the only approach that does not require hardware assist or paravirtualization(see subsection 4.2.2) in order to virtualize the system. However, thecomputational intensity of dynamic binary translation and instructionrewrite techniques results in a performance level that is unacceptable for


embedded systems [4].

4.2.2 ParavirtualizationThe prefix ”Para” is an English affix that originated from Greek and means”beside”, ”with”, or ”alongside”, which yields the meaning of ”alongsidevirtualization” [26]. Unlike full virtualization, in paravirtualization, guestoperating systems are modified in order to improve the performance ofthe hypervisor. These modifications are applied specifically to the guestOS kernel in order to replace non-virtualizable instructions and criticalkernel operations with hypercalls that can request services directly fromthe hypervisor. These services represent system calls that are part of theOS kernel, and they execute with the highest privilege level in the system.However, once the OS kernel is pushed into a virtual machine environment,the hypervisor gains the highest privilege level in the system. Consequently,the normal execution of system calls will cause system faults that must betrapped and emulated by the hypervisor. Paravirtualization remedies theneed to trap-and-emulate sensitive instruction. However, this process comeswith high development and maintenance costs. Detailed knowledge of theOS kernel is required in order to apply the necessary changes to the sourcecode. Xen ARM, for example, requires the modification of approximately4500 lines of code (LOC) [27]. Nevertheless, paravirtualization is the onlyviable solution for embedded platforms that do not provide any hardwarevirtualization support.

4.2.3 Monolithic HypervisorSimilar to an operating system, a monolithic hypervisor contains all devicedrivers and middleware to enable execution of guest operating systems [4].This hypervisor architecture results in a large software layer, which makes itdifficult to verify and validate. Furthermore, the monolithic hypervisor usesa single instance of the virtual environment to run multiple guest systems.Therefore, a single defect in the hypervisor could corrupt the entire system,which contradicts the isolation characteristics of system virtualization.

4.2.4 Console Guest HypervisorIn the console guest hypervisor approach, the hypervisor layer is reduced insize. However, this architecture requires a special guest virtual machine witha special operating system called ”console guest”, ”Domain 0”, or ”Dom0” inorder to provide services to other guest operating systems and to handle I/O

4.3. Resource Management 25

control. The selection of the Dom0 OS is critical because a general purposeOS might dramatically increase the size of the abstraction layer, and as aresult reduce the robustness of the system.

4.2.5 Microkernel-Based HypervisorIn order to increase the robustness of the hypervisor, its size should be assmall as possible. Microkernel-based hypervisors represent a thin softwarelayer that runs as bare-metal, and can provide strong isolation between guestoperating systems. This approach implements virtualization as a serviceon top of the trusted microkernel. Therefore, each separate instance is asrobust as the guest environment itself. Damaged guest environments can notcontaminate the rest of the system because only the microkernel executes inthe highest privilege mode.

4.3 Resource ManagementIn a virtualized embedded system, the hypervisor is responsible for managingall system resources, including CPU units [4]. The hypervisor can use spatialand temporal partitioning techniques to distribute or consolidate systemworkloads in order to reach an optimization solution that satisfies specificoperating conditions such as low power consumption or low heat generation.Spatial partitioning is only available in multicore systems, where only a singlevirtual machine is mapped to a specific set of CPUs. While this approachdoes not use system resources to their maximum capacity, it does provideavailability of service guarantees for safety and security critical applications.

Alternatively, the hypervisor can use dynamic partitioning, where resour-ces can be redistributed amongst the virtual machines in order to maximizesystem utilization. This architecture is more challenging to implement, but itunlocks many features such as load balancing, migration of virtual machinesacross cores, and temporal partitioning. Dynamic partitioning is a highlydesirable feature in power efficient systems. As an example, in a multicoresystem where two VMs only require 50% utilization and each VM is mappedto separate processor, the hypervisor can determine that a single core issufficient for both VMs, consolidate the workload, and turn off the othercores. As a result, the system will be able to save energy from staticpower consumption. The hypervisor can also use a hybrid approach thatstatically maps a safety-critical VM to a set of processors*, and use dynamicpartitioning for the rest of the cores in the system.* SMP guest operating systems can take advantage of multicores to execute concurrentworkloads


4.4 Hypervisor robustness

Virtualization techniques can increase system robustness against externalmalicious attacks [4] by isolating guest systems virtual environments. Anattacker can only affect the guest system it penetrates, leaving the restof the system unharmed. However, several studies were conducted to findvulnerabilities in the security of virtualized systems, such as SubVirt, Bluepill, Ormandy, Xen owning trilogy, and VMware’s security certification.In each case, the system was found susceptible to some type of attackthat allowed the attacker to cause crashes, anomalous behavior, or runarbitrary code. This discovery led to the conclusion that using a hypervisordoes not necessarily guarantee a robust isolation between virtual machines.Consequently, developers have to be aware of this fact when working in highlysensitive domains. Platform attestation is an approach used to increasethe robustness of virtualized environments. It proposes that only knowngood firmware such as the hypervisor are allowed to boot and control thecomputing platform at any given time, which prevents hypervisors from beingcorrupted.

4.5 Hardware Virtualization Acceleration

An important aspect in virtualization of MCS is to reduce the performanceoverhead of using a hypervisor. In many processors, the architecture definestwo modes of hierarchy: user mode, and supervisor mode [4]. The hardwareis designed to efficiently switch between the two modes. In OS levelsystem virtualization, two-level mode hierarchy is sufficient to achieve goodperformance. The OS kernel runs in supervisor mode and the applicationsrun in user mode (Figure 4.3). Therefore, the OS kernel has the highestprivilege level in the system, and can execute all instructions, and access anyhardware unit in the system, while on the other hand, applications executewith an unprivileged mode that limits their access. However, the two-levelmode hierarchy is a performance bottleneck for virtualized systems. Whenimplementing the hypervisor layer in a system that only supports two modesof hierarchy, the hypervisor runs in the most privileged mode of the system,and both OS kernel and applications are pushed into user mode (Figure 4.4).As a result, the system incurs a performance and maintenance overhead, asdiscussed in subsection 4.2.2 and subsection 4.2.1.

4.5. Hardware Virtualization Acceleration 27

Kernel

User Mode (PL0)

Supervisor Mode (PL1)

P1 P2 Pn

Figure 4.3: OS Level Virtualization in 2-level Mode Hierarchy System

HypervisorHypervisor Mode (PL1)

Kernel

Guest User Mode (PL0)

Guest Supervisor Mode (PL0)

P1 P2 Pn

VM

Figure 4.4: Type I Hypervisor Virtualization in 2-level Mode HierarchySystem

In order to reduce the overhead associated with system virtualization,CPU vendors have developed a hardware virtualization acceleration optionthat enables a third mode of hierarchy. The three-level mode hierarchysystem allows the hypervisor to run in the most privileged mode similar tothe old-supervisor mode, which allows access to all instructions and hardwareresources of the system. Furthermore, in the virtual machines, the guest OSkernel runs in guest supervisor mode, and applications run in guest usermode (Figure 4.5). This approach allows sensitive instructions to executewithout any modifications to the operating system and provides an efficientmechanism to switch between modes. Products in this area are Intel’s VT-x(Virtual Technology for x86) and ARM’s Virtual Extension (VE).


HypervisorHypervisor Mode (PL2)

Kernel

Guest User Mode (PL0)

Guest Supervisor Mode (PL1)

P1 P2 Pn

VM

Figure 4.5: Type I Hypervisor Virtualization in 3-level Mode HierarchySystem

4.5.1 Memory VirtualizationIn addition to optimizing the virtualization of CPUs, it is importantto find an efficient mechanism to manage the physical system memoryin a virtualized MCS. Modern operating systems implement memoryvirtualization techniques in order to optimize the sharing and dynamicallocation of the physical system memory [26]. The key hardware componentsthat enable efficient memory virtualization are the Memory ManagementUnit (MMU) and the Translation Lookaside Buffer (TLB). The MMUperforms virtual-to-physical address translations and stores a copy of recentlytranslated addresses in the TLB (a special cache dedicated to store virtual-to-physical address mapping). In order to support multiple virtual machinesanother level of address translation is required, which implies that the MMUitself has to be virtualized. The guest OS translates guest virtual addressesto guest physical addresses, and the hypervisor translates guest physicaladdresses to actual machine memory addresses. In order to acceleratethis two-level address translation process, the hypervisor uses shadow pagetables to directly map guest virtual memory addresses to machine memoryaddresses.

4.5.2 Device and I/O VirtualizationThe last import aspect of virtualizing embedded MCSs is the virtualizationof devices and I/Os. The robustness and efficiency of a virtualization solutiondepends on the system’s architecture for handling I/O accesses between thevirtual machines [4]. Three I/O virtualization techniques are widely used:emulation, pass-through, and mediated pass-through.

4.5. Hardware Virtualization Acceleration 29

4.5.2.1 Emulation

In the emulation architecture, the hypervisor intercepts and validates all I/Oaccesses of guest operating systems, and translated them into hypervisor-initiated operations [4]. The emulation method optimizes system reliabilitybecause all I/O accesses are handled by the trusted hypervisor. Furthermore,emulation ensures system availability independent of the state of the virtualmachines. However, this approach produces an overhead in all I/O operationswhich may result in unsatisfactory performance. While these emulatedsystems provide flexibility and system reliability, they lack in performance,and require a large effort to maintain the custom device drivers.

4.5.2.2 Pass-through

The pass-through architecture allows the guest operating systems to bypassthe hypervisor and gain direct access to I/O resources [4]. This modelimproves efficiency, but decreases system robustness. The pass-throughapproach reduces the level of isolation of virtual environments, which doesnot satisfy the safety and security requirement of MCS. In order to increasethe robustness of the pass-through architecture, an Input Output MemoryManagement Unit (IOMMU) is required. Similar to how the MMU allowsthe hypervisor to manage memory accesses of virtual machines, the IOMMUmanages access to I/O devices. This approach removes the risk of a virtualmachine accessing a memory address that is beyond its allocated memory.The IOMMU is particularly important in the presence of DMA engines, whereaccesses does not necessarily originate from the CPU. While IOMMU canimprove performance and robustness of a system that uses a pass-throughmodel, it can not provide sufficient reliability for safety-critical systems.Mainly, because if the virtual machine that is mapped to a pass-throughperipheral is corrupted, all other virtual machines lose access to that deviceas well.

4.5.2.3 Mediated Pass-through

In the pass-through approach, regardless of the availability of an IOMMUunit, an I/O device is mapped to a single virtual machine, and all othervirtual machines must depend on the VM owner to relay I/Os [4]. Therefore,if the VM owner is damaged, it could prevent further access to that I/Odevice. However, in mediated pass-through architecture, the guest OS devicedrivers remain unchanged similar to the regular pass-through architecture,but the hypervisor is allowed to trap and validate all I/O accesses thatmay affect the reliability and security of the system. Furthermore, the


hypervisor can then allow, modify, or reject access based on system policy.Therefore, the mediated pass-through system trades an acceptable amountof performance for reliability.

4.6 Virtualization Requirements for MCSFrom the literature review and the fundamental theory presented thus far, aset of requirements could be defined in order to find a suitable virtualizationsolution that leverages the available hardware resources, and enables the safeand secure integration of mixed-criticality systems.

• Req. 1: Hardware – hypervisor shall support ARM Cortex-A9processor (support for Zynq is a plus).

• Req. 2: Hardware Assists – hypervisor shall leverage availablehardware virtualization support.

• Req. 3: Robustness – hypervisor layer shall be as small as possible.

• Req. 4: Multiple OS – hypervisor shall support at least one GPOS andone Real-Time Operating System (RTOS) instances.

• Req. 5: Isolation – hypervisor shall provide strong isolation mechanism.

• Req. 6: Communication – hypervisor shall provide a communicationmechanism between virtual machines.

• Req. 7: Multicore – hypervisor shall leverage all available CPUs in thehardware platform.

• Req. 8: Pass-through – hypervisor shall not use pass-througharchitecture even in the presence IOMMU in the system.

Chapter 5

Exploration of AvailableHypervisor Solutions

Many hypervisor solutions are available as either open-source or commercialproducts. This section will showcase some of the available implementations,compare their pros and cons, and conclude in the selection of a suitablevirtualization solution that satisfies the system requirements form section 4.6.

5.1 Xen HypervisorThe Xen hypervisor is widely used in enterprise and is now making its wayto embedded systems. It was developed in Cambridge University, and isavailable as open-source software under the the general public license (GNU).The Xen hypervisor is implemented as the guest console architecture, asdiscussed in subsection 4.2.4. The hypervisor layer is a thin software layerthat resides above the hardware layer. It is the first program that runsafter the bootloader, and is responsible for managing the CPU, Memory, andinterrupts.

By default, the Xen hypervisor uses Credit as the CPU scheduler, whichallows the user to allocate a percentage of the CPU time for each VM, or allowthe hypervisor to automatically balance the workload across active CPUs inthe system. Alternatively, the user can specify Simple Earliest Deadline First(SEDF) algorithm for the scheduler. However, the load-balancing feature willbe unavailable [28].

The hypervisor is responsible for launches Dom0, which is a special virtualmachine that has privileged access rights to the physical I/O resources. Ithandles I/O accesses and interacts with the other virtual machines. All otherVM instances operate in Domain U (DomU), which runs in unprivileged

31

32 Chapter 5. Exploration of Available Hypervisor Solutions

mode. The guest virtual machines can be either paravirtualized (PV) orfully virtualized {a.k.a. Hardware-assisted Virtual Machine (HVM)}. ThePV guest are modified operating systems such as: Linux, Solaris, FreeBSD,or other UNIX operating systems. In order to facilitate I/O sharing, Xenuses split-driver architecture. This approach manges I/O accesses of DomUPV guests. The split-driver technique divides the driver into a front-end,located in the DomU PV guest, and a back-end, located in the Dom0 guest.

DomU PV guests are aware that they do not have direct access to thehardware and that they are running alongside other virtual machines on thesame hardware. However, DomU HVM guests are unaware of the presenceof other VMs, and of the fact that they are sharing hardware resources.Instead of split-drivers, in the HVM architecture, a special daemon is startedin Dom0 guest for each DomU HVM guest. The Xen hypervisor is availablefor both Intel and ARM devices. However, it is not recommended to use Xenwith devices that do not contain IOMMU units because the hypervisor canbe easily subverted by DMA capable devices [29].

5.2 Xen ZynqThe open-source Xen hypervisor has recently been ported to the newXilinx Zynq Ultrascale+Multi-Processor System-on-Chip (MPSoC) device[30]. Xen Zynq Distribution is released under the GNU General PurposeLicense 2 (GPL2). The processing platform features a quad-core ARMCortex-A53, a dual-core ARM Cortex-R5, a Mali-400MP2 GPU, and FPGAfabric that supports run-time reconfiguration. This device is the successor ofXilinx Zynq SoC, which features a dual-core Cortex-A9 processor and FPGAfabric.

5.3 SEL4 MicrokernelThe sel4 microkernel is based on the L4 microkernel, which is one of thesmallest kernels available today. Sel4 is the first formally verified microkernel,which implies that its specification is verified mathematically. Sel followsthe ”minimality principle”, which dictates that the kernel shall only containfunctionalities that can not be implemented at the user-level [31]. As a result,the microkernel is small, efficient, and robust. All device drivers are excludedfrom the microkernel level and execute in unprivileged mode, except for atimer driver and an interrupt controller driver.

The microkernel supports a small number of services that enable applications

5.4. TrustZone-based Hypervisor 33

to create and manage threads, virtual memory spaces, and interprocesscommunication (IPC). Furthermore, sel4 follows a ”capability-based accesscontrol model” in order to manage the access rights to all kernel services.Capabilities are unforgeable tokens that contain metadata about a specifickernel object, including its access rights. The use of capabilities as a controlmechanism allows the system to maintain strong isolation between softwarecomponents [32].

The sel4 microkernel implements a fixed-priority round-robin schedulerpolicy, mainly because its current ”time” abstraction method is under-developed and does not yield satisfactory results. As proposed in [31],reservations can be added to sel4 in order to provide a suitable temporalisolation solution for real-time systems.

Sel4 provides IOMMU support for Intel-based architectures (IA-32),which allows the safe integration of DMA enabled devices. Furthermore,Sel4 can support multicore systems via multikernel bootstrapping. However,this feature is only available for x86 machines; only uniprocessor is supportedfor ARM-based devices.

5.4 TrustZone-based HypervisorTrustZone technology refers to the security extensions available in mostmodern ARM systems. As discussed in subsection 3.2.9, TrustZone technologyprovides two working zones: ”Normal” zone and ”Secure” zone. Applicationsrunning in the Normal zone can not access resources from the Secure zone,but Secure zone software has full system access. This mechanism facilitatessystem partitioning, and the creation of a secure isolated environment thatcan host safety and security critical applications.

Most ARM processors offer two privilege levels (PL0 and PL2) forexecuting software. Typically, in operating systems, the kernel executes inthe highest privilege level (PL1) and applications run in unprivileged mode(PL0). However, in order to virtualize a system, a third privilege levelis needed to accommodate the hypervisor state (section 4.5). TrustZoneprovides a third level via the leveraging of the capabilities of the ”Monitor”mode (subsection 3.2.3). Therefore, ARM processor systems that includeTrustZone technology can achieve an efficient virtualization implementationbesides the use of paravirtualization.

ARM’s TrustZone security extensions can be utilized to virtualize asystem in two ways:

1. Use system access capabilities of the Secure zone to build a hypervisorthat can control virtual machines running in the Normal zone.


2. Use the efficient switching mechanism of the Secure zone Monitor tohost a dual-OS system (Secure zone OS and Normal zone OS)

5.4.1 SierraVisorSierraware offers a bare metal universal hypervisor (SierraVisor) that isavailable as open-source under the GNU GPL v2 license or with a commerciallicense [33]. It supports paravirtualization, TrustZone virtualization, andhardware assisted virtualization. SierraVisor is compatible with Cortex-A9/A15 and ARM11 based SoCs, but only Cortex-A15 supports thehardware assisted virtualization option*. The TrustZone virtualizationapproach allows for the integration of guest operating systems without anykernel modifications. Each guest kernel and applications run in their usualprivilege mode, supervisor and user mode respectively. Furthermore, eachguest executes in an isolated container with low overhead.

5.4.2 SafeGTOPPERS group of Nagoya University in Japan has developed an open-source dual-OS architecture (SafeG† – see Figure 5.1) designed to concurrentlyhost a real-time operating system (RTOS) and a general purpose operatingsystem (GPOS) on TrustZone enabled ARM SoC devices [34]. SafeG takesadvantage of ARM’s TrustZone security extensions to efficiently partitionthe system into Trusted and Non-Trusted states, which provides full systemaccess to trusted software, and limits the capabilities of software running inNon-Trusted state. SafeG includes the following features:

• Enables the concurrent execution of RTOS and GPOS on either single-core or multi-core ARM-based platforms.

• Devices and memory regions that are configured as Secure are protectedagainst illegal GPOS accesses.

• Normal world devices can be accessed from both GPOS (Non-Trusted)and RTOS (Trusted) software.

• Real-time requirements are guaranteed in RTOS (Trusted) via theutilization of FIQ and IRQ interrupts, where FIQ interrupts areissued for RTOS and IRQ interrupts are issued for GPOS. Whilein the Trusted state, IRQ interrupts are disabled so that GPOScan not disturb the execution of the RTOS. Therefore, GPOS only

* Hardware assisted virtualization is only available for devises that include ARM’sVirtualization Extensions † Safe Gard

5.4. TrustZone-based Hypervisor 35

executes when the RTOS issues the Secure Monitor Call (SMC)instruction, which causes the SafeG monitor to switch from the Trustedworld (RTOS) to the Non-Trusted world (GPOS). Furthermore, FIQinterrupts are active during the execution of GPOS, which enablesRTOS to retake control of the system. For example, a cyclic executionof RTOS/GPOS can be controlled by an FIQ interrupt of a systemtimer.

• GPOS does not require any major changes, and can execute withminimal overhead.

• Includes an efficient guest-to-guest communication mechanism (i.e.referred to as SafeG COM)

ARM TrustZone® Core

MemoryRTOS

Data

GPOSData

I/ORTOSDevice

GPOSDevice

RTOSFMP

Trusted

SafeG Monitor

Non-Trusted

GPOSLinux

BUS (NS bit)

Legend: Trusted Non-Trusted

Figure 5.1: SafeG Architecture [34]

Figure 5.1 depicts the SafeG architecture. It shows a simplified viewof a TrustZone enabled ARM processor together with partitioned memoryand device IO. The memory and device IO are configured as either Trusted(Secure) or Non-Trusted (Non-Secure), and their access is controlled bythe NS bit of the bus (see subsection 3.2.9). The SafeG Monitor is thegateway between the GPOS and the RTOS. During the switch operation,it is responsible for saving the state of one world and loading the state ofthe other. The RTOS tasks are statically mapped to each processor duringcompilation. On the other hand, the GPOS uses all available virtual CPUs inSMP (Symmetric Multi-Processor) mode. Furthermore, the GPOS does nothave access to Secure memory regions and Secure device IO. However, theRTOS can access all system resources. Therefore, by designating a resourceas Non-Secure and making the RTOS aware of its existance, both the GPOS


and RTOS can gain access, which is essential for communication between thetwo regions.

5.5 SICS Thin HypervisorSICS Thin Hypervisor (STH) is a light-weight hypervisor designed for ARM-based devices [35]. STH runs directly on top of the hardware (bare metal),and achieves system virtualization through paravirtualization. As a result,guest systems require some modifications to the OS kernel, including theaddition of a hypercall* inteface. STH strengthens the security of embeddedsystems through the isolation capabilities of virtual machines, and allowsfor the existence of heterogeneous operating systems on the same platform.Current STH version supports ARMv5 (926EJ-S) and ARMv7 Cortex-A8only. However, STH is a highly flexible and portable hypervisor that uses ahardware abstraction layer with minimal size.

5.6 Hypervisor Solution MatrixThere are many hypervisor solutions available in the market, some areopen-source and others commercial. Table 5.1 lists hardware and featuressupported by selected open-source hypervisors. It is important to keep inmind that the target hardware platform for this project is the Zedboard,which contains Xilinx Zynq 7000 SoC. Furthermore, the Zynq 7000 SoCcontains a dual-core Cortex-A9 processor, plus FPGA fabrique. Therefore,the solution matrix lists the Cortex-A9 and Zynq 7000 in the Hardwaresection in order to showcase the fact that most hypervisors are designed forthe ARM processor and not necessarily for the Zynq 7000 SoC. Nevertheless,since the zynq 7000 SoC contains an ARM processor, the porting process isa little easier, or might even be assumed compatible. Therefore, all theinformation presented in the table is a representation of the literature reviewfrom (section 5.1, section 5.3, section 5.4, section 5.5).

Another key factor to keep in mind, is that Cortex-A8/A9/A15 are builton the ARMv-7 architecture, which indicates that they contain some levelof compatibility. However, unlike the Cortex-A8/A9, Cortex-A15 includesVirtualization Extensions, which enables system virtualization without theneed for paravirtualization or binary translation techniques. Table 5.1 alsopresents hypervisor solutions that leverage TrustZone technology (available* Hypercalls are similar to system calls of a Linux kernel. As system calls provide aninterface between use applications and the kernel, the hypercalls provide an interfacebetween the virtual machine and the hypervisor.

5.6. Hypervisor Solution Matrix 37

Table 5.1: Hypervisor Solution MatrixXen ARM

PV VE Zynq STH SafeG SierraVisor Sel4 XtratuM

HARDWARECortex-A8 x - - x x x x -Cortex-A9 x - - - x x x -Cortex-A15 - x - - - x x -Zynq 7000 SoC - - - - x x2 x1 -Zynq Ultrascale+ MPSoC - x x - - - - -

x86 - - - - - - x xFEATURESMultiple guests x x x x x3 x x4 xParavirtualization(PV) x x5 x5 x x5 x x x

TrustZone SecurityExtension - - - - x x - -

VirtualizationExtension (VE) - x x - - x x x

Multiprocessor x x x - x x x4 xDocumentation x6 x x x x x6 x x

LEGEND:Not Available -

1 Support is visible in source code, but not in documentation.

2 Support is visible in documentation, but source coderequires modifications.

3 Dual-OS only.4 Available on x86 only.5 Lightweight PV of guests, or only needed for I/Os.

Exceptions

6 Outdated or poor documentation.Fully Supported x


in the project’s target platform) in order to achieve better performancesecurity.

Lastly, it is important to consider the quality of the documentation thataccompanies each solution while making a selection. Xen hypervisor, as anexample, provides a large amount of documentation. However, since this is alarge project, it is divided into multiple sub-projects. Naturally, the task ofmaintaining information accuracy becomes difficult. This particularly appliesto older (or inactive) projects. SierraVisor is another example of open-sourcesolutions with poor documentation. The available data sheets are designedas marketing tools to showcase the capabilities of the hypervisor solution,which makes sense because they offer their hypervisor with open-source andcommercial licenses.

Table 5.1 presents a mix of open-source hypervisor solutions that leveragedifferent virtualization techniques and hardware assists. The followingcomments provide comparative analysis of the selected hypervisor solutions.

• Xen ARM Para-Virtualization (PV) [36] is the initial port of Xen fromx86 to ARM based devices. It is an older project that has been inactivesince 2012, and is the only Xen release that is compatible with theCortex-A9 processor, which indicates that it is also compatible withthe ARM processor in the Zynq 7000 SoC. However, the Xen PVsolution will require development effort in order to port it to the Zynq7000 SoC platform.

• Xen ARM VE [37] is the current Xen on ARM project. It takesadvantage of the virtualization support available in ARM devices, suchas Cortex-A7/A15/A53, and only uses paravirtualization to supportI/O interfaces. As a result, this Xen hypervisor release can achieve ahigher performance and security level than older Xen solutions [38].

• Xen ARM Zynq resulted from the Xen ARM VE port. DornerWorkscreated a solution for Xilinx’s new SDSoC platform (Zynq Ultrascale+ MpSoC) that offers resources isolation and management, and anARINC-653 compliant scheduler [30].

• The last official release (2013) of STH supports ARMv5 926EJ-S andARMv7 Cortex A8 CPUs, but does not support Cortex-A9. Since theCortex-A8 and Cortex-A9 are based on ARMv7 architecture, they arebinary compatible. One could attempt to follow the well structuredcode, compile and run STH on Zynq. however, since STH usesparavirtualization to achieve system virtualization, the end result willhave lower performance than solutions that take advantage of hardwarevirtualization extension.

5.7. Conclusion 39

• SafeG is a dual-OS system (Normal OS and Secure OS) that leveragesARM’s TrustZone technology to achieve system isolation and virtua-lization. This solution provides support for the Zynq 7000 platform,and provides sufficient English documentation to understand thesystem architecture and setup. However, it is only compatible withARM devices that include TrustZone technology. Therefore, this is aviable option for this project, but it is not portable to other non-ARMbased system.

• SierraVisor is a hypervisor solution that is built on top of ARM’sTrustZone technology. It supports multiple guest operating systems,and can be built to support PV, VE, and TrustZone technologies (checkTable 5.1 for compatibility). Similar to SafeG, this solution is alsoavailable for ARM based devices only.

• Sel4 is a secure microkernel that has been formally verified. Sel4represent a solid foundation that can be used to build a hypervisor.Sel4 is derived from the L4 microkernel, and it culminates twenty yearsof experience in developing microkernels and hypervisors based systems[17]. Examples of commercial hypervisors in this category includeFiasco, PikeOS, OKL4, CODEZERO, and NOVA.

• XtratuM is a thin hypervisor that is designed for real-time embeddedsystems, and its source code is currently only available for x86 baseddevices [39].

5.7 ConclusionFrom the comparative study of the available virtualization solutions, we canconclude that XtratuM, Xen VE, and Xen Zynq are not compatible withthe hardware platform for this project. Furthermore STH and Sel4 requiretoo much porting effort, which is outside the scope of this work due to thelimited time duration of the project. Therefore, the remaining options areSafeG and SierraVisor. At first glance, SierraVisor appears to be a superiorsolution since it provides more features than Safeg. However, a close lookat the source code structure and the available documentation suggests thatthis solution will require a great effort to setup and run an initial system.Consequently, SafeG is identified as the optimal virtualization solution thatfully leverages the capabilities of the hardware, and satisfies all the designrequirements.

Chapter 6

System Implementation

This chapter presents the architecture of the implemented hardware andsoftware components of the system, and introduces the demo setup.

6.1 Implementation ToolsSeveral tools were used to build and test the prototype system.

6.1.1 Xilinx VivadoXilinx offers a set of tools that enable developers to create hardware andsoftware systems. Xilinx offers Vivado, which is a hardware developmenttool that is able to synthesize RTL design into a Xilinx FPGA or SoftwareDefined System-on Chip (SDSoC) device. Vivado outputs a bitstream (.bit)file that can program the FPGA directly using a programming port such asJoint Test Action Group (JTAG), or it could be wrapped into a BOOT.binfile, which combines initilization software, bitstream, and u-boot*.

6.1.2 Xilinx ARM Cross-CompilerXilinx Software Development Kit (SDK) is a software Integrated DevelopmentEnvironment (IDE) that enables full system software development anddebug. Developers can use Xilinx’s SDK without the need to use Vivado. TheSDK include a set of common Board Support Package (BSP)s that enablesdevelopers to build a system solely from the SDK.* Universal Bootloader is an open-source project that was cloned byXilinx. Wiki page: http://www.wiki.xilinx.com/U-boot and git repository:https://github.com/Xilinx/u-boot-xlnx

41

http://www.wiki.xilinx.com/U-boot

https://github.com/Xilinx/u-boot-xlnx

42 Chapter 6. System Implementation

6.2 System Architecture OverviewFigure 6.1 presents a synopsis of the implemented platform. As can beseen, this view represents both the hardware and software components ofthe system, and the software-to-hardware association. The software unitsare indicated by rectangle with rounded corners, whereas hardware unitsare indicated by regular rectangles. Furthermore, Figure 6.1 highlights theTrustZone partitioning of system resources in this implementation.

6.3 Hardware ComponentsThe Zynq system, as seen in Figure 6.1, is divided into two main regions: theProcessing System (PS), which consists of a hardwired application processingunit, memory controller, and peripheral devices, and the ProgrammableLogic (PL) (see Figure 3.1). The PS communicates with the NoC subsystemthrough the M_AXI_GP0 port, which is a general purpose AXI-basedinterface. In this configuration, the entire PL region is threated as a secureresource or subsystem, and is under the control of the RTOS. The ideais that the MicroBlaze (MB) nodes in the NoC subsystem handle datacollection and processing from sensors, and control actuators. Consequently,the information they hold is critical to the safety and security aspects of thesystem. Therefore, it is necessary to designate them as secure resources.

6.3.1 Resource PlanningSince normal resources can not access secure resources, it is important toplan a static allocation of system resources. Table 6.1 lists all the activeresources in the system, and indicates whether the resource is dedicated tothe secure or non-secure world, or to both. For example, resources such asL1 Cache are shared between the secure and non-secure worlds, whereas thePrivate Timer is dedicated for the non-secure world only*. Cache lines, asan example, are marked with a Non-Secure (NS) value that either sets theresource as secure or non-secure. Any attempt to access a secure cache line(NS=0) from the non-secure CPU will simply result in a cache miss.

Since the physical CPU is shared, one can think of it as a secure and anon-secure virtual CPU (NS VCPU and S VCPU), as depicted in Figure 6.1.This effectively yield four virtual CPUs in total in the Zynq-7000 SoC. The* It is important to keep in mind that the secure zone CPU can access all system resources,even when we claim that a resource is dedicated as non-secure. This only implies thatonly the non-secure zone intends use this resource.

6.3. Hardware Components 43

Zynq-7000 SoC

ProcessingSystem(PS)

UART 0UART 1

Eth 0

SharedMemory

Non-SecureMemory

SecureMemory

DDR3 Memory

PS-PLBoundary

ProgrammableLogic (PL)

NS VCPU S VCPU

SafeG Virtual Machine Monitor (VMM)

WDT

Timer

GTGPOS (Linux)

SHAPE

SHAPE Services (SS)SS1 SS2 SSn

RTOS(TOPPERS/FMP)

BTaskSafeG Switch CallSafeG Switch Call

Cyclic Task

T1 T2 Tn

Activate TasksActivate Tasks

M_AXI_GP0

Networkon Chip

MB1

MB2

MB3

MB4

SelectIO: LEDs, Buttons, Switches

Legend: Secure Hardware Non-Secure HardwareSecure Software Non-Secure Software

Figure 6.1: Hardware and Software Synopsis of Implemented System


Table 6.1: Resource PlanningResources Dedication

CPU BothL1 Cache BothL2 Cache Both

OCM BothDDR3 Both

Watchdog Timer SecurePrivate Timer Non-Secure

Ethernet Port 0 Non-SecureUSB 0 Non-Secure

UART 0 SecureUART 1 Non-Secure

M_AXI_GP0 SecureNoC Secure

MB Nodes Secure

DDR3 memory is partitioned into secure, non-secure, and shared blocks.This partition allows each OS to have a dedicated memory region, and enableinter-OS communication via shared memory communication.

6.3.2 Network-on Chip Subsystem

The NoC subsystem was included in order to demonstrate system scalability.As the number of processing nodes increase into 100’s of cores and beyond,a NoC interconnect is necessary in order to maintain an acceptable powerconsumption and system predictability [40]. The NoC system is based onpacket switched interconnects that are distributed through an NxN mesh.

The implemented NoC subsystem was generated using the NoC systemgenerated tool [41] [42] developed by Johnny Öberg at KTH – Royal Instituteof Technology, Sweden.

The tool creates an image of the system in the target FPGA technology’sinput language. The input of the tool is in XML format, and the outputformat depends on the FPGA vendor. For Altera devices the output followsa Qsys format, and for Xilinx devices the output is a TCl script. The nodetype, count, and process mapping is decided during system design stage. Thetool supports different processor architectures such as MicroBlaze, Nios, orLeon3, which enables the creation of a heterogeneous multi-core system.

6.4. Software Components 45

6.3.3 Network-on Chip IntegrationIn order to integrate the NoC subsystem in the design, the FPGA configurationbitstream was included in the BOOT.bin file (see subsection 6.6.1). Since thePS can access the PL through the M_AXI_GP0 port, the RTOS requiresthe addition of a device driver that enables read and write operations to thememory mapped port.

6.4 Software ComponentsThe system is divided into many layers and components that provideservices such as local cloud, cloud services, operating systems, and systemvirtualization. This section gives a brief description of these software layersand components and the role they play in the system.

6.4.1 SafeG Virtual Machine MonitorSafeG was introduced in subsection 5.4.2, but as a quick review, SafeGleverages ARM’s TrustZone technology to divide the system into secure andnormal sections, which enables the concurrent execution of two operatingsystems. This separation includes memory and I/O devices, and in the caseof Zynq, it also includes FPGA programmable logic.

6.4.2 TOPPERS/FMPTOPPERS/FMP is an RTOS developed in the TOPPERS project, and isdistributed as an open source software. This RTOS follows the uITRON4.0specification [43], which is a widely used RTOS specification for Japaneseembedded systems. TOPPERS/FMP supports the following features:

• Symmetric and asymmetric multiprocessor configurations• Static task to processor assignment during design time• Load balancing capabilities through the use of available API to migrate

tasks to other processors*

6.4.3 Linux OSLinux is an open-source operating system that is available for many processorarchitectures and development boards. It is designed to be highly flexible so* The kernel does not support task migration, but available API can be leveraged toimplement such functionality


that different parts of the system can be developed independently. The LinuxOS is composed of three main components: Linux Kernel, Root File System,and Device Tree Blob.

6.4.3.1 Linux Kernel

The kernel is the core software layer of the Linux operating system, whichsits directly on top of the hardware layer. It is responsible for abstracting thehardware resources of the system such that user-level software can gain accessthrough standardized APIs, such as Portable Operating System Interface(POSIX) APIs. Furthermore, the kernel performs resource management suchas I/O access, memory sharing, and process scheduling [44].

6.4.3.2 Root File System

The root file system, commonly known as ROOTFS, is a compressed file thatholds the root directories and the entire file system structure of the Linux OS.The root file system can be created from scratch using tools such as Buildrootor Yocto, but this process is too complicated, and often unnecessary [45]. Inmost cases, it is easier to simply modify an existing root file system.

6.4.3.3 Device Tree Blob

In order to increase the flexibility of the operating system, the Linux kerneldoes not contain description of the hardware present in the system, rather,it resides in a separate binary file: the Device Tree Blob (DTB) [46]. Thisbinary file is generated from a Device Tree System (DTS) using the DeviceTree Compiler (DTC) tool. The DTC is writing in a unique language thatallows the DTC to easily convert it to a DTB file, which in turn is easilyunderstood by the Linux kernel [47].

6.4.4 SHAPESelf-configurable High Availability and Policy based platform for Embeddedsystems (SHAPE), is a software layer that can connect multiple nodes in thesystem in order to create a local cloud [48]. It is composed of three mainsub-layers: application layer, middleware layer, and Instantiation layer.

1. Application layer – highest software layer in SHAPE where user levelapplications and services exists.

2. Middleware layer – core software layer in SHAPE. It is responsible formonitoring and controlling system resources, and managing communication

6.4. Software Components 47

between services. The middleware layer is platform independent (i.e.it does not depend on OS or hardware), which facilitates its portingprocess to another platform.

3. Instantiation layer – this is the lowest software layer that handles alloperating and hardware dependencies of the middleware. This layer sitson top of the platform layer, which consists of hardware, device drivers,and operating system. Unlike the middleware layer, the instantiationlayer is platform dependent. This layer is split into two main stacks:Portability and System Interface. The former contains device driversand larger functionalities such as the link handler, and the latterprovides the mapping of the system header files to the portability layer.

6.4.5 SHAPE ServicesSHAPE is the cloud environment where SHAPE Service (SS)s can beregistered and linked together. These services operate on the bases of aproducer-consumer system. For example, a producer SS could be responsiblefor reading sensor data, and a consumer SS could be responsible for displayingthe collected data. Therefore, SHAPE would be required to match the correctproducer(s) and consumer(s) SSs.

6.4.6 Inter OS CommunicationDue to the isolation mechanism in SafeG, it is not possible for the normalzone to directly communicate with the secure zone. As a result, an inter-OScommunication mechanism is required. Several solutions exists (OpenAMP[49], RPMsg [50], OP-Tee [51], and DualOSCom [52]) that can provide thistype of communication. Ideally, a thorough investigation of each solutionwould be required in order to determine the optimal solution for this system.However, since this project has a strict time constraint and the specificationsof all of mentioned solutions satisfy the isolation requirement of the project,the solution that required the least implementation effort was selected; inthis case, the DualOSCom. This solution is released as part of the softwaredistribution of SafeG, and is accompanied with sample software facilitatedits integration into the system.

The DualOSCom solution was integrated with SHAPE in order to enableSHAPE service access to the RTOS environment. This required the additionof a special shared memory monitor service that is able to read and write theshared memory location via the DualOSCom protocol.


6.5 Project File StructureFigure 6.2 displays the root file system of the project directory*. As can beseen, the directory structure of the EMC2 demo project† is divided into fourmain categories:

• The Xilinx directory holds all files associated with the booting processof the system (system bitstream, First Stage boot Loader (FSBL), andSecond Stage Boot Loader (SSBL)).

• The VMM directory has all files related to system virtualization.Therefore, it contains the source code for building SafeG, Linux patchfiles, SafeG applications for both trusted (FMP) and non-trusted(FMP/Linux) operating systems, and the shared communication library”DualOSCom”, which enables communication between the GPOS andRTOS.

• The OS directory contains both the RTOS (TOPPERS/FMP) andGPOS (Linux) operating systems.

• The SOA directory contains all files related to the local cloud systemand service oriented architecture, which consists of SHAPE and itsservices.

EMC2

Xilinx VMM OS SOA

Figure 6.2: High-Level View of Project Directory

6.6 System Build OverviewThis MCS is built from many components, and each of which is built usingdifferent types of tools. One of the main challenges in this project is tomaintain the different configurations for building the system. While thevirtualization layer (SafeG) and the Secure OS (RTOS) remain the same,the software that runs in the normal region can be either RTOS, GPOS, ora bare metal application. Furthermore, other configurations, such as RTOSor GPOS only mode, do not include the VMM.* The name of the project here does not reflect the actual EMC2 project. It simply refersto the root directory of the build system. † Henceforth EMC2 project.

6.6. System Build Overview 49

Figure 6.3 provides a summary of the different build configurations forthe system and the required flow and dependencies for building the system.Figure 6.3 display the activities that take place within each major directoryin the project directory. The keyword ”step x” indicates instances wheredependencies exists within a build directory. Software tools are indicatedby the circular shape, such as Vivado, SDK, and GNU Compiler Collection(GCC) (make).

6.6.1 Xilinx BuildIn the Xilinx directory, Vivado is used to synthesize the hardware design(vhdl or verilog code) into a bitstream file (.bit), which is used to programthe PL region of the Zynq-7000 SoC. The bitstream file can also containsoftware for soft-processors such as the MicroBlaze. Vivado also produces aset of files that represent the designed hardware platform, which are used forsoftware development.

The SDK tool is used to create the Board Support Package (BSP) andthe First Stage Boot Loader (FSBL) that correspond to the designed system.The BSP contains a set of device drivers that can be used to accelerate thedevelopment process. The FSBL is an Executable and Linkable Format(ELF) file that contains initialization software. During system boot, theFSBL is loaded into the On-Chip Memory (OCM) in order to initialize allavailable components (e.g. memory controller, data cache, instruction cache).In general, after the FSBL initialization process completes, and dependingon boot sequence, the CPU can do any of the following actions: configurethe FPGA, initiate the Second Stage Boot Loader SSBL, or jumps to thefirst address of the main program.

The SDK tool is also used to generate a boot file (BOOT.bin), whichmust at least contain the FSBL (fsbl.elf). In the implemented system, theBOOT.bin file also includes the bitstream file (system.bit) and the SSBL(uboot.elf*)

The u-boot.elf file (universal boot-loader) is the SSBL. Once the systemis initialized and the PL is configured, the system starts executing the u-boot instructions present in the BOOT.bin. U-boot is a full system onits own, and has many useful features. In particular, u-boot can be usedto load executables and other system files from a remote server into theDDR3 memory using protocols such as Trivial File Transfer Protocol (TFTP)(Figure 6.3). This method is extremely usefully, particularly when dealing

* When u-boot is compiled from source, the executable output must be renamed fromu-boot to u-boot.elf. Otherwise, the SDK tool will not recognize it.


with large files during the development phase, such as the RootFS and Linuxkernel.

Xilinx VMM

OS

SOA

Design

(.vhdl)

HardwareDesign

Vivadostep1 System

(.bit)

HWPlatform

Export HW

to SDK

SDKfsbl

(.elf)

step2

BSP

U-bootstep3

uboot

(.elf)make

SDK

step4

BOOT

(.bin)

Apps

make

make

step2

SafeGMonitor

make makestep1

OSPatches

patch

DualOSCom

t_fmp

(.bit)

nt_fmp

(.bit)

monitor

(.bit)

libSafeG

(.a)

GPOS

LinuxKernel

RootFS(.zip)

DTS

RTOS FMP makefmp

(.bin)

make zImage

DTCdevicetree

(.dtb)

SHAPEmakeshape

(.exe)

shape

lib (.a) step1

SHAPEServices

make

shape

services

(.exe)

step2

step3

TFTPserver

SD Card Zynq SoCPlatform tftpboot

Figure 6.3: System Build Overview Diagram

The u-boot subdirectory is a clone from Xilinx’s git repository, and wasbuilt for the Zedboard configuration [53]. If no changes are required in u-boot, one could simply use a prebuilt binary executable.

6.6. System Build Overview 51

6.6.2 OS BuildThe OS directory includes Linux as the GPOS and FMP as the RTOS.Both operating systems can be used to run with or without SafeG. TheRTOS, for example, can be build from within its directory, which enables itto run without SafeG. As will be explained in subsection 6.6.3, the SafeGconfiguration is initiated from the VMM directory. On the other hand, inoder to configure the GPOS for a SafeG build, it must be patched with thepatch sources in the Patches subdirectory. Otherwise, it will be built for noSafeG mode.

6.6.3 VMM BuildThe SafeG Monitor in the VMM directory contains the source code forbuilding the system monitor. It also contains the sources for buildinglibSafeG.a, which is a library that contains all the monitor system calls,such as the SafeG_System_Switch() call that is responsible for initiating aswitch from one region to the other*. In this system, it is only used by thetrusted FMP OS.

6.6.4 SOA BuildSHAPE is built with the arm cross compiler when targeting the Zynqplatform, but it is also build for the x86 platform in order to take advantageof the control station as an additional node. Once SHAPE is built, SHAPEServices (SS)s can be constructed. Two main SSs were used in this project:Hello-World, and shared memory monitor. These services are explained indetail in subsection 6.7.3 and subsection 6.7.4.

Since the shared memory monitor service uses the DualOSCom protocolto enable communication between the RTOS and GPOS, it includes sourcesfrom the DualOSCom subdirectory in the VMM directory. On the otherhand, it does not require the DualOSCom sources for the hello-world SS.

Once SHAPE and SS executables are ready, they are added to the rootfile system (RootFS) folder of the Linux operating system, which resides inthe Xilinx Directory. This is accomplished via a script that performs thefollowing five steps:

1. Unzip RootFS2. Mount RootFS into a temporary directory

* These system calls require a PL1 (Privelige Level) or higher in order to execute.Therefore, only the kernel can initiate such an action when the call originated from thenormal region.


3. Copy the executables into a preferred directory, such as the rootdirectory of the RootFS

4. Unmount the RootFS from the temporary directory5. Re-zip the RootFS folder

6.7 Demo SystemThe demo system, as depicted in Figure 6.4, is composed of two Zedboardsconnected to a control station in a network configuration via Ethernet. Thisconfiguration effectively creates a network of three nodes. Each Zedboardalso has a serial communication channel to the control station, which is usedto monitor the status of the system. Furthermore, each ZedBoard containssimilar configuration, as depicted in Figure 6.1. The only difference betweenthe figure and the actual implementation is that only two tasks were createdin the Secure regions. Task 1 was mapped to CPU0 and task 2 was mappedto CPU1 of the PS region.

Display

Zedboard Zedboard

SD Card

SD Card

ButtonsSwitches & LEDs

ButtonsSwitches& LEDs

UART UART

Ethernet Switch

Control Station

EthernetEthernetZynq SoC

Zynq SoC

DDR3Memory

DDR3Memory

Figure 6.4: Demo Setup

6.7.1 System BootEach Zedboard boots from the BOOT.bin file located in its SD Card. Onceu-boot is online, the control station is used to send load instructions to

6.7. Demo System 53

Table 6.2: Address MappingProgram Files Description Load AddresszImage Linux Kernel 0x00008000devicetree.dtb Device Tree Blob 0x00010000ramdisk_image.gz RootFS 0x00011000monitor.bin Monitor 0x1c000000fmp_t.bin TOPPERS/FMP 0x1c100000

the u-boot system via the serial channel (Universal Asynchronous ReceiverTransmiter (UART)). The u-boot in each Zedboard loads the system files(SafeG monitor, trusted RTOS, and non-trusted OS) from the TFTP serverthat is present in the control station to the DDR3 memory according to theaddresses in Table 6.2, (also see Figure 6.3), and jumps to the start address ofthe monitor (0x1c000000) in order to initiate the boot sequence (Figure 6.5).

6.7.2 Boot Sequence of Dual-OS SystemFigure 6.5 shows the boot sequence of the system with the SafeG VMM. Thestart indicates that u-boot has already loaded the appropriate files to DDR3memory, and jumped to the start address of SafeG. During the initializationof SafeG, the monitor configures the TrustZone parameters, and in the caseof multiprocessor configurations, it also performs synchronization steps.

After the initialization step, the CPU jumps to the start address ofthe Secure (or Trusted) OS (S_OS). At that point, the S_OS begins itsexecution, and the entire system becomes under its control. Once all thehigh priority tasks have finished executing, the scheduler moves a low prioritybackground tasks (btask) into a running state. The sole purpose of the btaskis to execute the SafeG System Switch instruction, which initiates a switchfrom the Secure to the Non-Secure zone. During the switch activity, theprocessor enters Monitor mode. It starts by saving the state of the zone thathas initiated the switch, and loads the state of the destination zone. Afterthe system completes the switch to the Non-Secure zone, the GPOS bootsand begins to execute. When a system timer reaches the end of its period,it issues an FIQ interrupt that causes the SafeG monitor to switch back theRTOS.

6.7.3 Hello-World ServiceThe Hello-World Service was the initial SS used to test the system. The goalof this service is to test the capabilities of SHAPE. This includes detecting


Secure ZoneNon-Secure Zone

Start

Monitor Mode S Supervisor ModeNS Supervisor Mode

Time

Initialize SafeG

Boot S_OS & StartHigh Priority Tasks

Save S_OS State &Load NS_OS State

Boot NS_OS &Start Processes

Save NS_OS State& Load S_OS State

S_OSNS_OS Monitor

Jump to S_OS start address

Low priority BTaskexecutes SafeG SystemSwitch function

Jump to NS_OS startaddressJump to NS_OS startaddress

Timer interrupt causesswitch back to S_OS Resume S_OS operation

Figure 6.5: Boot Sequence of the Dual-OS System

6.8. System Test and Results 55

System File Size (Bytes)GPOS(Linux v3.6.10) 2,368,472

SOA(SHAPE) 248,164RTOS(FMP) 50,160VMM(SafeG) 4,920

Table 6.3: Code Size Comparison of System Software Components

and linking available services, updating the registry system when a newservices are online and when they terminate. Therefore, the Hello-Worldservice is devided into two services: the first sends the message ”Hello”, andthe second sends the message ”World”, but only after it receives the ”Hello”message. If one of the services is terminated, the other stops sending messagesuntil a new instance of the terminated service reappears in the system.

6.7.4 Shared Memory Monitor ServiceThis service incorporates the DualOSCom shared memory protocol in orderto send and receive messages from the secure region of the system. However,current implementation only allows the Shared Memory Monitor Service(SMMS) to accept read requests. The write function will be implementedin the future work. After other services have issued the read request to theSMMS, the acquired values from the RTOS are broadcasted to all requestingservices.

6.8 System Test and ResultsThis section validates system requirements by investing system propertiesthat satisfy the requirements as presented in section 4.6. All tests werecaptured via video recording sessions.

6.8.1 RobustnessIt is important to build a robust Trusted Computing Base (TCB) softwarelayer in a virtualized system. The term robustness here indicates that asoftware component has been well tested, and in order to facilitate thisprocess the code size of the TCB layer should be as small as possible. Ascan be seen in Table 6.3, the VMM has smallest footprint (less than 5kB).


6.8.2 Isolation TestTrustZone provides strong isolation between the secure and normal zones. Inorder to test this isolation, a fork bomb [54] test was executed in the LinuxOS while observing the output of the RTOS. A fork bomb is a simple andpopular test that consists of a process that continuously replicates itself. Asa result, all the system resources become depleted, and consequently, thesystem can either slow down, becomes non-responsive, or even crash.

Depending on the machine, this test takes time until it creates a sufficientnumber of processes to deplete the system, as was observed. After afew seconds, the fork bomb test caused the Linux system to become non-responsive, while the RTOS continued to operate as expected.

6.8.3 Board-to-Board CommunicationIn order to present a proof of concept for a distributed system, two boards areconnected via an Ethernet connection. Each board is running an instance ofSHAPE along with a number of SHAPE services. The goal is to demonstratea system that is adaptable to changing environment conditions, such as theautomatic detection and matching of services. If a service moves from onenode to another it is automatically detected and reconnected with its co-services. The boards are connected via an Ethernet communication medium,which facilitates communication between the control station (local computer)and boards. As a result, all devices are connected via a router as depictedin Figure 6.4.

6.8.4 Dual-OS Communication SHAPE ServiceThis test examines the shared memory communication SHAPE service, whichallows the GPOS and RTOS to communicate. In this case, the control stationwas used as the first node, and a single board was used as a second node.Since the shared memory communication is local to the software running onthe Zynq device, the Shared Memory Monitor Service (SMMS) is instantiatedin the Zynq board, and the consumer services are instantiated in the controlstation.

In this configuration the RTOS has a cyclic task that continuously readsthe value of the switches, writes back that value to the LEDs, and sends it tothe shared memory buffer. Every time the RTOS performs a write operationto the shared memory buffer, a new data event triggers the SMMS to readand dequeue the buffer. Then, the SMMS broadcasts the information to allassociated services. Therefore, as the position of the switches changes, the

6.8. System Test and Results 57

LED display updates, and the printed message on the screen of the RTOSand all receiving services reflect the new value. This behavior was indeedobserved during this test.

Chapter 7

Conclusion and Future Work

Thoughts that reflect upon the built system, achieved results, and how itcould be improved via future work are presented in this section.

7.1 ConclusionSystem virtualization has proven to be a suitable technique for achievinga mixed-criticality embedded system. It can encapsulate subsystems intovirtual execution environments that provide strong isolation such that theoperation of critical systems can not be affected by errors or attacks onthe non-critical systems. As was presented in chapter 4, there are manyvirtualization solutions to consider during the design phase of a project, andmost of them are influenced by the available hardware support. In previousyears, before the semiconductor industry introduced virtualization support inhardware, only binary translation and paravirtualization techniques enabledsystem virtualization. Both of these approaches are expensive to implement,and yield less than satisfactory perform-ance. Therefore, it is importantto select appropriate hardware that facilitates system virtualization, andreduces the development and maintenance costs. The selection processfor this project did not follow this pattern, but instead used the availablehardware platform as a constraint, which ultimately lead to the selectionof a solution that leveraged ARM’s TrustZone technology in order to realizesystem virtualization. This hardware assisted virtualization technique enablethe safe and secure concurrent execution of a GPOS (Linux) and a RTOS(TOPPERS/FMP).

In addition to creating a platform suitable for mixed-criticality applica-tions, this project also aimed at integrating a local cloud software (SHAPE)that can manage multiple boards. SHAPE enables subsystems (Zedboards

59

60 Chapter 7. Conclusion and Future Work

in this case) to be automatically detected and serviced in a plug-and-playfashion. As a result of this work, SHAPE was ported to the embeddedLinux environment together with a shared communication mechanism thatenables shape services to communicate with the RTOS partition of the systemwithout incurring additional safety and security risks. This setup unlocks thepossibility of reporting system status of internal sensors to a monitor screen,or to coordinate different control systems.

Lastly, the system included a Network-on Chip subsystem that is usedto demonstrate the integration of the PS and PL, and provide an efficientscalability mechanism for additional processing nodes.

In conclusion, the embedded platform satisfied all the project requirements.

7.2 Future workThere are many areas of research and development available in the field ofMCSs. There are many useful features that could be added to this system,which would enhance the system’s performance. The following is a short listof such features:

• Experiment with process migration to enable load balancing andconsolidation features

• Enable Normal and Secure association in the PL region

• Implement a rich multimedia OS in Normal zone such as Android

• Improve the SMMS by enabling it to accept write requests from otherSSs

• Enable SHAPE to search database and find consumer/producer services

Bibliography

[1] Bajpai, Vikas and Gorthi, Ravi Prakash, “On non-functionalrequirements: A survey,” in Electrical, Electronics and ComputerScience (SCEECS), 2012 IEEE Students’ Conference on. IEEE, 2012,pp. 1–4.

[2] Burns, Alan and Davis, Robert, “Mixed criticality systems-a review,”Department of Computer Science, University of York, Tech. Rep, 2013.

[3] Trujillo, S. and Crespo, A. and Alonso, A., “MultiPARTES: MulticoreVirtualization for Mixed-Criticality Systems,” in Digital SystemDesign (DSD), 2013 Euromicro Conference on, Sept 2013. doi:10.1109/DSD.2013.37 pp. 260–265.

[4] D. Kleidermacher, “Chapter 7 - System Virtualization in MulticoreSystems,” in Real World Multicore Embedded Systems, Moyer, Bryon,Ed. Oxford: Newnes, 2013, pp. 227 – 267. ISBN 978-0-12-416018-7.[Online]. Available: http://www.sciencedirect.com/science/article/pii/B9780124160187000079

[5] “EMC2 public Webpage,” accessed 2016-01-26. [Online]. Available:http://www.artemis-emc2.eu/?id=23

[6] Weber, W. and Hoess, A. and Oppenheimer, F. and Koppenhoefer,B. and Vissers, B. and Nordmoen, B., “EMC2 a Platform Projecton Embedded Microcontrollers in Applications of Mobility, Industryand the Internet of Things,” in Digital System Design (DSD), 2015Euromicro Conference on, Aug 2015. doi: 10.1109/DSD.2015.12 pp.125–130.

[7] Salloum, C.E. and Elshuber, M. and Hoftberger, O. and Isakovic, H. andWasicek, A., “The ACROSS MPSoC – A New Generation of Multi-coreProcessors Designed for Safety-Critical Embedded Systems,” in DigitalSystem Design (DSD), 2012 15th Euromicro Conference on, Sept 2012.doi: 10.1109/DSD.2012.126 pp. 105–113.

61

http://www.sciencedirect.com/science/article/pii/B9780124160187000079

http://www.sciencedirect.com/science/article/pii/B9780124160187000079

http://www.artemis-emc2.eu/?id=23

62 Bibliography

[8] Rajasekar, S. and Philominathan, P. and Chinnathambi, V., “Researchmethodology,” arXiv preprint physics/0601009, 2006.

[9] H. Thane, “Testing and Safety Standards,” Safety IntegrityAB. [Online]. Available: http://swell.weebly.com/uploads/1/4/3/4/1434953/swell_safety_and_verification_20111007d.pdf

[10] S. Bologna, G. Dahll, G. Picciolo, and R. Taylor, “Safety application ofcomputer based systems for the process industry,” Report for the ESSIproject, vol. 21542, 1997.

[11] Hilderman, Vance and Baghi, Tony, Avionics certification: acomplete guide to DO-178 (software), DO-254 (hardware). AvionicsCommunications, 2007.

[12] International Electrotechnical Commission and others, “Functionalsafety of electrical/electronic/programmable electronic safety relatedsystems,” IEC 61508, 2000.

[13] EN50129, CENELEC, “Railway applications-Communication, signallingand processing systems-Safety related electronic systems for signalling,”British Standards Institution, United Kingdom. ISBN, pp. 0580–4181,2003.

[14] ISO, ISO, “26262–Road vehicles-Functional safety,” InternationalStandard ISO/FDIS, vol. 26262, 2011.

[15] M. Paulitsch and O. M. Duarte and H. Karray and K. Mueller andD. Muench and J. Nowotsch, “Mixed-Criticality Embedded Systems– A Balance Ensuring Partitioning and Performance,” in DigitalSystem Design (DSD), 2015 Euromicro Conference on, Aug 2015. doi:10.1109/DSD.2015.100 pp. 453–461.

[16] ARM, ARM, “Security Technology Building a Secure System UsingTrustZone Technology (white paper),” ARM Limited, 2009.

[17] sel4Wiki, “Frequently Asked Questions,” accessed 03-28-2016. [Online].Available: https://wiki.sel4.systems/FrequentlyAskedQuestions

[18] Uchiyama, Kunio and Arakawa, Fumio and Kasahara, Hironori andNojiri, Tohru and Noda, Hideyuki and Tawara, Yasuhiro and Idehara,Akio and Iwata, Kenichi and Shikano, Hiroaki, Heterogeneous multicoreprocessor technologies for embedded systems. Springer, 2012.

http://swell.weebly.com/uploads/1/4/3/4/1434953/swell_safety_and_verification_20111007d.pdf

http://swell.weebly.com/uploads/1/4/3/4/1434953/swell_safety_and_verification_20111007d.pdf

https://wiki.sel4.systems/FrequentlyAskedQuestions

Bibliography 63

[19] J. Pendlum and M. Leeser and K. Chowdhury, “Reducing ProcessingLatency with a Heterogeneous FPGA-Processor Framework,” inField-Programmable Custom Computing Machines (FCCM), 2014IEEE 22nd Annual International Symposium on, May 2014. doi:10.1109/FCCM.2014.13 pp. 17–20.

[20] S. Vestal, “Preemptive Scheduling of Multi-criticality Systems withVarying Degrees of Execution Time Assurance,” in Real-Time SystemsSymposium, 2007. RTSS 2007. 28th IEEE International, Dec 2007. doi:10.1109/RTSS.2007.47. ISSN 1052-8725 pp. 239–243.

[21] Borkar, Shekhar, “Thousand core chips: a technology perspective,” inProceedings of the 44th annual Design Automation Conference. ACM,2007, pp. 746–749.

[22] Crockett, Louise H and Elliot, Ross A and Enderwitz, Martin A andStewart, Robert W, The Zynq Book: Embedded Processing with the ArmCortex-A9 on the Xilinx Zynq-7000 All Programmable Soc. StrathclydeAcademic Media, 2014.

[23] EASA, “Certification memorandum - development assurance of airborneelectronic hardware,” Software and Complex Electronic HardwareSection, European Aviation Safety Agency, Aug 2011. [Online].Available: http://easa.europa.eu/system/files/dfu/certification-docs-certification-memorandum-EASA-CM-SWCEH-001-Development-Assurance-of-Airborne-Electronic-Hardware.pdf

[24] Xilinx, “Zynq-7000 All Programmable SoC, Technical ReferenceManual, UG585 (v1.10),” Feb 2015. [Online]. Available: http://www.xilinx.com

[25] ARM, ARM, “Architecture Reference Manual. ARMv7-A and ARMv7-R edition,” ARM DDI C, vol. 406, 2012.

[26] Marshall, David, “Understanding Full Virtualization,Paravirtualization, and Hardware Assist,” 2007.

[27] Christoffer Dall and Jason Nieh, “KVM for ARM,” in Proceedingsof the 12th Annual Linux Symposium, Ottawa, Canada, July2010. [Online]. Available: https://systems.cs.columbia.edu/archive/pub/2010/07/kvm-for-arm/

[28] Gu, Zonghua and Zhao, Qingling, “A state-of-the-art survey on real-timeissues in embedded systems virtualization,” 2012.

http://easa.europa.eu/system/files/dfu/certification-docs-certification-memorandum-EASA-CM-SWCEH-001-Development-Assurance-of-Airborne-Electronic-Hardware.pdf



http://www.xilinx.com

http://www.xilinx.com

https://systems.cs.columbia.edu/archive/pub/2010/07/kvm-for-arm/

https://systems.cs.columbia.edu/archive/pub/2010/07/kvm-for-arm/

64 Bibliography

[29] Wojtczuk, Rafal, “Subverting the Xen hypervisor,” Black Hat USA, vol.2008, 2008.

[30] DornerWorks, “Xen Zynq Distribution, User’s Manual -BETA, Xilinx-XenZynq-DOC-0001 v0.6,” March 2016. [Online].Available: http://dornerworks.com/wp-content/uploads/2016/03/XilinxXenUsersManual.pdf

[31] Lyons, Anna and Heiser, Gernot, “Mixed-criticality support in a high-assurance, generalpurpose microkernel,” in Proc. 2nd Workshop onMixed Criticality Systems (WMC), RTSS, 2014, pp. 9–14.

[32] Trustworthy Systems Team, “seL4 reference manual Version2.0.0,” NICTA-National Information and Communications TechnologyAustralia, 2006. [Online]. Available: https://github.com/seL4/seL4

[33] Sierraware, “Sierraware Overview,” accessed 03-28-2016. [Online]. Available: https://www.sierraware.com/sierraware_tee_hypervisor_overview.pdf

[34] “Introduction to the SafeG,” accessed 03-28-2016. [Online]. Available:https://www.toppers.jp/en/safeg.html

[35] VIKTOR DO, “STH, SICS Thin Hypervisor, Reference Manual,Version 0.4,” April 2013, accessed 10-04-2016. [Online]. Available:https://bitbucket.org/sicssec/sth/downloads

[36] Xen Project Wiki, “Archived/Xen ARM (PV).” [Online]. Available:http://wiki.xenproject.org/wiki/Archived/Xen_ARM_(PV)

[37] ——, “Xen ARM with Virtualization Extensions.”[Online]. Available: http://wiki.xen.org/wiki/Xen_ARM_with_Virtualization_Extensions

[38] ——. Xen ARM with Virtualization Extensionswhitepaper. [Online]. Available: http://wiki.xen.org/wiki/Xen_ARM_with_Virtualization_Extensions_whitepaper

[39] “XtratuM Hypervisor,” accessed 10-04-2016. [Online]. Available:http://www.xtratum.org

[40] De Micheli, Giovanni and Seiculescu, Ciprian and Murali, Srinivasan andBenini, Luca and Angiolini, Federico and Pullini, Antonio, “Networkson chips: from research to products,” in Design Automation Conference(DAC), 2010 47th ACM/IEEE. IEEE, 2010, pp. 300–305.

http://dornerworks.com/wp-content/uploads/2016/03/XilinxXenUsersManual.pdf

http://dornerworks.com/wp-content/uploads/2016/03/XilinxXenUsersManual.pdf

https://github.com/seL4/seL4

https://www.sierraware.com/sierraware_tee_hypervisor_overview.pdf

https://www.sierraware.com/sierraware_tee_hypervisor_overview.pdf

https://www.toppers.jp/en/safeg.html

https://bitbucket.org/sicssec/sth/downloads

http://wiki.xenproject.org/wiki/Archived/Xen_ARM_(PV)

http://wiki.xen.org/wiki/Xen_ARM_with_Virtualization_Extensions

http://wiki.xen.org/wiki/Xen_ARM_with_Virtualization_Extensions

http://wiki.xen.org/wiki/Xen_ARM_with_Virtualization_Extensions_whitepaper

http://wiki.xen.org/wiki/Xen_ARM_with_Virtualization_Extensions_whitepaper

http://www.xtratum.org

Bibliography 65

[41] Öberg, Johnny and Robino, Francesco and Attarzadeh, Hosein andSander, Ingo, “NoC System Generator: a Tool for Fast Prototypingof Multi-Core Systems on FPGAs.” DATE, 2013.

[42] J. Öberg and F. Robino, “A noc generator for the sea-of-cores era,”in FPGAWorld-2011, Proceedings of the 8th FPGAWorld Conference,Stockholm, 2011, available through ACM DL, 2011.

[43] Sakamura, K and Takada, H, “µ-ITRON version 4.0 Specification,”TRON Association.

[44] Karim Yaghmour, Jon Masters, Gilad Ben-Yossef, and Philippe Gerum,Building Embedded Linux Systems, 2nd ed. O’REILLY, 2008. ISBN978-0-596-52968-0

[45] Xilinx, Inc, “Build and Modify a Rootfs.” [Online]. Available:http://www.wiki.xilinx.com/Build+and+Modify+a+Rootfs

[46] D. P. Bovet and M. Cesati, Understanding the Linux kernel. ” O’ReillyMedia, Inc.”, 2005.

[47] Xilinx, Inc, “Build Device Tree Blob.” [Online]. Available: http://www.wiki.xilinx.com/Build+Device+Tree+Blob

[48] Andreas Lindell, Mikael Wånggren, Björn Berglund, Detlef Scholle,and Martin Kristensson, “Design Description - SHAPE demonstrationplatform,” Feb 2016, Internal documentation of SHAPE-platform.

[49] “Github OpenAMP.” [Online]. Available: https://github.com/OpenAMP/open-amp

[50] “Category:RPMsg.” [Online]. Available: http://omappedia.org/wiki/Category:RPMsg

[51] Joakim Bech, “OP-TEE, open-source security for the mass-market,”September 2014. [Online]. Available: https://www.linaro.org/blog/core-dump/op-tee-open-source-security-mass-market/

[52] Sangorrin Lopez, Daniel, “Advanced integration techniques for highlyreliable dual-OS embedded systems,” 2012.

[53] Xilinx, Inc, “Build U-Boot.” [Online]. Available: http://www.wiki.xilinx.com/Build+U-Boot

[54] “Fork bomb.” [Online]. Available: https://en.wikipedia.org/wiki/Fork_bomb

http://www.wiki.xilinx.com/Build+and+Modify+a+Rootfs

http://www.wiki.xilinx.com/Build+Device+Tree+Blob

http://www.wiki.xilinx.com/Build+Device+Tree+Blob

https://github.com/OpenAMP/open-amp

https://github.com/OpenAMP/open-amp

http://omappedia.org/wiki/Category:RPMsg

http://omappedia.org/wiki/Category:RPMsg

https://www.linaro.org/blog/core-dump/op-tee-open-source-security-mass-market/

https://www.linaro.org/blog/core-dump/op-tee-open-source-security-mass-market/

http://www.wiki.xilinx.com/Build+U-Boot

http://www.wiki.xilinx.com/Build+U-Boot

https://en.wikipedia.org/wiki/Fork_bomb

https://en.wikipedia.org/wiki/Fork_bomb

TRITA TRITA-ICT-EX-2016:164

www.kth.se

Documents

An Embedded Multi-Core Platform for Mixed-Criticality Systems1051460/... · 2016. 12. 2. · DEGREE PROJECT IN INFORMATION AND COMMUNICATION TECHNOLOGY, SECOND CYCLE, 30 CREDITS STOCKHOLM,