30
CMAS CPSWeek’15 Consolidation of Real-Time Systems into a Multi-Core Platform [email protected] [email protected] Raj Rajkumar Hyoseung Kim

Consolidation of Real-Time Systems into a Multi-Core Platformrtsl-edge.cs.illinois.edu/CMAS/media/08-Hyoseung.pdf · Bounding Memory Interference Delay • Provides an upper bound

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Consolidation of Real-Time Systems into a Multi-Core Platformrtsl-edge.cs.illinois.edu/CMAS/media/08-Hyoseung.pdf · Bounding Memory Interference Delay • Provides an upper bound

CMAS CPSWeek’15

Consolidation of Real-Time Systems

into a Multi-Core Platform

[email protected] [email protected]

Raj Rajkumar Hyoseung Kim

Page 2: Consolidation of Real-Time Systems into a Multi-Core Platformrtsl-edge.cs.illinois.edu/CMAS/media/08-Hyoseung.pdf · Bounding Memory Interference Delay • Provides an upper bound

CMAS CPSWeek’15

Consolidation onto Multicores

• Benefits

– Reduces the number of CPUs and wiring harness among them

– Leads to a significant reduction in cost and space requirements

Vendor #1 Vendor #2

Vendor #3 Vendor #4

Real-time System

Consolidation

Multi-core SBC

Single-core SBCs

2

Page 3: Consolidation of Real-Time Systems into a Multi-Core Platformrtsl-edge.cs.illinois.edu/CMAS/media/08-Hyoseung.pdf · Bounding Memory Interference Delay • Provides an upper bound

CMAS CPSWeek’15

Key Challenges

Timing predictability without losing efficiency

Temporal isolation among consolidated workloads

(Quantification & minimization of temporal interference)

Use of standardized COTS multi-core platforms

3

Page 4: Consolidation of Real-Time Systems into a Multi-Core Platformrtsl-edge.cs.illinois.edu/CMAS/media/08-Hyoseung.pdf · Bounding Memory Interference Delay • Provides an upper bound

CMAS CPSWeek’15

Multi-Core Memory Hierarchy

Core 1 Core 2 Core 3 Core P

L1/L2

Cache

L1/L2

Cache

L1/L2

Cache

L1/L2

Cache

Last-level Shared Cache

Memory Controller

DRAM

Bank 1

DRAM

Bank 2

DRAM

Bank 3

DRAM

Bank B

Cache

interference

Memory bus

interference

DRAM bank

interference

4

Page 5: Consolidation of Real-Time Systems into a Multi-Core Platformrtsl-edge.cs.illinois.edu/CMAS/media/08-Hyoseung.pdf · Bounding Memory Interference Delay • Provides an upper bound

CMAS CPSWeek’15

Last-Level Shared Cache

• A large cache shared among processing cores

– Reduces memory access time (7 – 10x faster than DRAM access)

– Mitigates DRAM bandwidth consumption

– Allows high cache utilization

Intel Core i7

8-15 MB L3 Cache

Freescale P4080

2MB L3 Cache

5

Page 6: Consolidation of Real-Time Systems into a Multi-Core Platformrtsl-edge.cs.illinois.edu/CMAS/media/08-Hyoseung.pdf · Bounding Memory Interference Delay • Provides an upper bound

CMAS CPSWeek’15

Uncontrolled Use of Shared Cache

1. Inter-core Interference

P1 P2 P3

L1

L2

L1

L2

L1

L2

L3

P4

L1

L2

Core

Private

Cache

Shared

Cache

2. Intra-core Interference

P1 P2 P3

L1

L2

L1

L2

L1

L2

L3

P4

L1

L2

Core

Private

Cache

Shared

Cache

43% Slowdown 27% Slowdown (C:2.5ms, T:10ms)

* PARSEC Benchmark on Intel i7 6

Page 7: Consolidation of Real-Time Systems into a Multi-Core Platformrtsl-edge.cs.illinois.edu/CMAS/media/08-Hyoseung.pdf · Bounding Memory Interference Delay • Provides an upper bound

CMAS CPSWeek’15

Coordinated Cache Management

• Challenges

– Uncontrolled shared cache: Cache interference penalties

– Cache partitioning:

• Limited number of cache partitions

• Cache under-utilization

• Key idea: Controlled sharing of partitioned caches

while ensuring timing predictability

1. Provides predictability on multi-core real-time systems

2. Addresses the limitations of cache partitioning

[ECRTS14] Hyoseung Kim, Arvind Kandhalu, and Raj Rajkumar. A Coordinated Approach for Practical OS-Level Cache Management in Multi-Core Real-Time Systems. In ECRTS, 2013.

7

Page 8: Consolidation of Real-Time Systems into a Multi-Core Platformrtsl-edge.cs.illinois.edu/CMAS/media/08-Hyoseung.pdf · Bounding Memory Interference Delay • Provides an upper bound

CMAS CPSWeek’15

Coordinated Cache Management

Tasks

+

+

+

Bounded Penalties

Memory partitions

Cache partitions

1

2

3

NP -1

NP

1. Per-core Cache

Reservation

2. Reserved Cache Sharing

3. Cache-Aware

Task Allocation

Task Parameters

Coordinated Cache

Management

τi :(Ci

p, Ti, Di, Mi )

Core

1

Core

2

Core

NC

Pa

rtit

ion

ed

Fix

ed

-pri

ori

ty S

ch

ed

uli

ng

Pa

ge

Co

lori

ng

(C

ac

he

Pa

rtit

ion

ing

)

Per-core cache reservation

Prevent Inter-core cache interference Reserved cache sharing: Mitigate the problems with page coloring

Considerations 1. Analyzing intra-core interference

2. Guaranteeing memory requirements

8

Page 9: Consolidation of Real-Time Systems into a Multi-Core Platformrtsl-edge.cs.illinois.edu/CMAS/media/08-Hyoseung.pdf · Bounding Memory Interference Delay • Provides an upper bound

CMAS CPSWeek’15

Coordinated Cache Management

Tasks

+

+

+

Bounded Penalties

Memory partitions

Cache partitions

1

2

3

NP -1

NP

1. Per-core Cache

Reservation

2. Reserved Cache Sharing

3. Cache-Aware

Task Allocation

Coordinated Cache

Management Core

1

Core

2

Core

NC

Pa

rtit

ion

ed

Fix

ed

-pri

ori

ty S

ch

ed

uli

ng

Pa

ge

Co

lori

ng

(C

ac

he

Pa

rtit

ion

ing

)

Task Parameters

τi :(Ci

p, Ti, Di, Mi )

Cache-Aware Task Allocation

• Algorithm to allocate tasks and cache partitions to cores

• Exploits the benefits of cache sharing by load concentration

9

Page 10: Consolidation of Real-Time Systems into a Multi-Core Platformrtsl-edge.cs.illinois.edu/CMAS/media/08-Hyoseung.pdf · Bounding Memory Interference Delay • Provides an upper bound

CMAS CPSWeek’15

Multi-Core Memory Hierarchy

Core 1 Core 2 Core 3 Core P

L1/L2

Cache

L1/L2

Cache

L1/L2

Cache

L1/L2

Cache

Last-level Shared Cache

Memory Controller

DRAM

Bank 1

DRAM

Bank 2

DRAM

Bank 3

DRAM

Bank B

Cache

interference

Memory bus

interference

DRAM bank

interference

10

Page 11: Consolidation of Real-Time Systems into a Multi-Core Platformrtsl-edge.cs.illinois.edu/CMAS/media/08-Hyoseung.pdf · Bounding Memory Interference Delay • Provides an upper bound

CMAS CPSWeek’15

DRAM Organization

DRAM Rank CHIP

1

CHIP

2

CHIP

3

CHIP

4

CHIP

5

CHIP

6

CHIP

7

CHIP

8

Data bus 8-bit

Address bus

Command bus

64-bit

DRAM Chip

Bank 1

Com

ma

nd d

eco

de

r

Data bus

Address bus

Command bus

8-bit

Bank ... Bank 8

Columns

Row

s

Row buffer

Row

de

co

der

Column decoder

Ro

w a

dd

ress

Column address

DRAM access latency depends on which row is currently in the row buffer

Row hit

Row conflict

11

Page 12: Consolidation of Real-Time Systems into a Multi-Core Platformrtsl-edge.cs.illinois.edu/CMAS/media/08-Hyoseung.pdf · Bounding Memory Interference Delay • Provides an upper bound

CMAS CPSWeek’15

Memory Controller

Request buffer Bank 1 Priority Queue

Bank 2 Priority Queue

Bank B Priority Queue

...

Bank 1 Scheduler

Bank 2 Scheduler

Bank B Scheduler

...

Channel Scheduler

Memory scheduler

Read/

Write

Buffers

DRAM address/command buses

Processor data bus

DRAM data bus

Memory requests from CPU cores

Two-level hierarchical

scheduling structure

(FR-FCFS)

Inter-bank interference at channel scheduler

Intra-bank interference at bank scheduler Memory interference

12

Page 13: Consolidation of Real-Time Systems into a Multi-Core Platformrtsl-edge.cs.illinois.edu/CMAS/media/08-Hyoseung.pdf · Bounding Memory Interference Delay • Provides an upper bound

CMAS CPSWeek’15

DRAM Bank Partitioning & Sharing

• DRAM bank partitioning: software method to assign dedicated DRAM

banks to each core (or task)

• Problems

– Limited number of DRAM bank partitions

– Reduced usable memory size

Bank 1 Scheduler

Bank 2 Scheduler

Channel Scheduler

Core 1 Core 2

(1) w/o bank partitioning

Bank 1 Scheduler

Bank 2 Scheduler

Channel Scheduler

Core 1 Core 2

(2) w/ bank partitioning

Intra-bank and inter-bank

interference Inter-bank interference only

DRAM bank sharing

13

Page 14: Consolidation of Real-Time Systems into a Multi-Core Platformrtsl-edge.cs.illinois.edu/CMAS/media/08-Hyoseung.pdf · Bounding Memory Interference Delay • Provides an upper bound

CMAS CPSWeek’15

Bounding Memory Interference Delay

• Provides an upper bound on memory interference delay

– Intra-bank and inter-bank interference delays

– Private and shared DRAM banks

Response-Time Based

Schedulability Analysis

[RTAS14] Hyoseung Kim et al. Bounding Memory Interference Delay in COTS-based Multi-Core Systems. In IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), 2014. Best Paper Award

Request-driven bounding

Task’s own memory requests

Job-driven bounding

Interfering memory requests

during the task execution

14

Page 15: Consolidation of Real-Time Systems into a Multi-Core Platformrtsl-edge.cs.illinois.edu/CMAS/media/08-Hyoseung.pdf · Bounding Memory Interference Delay • Provides an upper bound

CMAS CPSWeek’15

Multi-Core Memory Hierarchy

Core 1 Core 2 Core 3 Core P

L1/L2

Cache

L1/L2

Cache

L1/L2

Cache

L1/L2

Cache

Last-level Shared Cache

Memory Controller

DRAM

Bank 1

DRAM

Bank 2

DRAM

Bank 3

DRAM

Bank B …

Cache

interference

Memory bus

interference

DRAM bank

interference

Operating System

Task τ1

Task τ2

Task τ3

Task τ4

Task τ5

Task τ6

15

Page 16: Consolidation of Real-Time Systems into a Multi-Core Platformrtsl-edge.cs.illinois.edu/CMAS/media/08-Hyoseung.pdf · Bounding Memory Interference Delay • Provides an upper bound

CMAS CPSWeek’15

Linux/RK

• CMU’s open source real-time extensions to the Linux kernel

• Resource kernel (RK) approach

• Resource reservation: tasks can reserve a portion of system resources

– Ex) CPU reserve: (budget, replenishment period, CPU affinity, policy)

• Guarantees resource allocations at admission time

• Enforces resource usage

Resource

Set

CPU

Reserve Task

1

Network

Reserve

Memory

Reserve

Task

2

Task

3

16

Page 17: Consolidation of Real-Time Systems into a Multi-Core Platformrtsl-edge.cs.illinois.edu/CMAS/media/08-Hyoseung.pdf · Bounding Memory Interference Delay • Provides an upper bound

CMAS CPSWeek’15

Multi-Core Memory Management

• Linux/RK memory manager

– Supports software cache and DRAM partitioning

– Allows tasks to specify their memory size, cache, DRAM bank demands

Task i: memory reservation - Memory size = m pages - Cache partitions = h - DRAM bank partitions = b

Real-time task

c Task i: allocated pages

Linux/RK Memory Manager

Cache 1 Cache 2 Cache 3 Cache H ...

Bank 1

Bank 2

Bank B

...

...

...

Bank 1

Bank 2

Bank B

...

...

...

17

Unallocated physical pages

Page 18: Consolidation of Real-Time Systems into a Multi-Core Platformrtsl-edge.cs.illinois.edu/CMAS/media/08-Hyoseung.pdf · Bounding Memory Interference Delay • Provides an upper bound

CMAS CPSWeek’15

Cache Interference

• Task execution time and L3 misses

– Intel i7 2600 3.4GHz quad-core, 8MB L3 cache, DDR3-1333

Task Instance

10

11

12

13

14

0 10 20 30 40 50 60 70 80 90 100

Exe

cuti

on

Tim

e

(mse

c)

w/o Cache Reservation

w/ Cache Reservation (p=12)

0

3

6

9

12

0 10 20 30 40 50 60 70 80 90 100

L3 m

isse

s p

er

mse

c (x

10

00

)

w/o Cache Reservation

w/ Cache Reservation (p=12)

Solo WCET

Solo WC L3 misses

18

Page 19: Consolidation of Real-Time Systems into a Multi-Core Platformrtsl-edge.cs.illinois.edu/CMAS/media/08-Hyoseung.pdf · Bounding Memory Interference Delay • Provides an upper bound

CMAS CPSWeek’15

Memory Interference

• Shared DRAM Bank

0

200

400

600

800

1000

1200

1400

No

rm. R

esp

on

se T

ime

(%

)

black- scholes

body- track

canneal ferret fluid- animate

freq- mine

ray- trace

stream- cluster

swap- tions

vips x264

12x increase due to

memory interference

Observed

Predicted by analysis

19

Page 20: Consolidation of Real-Time Systems into a Multi-Core Platformrtsl-edge.cs.illinois.edu/CMAS/media/08-Hyoseung.pdf · Bounding Memory Interference Delay • Provides an upper bound

CMAS CPSWeek’15

Memory Interference

• Private DRAM Bank

black- scholes

body- track

canneal ferret fluid- animate

freq- mine

ray- trace

stream- cluster

swap- tions

vips x264 0

100

200

300

400

500

No

rm. R

esp

on

se T

ime

(%

)

Observed

Predicted by analysis

4.1x increase DRAM bank partitioning

helps reducing the memory interference

Our analysis enables the quantification of the benefit of DRAM bank partitioning

20

Page 21: Consolidation of Real-Time Systems into a Multi-Core Platformrtsl-edge.cs.illinois.edu/CMAS/media/08-Hyoseung.pdf · Bounding Memory Interference Delay • Provides an upper bound

CMAS CPSWeek’15

Memory Interference

• Private DRAM Bank under moderate memory contention

0

20

40

60

80

100

120

140

160

180

No

rm. R

esp

on

se T

ime

(%

) Average over-estimates are 8%

(13% for a shared bank) Observed

Predicted by analysis

black- scholes

body- track

canneal ferret fluid- animate

freq- mine

ray- trace

stream- cluster

swap- tions

vips x264

Our analysis bounds memory interference delay with low pessimism

under both severe and moderate memory contentions

21

Page 22: Consolidation of Real-Time Systems into a Multi-Core Platformrtsl-edge.cs.illinois.edu/CMAS/media/08-Hyoseung.pdf · Bounding Memory Interference Delay • Provides an upper bound

CMAS CPSWeek’15

Multi-Core Virtualization

22

Page 23: Consolidation of Real-Time Systems into a Multi-Core Platformrtsl-edge.cs.illinois.edu/CMAS/media/08-Hyoseung.pdf · Bounding Memory Interference Delay • Provides an upper bound

CMAS CPSWeek’15

Real-Time System Virtualization

• Barrier to consolidation

– Each app. could have been developed

independently by different vendors

• Bare-metal / Proprietary OS

• Linux

– Legacy software

– Different license issues

• Consolidation via virtualization

– Each application can maintain

its own implementation

– Minimizes re-certification process

– IP protection, license segregation

– Fault isolation

Virtualization

Multi-core CPU

Real-Time Hypervisor

23

Page 24: Consolidation of Real-Time Systems into a Multi-Core Platformrtsl-edge.cs.illinois.edu/CMAS/media/08-Hyoseung.pdf · Bounding Memory Interference Delay • Provides an upper bound

CMAS CPSWeek’15

VM Scheduling Structure

• Two-level hierarchical scheduling structure

– Task scheduling and VCPU scheduling

VM1

VCPU1

Task τ1

Task Scheduler

Task τ2

VCPU2

Task τ3

Task Scheduler

Task τ4

Hypervisor

Physical Core 1 (PCPU1)

VCPU Scheduler

VM2

VCPU3

Task τ5

Task Scheduler

Task τ6

VCPU4

Task τ7

Task Scheduler

Task τ8

Physical Core 2 (PCPU2)

VCPU Scheduler

24

Page 25: Consolidation of Real-Time Systems into a Multi-Core Platformrtsl-edge.cs.illinois.edu/CMAS/media/08-Hyoseung.pdf · Bounding Memory Interference Delay • Provides an upper bound

CMAS CPSWeek’15

Virt/RK

• Real-time virtualization with resource kernel approach

– CPU reservation for VCPUs

– Memory reservation for VMs

– Current implementation: Virt/RK::KVM-x86

– Under development: Virt/RK::KVM-ARM, Virt/RK::L4

VM1

VCPU1

Task τ1

Task Scheduler

Task τ2

VCPU2

Task τ3

Task Scheduler

Task τ4

VM Resource Reservation

• VCPU1: 30% of physical CPU

• VCPU2: 30% of physical CPU

• VM1: 20% of host memory w/

cache & DRAM bank

partitioning

25

Page 26: Consolidation of Real-Time Systems into a Multi-Core Platformrtsl-edge.cs.illinois.edu/CMAS/media/08-Hyoseung.pdf · Bounding Memory Interference Delay • Provides an upper bound

CMAS CPSWeek’15

Resource Sharing in Virtualization

• Consolidation inevitably causes the sharing of resources

– Sensors

– Network interfaces

– I/O devices

– Shared data

• Long blocking time in a virtualized environment

– VCPU level preemptions

– VCPU budget depletion

Requires mutually-exclusive locks

to avoid race conditions

Need a synchronization mechanism with bounded blocking times

for multi-core real-time virtualization

26

Page 27: Consolidation of Real-Time Systems into a Multi-Core Platformrtsl-edge.cs.illinois.edu/CMAS/media/08-Hyoseung.pdf · Bounding Memory Interference Delay • Provides an upper bound

CMAS CPSWeek’15

vMPCP

• Virtualization-aware Multiprocessor Priority Ceiling Protocol

– Provides bounded blocking time on accessing shared resources

in multi-core virtualization

• Hierarchical priority ceilings

• Two-level priority queue for a mutex waiting list

– Optional VCPU budget overrun to reduce blocking times

– VCPU budget replenishment policies

• Periodic server

• Deferrable server

– Implemented on Virt/RK::KVM-x86

[RTSS14] Hyoseung Kim, Shige Wang, and Raj Rajkumar. vMPCP: A Synchronization Framework for Multi-Core Virtual Machines. In IEEE Real-Time Systems Symposium (RTSS), 2014.

27

Page 28: Consolidation of Real-Time Systems into a Multi-Core Platformrtsl-edge.cs.illinois.edu/CMAS/media/08-Hyoseung.pdf · Bounding Memory Interference Delay • Provides an upper bound

CMAS CPSWeek’15

Case Study: Effect of vMPCP

Baseline

Virtualization-unaware

synchronization protocol

(MPCP)

vMPCP

Virtualization-aware

synchronization protocol

τ1

τ2

τ3

τ4

τ5

τ6

τ7

τ8 (μsec)

τ1

τ2

τ3

τ4

τ5

τ6

τ7

τ8 (μsec)

vMPCP yields 29% shorter task response time with shared resources

28

Page 29: Consolidation of Real-Time Systems into a Multi-Core Platformrtsl-edge.cs.illinois.edu/CMAS/media/08-Hyoseung.pdf · Bounding Memory Interference Delay • Provides an upper bound

CMAS CPSWeek’15

Summary

• Workload Consolidation into a Multi-core Platform

– Predictability without losing efficiency

– Quantification and minimization of temporal interference

• Multi-core memory hierarchy

– Coordinated cache management

• Cache partitioning, reserve, and sharing

– Bounding memory interference

• DRAM partitioning and sharing Quantifies the benefit of partitioning

• Memory-bus and DRAM bank interference analysis

• Real-time virtualization

– Virt/RK: resource reservation approach

– vMPCP: resource sharing for real-time virtualization

29

Page 30: Consolidation of Real-Time Systems into a Multi-Core Platformrtsl-edge.cs.illinois.edu/CMAS/media/08-Hyoseung.pdf · Bounding Memory Interference Delay • Provides an upper bound

CMAS CPSWeek’15

What next?

• Modeling state-of-the-art computer architecture techniques

– Hardware prefetchers

– Future memory controllers

– Future DRAM designs

• Extending existing real-time systems work to virtualization

– Diverse locking protocols (e.g., RW-lock)

– Fault tolerance mechanisms

– Hardware accelerations (e.g., GPGPU)

30