A Fully Preemptive Multiprocessor Semaphore Protocol for ...bbb/papers/talks/ecrts13b.pdf ·...

Björn B. Brandenburgbbb@mpi-sws.org

ECRTS’13July 12, 2013

A Fully PreemptiveMultiprocessor Semaphore Protocol forLatency-Sensitive Real-Time Applications

MPI-SWS Brandenburg

A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications

A Rhetorical Question

On uniprocessors, why do we usethe priority inheritance protocol (PIP)or the priority ceiling protocol (PCP)

instead of simple non-preemptive sections?

AUTOSAR Non-Preemptive Critical Section:SuspendAllInterrupts(…);// critical sectionResumeAllInterrupts(…);

MPI-SWS Brandenburg

uniprocessor, non-preemptive critical sections

long lower-prioritycritical section (CS)

release deadline

unrelated, latency-sensitive high-priority task

less time-criticallower-priority tasks

RT 101: Preemptive Synchronization Matters

MPI-SWS Brandenburg

uniprocessor, non-preemptive critical sections

release deadline

Deadline miss due to latency increase!

Long non-preemptive critical section.

MPI-SWS Brandenburg

long lower-priority CS

uniprocessor, with PIP

release deadline

MPI-SWS Brandenburg

long lower-priority CS

uniprocessor, with PIP

release deadline

Latency-sensitive taskisolated from unrelated critical section!

Lower-priority critical section: fully preemptive execution.

MPI-SWS Brandenburg

partitioned multiprocessor scheduling

(on same core)

What if we host the same workload on a multiprocessor?

The Multiprocessor Case

MPI-SWS Brandenburg

partitioned multiprocessor scheduling

The Multiprocessor Case

release deadline

No existing real-time semaphore protocolfor partitioned or clustered scheduling

isolates high-priority tasks from unrelated CSs.

(on same core)

MPCP, FMLP, FMLP+, OMLP, …

MPI-SWS Brandenburg

This Paper

Independence preservation formalizes the idea that“tasks should never be delayed by unrelated critical sections.”

MPI-SWS Brandenburg

This Paper

Independence preservation isimpossible without (limited) job migrations.

MPI-SWS Brandenburg

This Paper

First independence-preserving semaphore protocolfor clustered/partitioned scheduling; the protocol also has

asymptotically optimal blocking bounds.

MPI-SWS

Brandenburg

Clustered JLFP Scheduling

Main Memory

L2 Cache

Core 1 Core 2 Core 4Core 3

Main Memory

L2 Cache

global schedulingMain Memory

L2 Cache

J1 J2 J3 J4

partitioned scheduling

c … number of processors per clusterm … number of processors (total)

c = 1 c = mclustered scheduling

1 ≤ c ≤ m

Job-Level Fixed-Priority Scheduling (JLFP)

MPI-SWS

Brandenburg

Main Memory

L2 Cache

Main Memory

L2 Cache

J1 J2 J3 J4

1 ≤ c ≤ m

This talk: Partitioned Fixed-Priority (P-FP) Scheduling

MPI-SWS

Brandenburg

Main Memory

L2 Cache

Main Memory

L2 Cache

J1 J2 J3 J4

1 ≤ c ≤ m

Task model: implicit-deadline sporadic tasks(choice of deadline constraint irrelevant to results)

MPI-SWS

Brandenburg

Real-Time Semaphore Protocols

Binary Semaphoresin POSIX

pthread_mutex_lock(…)// critical sectionpthread_mutex_unlock(…)

A blocked task suspends& yields the processor.

MPI-SWS

Brandenburg

Priority Inversion

A job should be scheduled, but is not.

PI-Blocking: increase in worst-case response time due to priority inversions.

MPI-SWS

Brandenburg

Priority Inversion

Goal: bounded pi-blocking.

Bounded in terms of critical section lengths only!

MPI-SWS

Brandenburg

Assumptions➡ Unnested critical sections.➡ Suspension-oblivious schedulability analysis.

Priority Inversion

Part 1

Avoiding Delays due toUnrelated Critical Sections

MPI-SWS Brandenburg

Independence Preservation

(specific to s-oblivious analysis)

“Tasks should never be delayed by unrelated critical sections.”

MPI-SWS Brandenburg

Let bi,q denote the maximum pi-blocking incurred by task Ti due to requests for resource q.

Let Ni,q denote the maximum number of times that any job of Ti accesses resource q.

Under an independence-preserving locking protocol,if Ni,q = 0, then bi,q = 0.

“You only pay for what you use.”

MPI-SWS Brandenburg

Let bi,q denote the maximum pi-blocking incurred by task Ti due to requests for resource q.

Let Ni,q denote the maximum number of times that any job of Ti accesses resource q.

Under an independence-preserving locking protocol,if Ni,q = 0, then bi,q = 0.

“You only pay for what you use.”

Isolation useful for:latency-sensitive workloads (if no delay can be tolerated) or

if low-priority tasks contain unknown or untrusted critical sections.

MPI-SWS Brandenburg

real-time locking protocol

queue structure

progress mechanism= +

MPI-SWS Brandenburg

real-time locking protocol = queue

structureprogress

mechanism +

Ensure that a lock holder is scheduled(while waiting tasks incur pi-blocking).

How to order conflicting critical sections(e.g., priority queue, FIFO queues).

MPI-SWS Brandenburg

structure+

global schedulingpriority inheritance

partitioned schedulingpriority boosting

clustered schedulingpriority donation

progress mechanism

MPI-SWS Brandenburg

structure+

progress mechanism

Priority boosting and Priority Donation:lock-holding jobs have higher priority than non-lock-holding jobs

➔ effectively non-preemptive ➔ not independence preserving

MPI-SWS Brandenburg

structure+

Existing independence-preserving locking protocols:

Global PIP, Global FMLP, Global OMLP, …

progress mechanism

MPI-SWS Brandenburg

Observation

Independence preservation + bounded priority inversionrequires intra-cluster job migrations.

Main Memory

L2 Cache

Main Memory

L2 Cache

J1 J2 J3 J4

partitioned scheduling clustered scheduling

MPI-SWS Brandenburg

Observation

Independence preservation + bounded priority inversionrequires intra-cluster job migrations.

Main Memory

L2 Cache

Main Memory

L2 Cache

J1 J2 J3 J4

partitioned scheduling clustered scheduling

Intra-cluster: (temporarily) execute jobs on processors/clusters they have not been assigned to.

MPI-SWS Brandenburg

Example: Job Migration is Necessary

three tasks, two cores, one resource, P-FP scheduling

MPI-SWS Brandenburg

T2 starts executing critical section…

MPI-SWS Brandenburg

e 2 Job of T1 is released.

What to do with in-progress critical section?

MPI-SWS Brandenburg

critical section

Case 1: priority boosting (=let T2 continue).

MPI-SWS Brandenburg

critical section

CSpi-blocked

MPI-SWS Brandenburg

CSpi-blocked

Benefit: T3 incurs only bounded pi-blocking, meets deadline.

critical section

MPI-SWS Brandenburg

CSpi-blocked

critical section

Problem: T1 misses its deadline.

MPI-SWS Brandenburg

Case 2: independence preservation (= preempt T2).

MPI-SWS Brandenburg

Independence preservation: T1 meets its deadline.

MPI-SWS Brandenburg

CSpi-blocked

MPI-SWS Brandenburg

CSpi-blocked

Problem: T3 incurs “unbounded” pi-blocking, misses deadline!

MPI-SWS Brandenburg

Partitioned Scheduling with Migrations?

MPI-SWS Brandenburg

Partitioned By Necessity

migrations infeasiblefor lack of technical capability

Core 1

Local Memory

Shared Memory

Local Memory

Core 2 Core 3 Core 4

J1 J2 J3 J4

E.g., SoC with heterogeneous cores (ARM, PowerPC, x86, MIPS).

MPI-SWS Brandenburg

Partitioned By Necessity

Core 1

Local Memory

Shared Memory

Local Memory

J1 J2 J3 J4

➔ independence preservation and bounded priority inversion impossible to achieve!

E.g., SoC with heterogeneous cores (ARM, PowerPC, x86, MIPS).

MPI-SWS Brandenburg

Main Memory

L2 Cache

Core 1 Core 2 Core 4Core 3Q1

J1 J2 J3 J4

migrations disallowedbut technically feasible

Partitioned By ChoicePartitioned By Necessity

Core 1

Local Memory

Shared Memory

Local Memory

J1 J2 J3 J4

MPI-SWS Brandenburg

Main Memory

L2 Cache

Core 1 Core 2 Core 4Core 3Q1

J1 J2 J3 J4

migrations disallowedbut technically feasible

Partitioned By ChoicePartitioned By Necessity

Core 1

Local Memory

Shared Memory

Local Memory

J1 J2 J3 J4

Occasional migrations not desirable, but possible!(Focus of this work.)

MPI-SWS Brandenburg

1) Ensure independence preservation (= preempt T2).

Independence preservation: T1 meets its deadline.

MPI-SWS Brandenburg

Easy fix: migrate T2 when T3 suspends.

CSpi-blocked

2) Ensure bounded pi-blocking (= schedule T2).

MPI-SWS Brandenburg

CSpi-blocked

Temporarily move T3 to Core 2…

MPI-SWS Brandenburg

CSpi-blocked

Benefit: T3 incurs only bounded pi-blocking, meets deadline.

MPI-SWS Brandenburg

Theorem

Under non-global scheduling (c ≠ m), it is impossible for a semaphore protocol to simultaneously

(i) prevent unbounded pi-blocking,(ii) be independence-preserving, and(iii) avoid inter-cluster job migrations.

Pick any two…

MPI-SWS

Brandenburg

Combinations of Properties

(i) & (iii)➡ MPCP, Part. FMLP, FMLP+, OMLP, …

(ii) & (iii)➡ Applying PIP to partitioned scheduling (not sound!)

(i) & (ii)➡ no such protocol known!

Under non-global scheduling (c ≠ m), it is impossible for a semaphore protocol to simultaneously

(i) prevent unbounded pi-blocking,(ii) be independence-preserving, and(iii) avoid inter-cluster job migrations.

Part 2

Independence Preservation+

Asymptotically OptimalPI-Blocking

MPI-SWS Brandenburg

High-Level Overview

structureprogress

mechanism +

MPI-SWS Brandenburg

High-Level Overview

structureprogress

mechanism +

Must beindependence-preserving.

Must ensureasymptotic optimality.

MPI-SWS Brandenburg

High-Level Overview

structureprogress

mechanism +

Adopt intuition from example:when lock holder is preempted,

migrate to blocked task’s processor.

MPI-SWS Brandenburg

Migratory Priority Inheritance

classic priority inheritanceinherit priority of blocked jobs

MPI-SWS Brandenburg

“cluster inheritance”inherit eligibility to execute

on assigned clustersfrom blocked jobs

MPI-SWS Brandenburg

“cluster inheritance”inherit eligibility to execute

on assigned clustersfrom blocked jobs

Jobs remain fully preemptive even in critical sections.➔ enables independence preservation

MPI-SWS Brandenburg

High-Level Overview

structureprogress

mechanism +

MPI-SWS Brandenburg

High-Level Overview

structureprogress

mechanism +

Resolve (most) contention within clusters:use a multi-level queue.

MPI-SWS Brandenburg

A 3-Level FIFO/FIFO/PRIO Queue

one 3-level queue for each resource

Cluster 1

Cluster 2

Cluster K

shared resource

MPI-SWS Brandenburg

Cluster 1

Cluster 2

Cluster K

PQ1priority queue

PQ2priority queue

PQKpriority queue

shared resource

MPI-SWS Brandenburg

Cluster 1

Cluster 2

FIFO Queue FQ1

FIFO Queue FQ2

Cluster K

FIFO Queue FQK

PQ1priority queue

PQ2priority queue

PQKpriority queue

shared resource

MPI-SWS Brandenburg

Cluster 1

Cluster 2

FIFO Queue FQ1

FIFO Queue FQ2

Cluster K

FIFO Queue FQK

PQ1priority queue

PQ2priority queue

PQKpriority queue

shared resource

Bounded length: at most c jobs (in each cluster).(c = number of cores in cluster)

MPI-SWS Brandenburg

Cluster 1

Cluster 2

FIFO Queue FQ1

FIFO Queue FQ2

Cluster K

FIFO Queue FQK

PQ1priority queue

PQ2priority queue

PQKpriority queue

shared resource

Priority queue used only if more than c jobs contend.(c = number of cores in cluster)

MPI-SWS Brandenburg

Cluster 1

Cluster 2

FIFO Queue FQ1

FIFO Queue FQ2

Cluster K

FIFO Queue FQK

PQ1priority queue

PQ2priority queue

PQKpriority queue

FIFO Queue GQ

MPI-SWS Brandenburg

Cluster 1

Cluster 2

FIFO Queue FQ1

FIFO Queue FQ2

Cluster K

FIFO Queue FQK

PQ1priority queue

PQ2priority queue

PQKpriority queue

FIFO Queue GQ

Global FIFO Queue resolves inter-cluster contention.

Bounded length: at most K = m / c jobs (one per cluster).

(m = total number of cores, c = number of cores per cluster)

MPI-SWS Brandenburg

The O(m) Independence-Preserving Locking Protocol (OMIP)

TheOMIP = 3-level F/F/P

queuemigratory priority

inheritance +

independence-preserving O(m) s-obliviouspi-blocking

MPI-SWS Brandenburg

The O(m) Independence-Preserving Locking Protocol (OMIP)

TheOMIP = 3-level F/F/P

queuemigratory priority

inheritance +

independence-preserving O(m) s-obliviouspi-blocking

Ω(m) lower bound on s-oblivious pi-blocking (— & Anderson, 2010)

➔ The OMIP ensures asymptotically optimal s-oblivious pi-blocking.

Part 3

Evaluation

MPI-SWS

Brandenburg

Prototype Implementation

Linux Testbed for Multiprocessor Scheduling in Real-Time Systems

www.litmus-rt.org

3-level queues➡ easy (reuse Linux wait queues)➡ cheap compared to syscall

Migratory priority inheritance➡ more tricky (need to avoid global locks)➡ store bitmap of cores “offering” to

schedule lock holder in each lock

MPI-SWS

Brandenburg

Response Times Experiment

Setup➡ 4 tasks on each core (one independent & latency-sensitive)➡ one shared resource➡ max. critical section length: ~1ms

on an 8-core, 2-Ghz Xeon X7550 System

Core 1 Core 8

MPI-SWS

Brandenburg

Core 1 Core 8

One latency-sensitive, independent task with

period = 1ms.

MPI-SWS

Brandenburg

Core 1 Core 8

Three tasks (on each core) withperiods 25 ms, 100 ms, and 1000 ms.

Each job of these tasks locks the resource once.

MPI-SWS

Brandenburg

Three Configurations➡ No locks (unsound!)‣ no blocking (baseline)

➡ Clustered OMLP‣ priority donation

➡ OMIP‣migratory priority inheritance

Experiment➡ Measured response times with sched_trace

➡ 30-minute traces➡ more than 45 million jobs

Core 1 Core 8

MPI-SWS Brandenburg

Response Time CDF

0.00%$

10.00%$

20.00%$

30.00%$

40.00%$

50.00%$

60.00%$

70.00%$

80.00%$

90.00%$

100.00%$

0$ 0.1$ 0.2$ 0.3$ 0.4$ 0.5$ 0.6$ 0.7$ 0.8$ 0.9$ 1$

P(respon

se)*me))≤)X))

response)*me)(in)ms)))

NONE$CDF$

OMIP$CDF$

OMLP$CDF$

higher is better

increasing response time

MPI-SWS Brandenburg

Response Time CDF of 1-ms Tasks

0.00%$

10.00%$

20.00%$

30.00%$

40.00%$

50.00%$

60.00%$

70.00%$

80.00%$

90.00%$

100.00%$

0$ 0.1$ 0.2$ 0.3$ 0.4$ 0.5$ 0.6$ 0.7$ 0.8$ 0.9$ 1$

P(respon

se)*me))≤)X))

NONE$CDF$

OMIP$CDF$

OMLP$CDF$

OMLPOMIP & NONE

(overlapping)

MPI-SWS Brandenburg

0.00%$

10.00%$

20.00%$

30.00%$

40.00%$

50.00%$

60.00%$

70.00%$

80.00%$

90.00%$

100.00%$

0$ 0.1$ 0.2$ 0.3$ 0.4$ 0.5$ 0.6$ 0.7$ 0.8$ 0.9$ 1$

P(respon

se)*me))≤)X))

NONE$CDF$

OMIP$CDF$

OMLP$CDF$

With priority donation (or priority boosting), ~20% of the jobs of the

1ms-tasks miss their deadline.

MPI-SWS Brandenburg

0.00%$

10.00%$

20.00%$

30.00%$

40.00%$

50.00%$

60.00%$

70.00%$

80.00%$

90.00%$

100.00%$

0$ 0.1$ 0.2$ 0.3$ 0.4$ 0.5$ 0.6$ 0.7$ 0.8$ 0.9$ 1$

P(respon

se)*me))≤)X))

NONE$CDF$

OMIP$CDF$

OMLP$CDF$Response time distribution under OMIP

equivalent to case without locks.

(OMIP & NONE curves overlap)

MPI-SWS Brandenburg

0.00%$

10.00%$

20.00%$

30.00%$

40.00%$

50.00%$

60.00%$

70.00%$

80.00%$

90.00%$

100.00%$

0$ 10$ 20$ 30$ 40$ 50$ 60$ 70$ 80$ 90$ 100$

P(respon

se)*me))≤)X))

response)*me)(in)ms))

NONE$CDF$

OMIP$CDF$

OMLP$CDF$

MPI-SWS Brandenburg

0.00%$

10.00%$

20.00%$

30.00%$

40.00%$

50.00%$

60.00%$

70.00%$

80.00%$

90.00%$

100.00%$

0$ 10$ 20$ 30$ 40$ 50$ 60$ 70$ 80$ 90$ 100$

P(respon

se)*me))≤)X))

NONE$CDF$

OMIP$CDF$

OMLP$CDF$

NONEOMLP

MPI-SWS Brandenburg

0.00%$

10.00%$

20.00%$

30.00%$

40.00%$

50.00%$

60.00%$

70.00%$

80.00%$

90.00%$

100.00%$

0$ 10$ 20$ 30$ 40$ 50$ 60$ 70$ 80$ 90$ 100$

P(respon

se)*me))≤)X))

NONE$CDF$

OMIP$CDF$

OMLP$CDF$OMIP: blocking shifted to lower-

priority (= later-deadline) jobs.

MPI-SWS

Brandenburg

Analytical Blocking/Latency Tradeoff

Large-scale schedulability experiments➡ Varied #tasks, #cores, #resources, max. critical section lengths, etc.➡ >150,000,000 task sets➡ 678 schedulability plots, available in online appendix

MPI-SWS

Brandenburg

In the presence of latency-sensitive tasks, the OMIP is generally the only viable option.

MPI-SWS

Brandenburg

In the presence of latency-sensitive tasks, the OMIP is generally the only viable option.

Without latency-sensitive tasks, the OMIP does not offer substantial improvements.

Conclusion

MPI-SWS Brandenburg

Summary

Independence preservation formalizes the idea that“tasks should not be delayed by unrelated critical sections.”

The OMIP is the first independence-preserving semaphore protocol for clustered scheduling. It ensures

asymptotically optimal s-oblivious pi-blocking.

MPI-SWS Brandenburg

Future Work

Suspension-Aware Analysis

Nesting Budget Overruns

MPI-SWS Brandenburg

Thanks!

Linux Testbed for Multiprocessor Scheduling in Real-Time Systems

www.litmus-rt.org

SchedCATSchedulability test

Collection And Toolkitwww.mpi-sws.org/~bbb/

projects/schedcat

Appendix

MPI-SWS

Brandenburg

Design Inspirations

Migrate to Blocked Task’s CPU➡ “Local helping” in TU Dresden’s Fiasco/L4‣Hohmuth & Peter (2001)

➡ Multiprocessor bandwidth inheritance (MBWI)‣Faggioli, Lipari, & Cucinotta (2010)

Queue Design➡ Intra-cluster queues adopted from global OMLP‣— & Anderson (2010)

➡ Inter-cluster queues similar to clustered OMLP‣— & Anderson (2011)

MPI-SWS

Brandenburg

What about overheads?

Aren’t job migrations expensive?➡ response time experiments reflect all overheads in real system➡ latency-sensitive tasks do not migrate, only lower-priority tasks do➡ only working set of critical section migrates (likely small), not

entire task working set (likely much larger)➡ the critical section would have been preempted anyway

0.00%$

10.00%$

20.00%$

30.00%$

40.00%$

50.00%$

60.00%$

70.00%$

80.00%$

90.00%$

100.00%$

0$ 10$ 20$ 30$ 40$ 50$ 60$ 70$ 80$ 90$ 100$

P(respon

se)*me))≤)X))

NONE$CDF$

OMIP$CDF$

OMLP$CDF$

0.00%$

10.00%$

20.00%$

30.00%$

40.00%$

50.00%$

60.00%$

70.00%$

80.00%$

90.00%$

100.00%$

0$ 0.1$ 0.2$ 0.3$ 0.4$ 0.5$ 0.6$ 0.7$ 0.8$ 0.9$ 1$

P(respon

se)*me))≤)X))

NONE$CDF$

OMIP$CDF$

OMLP$CDF$

MPI-SWS Brandenburg

Blocking Analysis

FIFO Queue FQPQ priority

queueFIFO Queue GQ

MPI-SWS Brandenburg

Blocking Analysis

queueFIFO Queue GQ

at mostK - 1 = m / c - 1

queued jobs

at mostc - 1

queued jobs

MPI-SWS Brandenburg

Blocking Analysis

queueFIFO Queue GQ

at mostK - 1 = m / c - 1

queued jobs

at mostc - 1

queued jobs

at mostm / c - 1

blocking CS

at most(c - 1) · (m / c)

blocking CS

MPI-SWS Brandenburg

Blocking Analysis

queueFIFO Queue GQ

at mostK - 1 = m / c - 1

queued jobs

at mostc - 1

queued jobs

at mostm / c - 1

blocking CS

at most(c - 1) · (m / c)

blocking CS

At mostm / c - 1 + (c - 1) · (m / c) = m - 1 = O(m)

blocking critical sections.

MPI-SWS Brandenburg

Blocking Analysis

queueFIFO Queue GQ

at mostK - 1 = m / c - 1

queued jobs

at mostc - 1

queued jobs

at mostm / c - 1

blocking CS

at most(c - 1) · (m / c)

blocking CS

At mostm / c - 1 + (c - 1) · (m / c) = m - 1 = O(m)

blocking critical sections.

Under s-oblivious analysis:at most O(m) critical sections

cause pi-blocking.

A Fully Preemptive Multiprocessor Semaphore Protocol for ...bbb/papers/talks/ecrts13b.pdf ·...

Documents

5. semaphore alphabet

Preemptive Scheduling Of A Multiprocessor System With Memories

The Semaphore Circular - Cobseo

Non-Preemptive and Limited Preemptive Scheduling · Non-Preemptive and Limited Preemptive Scheduling ... Scheduling costs: Time the scheduling Algorithm needs to ... It simpli es

Semaphore Twinsoft Manual

Semaphore Design Patterns

William Sandqvist william@kth.se. Some symbols Binary Semaphore Counting Semaphore Mutex Semaphore Message Queue Event Flag Mailbox Task ISR Timer

Anlgesia Preemptive

Preemptive Scheduling Model

Semaphore NS

Java Semaphore (Part 1)

Group19 Semaphore

Empirical or preemptive

Company Profile- Semaphore

3. Semaphore TN

Solution With Semaphore

Semaphore Training HARLAN R. DICKSON DIVISION. U. S. Naval Sea Cadet CorpsCOMPASS :: What is Semaphore? Semaphore is a method

PreEmptive Analytics Win32 API User's Guide · PREEMPTIVE SOLUTIONS PREEMPTIVE ANALYTICS Win32 Client API User’s Guide Version 1.0

Semaphore in UNIX Notes

THE SEMAPHORE - Fox Valley Divisionfoxvalleydivision.org/images/semaphorefiles/semaphore201212.pdf · THE SEMAPHORE December 2012 HAPPY HOLIDAYS ! from the Semaphore crew and all