Multi-core Real-Time Scheduling for Generalized Parallel Task Models Abusayeed Saifullah, Kunal Agrawal, Chenyang Lu, Christopher Gill

Multi-core Real-Time Scheduling for Generalized Parallel Task Models

Abusayeed Saifullah, Kunal Agrawal, Chenyang Lu, Christopher Gill

Multi-core processors provide an opportunity to schedule computation-intensive tasks in real-time Most of the tasks exhibit intra-task parallelism Real-time systems need to be developed to

exploit intra-task parallelism

2

Real-Time Systems on Multi-core Traditional multiprocessor scheduling

Focuses on inter-task parallelism Mostly restricted to sequential task models

Computation-intensive complex real-time tasks are growing Video surveillance Radar tracking Hybrid real-time structural testing

3

Parallel Task Model

Lakshmanan et al. (RTSS ’10) have addressed a restricted synchronous model where

Each horizontal bar indicates a thread of execution (sequence of instructions)

Parallel threads form a segment

Threads of each segment synchronize at the end of the segment

A task is an alternate sequence of parallel and sequential segments

The total number of threads in each segment ≤ number of cores

All parallel segments have an equal number of threads

Synchronous task model

Segment 1 Seg 2 Seg 3 Segment 4 Segment 5

Threads of Segment 1 synchronize here

Our Contributions

4

We address a general synchronous parallel task model Different segments may have different numbers of threads Each segment can have an arbitrary number of threads

Example: such tasks are generated by Parallel for loops in OpenMP, CilkPlus Barrier primitives in thread libraries

This model is more portable The same program can execute on machines with different

numbers of cores

A Task Example

start

end

5

void parallel_task(float *a,float *b,float *c,float * d){7

int n=7; int i=0;

parallel_for(; i< n; i++)c[i] = a[i] + b[i];

n=4; i=0;

parallel_for(; i< n; i++)d[i] = a[i] - b[i];

}

Our Contributions (contd..)

6

We propose a task decomposition for general synchronous parallel task model Decomposes each parallel task into a set of sequential subtasks Subtasks are scheduled like traditional tasks

Why decomposition? We can exploit the rich literature of multiprocessor scheduling The proposed decomposition ensures that if the decomposed tasks

are schedulable, the original task set is also schedulable

Our Contributions (contd..) We analyze schedulability in terms of processor speed

augmentation bound Speed augmentation bound ν for an Algorithm A: if an optimal

algorithm can schedule a synchronous parallel task set on unit-speed processor cores, then A can schedule the decomposed tasks on ν-speed processor cores.

We prove that the proposed decomposition requires a speed augmentation of at most 4 for Global Earliest Deadline First (G-EDF) scheduling 5 for Partitioned Deadline Monotonic (P-DM) scheduling

7

Overview of a Task Decomposition

8

Each thread of the task becomes an individual task with An intermediate subdeadline A release offset to retain precedence relations in the original task

Deadlines are assigned by distributing slack among segments

Deadline of a thread= execution requirement+ assigned slack

How much slack a segment demands depends on Available slack of the task Execution requirement of the segment

Execution requirement of a segment is the product of Total number of parallel threads in the segment and Execution requirement of each thread in the segment

Larger execution requirement implies more demand for slack In the figure, Segment 1 requires more slack than Segment 2

Slack Distribution

9

Slack Distribution (contd..)

10

We use the following principle to distribute slack All segments that receive slack will achieve an equal density

Reasons to equalize the density among segments Fairness: deadline of each segment becomes proportional to its

execution requirement We can bound the density of the decomposed tasks We can exploit existing density-based analyses for multiprocessor

€

Density of a task =execution requirement

deadline

Density of a Segment S =(total threads in S) * (exec. req.of a thread)

Assigned deadline

Slack Distribution (contd..)

11

…

Slack of each segment is determined by solving the equalities Sum of subdeadlines=task deadline (total assigned slack = task slack) Density of Segment 1= density of Segment 2 = so on

All threads in a segment have the same deadline and offset Deadline= execution requirement of the thread + segment slack Release offset=sum of deadlines of preceding segment

An Example of Task Decomposition

12

Segment 1:

deadline=20

density=(5*4)/20=1

Segment 2:

deadline=4

density=(2*2)/4=1

Segment 3:

deadline=9

density=(3*3)/9=1

Segment 4:

deadline=16

density=(4*4)/16=1

Segment 5:

deadline=3

density=(1*3)/3=1

All segments have an equal density!

Global EDF (G-EDF) Schedulability A sufficient condition for G-EDF scheduling on m unit- speed cores [Baruah RTSS ’07]

A necessary condition for any task set for any scheduler

total density

max density

If the original task set is schedulable anyway on m unit-speed cores, the decomposed tasks are schedulable under G-EDF on 4-speed cores

Using the density bounds for decomposed tasks

13

Partitioned DM (P-DM) Schedulability

A sufficient condition for FBB-FFD scheduling on m unit-speed cores

FBB-FFD (Fisher Baruah Baker – First-Fit Decreasing) is a well-known P-DM scheduler [ECRTS ’06]

A necessary condition for any scheduler

max cumulative exe. req. of tasks divided by time length

If the original task set is schedulable anyway on m unit-speed cores, the decomposed tasks are FBB-FFD schedulable on 5-speed cores

Using load and density bounds for decomposed tasks

14

Conclusion Multi-core processors provide opportunities to schedule

computation-intensive tasks in real-time Real-time systems need to exploit intra-task parallelism

We have addressed real-time scheduling for generalized synchronous parallel task model Different segments may have different number of threads Each segment can have an arbitrary number of threads

We have proposed a task decomposition that achieves A processor-speed augmentation bound of 4 for Global EDF A processor-speed augmentation bound of 5 for Partitioned DM

15

Documents

Multi-core Real-Time Scheduling for Generalized Parallel Task Models Abusayeed Saifullah, Kunal Agrawal, Chenyang Lu, Christopher Gill