Static Process Scheduling Yi Sun. Overview Before execution, processes need to be scheduled and allocated with resources Objective Enhance overall

Static Process Scheduling

Yi Sun

Overview

Before execution, processes need to be scheduled and allocated with resources

Objective Enhance overall system performance metric

Process completion time and processor utilization In distributed systems: location and performance transparency

In distributed systems Local scheduling (on each node) + global scheduling Communication overhead Effect of underlying architecture Dynamic behavior of the system

Process Interaction Models

Precedence process model: Directed Acyclic Graph (DAG) Represent precedence relationships between processes Minimize total completion time of task (computation +

communication) Communication process model

Represent the need for communication between processes Optimize the total cost of communication and computation

Disjoint process model Processes can run independently and completed in finite time Maximize utilization of processors and minimize turnaround time of

processes

Process Models

Partition 4 processes onto two nodes

Communication overhead

System Performance Model

Attempt to minimize the total completion time of (makespan) of a set of interacting processes

System Performance Model (Cont.)

Related parameters OSPT: optimal sequential processing time; the best time that can be

achieved on a single processor using the best sequential algorithm CPT: concurrent processing time; the actual time achieved on a n-processor

system with the concurrent algorithm and a specific scheduling method being considered

OCPTideal: optimal concurrent processing time on an ideal system; the best time that can achieved with the concurrent algorithm being considered on an ideal n-processor system(no inter-communication overhead) and scheduled by an optimal scheduling policy

Si: the ideal speedup by using a multiple processor system over the best sequential time

Sd: the degradation of the system due to actual implementation compared to an ideal system


Pi: the computation time ofthe concurrent algorithm onnode i

(RP 1)P1

P2

P3

P4

OCPTideal

P1

P2

P3

P4

OCPTideal

P1

P2

P3

P4


(The smaller, the better)

(The larger, the better)


RP: Relative processing (algorithm) How much loss of speedup is due to the substitution of the best sequential algorithm

by an algorithm better adapted for concurrent implementation but which may have a greater total processing need

Loss of parallelism due to algorithm conversion Increase in total computation requirement

Sd Degradation of parallelism due to algorithm implementation

RC: Relative concurrency (algorithm?) How far from optimal the usage of the n-processor is RC=1 best use of the processors Theoretic loss of parallelism

: loss of parallelism when implemented on a real machine (system architecture + scheduling)

Efficiency Loss

Impact factors: scheduling, system, and communication

Efficiency Loss (Cont.)

'

)()()',(

)',(

schedsyst

ideal

idealideal

ideal

ideal

ideal

ideal

OCPT

OCPTYCPT

OCPT

YCPTYXCPT

OCPT

OCPTYXCPT

'

)()(),(

),(

systsched

ideal

ideal

ideal

ideal

ideal

OCPT

OCPTXOCPT

OCPT

XOCPTZXCPT

OCPT

OCPTZXCPT

Workload Distribution

Performance can be further improved by workload distribution Loading sharing: static workload distribution

Dispatch process to the idle processors statically upon arrival Corresponding to processor pool model

Load balancing: dynamic workload distribution Migrate processes dynamically from heavily loaded processors to lightly

loaded processors Corresponding to migration workstation model

Model by queuing theory: X/Y/c Proc. arrival time distribution:X; Service time distribution:Y; # of servers: c : arrival rate; : service rate; : migration rate : depends on channel bandwidth, migration protocol, context and state

information of the process being transferred.

Processor-Pool and Workstation Queueing Models

Static Load Sharing

Dynamic Load Balancing

M for Markovian distribution

Comparison of Performance for Workload Sharing

(Communication overhead)

(Negligible Communication overhead)

))((

1

2

1

TT

TT

Static Process Scheduling

Static process scheduling: deterministic scheduling policy Scheduling a set of partially ordered tasks on a non-preemptive

multi-processor system of identical processors to minimize the overall finishing time (makespan)

Optimize makespan NP-complete Need approximate or heuristic algorithms…

Attempt to balance and overlap computation and communication Mapping processes to processors is determined before the execution

Once a process starts, it stays at the processor until completion Need prior knowledge about process behavior (execution time,

precedence relationships, communication patterns) Scheduling decision is centralized and non-adaptive

Precedence Process and Communication System Models

Communication overhead for one messageExecution time

No. of messagesto communicate

Communication overhead for A(P1) and E(P3)= 4 * 2 = 8

Precedence Process Model

Precedence Process Model – NP-complete A program is represented by a DAG (Figure 5.5 (a))

Node: task with a known execution time Edge: weight showing message units to be transferred

Communication system model: Figure 5.5 (b) Scheduling strategies

List Scheduling (LS): no processor remains idle if there are some tasks available that it could process (no communication overhead)

Extended List Scheduling (ELS): LS first + communication overhead Earliest Task First (ETF) scheduling: the earliest schedulable task is scheduled first

Critical path: longest execution path Lower bound of the makespan Try to map all tasks in a critical path onto a single processor

Makespan Calculation for LS, ELS, and ETF

Communication Process Model

Communication process model Maximize resource utilization and minimize inter-process

communication Undirected graph G=(V,E)

V: Processes E: weight = amount of interaction between processes

Cost equation

e = process execution cost (cost to run process j on processor i) C = communication cost (C==0 if i==j) Again!!! NP-Complete

)( )(),(

, ),()(),(GVj GEji

jijiij ppcpePGCost

Stone’s two-processor model to achieve minimum total execution and

communication cost Example: Figure 5.7 (Don’t consider execution cost)

Partition the graph by drawing a line cutting through some edges Result in two disjoint graphs, one for each process Set of removed edges cut set

Cost of cut set sum of weights of the edges Total inter-process communication cost between processors

Of course, the cost of cut sets is 0 if all processes are assigned to the same node Computation constraints (no more k, distribute evenly…)

Example: Figure 5.8 (Consider execution cost) Maximum flow and minimum cut in a commodity-flow network

Find the maximum flow from source to destination

Computation Cost and Communication Graphs

Minimum-Cost Cut

Only the cuts that separate A and Bare feasible

Discussion – Static Process Scheduling

Once a process is assigned to a processor, it remain there until its execution has been completed

Need prior knowledge of execution time and communication behavior Not realistic

Reference

Distributed Operating Systems & Algorithms, by Randy Chow and Theodore Johnson, 1997

Documents

Static Process Scheduling Yi Sun. Overview Before execution, processes need to be scheduled and allocated with resources Objective Enhance overall