32
QoPS: A QoS based Scheme for Parallel Job Scheduling M. Islam P. Balaji P. Sadayappan and D. K. Panda Computer and Information Science The Ohio State University Presented by Gerald Sabin

QoPS: A QoS based Scheme for Parallel Job Scheduling

  • Upload
    trevor

  • View
    56

  • Download
    0

Embed Size (px)

DESCRIPTION

QoPS: A QoS based Scheme for Parallel Job Scheduling. M. IslamP. Balaji P. Sadayappan and D. K. Panda Computer and Information Science The Ohio State University. Presented by Gerald Sabin. Job Schedulers Today. Independent Parallel Job Scheduling Model - PowerPoint PPT Presentation

Citation preview

Page 1: QoPS: A QoS based Scheme for Parallel Job Scheduling

QoPS: A QoS based Scheme for Parallel Job Scheduling

M. Islam P. BalajiP. Sadayappan and D. K. PandaComputer and Information Science

The Ohio State University

Presented by Gerald Sabin

Page 2: QoPS: A QoS based Scheme for Parallel Job Scheduling

The Ohio State University 206/24/2003

Independent Parallel Job Scheduling Model– Dynamically arriving Independent Parallel Jobs– Resource mapping: Submitted Jobs to Resources present

Number of Techniques studied over the years– Backfilling (Ex: Conservative, EASY, No Guarantee)– Priority based scheduling

Differentiated service to different classes of jobs Soft Real-time or Best Effort guarantees to the completion time

Hard Real-Time or “Deadline-based” scheduling– Allow Users to specify the deadline they desire– Cost model based on Resources Used AND Deadline Specified– Requires a deadline-based scheduling algorithm: LONG OVERDUE !

Job Schedulers Today

Page 3: QoPS: A QoS based Scheme for Parallel Job Scheduling

The Ohio State University 306/24/2003

QoS for Job Scheduling

Two Components in providing QoS– Cost Model Component

Based on Resources Used AND Deadline Specified More urgent jobs are charged more Guarantees the service requested

– Job Scheduling Component Admission Control

– Can we meet the specified deadline?

Once admitted, cannot miss the specified deadline

We only deal with the Job Scheduling Component

Page 4: QoPS: A QoS based Scheme for Parallel Job Scheduling

The Ohio State University 406/24/2003

Overview

Related Work

The QoPS Algorithm

Simulation Approach

Experimental Results

Conclusions and Future Work

Page 5: QoPS: A QoS based Scheme for Parallel Job Scheduling

The Ohio State University 506/24/2003

Related Work Feitelson’s Slack-Based (SB) Scheduling [feit97]

– Focused on improving Utilization and Turnaround time– Jobs have an associated slack, based on their priority

This determines how much they can be delayed

Ramamritham’s Real-Time (RT) Scheduling [krithi90]– Deadline-based scheduling algorithm– Non-periodic Single Processor Jobs– Statically available at start time

[feit97]: “Supporting Priorities and Improving Utilization of the IBM SP2 Scheduler using Slack based Backfilling”, D. Talby, D. G. Feitelson, IPPS, Apr ’97[krithi90]: “Efficient Scheduling Algorithms for Real-Time Multiprocessor Systems”, K. Ramamritham, J. A. Stankovic, P-F. Shiah, TPDS, Apr ‘90

Page 6: QoPS: A QoS based Scheme for Parallel Job Scheduling

The Ohio State University 606/24/2003

Slack-Based (SB) Scheduling Algorithm

When a job (JN+1) arrives– Calculate its slack (based on its priority)– If J1, J2, …, JN are already present and scheduled in that order

– Try placing the job (JN+1) in each possible position in this list

– For each of the N+1 schedules feasible, calculate a cost function ‘f’ A schedule is feasible if no job exceeds the slack given to it

– Choose the schedule with the “best cost function value”

J1J2J3J4J5J6

J7

J1J2J3J4J5J6 J7J1J2J3J4J5J6 J7 J1J2J3J4J5J6 J7Cost Function EvaluationCost Function EvaluationCost Function Evaluation

f0 f1 f2 f3 f4 f5 f6fbest

Page 7: QoPS: A QoS based Scheme for Parallel Job Scheduling

The Ohio State University 706/24/2003

Real-Time (RT) Scheduling Algorithm

Static Scheme, so there’s no concept of new jobs arriving Sort jobs based on a heuristic function Start from a NULL schedule For each of the jobs

– If placing the job in the current schedule misses its deadline Backtrack to the last known feasible schedule

– If (number of backtracks > p) Discard the Schedule

If all jobs have been placed within their deadlines– Accept the Schedule

Page 8: QoPS: A QoS based Scheme for Parallel Job Scheduling

The Ohio State University 806/24/2003

J4

J2J4J3

J2

J1

Working of the RT AlgorithmJN JN-1 JN-2 . . . J3 J2 J1

Sorted by Earliest Deadline first (EDF)

NULL

J3

Page 9: QoPS: A QoS based Scheme for Parallel Job Scheduling

The Ohio State University 906/24/2003

Modified Slack-Based (MSB) Algorithm

Modified Slack-Based (MSB) Algorithm– Motivation of SB: To improve Utilization and Response Time

– SB assigns slack to jobs based on job priorities

– MSB assigns slack to jobs based on the deadline specified

– Rest of the algorithm is unchanged

Page 10: QoPS: A QoS based Scheme for Parallel Job Scheduling

The Ohio State University 1006/24/2003

Modified Real-Time (MRT) Algorithm

Modified Real-Time (MRT) Algorithm– RT was designed for non-periodic uni-processor jobs

– All jobs are Statically available at the start of the execution

– MRT involves two modifications to RT To allow dynamically arriving jobs

– Run the algorithm every time a job arrives

To allow scheduling of parallel jobs– Allowing backfilling of jobs

Page 11: QoPS: A QoS based Scheme for Parallel Job Scheduling

The Ohio State University 1106/24/2003

Overview

Related Work

The QoPS Algorithm Simulation Approach

Experimental Results

Conclusions and Future Work

Page 12: QoPS: A QoS based Scheme for Parallel Job Scheduling

The Ohio State University 1206/24/2003

The Basic QoPS Algorithm Similar to the MSB algorithm, but…

– Provides more flexibility in reordering scheduled jobs When a job (JN+1) arrives

– If J1, J2, …, JN are already present and scheduled in that order– Place the job (JN+1) at the start of all jobs

Try scheduling the jobs in that order– If all jobs are able to meet their deadlines, Great ! Admit it !– If some job fails, we have two options:– Option1:

Consider the failed job as a critical job Push the failed job to the start of the schedule and retry ‘k’ number of such re-orderings of existing jobs are allowed If (number of re-orderings > k) switch to option 2

– Option2: Back off exponentially in the position at which you try placing job (JN+1) and retry

Page 13: QoPS: A QoS based Scheme for Parallel Job Scheduling

The Ohio State University 1306/24/2003

J3 J2J2 J1

Working of the QoPS Algorithm

J12 J11 J7J8 J6 J5J10 J9 J4 J1

J13

J13

J13

J1

J2

J3

J3J13

J13

J1

J2

J3

J1 J13 J2J3

J13

J1

J4

J3

J2

J6 J5 J4J1J13 J2J3

Max. Violations Allowed = 2

Current Violations = 0Current Violations = 1Current Violations = 2Current Violations = 0

Page 14: QoPS: A QoS based Scheme for Parallel Job Scheduling

The Ohio State University 1406/24/2003

Overview

Related Work

The QoPS Algorithm

Simulation Approach Experimental Results

Conclusions and Future Work

Page 15: QoPS: A QoS based Scheme for Parallel Job Scheduling

The Ohio State University 1506/24/2003

Simulation ApproachCTC/SDSC Trace

Load Variation

Deadline Calculator

Deadline-based Trace

QoPSSimulation

MSBSimulation

MRTSimulation

EASYSimulation

Duplication/Expansion

Page 16: QoPS: A QoS based Scheme for Parallel Job Scheduling

The Ohio State University 1606/24/2003

Trace Generation Many job logs available, but no associated deadlines Synthetic Deadline Generation

– Generate a schedule for the job trace using EASY– For any job J, if the Turnaround time in this schedule is T– Deadline for J = Arrival Time + max (runtime, (1-SF) x T)– SF is the “Stringency factor” (0 < SF < 1)

0 would give the least stringent deadlines and 1 the most stringent

Some jobs might not come with deadlines– Very lax deadlines to prevent starvation– If ‘T’ is the current expected Turnaround time,

Deadline = Arrival Time + max (24hrs, R x T)

– R is the “Relaxation Factor” of the schedule

Page 17: QoPS: A QoS based Scheme for Parallel Job Scheduling

The Ohio State University 1706/24/2003

Overview

Related Work

The QoPS Algorithm

Simulation Approach

Experimental Results Conclusions and Future Work

Page 18: QoPS: A QoS based Scheme for Parallel Job Scheduling

The Ohio State University 1806/24/2003

Experimental Results Two evaluation scenarios

– Scenario1: All jobs have deadlines Pure comparison of the three algorithms

– Scenario2: Mixed jobs: Some have deadlines, others are artificially provided More realistic

Tests Conducted:– Job Acceptance rate– Impact on Non-deadline Jobs– Utilization Variation, etc

Page 19: QoPS: A QoS based Scheme for Parallel Job Scheduling

The Ohio State University 1906/24/2003

Admittance Capacity ComparisonUnadmitted Jobs Vs Load

0

5

10

15

20

25

1 1.1 1.2 1.3 1.4 1.5 1.6

Load (Duplication)

% U

nadm

itted

Job

s

QoPSMRTMSB

Unadmitted Proc. Secs Vs Load

0

5

10

15

20

25

30

1 1.1 1.2 1.3 1.4 1.5 1.6

Load (Duplication)

% U

nadm

itted

Pro

c. S

ecs.

QoPSMRTMSB

• All jobs have deadlines; Stringency Factor = 0.2; CTC Trace• QoPS admits the most number of jobs (and Processor Seconds)

Page 20: QoPS: A QoS based Scheme for Parallel Job Scheduling

The Ohio State University 2006/24/2003

Utilization ComparisonUtilization Vs Load

(Stringency Factor = 0.2)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 1.1 1.2 1.3 1.4 1.5 1.6

Load (Duplication)

Util

izat

ion

QoPSMRTMSBEASY

• All jobs have deadlines; CTC Trace• Deadline-based schemes lose about 10% Utilization

Utilization Vs Load(Stringency Factor = 0.5)

00.1

0.20.30.4

0.50.60.7

0.80.9

1 1.1 1.2 1.3 1.4 1.5 1.6

Load (Duplication)

Util

izat

ion

QoPSMRTMSBEASY

Page 21: QoPS: A QoS based Scheme for Parallel Job Scheduling

The Ohio State University 2106/24/2003

Admittance Capacity Comparison (Mixed Jobs)

Unadmitted Jobs Vs Load

0

2

4

6

8

10

12

1 1.1 1.2 1.3 1.4 1.5 1.6

Load (Duplication)

% U

nadm

itted

Job

s

QoPSMRTMSB

Unadmitted Proc. Secs Vs Load

0

1

2

3

4

5

6

7

8

9

1 1.1 1.2 1.3 1.4 1.5 1.6

Load (Duplication)

% U

nadm

itted

Pro

c. S

ecs. QoPS

MRTMSB

• 20% jobs have deadlines; Stringency Factor = 0.2; CTC Trace• QoPS admits the most number of jobs (and Processor Seconds)

Page 22: QoPS: A QoS based Scheme for Parallel Job Scheduling

The Ohio State University 2206/24/2003

Response Time and Slow Down Vs Load

Response Time Vs Load

0

5000

10000

15000

20000

25000

30000

35000

1 1.1 1.2 1.3 1.4 1.5 1.6

Load (Duplication)

Res

pons

e Ti

me

(sec

s)

QoPSMRTMSBEASY

• 20% jobs have deadlines; Stringency Factor = 0.2; CTC Trace• QoPS gives the best slow-down in spite of accepting more jobs; Unfair to EASY

Slow Down Vs Load

0

5

10

15

20

25

30

35

40

45

1 1.1 1.2 1.3 1.4 1.5 1.6

Load (Duplication)

Slo

w D

own

QoPSMRTMSBEASY

Page 23: QoPS: A QoS based Scheme for Parallel Job Scheduling

The Ohio State University 2306/24/2003

Utilization Vs Load (Mixed Jobs)Utilization Vs Load

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

1 1.1 1.2 1.3 1.4 1.5 1.6

Load

Util

izat

ion QoPS

MRTMSBEASY

• EASY has a higher Utilization• Accepts more (all) jobs; Unfair to the deadline-based schemes

Page 24: QoPS: A QoS based Scheme for Parallel Job Scheduling

The Ohio State University 2406/24/2003

Response Time and Slow Down Vs Utilization

• 20% jobs have deadlines; Stringency Factor = 0.2; CTC Trace• Fairer Comparison; QoPS still performs better in most cases, especially Slow

Down

Page 25: QoPS: A QoS based Scheme for Parallel Job Scheduling

The Ohio State University 2506/24/2003

Overview

Related Work

The QoPS Algorithm

Simulation Approach

Experimental Results

Conclusions and Future Work

Page 26: QoPS: A QoS based Scheme for Parallel Job Scheduling

The Ohio State University 2606/24/2003

Conclusions “Deadline-based” scheduling is desirable

– No such scheme for parallel jobs– Previous schemes can be extended, but…

Not proposed for this kind of scheduling Might not fit in perfectly

– Proposed the QoPS algorithm

Allows jobs to specify required deadlines– Admission control checks admissibility– Job Scheduler schedules admitted jobs

Outperforms extended previous schemes (MSB and MRT)– But, the main idea is not performance– Deadline Scheduling is a necessity and QoPS is an effort to meet it

Page 27: QoPS: A QoS based Scheme for Parallel Job Scheduling

The Ohio State University 2706/24/2003

Future Work

Cost Metric component in QoS Currently using a first fit mechanism

– Best fit is expected to do much better

Job Shedding Vs Non Job Shedding– If deadline can’t be met

Should we reject the job (will the user try again?) Should we give it the best available deadline

Grid based extensions to QoPS

Page 28: QoPS: A QoS based Scheme for Parallel Job Scheduling

The Ohio State University 2806/24/2003

Thank You !

Page 29: QoPS: A QoS based Scheme for Parallel Job Scheduling

Backup Slides

Page 30: QoPS: A QoS based Scheme for Parallel Job Scheduling

The Ohio State University 3006/24/2003

Admittance Capacity for SDSC traceUnadmitted Jobs Vs Load

0

5

10

15

20

25

1 1.1 1.2 1.3 1.4 1.5 1.6

Load (Duplication)

% U

nadm

itted

Job

s

QoPSMRTMSB

Unadmitted Proc. Secs Vs Load

0

5

10

15

20

25

1 1.1 1.2 1.3 1.4 1.5 1.6

Load (Duplication)

% U

nadm

itted

Pro

c. S

ecs.

QoPSMRTMSB

• All jobs have deadlines; Stringency Factor = 0.2; CTC Trace• QoPS admits the most number of jobs (and Processor Seconds)

Page 31: QoPS: A QoS based Scheme for Parallel Job Scheduling

The Ohio State University 3106/24/2003

Admittance Capacity with Job Expansion

Unadmitted Jobs Vs Load

5

5.5

6

6.5

7

7.5

8

8.5

9

9.5

1 1.1 1.2 1.3 1.4 1.5 1.6

Load (Expansion)

% U

nadm

itted

Job

s

QoPSMRTMSB

Unadmitted Proc. Secs Vs Load

15

16

17

18

19

20

21

22

1 1.1 1.2 1.3 1.4 1.5 1.6

Load (Expansion)

% U

nadm

itted

Pro

c. S

ecs.

QoPSMRTMSB

• All jobs have deadlines; Stringency Factor = 0.2; CTC Trace• QoPS admits the most number of jobs (and Processor Seconds)

Page 32: QoPS: A QoS based Scheme for Parallel Job Scheduling

The Ohio State University 3206/24/2003

Impact of Relaxation FactorAverage Response Time Vs Load

0

5000

10000

15000

20000

25000

30000

35000

1 1.1 1.2 1.3 1.4 1.5 1.6

Load (Duplication)

Ave

rage

Res

pons

e Ti

me

(sec

s)

Factor = 2Factor = 5Factor = 10

Average Slow Down Vs Load

0

2

4

6

8

10

12

14

1 1.1 1.2 1.3 1.4 1.5 1.6

Load (Duplication)

Ave

rage

Slo

w D

own

Factor = 2Factor = 5Factor = 10

• 80% jobs have deadlines; Stringency Factor = 0.2; CTC Trace• With low “R”, Longer jobs perform better (reflects in Resp.

Time)