22
Outline Introduction Job Scheduling - with and without SLAs Simulating SLAs-based scheduling Conclusions and next steps Discussion Simulating the usage of SLAs for job scheduling in an HPC environment Roland K¨ ubert H¨ochstleistungsrechenzentrumStuttgart January 31, 2010 Roland K¨ ubert Simulating the usage of SLAs for job scheduling in an HPC enviro

Simulating the usage of SLAs for job scheduling in an HPC environment

Embed Size (px)

DESCRIPTION

How to simulate job scheduling using SLAs in a high-performance computing environment by extending the Alea Grid Scheduling Simulator.

Citation preview

Page 1: Simulating the usage of SLAs for job scheduling in an HPC environment

OutlineIntroduction

Job Scheduling - with and without SLAsSimulating SLAs-based scheduling

Conclusions and next stepsDiscussion

Simulating the usage of SLAs for job scheduling inan HPC environment

Roland Kubert

Hochstleistungsrechenzentrum Stuttgart

January 31, 2010

Roland Kubert Simulating the usage of SLAs for job scheduling in an HPC environment

Page 2: Simulating the usage of SLAs for job scheduling in an HPC environment

OutlineIntroduction

Job Scheduling - with and without SLAsSimulating SLAs-based scheduling

Conclusions and next stepsDiscussion

1 Introduction

2 Job Scheduling - with and without SLAs

3 Simulating SLAs-based scheduling

4 Conclusions and next steps

5 Discussion

Roland Kubert Simulating the usage of SLAs for job scheduling in an HPC environment

Page 3: Simulating the usage of SLAs for job scheduling in an HPC environment

OutlineIntroduction

Job Scheduling - with and without SLAsSimulating SLAs-based scheduling

Conclusions and next stepsDiscussion

1 Introduction

2 Job Scheduling - with and without SLAs

3 Simulating SLAs-based scheduling

4 Conclusions and next steps

5 Discussion

Roland Kubert Simulating the usage of SLAs for job scheduling in an HPC environment

Page 4: Simulating the usage of SLAs for job scheduling in an HPC environment

OutlineIntroduction

Job Scheduling - with and without SLAsSimulating SLAs-based scheduling

Conclusions and next stepsDiscussion

Motivation

HPC services are only offered on best-effort basis

Scheduling parameters are few and only trivial

Work about SLAs has been performed at HLRS. . .

. . . but is on a higher level

Roland Kubert Simulating the usage of SLAs for job scheduling in an HPC environment

Page 5: Simulating the usage of SLAs for job scheduling in an HPC environment

OutlineIntroduction

Job Scheduling - with and without SLAsSimulating SLAs-based scheduling

Conclusions and next stepsDiscussion

Job scheduling

scheduling: “to plan (something) at a certain time”

Scheduling is used in many fields

Job scheduling assigns computational jobs to processing units

Roland Kubert Simulating the usage of SLAs for job scheduling in an HPC environment

Page 6: Simulating the usage of SLAs for job scheduling in an HPC environment

OutlineIntroduction

Job Scheduling - with and without SLAsSimulating SLAs-based scheduling

Conclusions and next stepsDiscussion

Service Level Agreements in one sentence

“The purpose of [a] Service Level Agreement (SLA) is to definethe services and responsibilities of the [service provider] and itsclients.” (Michigan State University High Performance ComputingCenter Service Level Agreement)

Roland Kubert Simulating the usage of SLAs for job scheduling in an HPC environment

Page 7: Simulating the usage of SLAs for job scheduling in an HPC environment

OutlineIntroduction

Job Scheduling - with and without SLAsSimulating SLAs-based scheduling

Conclusions and next stepsDiscussion

1 Introduction

2 Job Scheduling - with and without SLAs

3 Simulating SLAs-based scheduling

4 Conclusions and next steps

5 Discussion

Roland Kubert Simulating the usage of SLAs for job scheduling in an HPC environment

Page 8: Simulating the usage of SLAs for job scheduling in an HPC environment

OutlineIntroduction

Job Scheduling - with and without SLAsSimulating SLAs-based scheduling

Conclusions and next stepsDiscussion

Classical job scheduling

Objective is mostly to maximize utilization or minimizewaiting time

Various algorithms with different advantages

Either schedule-based or queue-based

Roland Kubert Simulating the usage of SLAs for job scheduling in an HPC environment

Page 9: Simulating the usage of SLAs for job scheduling in an HPC environment

OutlineIntroduction

Job Scheduling - with and without SLAsSimulating SLAs-based scheduling

Conclusions and next stepsDiscussion

Job scheduling - with SLAs

A quite popular field

Two main streams

SLAs per jobTrivial QoS parameters (Timing and resource requirements)

Relies on precise specification of job execution times

Roland Kubert Simulating the usage of SLAs for job scheduling in an HPC environment

Page 10: Simulating the usage of SLAs for job scheduling in an HPC environment

OutlineIntroduction

Job Scheduling - with and without SLAsSimulating SLAs-based scheduling

Conclusions and next stepsDiscussion

1 Introduction

2 Job Scheduling - with and without SLAs

3 Simulating SLAs-based scheduling

4 Conclusions and next steps

5 Discussion

Roland Kubert Simulating the usage of SLAs for job scheduling in an HPC environment

Page 11: Simulating the usage of SLAs for job scheduling in an HPC environment

OutlineIntroduction

Job Scheduling - with and without SLAsSimulating SLAs-based scheduling

Conclusions and next stepsDiscussion

Simulating SLA-based job scheduling

Just implementing some scheduling won’t work

Production use cannot be done without previous investigations

Therefore, use a simulation tool: Alea

Needs to be extended in order to investigate SLAs

Roland Kubert Simulating the usage of SLAs for job scheduling in an HPC environment

Page 12: Simulating the usage of SLAs for job scheduling in an HPC environment

OutlineIntroduction

Job Scheduling - with and without SLAsSimulating SLAs-based scheduling

Conclusions and next stepsDiscussion

Alea’s features

Supports different workload formats

Various scheduling algorithms already implemented

Visualization features

Free software (LGPL)

Roland Kubert Simulating the usage of SLAs for job scheduling in an HPC environment

Page 13: Simulating the usage of SLAs for job scheduling in an HPC environment

OutlineIntroduction

Job Scheduling - with and without SLAsSimulating SLAs-based scheduling

Conclusions and next stepsDiscussion

Alea’s graphs

Figure: Screenshot of Alea

Roland Kubert Simulating the usage of SLAs for job scheduling in an HPC environment

Page 14: Simulating the usage of SLAs for job scheduling in an HPC environment

OutlineIntroduction

Job Scheduling - with and without SLAsSimulating SLAs-based scheduling

Conclusions and next stepsDiscussion

Alea’s shortcomings

Many hard-coded settings (magic numbers)

No extensibility foreseen

Not really user-friendly

No further developments

Roland Kubert Simulating the usage of SLAs for job scheduling in an HPC environment

Page 15: Simulating the usage of SLAs for job scheduling in an HPC environment

OutlineIntroduction

Job Scheduling - with and without SLAsSimulating SLAs-based scheduling

Conclusions and next stepsDiscussion

Alea’s architecture

Figure: High-level architecture of Alea 2.1

Roland Kubert Simulating the usage of SLAs for job scheduling in an HPC environment

Page 16: Simulating the usage of SLAs for job scheduling in an HPC environment

OutlineIntroduction

Job Scheduling - with and without SLAsSimulating SLAs-based scheduling

Conclusions and next stepsDiscussion

Simulation of service levels

Simulation of three different service levels: gold, silver, bronze

Different service level distribution were generated andsimulated against a workload format (San DiegoSupercomputer Center’s Blue Horizon (144 nodes x 8 CPUs))

Investigated changes of waiting time with differentdistributions of service levels

Example: Gold-Silver-Bronze 0-0-100, 0-5-95, 1-4-95, 2-3-95,etc.)

Roland Kubert Simulating the usage of SLAs for job scheduling in an HPC environment

Page 17: Simulating the usage of SLAs for job scheduling in an HPC environment

OutlineIntroduction

Job Scheduling - with and without SLAsSimulating SLAs-based scheduling

Conclusions and next stepsDiscussion

Simulation results

Machine usage did not change

Introducing service level increases average wait time

Increasing number of prioritized jobs increases wait time forlower-prioritized classes

Ensuring that not too many high-priority jobs exist enablesthe service provider to give “soft” guarantees on wait time

Roland Kubert Simulating the usage of SLAs for job scheduling in an HPC environment

Page 18: Simulating the usage of SLAs for job scheduling in an HPC environment

OutlineIntroduction

Job Scheduling - with and without SLAsSimulating SLAs-based scheduling

Conclusions and next stepsDiscussion

1 Introduction

2 Job Scheduling - with and without SLAs

3 Simulating SLAs-based scheduling

4 Conclusions and next steps

5 Discussion

Roland Kubert Simulating the usage of SLAs for job scheduling in an HPC environment

Page 19: Simulating the usage of SLAs for job scheduling in an HPC environment

OutlineIntroduction

Job Scheduling - with and without SLAsSimulating SLAs-based scheduling

Conclusions and next stepsDiscussion

Conclusions

Using SLAs for scheduling is possible (duh)

Can range from trivial to complex

Simulation is a good way to examine different parameters,combinations, workloads, objective functions, ...

Publication has been accepted at PARENG 2011

Roland Kubert Simulating the usage of SLAs for job scheduling in an HPC environment

Page 20: Simulating the usage of SLAs for job scheduling in an HPC environment

OutlineIntroduction

Job Scheduling - with and without SLAsSimulating SLAs-based scheduling

Conclusions and next stepsDiscussion

Next steps

Improvements on Alea

Conceptual implementation

Queue-based against schedule-based algorithms

Additional, more complex service levels

Roland Kubert Simulating the usage of SLAs for job scheduling in an HPC environment

Page 21: Simulating the usage of SLAs for job scheduling in an HPC environment

OutlineIntroduction

Job Scheduling - with and without SLAsSimulating SLAs-based scheduling

Conclusions and next stepsDiscussion

1 Introduction

2 Job Scheduling - with and without SLAs

3 Simulating SLAs-based scheduling

4 Conclusions and next steps

5 Discussion

Roland Kubert Simulating the usage of SLAs for job scheduling in an HPC environment

Page 22: Simulating the usage of SLAs for job scheduling in an HPC environment

OutlineIntroduction

Job Scheduling - with and without SLAsSimulating SLAs-based scheduling

Conclusions and next stepsDiscussion

Questions

Figure: Flammarions Holzstich

Roland Kubert Simulating the usage of SLAs for job scheduling in an HPC environment