1 A Performance Study of Grid Workflow Engines Alexandru Iosup and Dick Epema PDS Group Delft University of Technology The Netherlands Corina Stratan Parallel

1

A Performance Study ofGrid Workflow Engines

Alexandru Iosup and Dick Epema

PDS GroupDelft University of Technology

The Netherlands

Corina Stratan

Parallel and Distributed Systems GroupPolitehnica University of Bucharest

Romania

IEEE/ACM Grid 2008, Tsukuba, JP.

2

Why are Grid Workflows Interesting?• Grids promise reliable and

easy-to-use computational infrastructure for e-Science

• Full automation from experiment design to final result

• Often, automation = workflows• Jobs comprising inter-related

computing and data-transfer tasks

3

Why is the Performance of Real Grid Workflow Engines Interesting?

• For our users• Is this system suitable for its users?• Are other systems better?

• For focusing on the right research problems• What are the interesting problems?

System configuration? Which workflow characteristics? Other problems…

• For simulation studies• Unrealistic assumptions limit the applicability of results.

How scalable are GWFEs? What overheads do they have?

4

Problem: How to Assess the Performance of Grid Workflow Engines?

• What do we want to assess?• Is testing in real environments appropriate?• What performance metrics are important?• What workflows to use?

Our goal is to develop and validate a methodology for

assessing GWFEs.

5

Outline

1. Introduction2. Methodology for Testing GWFEs3. The Methodology in Practice4. Conclusion and Future Work

6

2. Methodology for Testing GWFEsWhat to Assess?

• Traditional: raw performance metrics1. Runtime, wait time, etc.

• In addition, for Grids (failure-prone, complex environments):2. Overhead

What is the cost of using a GWFE?3. Stability

Does the system behave consistently?4. Scalability

Does the system support grid-size workloads?5. Reliability

What is the impact of dynamic resource availability?

7

2. Methodology for Testing GWFEsIs Testing in Real Environments Appropriate?

• Our approach (novel)Testing complete grid middleware stacks in real grid environments.

• Alternatives• Simulation [Ahmad & Kwok, JPDC’99]• Math. Analysis• Testing GWFEs in isolation (think unit vs. integration

testing)

8

2. Methodology for Testing GWFEsWhat Performance Metrics are Important?

Grid Resource Manager

• Overheads components: Oi, Oa, Os, Ost, Of• Raw performance: Makespan (MS), Speed-Up vs. Single/Infinite

Machine, …• Stability: internal (MS IQR/Med.), overall (MS Range/Median)• Scalability, Reliability [see article].

Grid Workflow Engine

Workflow Tasks

9

2. Methodology for Testing GWFEsWhat Workflows to Use?

• No accepted workload; no real system traces. • Sources: related simulation work, Standard Task Graph Set,

our investigation of test workflows from 2 long-term grid traces [CG Symp.’08], our model of grid bags-of-tasks validated with 7 long-term grid traces [HPDC’08].

Number of graph nodes

Graph traversal height

10

Outline


11

3. The Methodology in Practice (Selected Results)Experimental Setup

• Testing complete grid middleware stacks• Generic GWFE: a baseline GWFE implementation

• 15 PCs, [email protected], 2GB RAM, 1Gbps Ethernet• Tools: MonALISA, ServMark = DiPerF + GrenchMark.

12

3. The Methodology in Practice (Selected Results)Overhead: Impact of WL Size and Type

• Setup: DAGMan, empty jobs, C-4 (left) / many (right).• Oi >> Ost = Of. Internal state update very important.• S-1, S-3: many often updates lower system throughput.

13

3. The Methodology in Practice (Selected Results)Raw Perf.: Performance vs. Consumption

Karajan performs better than DAGMan, but runs quickly out of resources.

!!!!!!!!!!!!!!!!!!!!!!!!

Karajan DAGMan

14

3. The Methodology in Practice (Selected Results)Stability: Internal and Overall Stability• Setup:

DAGMan, 10 independent runs, C-4, 10 WFs.

• System is:• Internally stable• Overall not stable

• Need to react to system dynamics to favor under-served workflows.

15

Outline


16

Conclusion and Future Work

• Methodology for testing Grid Workflow Engines• Goals• Metrics• Workflows• Testing grid middleware stacks, not GWFEs in isolation!

• Analysis of two much used GWFEs vs. a baseline GWFE

• Future work• Apply method to more middleware stacks, in more environments• Design domain-specific workloads and assess the performance

impact of the inter-domain differences (do different domains raise different challenges?)

17

Thank you! Questions? Remarks? Observations?

Help building our community’sGrid Workloads

Archive:http://gwa.ewi.tudelft.nl

• Contact: [email protected] [google “Iosup“]

• Web site:http://www.pds.ewi.tudelft.nl PDS group articles & software

• Have (workflow-based) grid traces?

• Additional References

[HPDC’08] A. Iosup, O. Sonmez, S. Anoep, and D.H.J. Epema, The Performance of Bags-Of-Tasks in Large-Scale Distributed Computing Systems, In IEEE HPDC'08, 2008.

[CG Symp.’08] S. Ostermann, R. Prodan, T. Fahringer, and A. Iosup, On the characteristics of grid workflows, In CoreGRID Symp. 2008.

Documents

1 A Performance Study of Grid Workflow Engines Alexandru Iosup and Dick Epema PDS Group Delft University of Technology The Netherlands Corina Stratan Parallel