17
1 A Performance Study of Grid Workflow Engines Alexandru Iosup and Dick Epema PDS Group Delft University of Technology The Netherlands Corina Stratan Parallel and Distributed Systems Group Politehnica University of Bucharest Romania IEEE/ACM Grid 2008, Tsukuba, JP.

1 A Performance Study of Grid Workflow Engines Alexandru Iosup and Dick Epema PDS Group Delft University of Technology The Netherlands Corina Stratan Parallel

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

1

A Performance Study ofGrid Workflow Engines

Alexandru Iosup and Dick Epema

PDS GroupDelft University of Technology

The Netherlands

Corina Stratan

Parallel and Distributed Systems GroupPolitehnica University of Bucharest

Romania

IEEE/ACM Grid 2008, Tsukuba, JP.

2

Why are Grid Workflows Interesting?• Grids promise reliable and

easy-to-use computational infrastructure for e-Science

• Full automation from experiment design to final result

• Often, automation = workflows• Jobs comprising inter-related

computing and data-transfer tasks

3

Why is the Performance of Real Grid Workflow Engines Interesting?

• For our users• Is this system suitable for its users?• Are other systems better?

• For focusing on the right research problems• What are the interesting problems?

System configuration? Which workflow characteristics? Other problems…

• For simulation studies• Unrealistic assumptions limit the applicability of results.

How scalable are GWFEs? What overheads do they have?

4

Problem: How to Assess the Performance of Grid Workflow Engines?

• What do we want to assess?• Is testing in real environments appropriate?• What performance metrics are important?• What workflows to use?

Our goal is to develop and validate a methodology for

assessing GWFEs.

5

Outline

1. Introduction2. Methodology for Testing GWFEs3. The Methodology in Practice4. Conclusion and Future Work

6

2. Methodology for Testing GWFEsWhat to Assess?

• Traditional: raw performance metrics1. Runtime, wait time, etc.

• In addition, for Grids (failure-prone, complex environments):2. Overhead

What is the cost of using a GWFE?3. Stability

Does the system behave consistently?4. Scalability

Does the system support grid-size workloads?5. Reliability

What is the impact of dynamic resource availability?

7

2. Methodology for Testing GWFEsIs Testing in Real Environments Appropriate?

• Our approach (novel)Testing complete grid middleware stacks in real grid environments.

• Alternatives• Simulation [Ahmad & Kwok, JPDC’99]• Math. Analysis• Testing GWFEs in isolation (think unit vs. integration

testing)

8

2. Methodology for Testing GWFEsWhat Performance Metrics are Important?

Grid Resource Manager

• Overheads components: Oi, Oa, Os, Ost, Of• Raw performance: Makespan (MS), Speed-Up vs. Single/Infinite

Machine, …• Stability: internal (MS IQR/Med.), overall (MS Range/Median)• Scalability, Reliability [see article].

Grid Workflow Engine

Workflow Tasks

9

2. Methodology for Testing GWFEsWhat Workflows to Use?

• No accepted workload; no real system traces. • Sources: related simulation work, Standard Task Graph Set,

our investigation of test workflows from 2 long-term grid traces [CG Symp.’08], our model of grid bags-of-tasks validated with 7 long-term grid traces [HPDC’08].

Number of graph nodes

Graph traversal height

10

Outline

1. Introduction2. Methodology for Testing GWFEs3. The Methodology in Practice4. Conclusion and Future Work

11

3. The Methodology in Practice (Selected Results)Experimental Setup

• Testing complete grid middleware stacks• Generic GWFE: a baseline GWFE implementation

• 15 PCs, [email protected], 2GB RAM, 1Gbps Ethernet• Tools: MonALISA, ServMark = DiPerF + GrenchMark.

12

3. The Methodology in Practice (Selected Results)Overhead: Impact of WL Size and Type

• Setup: DAGMan, empty jobs, C-4 (left) / many (right).• Oi >> Ost = Of. Internal state update very important.• S-1, S-3: many often updates lower system throughput.

13

3. The Methodology in Practice (Selected Results)Raw Perf.: Performance vs. Consumption

Karajan performs better than DAGMan, but runs quickly out of resources.

!!!!!!!!!!!!!!!!!!!!!!!!

Karajan DAGMan

14

3. The Methodology in Practice (Selected Results)Stability: Internal and Overall Stability• Setup:

DAGMan, 10 independent runs, C-4, 10 WFs.

• System is:• Internally stable• Overall not stable

• Need to react to system dynamics to favor under-served workflows.

15

Outline

1. Introduction2. Methodology for Testing GWFEs3. The Methodology in Practice4. Conclusion and Future Work

16

Conclusion and Future Work

• Methodology for testing Grid Workflow Engines• Goals• Metrics• Workflows• Testing grid middleware stacks, not GWFEs in isolation!

• Analysis of two much used GWFEs vs. a baseline GWFE

• Future work• Apply method to more middleware stacks, in more environments• Design domain-specific workloads and assess the performance

impact of the inter-domain differences (do different domains raise different challenges?)

17

Thank you! Questions? Remarks? Observations?

Help building our community’sGrid Workloads

Archive:http://gwa.ewi.tudelft.nl

• Contact: [email protected] [google “Iosup“]

• Web site:http://www.pds.ewi.tudelft.nl PDS group articles & software

• Have (workflow-based) grid traces?

• Additional References

[HPDC’08] A. Iosup, O. Sonmez, S. Anoep, and D.H.J. Epema, The Performance of Bags-Of-Tasks in Large-Scale Distributed Computing Systems, In IEEE HPDC'08, 2008.

[CG Symp.’08] S. Ostermann, R. Prodan, T. Fahringer, and A. Iosup, On the characteristics of grid workflows, In CoreGRID Symp. 2008.