25
19-06-22 Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel and Distributed Systems Group Delft University of Technology http:// guardg.st.ewi.tudelft.nl/

11-2-2014 Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel

Embed Size (px)

Citation preview

Page 1: 11-2-2014 Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel

10-04-23

Challenge the future

DelftUniversity ofTechnology

Overprovisioningfor Performance Consistency

in Grids

Nezih Yigitbasi and Dick Epema

Parallel and Distributed Systems GroupDelft University of Technology

http://guardg.st.ewi.tudelft.nl/

Page 2: 11-2-2014 Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel

2

The Problem: Performance inconsistency in grids

~70X

• Inconsistent performance common in grids• bursty workloads

• variable background loads

• high rate of failures

• highly dynamic & heterogeneous environment

Bag-of-Tasks with 128 tasks

submitted every 15 minutes

How can we provide consistent performance in grids?How can we provide consistent performance in grids?

Page 3: 11-2-2014 Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel

3

GOAL-1Realistic performance evaluation of static and dynamic overprovisioning strategies (system’s perspective)

GOAL-2Dynamically determine the overprovisioning factor (Κ) for user specified performance requirements (user’s perspective)

Our goals

Page 4: 11-2-2014 Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel

4

Outline

Overprovisioning Strategies

Experimental Setup

Results

Dynamically Determining Κ

Conclusions

Page 5: 11-2-2014 Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel

5

Overprovisioning (I)• Increasing the system capacity to provide better, and in

particular, consistent performance even under variable workloads and unexpected demands

Pros• simple • obviates the need for complex algorithms• easy to deploy & maintain

Cons• cost-ineffective• workloads may evolve (e.g., increasing user base)• lowly-utilized systems

Page 6: 11-2-2014 Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel

6

Overprovisioning (II)• Preferred way of providing performance guarantees• typical data center utilization is no more than 15-50%• telecommunication systems have ~30% on average

L. A. Barroso and U. Hölzle, The Case for Energy-Proportional Computing,

IEEE Computer, December 2007.

L. A. Barroso and U. Hölzle, The Case for Energy-Proportional Computing,

IEEE Computer, December 2007.

• High overprovisioning factors (Κ) are common in modern systems

• Google: 450,000 (2005)• Microsoft: 218,000 (mid-

2008)• Facebook: 10,000+ (2009)

Page 7: 11-2-2014 Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel

7

1. Statici. Largestii. Alliii. Number

• Where should we deploy the resources?• Does it make any difference?

2. Dynamic• Dynamic overprovisioning

• a.k.a. auto-scaling• low/high thresholds for acquiring/releasing resources

• Given Κ, it is straightforward to determine the number of processors for a strategy

Overprovisioning strategies

Time

Static Dynamic

Waste

Demand

Page 8: 11-2-2014 Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel

8

Outline

Overprovisioning Strategies

Experimental Setup

Results

Dynamically Determining Κ

Conclusions

Page 9: 11-2-2014 Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel

9

System model

• DAS-3 multi-cluster grid• Global Resource Managers (GRM)

interacting with Local Resource Managers (LRM)GRM

globalqueue

LRM

local queues

local jobsglobal job

LRM

LRM

Page 10: 11-2-2014 Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel

10

Workload

• Realistic workloads consisting of Bag-of-Tasks (BoT)

• Simulations using 10 workloads with 80% load• each workload has ~1650 BoTs and ~10K tasks• duration of each workload is [1 day-1week]

• Real background load trace • DAS-3 trace of June’08 (http://gwa.ewi.tudelft.nl/)

(Distribution parameters are determined after base-two log transformation)

Page 11: 11-2-2014 Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel

11

Scheduling model

Page 12: 11-2-2014 Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel

12

Methodology• Compare the overprovisioned system with the initial system (NO)

• For Dynamic

• 69/129 s and 18/23 s for min/max acquisition/release

• 60%/70% for low/high thresholds

• Κ varies over time so for a fair comparison keep it in ± 10% range

Page 13: 11-2-2014 Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel

13

Traditional performance metrics

First task submitted Last task done

Makespan

Page 14: 11-2-2014 Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel

14

Consistency metrics

• We define two metrics to capture the notion of consistency across two dimensions

• System gets more consistent as Cd gets closer to 1, Cs gets closer to 0

• A tighter range of the NSL is a sign of better consistency

Page 15: 11-2-2014 Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel

15

Outline

Overprovisioning Strategies

Experimental Setup

Results

Dynamically Determining Κ

Conclusions

Page 16: 11-2-2014 Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel

16

Performance of scheduling policies

ECT is the worst

Dynamic Per Task

is the best

Page 17: 11-2-2014 Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel

17

Performance of different strategies

Different Overprovisioning Factors (Κ)DifferentStrategies

• Consistency obtained with overprovisioning is much better than the initial system (NO)

• Static strategies provide similar performance (only K matters)• All and Largest are viable alternatives to Number as Number increases

the administration, installation, and maintenance costs• Dynamic strategy has better performance compared to static strategies• K = 2.5 is the critical value

Page 18: 11-2-2014 Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel

18

Cost of different strategies

• Use CPU-Hours• time a processor is used [h]• round up a partial instance-hours to one hour similar to the

Amazon EC2 on-demand instances pricing model

• Significant reduction, as high as ~40%, in cost

Page 19: 11-2-2014 Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel

19

Outline

Overprovisioning Strategies

Experimental Setup

Results

Dynamically Determining Κ

Conclusions

Page 20: 11-2-2014 Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel

20

Determining Κ dynamically

• So far system’s perspective, now user’s perspective

• How can we dynamically determine Κ given the user performance requirements?

• We use a simple feedback-control approach to deploy additional resources dynamically to meet user performance requirements

Page 21: 11-2-2014 Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel

21

Evaluation

• Simulated DAS-3 without background load

• ~1.5 month workload consisting of ~33K BoTs• Empirically show that the controller stabilizes

• Average makespan for the workload in the initial system (without the controller) is ~3120 minutes

• Three scenarios from tight to loose performance requirements• [250m-300m]• [700m-750m]• [1000m-1250m]

Page 22: 11-2-2014 Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel

22

Results (I)

•Significant improvement, as high as ~65%, when the performance requirements are tight

•~40%-50% improvement for loose performance requirements

Page 23: 11-2-2014 Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel

23

Results (II)

[250m-300m] [700m-750m]

[1000m-1250m]

Page 24: 11-2-2014 Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel

24

Conclusions

• Overprovisioning improves performance consistency significantly• Static strategies provide similar performance (only K matters)• Dynamic strategy performs better than the static strategies• Need to determine the critical value to maximize the benefit of overprovisioning

GOAL-2: Dynamically Determining GOAL-2: Dynamically Determining ΚΚ for Given User for Given User Performance RequirementsPerformance Requirements

• Feedback-controlled system tuning K dynamically using historical

performance data and specified performance requirements

• The number of BoTs meeting the performance requirements increases

significantly, as high as 65%, compared to the initial system

GOAL-1: Realistic Performance Evaluation of Different GOAL-1: Realistic Performance Evaluation of Different StrategiesStrategies

Page 25: 11-2-2014 Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel

25

More Information:

•Guard-g Project: http://guardg.st.ewi.tudelft.nl/

•PDS publication database: http://www.pds.twi.tudelft.nl

Thank you! Questions? Comments?Thank you! Questions? Comments?

[email protected]”http://www.st.ewi.tudelft.nl/~nezih/

[email protected]”http://www.st.ewi.tudelft.nl/~nezih/