Download pptx - Data Placement and Task Scheduling in cloud, Online and Offline 2014.11.27 赵青天津科技大学 [email protected]

Data Placement and Task Scheduling in cloud, Online and Offline

2014.11.27

赵青天津科技大学[email protected]

Motivation

● Increase the corresponding speed and throughput

● Guarantee QoS

● Energy Efficient and Green Computing

Overview

● Data placement for data-intensive application

● Task scheduling for QoS and energy efficiency

● Online task scheduling

1. Data Placement for data-intensive application

● Data clustering based on data correlation

if put every 2at different nodes?

how much data transfer amount would be in-creased

BEA

Hierarchical clustering tree

Objective:Place the close-related data items together so as to decrease data transfersContributions: 1. Introduced data size factors2. Issued “First Order Conduction

Correlation” from intermediate data

● Data distribution

Storage capacity, computation load balance “Tree-to-Tree” greedy allocation strategy

Modified PSO algorithm


● Cloud platform modeling

Physical network structure/ BEA

Objective:Make the frequent data movements happene on high-speed channels so as to improve network utilization and the efficiency of the whole cloud system.


● Runtime data placement— Newly generated datasets will be saved to the data center which has

the maximum dependency with it— The cost of re-distribution itself will also be taken into account.

● Results: by the greedy allocation strategies

10% 20% 30% 50%1800

1900

2000

2100

2200

2300

2400

2500

No.3 strategy (without runtime algorithm)

No.4 strategy (with runtime algorithm)

DongYuan's strategy

No.5 strategy

prediction error rate

Tota

l D

ata

Movem

ent

Am

ount

10% 20% 30% 50%1800

1900

2000

2100

2200

2300

2400

2500

No.3 strategy (without runtime algorithm)

No.4 strategy (with runtime algorithm)

DongYuan's strategy

No.5 strategy

prediction error rate

Tota

l Tim

e C

onsum

ed b

y M

ovem

ent

2. Task Scheduling and Virtual Machine Allocation

● Objective:— Distribute the tasks with strong data dependences to the servers on a

high-bandwidth connection, and turn off some of these servers with low utilization

— Therefore:• the response time can be reduced

• the utilization of system wide can be improved

• some idle network devices can also be turned off

● Task Clustering by— Hypergraph partitioning— BEA Transformation

Efficient & Energy Saving!


● Requirement of tasks— Storage requirement— Computing Resource requirement: represented by VMs.

● Task Scheduling and Deadline constraint:

:)1( mxVM x ),( xmem

xcpu VCVC

),,(:)1( ideadline

iii TWCETVMnit

Decrease the number of VMs as much as possible, while ensuring users’ Service Level Agreements.


● Physical machine allocation— Optimization objective: energy efficiency, high-bandwidth networks,

load balance

— Greedy Strategy: • Each server’s energy efficiency

• TRD (Task Requirement Degree)

• Top-Down & Bottom-up: reduce data transfers, and improve network utilization

• Load balance

— Constraint conditions: storage capacity, CPU and memory constraints— Other Methods: Genetic algorithms, PSO algorithms

Optimal utilization level in terms of performance-per-watt:Commonly,

yOpt%70yOpt

yyy

yyyx OptUtilUtil

OptUtilorUtilTRD

0,

0,0

%1001

x

CPU

m

x

xCPU

yx

y C

RQUtil

3. Online Scheduling

● Problems:— How to schedule the tasks in a fine-grained workflow?— How to deal with some variable conditions at runtime?

● Reinforcement learning based methods

T

t tttt sasrhR

1 11 ),,()(

dhhRhpJ )()|()(

),|(),|()()|( 111 tttttTt saasspsphp

The goal of RL is to find the optimal pol-icy parameter

)(maxarg* J

Agent Environment

State s

Action a

Reward r

3. Online Scheduling

Example: Cart-Pole Swing-up

● Task: swing up the pole by moving the cart

● State (2-D continuous): angle , and velocity of the pole

● Action (1-D continuous): force applied to cart

● Reward:

]2,0[ ]3,3[

)cos(),,( 11 tttt sasr

Thank for your time！

Download pptx - Data Placement and Task Scheduling in cloud, Online and Offline 2014.11.27 赵青 天津科技大学 [email protected]

Download pptx - Data Placement and Task Scheduling in cloud, Online and Offline 2014.11.27 赵青天津科技大学 [email protected]