Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Ignacio (Nacho) Cano
Ph.D. Defense
Paul G. Allen School of Computer Science & Engineering
University of Washington
December 2018
••
HDD TIER
SSD TIER
HDD TIER
SSD TIER
Virtualization Layer
Physical Layer
HYPERVISOR
VM VM VM VM
HYPERVISOR
VM VM VM VM
…
Virtualization Layer
Physical Layer
HYPERVISOR
VM VM VMVM
HYPERVISOR
VMVM
VMVM
…
West-US East-US
West-US East-US
…
Sub-optimal systems efficiency and performance
Improve systems efficiency and performance
Machine Learning to the rescue!
1. CURATOR: ML-based policy
2. ADARES: ML-based mechanism
3. PULPO: ML-System co-design
• Motivation• Challenges• CURATOR
• ADARES
• PULPO
• Conclusions
• Motivation• Challenges• CURATOR
• ADARES
• PULPO
• Conclusions
historical traces
transfer learning
sensing
reward
safe online exploration
uncertainty
• Motivation• Challenges• CURATOR
• ADARES
• PULPO
• Conclusions
Scheduling of these tasks is key to overall cluster performance
• reinforcement learning
• reinforcement learning
50% of clusters have 75% usage
Many clusters waste 25% of fast storage
Need smarter scheduling policies
AGENT
ENVIRONMENT
Action atState st Reward rt
st+1
rt+1
0
5
10
15
20
25
30
35
40
oltp oltp skewed oltp varying oltp and vdi dss
Impr
ovem
ent (
%)
latency ssd reads
METRIC AVG IMPROVEMENT
LATENCY 12 %
SSD READS 16 %
reinforcement learning
• Motivation• Challenges• CURATOR
• ADARES
• PULPO
• Conclusions
Need a system that adaptively changes resources allocated to VMs in a cluster
• contextual bandits
• decoupled highly configurable
• contextual bandits
• decoupled highly configurable
AGENT
ENVIRONMENT
Arm/Action atContext xt Reward rt
• Adaptive and improve over time
Arm/Action atContext xt
Reward rt
SOFTWARE AGENT
CLUSTER (ENVIRONMENT)
Sensing
cluster-node-vmmeasurements
Execution
Filtering
recommendation
Predictive
predict+
learn
Decision
exploitexplore
+business
logic
Less under and overprovisioning
Start End
Under and overprovisioning
Increases utilization Keeps up with IOPS
contextual bandits ••
• decoupled highly configurable•
• Motivation• Challenges• CURATOR
• ADARES
• PULPO
• Conclusions
Need a system to efficiently train ML models from geo-distributed datasets
• Trade-off in-DC computation and communication X-DC communication
• Apache Hadoop YARN and Apache REEF
• Trade-off in-DC computation and communication X-DC communication
• Apache Hadoop YARN and Apache REEF
West-US Germany
Canada
DC Coordinator
More in-DC cmp. and comm.Less X-DC communication
• Reduce communication between data centers
• Multi-level communication trees across DCs
• Learning algorithms in terms of B/R primitives
Application Layer (DML/GDML)
• Generalized control plane
• Data aggregation, communication, etc. Apache REEF
• Federated version
• Single massive YARN cluster
• Network-aware resource requests
Apache Hadoop YARN
●
● ●
101
103
2 4 6 8Data centers
X−D
C T
rans
fer (
GB)
● centralizeddistributeddistributed−fadl
CRITEO 10M CRITEO 50M – 2DC
●
●
●
●
●●
10−5
10−3
10−1
100 101 102 103
X−DC Transfer (GB)
Rel
ative
obj
. fun
ctio
n ● centralizeddistributeddistributed−fadl
Better models soonerOrders of magnitude reduction
KAGGLE 500K KAGGLE 5M
10−7
10−5
10−3
10−1
0 2 4 6 8Relative Time
Rel
ative
obj
. fun
ctio
n centralized−bulkcentralized−streamdistributeddistributed−fadl
10−7
10−5
10−3
10−1
0 10 20 30Relative Time
Rel
ative
obj
. fun
ctio
n centralized−bulkcentralized−streamdistributeddistributed−fadl
Close to optimal in low dimensional models
Deteriorates with model size
in-DC comp. and comm. X-DC communication
Apache Hadoop YARN Apache REEF
• Motivation• Challenges• CURATOR
• ADARES
• PULPO
• Conclusions
We can use Machine Learning to optimize Distributed Systems
• CURATOR• RL-based scheduling
• ADARES• Contextual bandits-based adjustments
• PULPO• Co-designed ML-System
We can use Machine Learning to optimize Distributed Systems
• CURATOR• RL-based scheduling
• ADARES• Contextual bandits-based adjustments
• PULPO• Co-designed ML-System
Operations interposed at the hypervisor level
and redirected to CVMs
Global view of cluster state
Integrated Compute-Storage
Data replicationDisk balancingVM Migration
Global view of cluster state
Scale
KEY-VALUE STORE
I/O SERVICE(foreground)
STORAGE MGMT(background)
KEY-VALUE STORE
STORAGE MGMTuses MapReduce
I/O SERVICE
• extentsextent groups
eid egid
33 1984
44 1984
56 2010
Extent Id Map
egid physical location
1984 disk1,disk2
… …
Extent Group Id Map
file
33 44 56
0 1 2 3 4
Node A
Node B
Node C
Node D
Curator (A)Slave
Curator (C)Slave
Curator (D)Slave
Curator (B)Master
Metadata Ring
Curator (A)
Curator (C)
Curator (D) Curator (B)
120 -> mtime
120 -> atime
(120,mtime,atime)
Q(st, at) = (1� ↵)Q(st, at) + ↵(rt+1 + �maxaQ(st+1, a))
0 � 1
0 ↵ 1
Learned Value
Estimate of optimal future value
Learning Rate
Discount Factor
Old Value
LEARNED not to run in those cases
run although cluster highly utilized and low latency
Hypervisor
CVM
CPUMemory
SSD
SSD
HDD
HDD
SCSI Controller
Sensing
I/O Hypervisor
CVM
CPUMemory
SSD
SSD
HDD
HDD
SCSI Controller
Sensing
I/O Hypervisor
CVM
CPUMemory
SSD
SSD
HDD
HDD
SCSI Controller
Sensing
I/O
VM Management Software
Cluster Manager
Execution
Decision
Predictive
Filtering
w/o transfer learningmem noop
with transfer learning mem up
Low memory context
West-US
More computationLess communication
Initialize w0
DC-2 DC-3
DC-1 / Coordinator
1. Initialize w0
Send w0
DC-2 DC-3
DC-1 / Coordinator
1. Initialize w0
Compute Gradient
Compute Gradient
Compute Gradient
DC-2 DC-3
DC-1 / Coordinator
1. Initialize w0
2. DCs compute gradient in parallel
Send Gradient
Send Gradient
DC-2 DC-3
DC-1 / Coordinator
1. Initialize w0
2. DCs compute gradient in parallel
Aggregate gradient
DC-2 DC-3
DC-1 / Coordinator
1. Initialize w0
2. DCs compute gradient in parallel
3. Aggregate gradient
Send aggregate gradient
DC-2 DC-3
DC-1 / Coordinator
1. Initialize w0
2. DCs compute gradient in parallel
3. Aggregate gradient
Optimize
f̂1
Optimize
f̂3
Optimize
f̂2
DC-2 DC-3
DC-1 / Coordinator
no X-DC communication
1. Initialize w0
2. DCs compute gradient in parallel
3. Aggregate gradient
4. DCs local optimization in parallel
Send Descent Direction 3
Send Descent
Direction 2
DC-2 DC-3
DC-1 / Coordinator
1. Initialize w0
2. DCs compute gradient in parallel
3. Aggregate gradient
4. DCs local optimization in parallel
Aggregate descent
direction
DC-2 DC-3
DC-1 / Coordinator
1. Initialize w0
2. DCs compute gradient in parallel
3. Aggregate gradient
4. DCs local optimization in parallel
5. Aggregate descent direction
Send aggregate descent
direction
DC-2 DC-3
DC-1 / Coordinator
1. Initialize w0
2. DCs compute gradient in parallel
3. Aggregate gradient
4. DCs local optimization in parallel
5. Aggregate descent direction
Line Search
Line SearchLine Search
DC-2 DC-3
DC-1 / Coordinator
1. Initialize w0
2. DCs compute gradient in parallel
3. Aggregate gradient
4. DCs local optimization in parallel
5. Aggregate descent direction
6. DCs do line search in parallel
Send Line Search Results
Send Line Search Results
DC-2 DC-3
DC-1 / Coordinator
1. Initialize w0
2. DCs compute gradient in parallel
3. Aggregate gradient
4. DCs local optimization in parallel
5. Aggregate descent direction
6. DCs do line search in parallel
Update model
DC-2 DC-3
DC-1 / Coordinator
1. Initialize w0
2. DCs compute gradient in parallel
3. Aggregate gradient
4. DCs local optimization in parallel
5. Aggregate descent direction
6. DCs do line search in parallel
7. Update model with best step size
Send w1
DC-2 DC-3
DC-1 / Coordinator
1. Initialize w0
2. DCs compute gradient in parallel
3. Aggregate gradient
4. DCs local optimization in parallel
5. Aggregate descent direction
6. DCs do line search in parallel
7. Update model with best step size
8. Repeat
Federated YARN cluster
Container
DC-1 DC-2 DC-3
DC Slaves
Global Master
X-DC linksX-DC linksDC Masters