Upload
amarjit-singh-dhillon
View
59
Download
1
Embed Size (px)
Citation preview
DYNAMICALLY CONTROLLING NODE LEVEL PARALLELISM IN HADOOP
PRESENTED BY -
AMARJIT SINGHVIGNESH VENKATARANANUJAM
NODE LEVEL PARALLELISM
NodeMemory
5 GB
Container1 GB 1 GB
1 GB
1 GB
1 GB
1 GB
• MAXIMUM - 5 MAP/ REDUCE TASKS
P- 150
P-2
Increase /Decrease of CCS
DYNAMIC CONTROLLING
5 CCS
Manual Tuning
Time consuming / Performance Implications
Dynamic Tuning
Faster / Better Performance
DYNAMIC TUNING OF CCS USING FEEDBACK CONTROLLERS
+ ccs
- ccsPD controllerWaterlevel PD + Pruning
CPU %
Blocked I/O Processes
Context Switch
Error = Reference value – Value
User_cpu Proc_blocked Ctxt
Value
YARN ARCHITECTURE* Config Files (Momory + Virtual core Limit )
Itration = 1
CCS = 7
compute CCS
Itration = 2
CCS = 4
RM
NM
NM
NM
1
1......N
SUSPENDING CONTAINERS – PROPOSED METHOD
RM
NM
NM
NM
Wait queue
Ready Queue
Uses existing ccs to allocate containers.
Periodically compute CCS value - API
IF { New CCS < Old CCS } Suspend Containers
IF { New CCS > Old CCS & Suspended containers }
Then { Resume old containers before new containers spawn }
IF { New CCS > Old CCS & no Suspended containers }
Then { assign new containers }
CCS Alloted = 7CCS Alloted = 14CCS Alloted =21
CCS Alloted = 14
WATER-LEVEL CONTROLLER
1
0CCS
UB
LB
Water
User_cpu Proc_blocked 0 : Lower Thresh-hold 1: Upper Thresh-holdCtxt
1 0
CCS
Continuous
Increase
Continuous
Decrease
PD CONTROLLER
User_cpu Proc_blocked Score Ctxt
CCS
Scor
e
ccs =10ccs =12
ccs=14Current Timeline
Current error
[ E(J) ]
Change in error
[ E(J) – E(J-1)]
Proportionate constant
kP *
Derivative constant
kD *+CCS =
Error = score
ccs
scoreCCS
Experimental Setup
Ten IBM Power PC Machines
10 GBPS Ethernet network B/w RM & NM’s
16 cores 64 CPU Threads 124 GB RAM
For each node
RM
NM 9
NM 2
NM 1
1
Applications used for Testing
Applications are selected based on two factors CPU Utilization IO Demand
Performance Comparison
• Default Configuration is at least 50% slower for all applications
• All three dynamic approaches are much better than best practices (7-31% better)
• PD is better than WaterLevel and PD+pruning except for grep application
Tuning Methods to be Compared• Default• Best practice• Three Dynamic Controlling Methods (PD,WL &
PD+pruning)Table: Relative Comparison of map completion time for various tuning methods
Performance Comparison
• Default, best case, and exhaustive search have static CCS value
• Among dynamic approaches PD and WaterLevel changes CCS
• PD+pruning changes CCS initially, but stabilizes CCS to a fixed value after 350 second mark
Fig : Change of CCS value of all tuning approaches
• Dynamic tuning achieves the most satisfactory performance as well as CCS responsiveness
Resource Usage Comparison
Default Tuning Best practice Tuning
PD Tuning
Conclusion
Does not under utilize resources
In Performance comparison PD based dynamic controller showed improvement compared to best Practice method and Default method
Dynamic approach change the CCS value dynamically for efficient utilization of resources
Dynamic approach suspends the container when it has less CCS value which reduce CPU contention