25
Introduction YARN Mesos Omega Related work Conclusions Scheduling and sharing resources in Data Clusters Jose Luis Lopez Pino December 12, 2013 Jose Luis Lopez Pino Scheduling and sharing resources in Data Clusters

Scheduling and sharing resources in Data Clusters

Embed Size (px)

DESCRIPTION

This presentation is part of my work for the course 'Big Data Seminar' at TU Berlin within the IT4BI (Information Technology for Business Intelligence) master programme.

Citation preview

Page 1: Scheduling and sharing resources in Data Clusters

IntroductionYARNMesosOmega

Related workConclusions

Scheduling and sharing resources in Data Clusters

Jose Luis Lopez Pino

December 12, 2013

Jose Luis Lopez Pino Scheduling and sharing resources in Data Clusters

Page 2: Scheduling and sharing resources in Data Clusters

IntroductionYARNMesosOmega

Related workConclusions

Table of contents

1 IntroductionThe problemSolutions

2 YARNArchitectureAdvantagesDrawbacksPerformance

3 MesosArchitectureAdvantages

DrawbacksPerformance

4 OmegaArchitectureAdvantagesDrawbacksPerformance

5 Related workResource managersScheduling techniques

6 Conclusions

Jose Luis Lopez Pino Scheduling and sharing resources in Data Clusters

Page 3: Scheduling and sharing resources in Data Clusters

IntroductionYARNMesosOmega

Related workConclusions

The problemSolutions

The problem

Data lake

Multiple frameworks[6]

Duplicate de data

Cluster utilization

Jose Luis Lopez Pino Scheduling and sharing resources in Data Clusters

Page 4: Scheduling and sharing resources in Data Clusters

IntroductionYARNMesosOmega

Related workConclusions

The problemSolutions

Solutions[8]

1 Static partitioning

2 Monolithic schedulers

3 Two-level scheduler

4 Shared state approach

Jose Luis Lopez Pino Scheduling and sharing resources in Data Clusters

Page 5: Scheduling and sharing resources in Data Clusters

IntroductionYARNMesosOmega

Related workConclusions

ArchitectureAdvantagesDrawbacksPerformance

Architecture[10]

Figure: YARN architecture

Jose Luis Lopez Pino Scheduling and sharing resources in Data Clusters

Page 6: Scheduling and sharing resources in Data Clusters

IntroductionYARNMesosOmega

Related workConclusions

ArchitectureAdvantagesDrawbacksPerformance

Advantages

Scale

Data locality

Easy to port a new framework

Jose Luis Lopez Pino Scheduling and sharing resources in Data Clusters

Page 7: Scheduling and sharing resources in Data Clusters

IntroductionYARNMesosOmega

Related workConclusions

ArchitectureAdvantagesDrawbacksPerformance

Drawbacks

Failure recovery

High latency?

Network overload?

Jose Luis Lopez Pino Scheduling and sharing resources in Data Clusters

Page 8: Scheduling and sharing resources in Data Clusters

IntroductionYARNMesosOmega

Related workConclusions

ArchitectureAdvantagesDrawbacksPerformance

Performance

Job throughput

Application Masterflooding

Preemption

Jose Luis Lopez Pino Scheduling and sharing resources in Data Clusters

Page 9: Scheduling and sharing resources in Data Clusters

IntroductionYARNMesosOmega

Related workConclusions

ArchitectureAdvantagesDrawbacksPerformance

Architecture[9]

Figure: Mesos architecture

Jose Luis Lopez Pino Scheduling and sharing resources in Data Clusters

Page 10: Scheduling and sharing resources in Data Clusters

IntroductionYARNMesosOmega

Related workConclusions

ArchitectureAdvantagesDrawbacksPerformance

Advantages

Flexible

Extensible

Fault tolerance

Backup master nodeRecreate master using communicationUse checkpoints for the slaves

Jose Luis Lopez Pino Scheduling and sharing resources in Data Clusters

Page 11: Scheduling and sharing resources in Data Clusters

IntroductionYARNMesosOmega

Related workConclusions

ArchitectureAdvantagesDrawbacksPerformance

Drawbacks

Complex to port a framework

Intensive communication

Revocation might be dangerous

Penalizes long jobs

Jose Luis Lopez Pino Scheduling and sharing resources in Data Clusters

Page 12: Scheduling and sharing resources in Data Clusters

IntroductionYARNMesosOmega

Related workConclusions

ArchitectureAdvantagesDrawbacksPerformance

Performance

Missing: comparison of different policies and modules

Scalable

+ 18% memory+ 10% CPU utilizationless than 1s launching tasks with 50k nodes

Small tasks

Data locality with delay scheduling[12]

MPITorque and gang scheduling

Jose Luis Lopez Pino Scheduling and sharing resources in Data Clusters

Page 13: Scheduling and sharing resources in Data Clusters

IntroductionYARNMesosOmega

Related workConclusions

ArchitectureAdvantagesDrawbacksPerformance

Architecture[8]

Figure: Omega architecture

Jose Luis Lopez Pino Scheduling and sharing resources in Data Clusters

Page 14: Scheduling and sharing resources in Data Clusters

IntroductionYARNMesosOmega

Related workConclusions

ArchitectureAdvantagesDrawbacksPerformance

Advantages

Schedulers work in parallel

Very scalable

Jose Luis Lopez Pino Scheduling and sharing resources in Data Clusters

Page 15: Scheduling and sharing resources in Data Clusters

IntroductionYARNMesosOmega

Related workConclusions

ArchitectureAdvantagesDrawbacksPerformance

Drawbacks

Unfair distribution

Conflicts

Jose Luis Lopez Pino Scheduling and sharing resources in Data Clusters

Page 16: Scheduling and sharing resources in Data Clusters

IntroductionYARNMesosOmega

Related workConclusions

ArchitectureAdvantagesDrawbacksPerformance

Performance

Decision time and busyness of the scheduler

Real workloads

Jose Luis Lopez Pino Scheduling and sharing resources in Data Clusters

Page 17: Scheduling and sharing resources in Data Clusters

IntroductionYARNMesosOmega

Related workConclusions

Resource managersScheduling techniques

Resource managers

Heterogeneous environments: Corona and Cosmos [1]

Homogeneous environments: Quincy[4]

Jose Luis Lopez Pino Scheduling and sharing resources in Data Clusters

Page 18: Scheduling and sharing resources in Data Clusters

IntroductionYARNMesosOmega

Related workConclusions

Resource managersScheduling techniques

Scheduling techniques

Lottery scheduling[11]

Dynamic Proportional Share Scheduling[7]

Calibration: how does a particular task perform in a particularnode?[5]

Stragglers and speculative relaunch[13]

Delay scheduling: achieve locality, relax fairness[12]

Rich resource-requests[2]

Optimize short jobs[3]

Jose Luis Lopez Pino Scheduling and sharing resources in Data Clusters

Page 19: Scheduling and sharing resources in Data Clusters

IntroductionYARNMesosOmega

Related workConclusions

Conclusions

Different models

YARN:

Easier to port a new frameworkData locality

Mesos

Flexible and modularFault toleranceMore scalable

Omega:

FlexibleHighly scalable

Jose Luis Lopez Pino Scheduling and sharing resources in Data Clusters

Page 20: Scheduling and sharing resources in Data Clusters

IntroductionYARNMesosOmega

Related workConclusions

References I

[1] Ronnie Chaiken, Bob Jenkins, Per-Ake Larson, Bill Ramsey,Darren Shakib, Simon Weaver, and Jingren Zhou.Scope: easy and efficient parallel processing of massive datasets.Proceedings of the VLDB Endowment, 1(2):1265–1276, 2008.

[2] Carlo Curino, Djellel Difallah, Chris Douglas, RaghuRamakrishnan, and Sriram Rao.Reservation-based scheduling: If youre late dont blame us!

[3] Khaled Elmeleegy.Piranha: Optimizing short jobs in hadoop.Proceedings of the VLDB Endowment, 6(11):985–996, 2013.

Jose Luis Lopez Pino Scheduling and sharing resources in Data Clusters

Page 21: Scheduling and sharing resources in Data Clusters

IntroductionYARNMesosOmega

Related workConclusions

References II

[4] Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder,Kunal Talwar, and Andrew Goldberg.Quincy: fair scheduling for distributed computing clusters.In Proceedings of the ACM SIGOPS 22nd symposium onOperating systems principles, pages 261–276. ACM, 2009.

[5] Gunho Lee, Byung-Gon Chun, and Randy H Katz.Heterogeneity-aware resource allocation and scheduling in thecloud.In Proceedings of the 3rd USENIX Workshop on Hot Topicsin Cloud Computing, HotCloud, volume 11, 2011.

Jose Luis Lopez Pino Scheduling and sharing resources in Data Clusters

Page 22: Scheduling and sharing resources in Data Clusters

IntroductionYARNMesosOmega

Related workConclusions

References III

[6] Kyong-Ha Lee, Yoon-Joon Lee, Hyunsik Choi, Yon DohnChung, and Bongki Moon.Parallel data processing with mapreduce: a survey.ACM SIGMOD Record, 40(4):11–20, 2012.

[7] Thomas Sandholm and Kevin Lai.Dynamic proportional share scheduling in hadoop.In Job scheduling strategies for parallel processing, pages110–131. Springer, 2010.

Jose Luis Lopez Pino Scheduling and sharing resources in Data Clusters

Page 23: Scheduling and sharing resources in Data Clusters

IntroductionYARNMesosOmega

Related workConclusions

References IV

[8] Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek,and John Wilkes.Omega: Flexible, scalable schedulers for large computeclusters.In Proceedings of the 8th ACM European Conference onComputer Systems, EuroSys ’13, pages 351–364, New York,NY, USA, 2013. ACM.

[9] Facebook Engineering Team.Under the hood: Scheduling mapreduce jobs more efficientlywith corona.

Jose Luis Lopez Pino Scheduling and sharing resources in Data Clusters

Page 24: Scheduling and sharing resources in Data Clusters

IntroductionYARNMesosOmega

Related workConclusions

References V

[10] Vinod K. Vavilapalli.Apache Hadoop YARN: Yet Another Resource Negotiator.In Proc. SOCC, 2013.

[11] Carl A Waldspurger and William E Weihl.Lottery scheduling: Flexible proportional-share resourcemanagement.In Proceedings of the 1st USENIX conference on OperatingSystems Design and Implementation, page 1. USENIXAssociation, 1994.

Jose Luis Lopez Pino Scheduling and sharing resources in Data Clusters

Page 25: Scheduling and sharing resources in Data Clusters

IntroductionYARNMesosOmega

Related workConclusions

References VI

[12] Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma,Khaled Elmeleegy, Scott Shenker, and Ion Stoica.Delay scheduling: a simple technique for achieving localityand fairness in cluster scheduling.In Proceedings of the 5th European conference on Computersystems, pages 265–278. ACM, 2010.

[13] Matei Zaharia, Andy Konwinski, Anthony D Joseph, Randy HKatz, and Ion Stoica.Improving mapreduce performance in heterogeneousenvironments.In OSDI, volume 8, page 7, 2008.

Jose Luis Lopez Pino Scheduling and sharing resources in Data Clusters