40
Apache Hadoop YARN: The Next-generation Distributed Operating System Zhijie Shen & Jian He @ Hortonworks

ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

Embed Size (px)

DESCRIPTION

For diverse organizations, Apache Hadoop has become the de-facto place where data & computational resources are shared. This broad usage has stretched its design beyond its intended target. To address this, Apache Hadoop community has come up with next generation of Hadoop’s compute platform: YARN. YARN in a nutshell is the distributed Operating System of the big-data world. In this talk, we will introduce YARN, covering how the new architecture decouples programming model from resource management, scheduling functions, platform’s fault tolerance & high availability, tools for application tracing & analyses. We will then discuss the exciting ecosystem of Apache Software Foundation projects forming around YARN. We will conclude with a coverage on the applications & services being built around YARN platform which lets user chose the programming models choice, all on the same data.

Citation preview

Page 1: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

Apache Hadoop YARN: The Next-generation Distributed Operating System

Zhijie Shen & Jian He @ Hortonworks

Page 2: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

About Us

● Two hatso Software Engineer @ Hortonworks, Inc.o Hadoop Committer @ The Apache Foundation

● We’re doing Apache YARN!

Page 3: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

Agenda

● What Is YARN● YARN Basics● Recent Development● Writing Your YARN Applications

Page 4: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

What is YARN (1)

Frequent questions on mailing list:● What is Hadoop 2.0?● What is YARN?

Hadoop 2.0= YARN= the operating systems to run various distributed data processing applications

Page 5: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

What is YARN (2) - Motivation

● Hadoop is 8yr old● World is significantly changed

o The hardwareo diverse data processing model: interaction,

streaming

Page 6: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

What is YARN (3) - Motivation

Flexibility - Enabling data processing model more than MapReduce

Page 7: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

What is YARN (5) - Motivation

● Scalability - Splitting resource management and job life cycle management and mointoring

● Efficiency - Improve cluster utilizationo distinct map/reduce

slots on a host

Page 8: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

What Is YARN (4) - Motivation

● Resource Sharing - Multiple workloads in cluster

Page 9: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

What Is YARN (5) - Status

● YARN is shipped● YARN is going to be better● Hadoop 2.4 is coming this month!

● ResourceManager high availability● Application historic data services● Long-running applications optimization

Page 10: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

What Is YARN (6) - Status

YARN ecosystem

Page 11: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

What is YARN (7) - Status

YARN community

Active committers working on YARN 10+

Active other contributors 20+

Total Jira tickets 1800+

Jira tickets that are resolved/closed 1100+

Page 12: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

Agenda

● What Is YARN● YARN Basics● Recent Development● Writing Your YARN Applications

Page 13: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

YARN Basics (1) - Concept

● ResourceManager - Master of a cluster

● NodeManager - Slave to take care of one host

● ApplicationMaster - Master of an application

● Container - Resource abstraction, process to complete a task

Page 14: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

YARN Basics (2) - RM

Component interfacing RM to the clients:● ClientRMService

Component interacting with the per-application AMs:● ApplicationMasterService

Component connecting RM to the nodes:● ResourceTrackerService

Core of the ResourceManager● Scheduler, managing apps

Page 15: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

YARN Basics (3) - NM

Component for NM-RM communication:● NodeStatusUpdater

Core component managing containers on the node:● ContainerManager

Component interacting with OS to start/stop the container process:● ContainerExecutor

Page 16: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

YARN Basics (4) - Workflow

Execution Sequence:1. Client submits an application2. RM allocates a container to start AM3. AM registers with RM4. AM asks containers from RM5. AM notifies NM to launch containers6. Application code is executed in

container7. Client contacts RM/AM to monitor

application’s status8. AM unregisters with RM

Client RM NM AM

1

2

3

4

5

7

8

6

Page 17: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

YARN Basics (5) - Scheduler

● FIFOSimply first come, first serve

● Fairo evenly share the resource across queues, among

applications● Capacity

o allowing certain resource capacity to queues

Page 18: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

Agenda

● What Is YARN● YARN Basics● Recent Development● Writing Your YARN Applications

Page 19: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

Apache Hadoop release

● Hadoop 2.4 is coming out very soon

● ResourceManager high availabilityo RM Restarto RM failover

● Application Historic Data Serviceo MRv1 JobHisotoryServer.

Page 20: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

ResourceManager High Availability

● RM is potentially a single point of failure.o Restart for various reasons: Bugs, hardware failures, deliberate down-

time for upgrades

● Goal: RM down-time invisible to end-users.

● Includes two parts:o RM Restarto RM Failover

Page 21: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

ResourceManager Restart

● RMState includes:o static state (appSubmissionContext, credentials) , belongs to RMo running state - per AM

scheduler state, i.e. per-app scheduling state, resource consumptions

● Overly complex in MRv1 for the fact that JobTracker has to save too much information: both static state and running state.

● YARN RM only persists static state, because AM and RM are running on different machines on YARNo Persist application submission metadata and gathers running state

from AM and NM

Page 22: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

ResourceManager FailOver● Active/Standby

o Leader election (ZooKeeper)

● Standby on transition to Active loads all the state from the state store.

● NM, AM, clients, redirect to the new RMo RMProxy

Page 23: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

Application Historic Data Service

● Motivation: MRv1 JobHisotoryServero MR job writes history data into HDFS.

o When job finishes and removes from memory, Clients request will be redirected to JHS.

o Reads the this persistent history data in a on-demand way to serve the requests.

Page 24: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

Application Historic Data Service

● Goal: Solve the application-history problem in a generic and more scalable way.

● A single application history server to serve all applications’ requests.

● ResourceManager records generic application informationo Application, ApplicationAttempt, Container

● ApplicationMaster writes framework specific informationo Free for users to define

● Multiple interfaces to inquiry the historic informationo RPC, RESTful Serviceso History server Web UI

Page 25: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

Application Historic Server

Page 26: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

Long Running Service(YARN-896)

● YARN is intended to be generic purpose.o Short-lived (e.g.MR) apps vs long-lived apps

● Goal: disruptive impact minimum to running applications.o Work-preserving AM restart.

Single point of failure for application’s point of view Not killing containers AM rebind with previous running containers.

o Work-preserving NM restart. containers process Previous running containers rebind with new NM.

Page 27: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

Agenda

● What Is YARN● YARN Basics● Recent Development● Writing Your YARN Applications

Page 28: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

Write a Yarn Applicaition

Page 29: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

Write client and AM

Client

● Client – RM --- YarnClient o submitApplication

ApplicationMaster

● AM – RM --- AMRMCliento registerApplicationMaster

o allocate

o finishApplicationMaster

● AM – NM --- NMCliento startContainer

o getContainerStatus

o stopContainer

Page 30: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

Writing the YarnClient

1. Get the application Id from RM

2. Construct ApplicationSubmissionContext

a. Shell command to run the AM

b. Environment (class path, env-variable)

c. LocalResources (Job jars downloaded from HDFS)

3. Submit the request to RM

a. submitApplication

Page 31: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

Example: client submit an application

1. YarnClient yarnClient = YarnClient.createYarnClient();

2. YarnClientApplication app = yarnClient.createApplication();

3. appContext = app.getAppSubmissionContext()

…….client logic …..

set command: "$JAVA_HOME/bin/java" + "ApplicationMaster" + command

set environments: class path etc.

set jars:

…………………………..

4. yarnClient.submitApplication(appContext);

Page 32: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

Writing ApplicationMaster

● The ApplicationMaster is started.

● In AM code, for each task, to start the container, we repeat a similar procedure as starting the master.o construct ContainerLaunchContext

commands env jars

Page 33: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

Writing ApplicationMaster

1. AM registers with RM (registerApplicationMaster)

2. HeartBeats(allocate) with RM (asynchronously)

a. send the Request

i. Request new containers.

ii. Release containers.

b. Received containers and send request to NM to start the container

3. Unregisters with RM (finishApplicationMaster)

Page 34: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

Writing ApplicationMaster

● RM assigns containers asynchronously● Containers are likely not returned immediately at current

call.● User needs to give empty requests until it gets the

containers it requested.● ResourceRequest is incremental.

Page 35: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

ContainerManagementProtocol

1. Start container (task) on NM

a. startContainer

2. Monitor the container(task) status

a. getContainerStatus

3. Stop the the container (task)

a. stopContainer

Page 36: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

Example - AM: request containers

AMRMClient rmClient = AMRMClient.createAMRMClient();

ContainerRequest containerAsk = new ContainerRequest(capability)

rmClient.addContainerRequest(containerAsk);

Wait until get the containers:

do {

AllocateResponse response = rmClient.allocate(0);

Thread.sleep(100);

} while (response.getAllocatedContainers().size() > 0)

Page 37: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

Example - AM: start/stop containers

NMClient nmClient = NMClient.createNMClient();

for (Container container : response.getAllocatedContainers()) {

ContainerLaunchContext ctx =

ContainerLaunchContext.newInstace();

ctx.setCommands(command);

ctx.setEnvironment(envs);

nmClient.startContainer(container, ctx);

}

-monitor container: nmClient.getContainerStatus(containerId)

-stop container: nmClient.stopContainer(ContainerId)

Page 38: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

Takeaway

1. YARN is a resource-management platform that can host different data-processing frameworks (beyond MapReduce).o No longer limited by MR, (batch process, interactive query, stream)o All applications can now co-exist and share the same data on HDFS.

2. YARN is in production now.o Yahoo, Ebay.

3. We welcome your contributions.o patches, tests, writing applications on YARNo https://issues.apache.org/jira/browse/YARNo https://github.com/hortonworks/simple-yarn-app

Page 39: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System

The book is out !

http://yarn-book.com/

http://yarn-book.com/

Page 40: ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distributed Operating System