Apache Hadoop YARN: Present and Future

Preview:

DESCRIPTION

 

Citation preview

Apache Hadoop YARN: Present and Future

Vinod Kumar VavilapalliHortonworks

© Hortonworks Inc. 2014

Apache Hadoop YARNPresent and Future

Vinod Kumar Vavilapalli

vinodkv [at] apache.org

@tshooter

Page 2

© Hortonworks Inc. 2014

A quick show of hands..

• Hadoop 2

Page 3Architecting the Future of Big Data

Real life Hadoop Logo

© Hortonworks Inc. 2014

Who am I?

• 6.75 Hadoop-years old• Last thing at School – a two node Tomcat cluster. Three months later,

first thing at job, brought down a 800 node cluster ;)• Previously @Yahoo!• Now @Hortonworks• Two hats

– Hortonworks: Hadoop MapReduce and YARN Development lead– Apache: Apache Hadoop YARN lead. Apache Hadoop PMC, Apache Member

• Worked/working on– YARN, Hadoop MapReduce, HadoopOnDemand, CapacityScheduler, Hadoop

security– Apache Ambari: Kickstarted the project and its first release– Stinger: High performance data processing with Hadoop/Hive

• Lots of trouble shooting on clusters• 99% + code in Apache, Hadoop

Page 4Architecting the Future of Big Data

© Hortonworks Inc. 2014

Agenda

• Apache Hadoop 2 : Overview• Past• Present• Future

Page 5Architecting the Future of Big Data

© Hortonworks Inc. 2014

Apache Hadoop 2Next Generation Architecture

Architecting the Future of Big DataPage 6

© Hortonworks Inc. 2014

What is YARN?

• Resource Management Platform– MapReduce v2– Beyond MapReduce with Tez, Storm, Spark; in Hadoop!– Did I mention Services like HBase, Accumulo on YARN with HoYA/Slider?

• How is it different from Hadoop 1? ..

Page 7Architecting the Future of Big Data

© Hortonworks Inc. 2014

Hadoop 1 vs Hadoop 2

HADOOP 1.0

HDFS(redundant, reliable storage)

MapReduce(cluster resource management

& data processing)

HDFS2(redundant, highly-available & reliable storage)

YARN(cluster resource management)

MapReduce(data processing)

Others

HADOOP 2.0

Single Use SystemBatch Apps

Multi Purpose PlatformBatch, Interactive, Online, Streaming, …

Page 8

© Hortonworks Inc. 2014

Key Benefits of YARN

• Scale

• New Programming Models & Services

• Improved cluster utilization

• Agility

• To infinity and beyond ..

Page 9

© Hortonworks Inc. 2014

Why Migrate?

• 2.0 >= 2 * 1.0– HDFS: Lots of ground-breaking features– YARN: Next generation architecture

• Return on Investment: 2x throughput on same hardware!• Ready for improvements in hardware• Not convinced? Let’s see what others are saying!

Page 10Architecting the Future of Big Data

© Hortonworks Inc. 2014

Yahoo!

• Leader/Visionary on all things Hadoop!• On YARN (0.23.x)• Moving fast to 2.x

Page 11Architecting the Future of Big Data

http://developer.yahoo.com/blogs/ydn/hadoop-yahoo-more-ever-54421.html

© Hortonworks Inc. 2014

Twitter

Page 12Architecting the Future of Big Data

© Hortonworks Inc. 2014

Ebay

• Has one of the largest Hadoop clusters in the industry with many petabytes of data

• Migrated production clusters to Hadoop-2• Go to Mayank’s talk

– “Hadoop-2 @ ebay”!– Thursday, April 3– Track : Deployment and Operations

• Should be convinced by now .. . No?

Page 13Architecting the Future of Big Data

© Hortonworks Inc. 2014

YARN: the Data Operating System

Page 14Architecting the Future of Big Data

© Hortonworks Inc. 2014

Present

Architecting the Future of Big DataPage 15

© Hortonworks Inc. 2014

Apache Hadoop releases

• 15 October, 2013• The 1st GA release of Apache Hadoop 2.x• YARN

– First stable and supported release of YARN– Binary Compatibility for MapReduce applications built on hadoop-1.x– YARN level APIs solidified for the future– Performance– Scale!

• HDFS– High Availability for HDFS– HDFS Federation– HDFS Snapshots– NFSv3 access to data in HDFS

• Support for running Hadoop on Microsoft Windows• Substantial amount of integration testing with rest of projects in the

ecosystem

Page 16Architecting the Future of Big Data

Apache Hadoop 2.2

© Hortonworks Inc. 2014

Apache Hadoop releases (contd)

• 24 February, 2014• First post GA release for the year 2014

• Alpha features in YARN– ResourceManager HA– Application History– Will cover in the 2.4 content

• HDFS– Details follow..

• Number of bug-fixes, enhancements

Page 17Architecting the Future of Big Data

Apache Hadoop 2.3

© Hortonworks Inc. 2014

HDFS: Heterogeneous Storage

Page 18Architecting the Future of Big Data

© Hortonworks Inc. 2014

HDFS: DataNode caching

Page 19Architecting the Future of Big Data

© Hortonworks Inc. 2014

Apache Hadoop releases (contd)

• Very soon!

• YARN– Details follow..– ResourceManager restart fail-over for high availability– Preemption– Application History and timeline

• HDFS– FileSystem ACLs– Rolling upgrades

Page 20Architecting the Future of Big Data

Apache Hadoop 2.4

© Hortonworks Inc. 2014

ResourceManager Restart and fail-over

Page 21Architecting the Future of Big Data

ZooKeeper

© Hortonworks Inc. 2014

Capacity Scheduler Preemption

Page 22Architecting the Future of Big Data

© Hortonworks Inc. 2014

Application History and Timeline

• Few MR specific implementations: History and web-UI• Not just MR anymore!• History

– MapReduce specific Job History Server– Beyond ResourceManager Restart

• Timeline– Framework specific event collection and UIs

• Run analytics on historical apps!

Page 23Architecting the Future of Big Data

© Hortonworks Inc. 2014

Future

Architecting the Future of Big DataPage 24

© Hortonworks Inc. 2014

Future: Operational enhancements

• Rolling upgrades– No/minimal impact to users– Ideal: Always rolling!

• HDFS in• YARN

Page 25Architecting the Future of Big Data

© Hortonworks Inc. 2014

Future: Enabling more apps

• Beyond MR• Discussing next

– Long running services– Isolation– Multi-dimensional resource

scheduling

Page 26Architecting the Future of Big Data

© Hortonworks Inc. 2014

Future: Long running services

• You can run them already!• Few enhancements needed

– Logs– Security– Management/monitoring

• Resource sharing across workload types

• Project Slider

Page 27Architecting the Future of Big Data

© Hortonworks Inc. 2014

Fine-grain isolation for multi-tenancy

• Custom memory-monitoring• Cgroups• Linux Containers• VMs

Page 28Architecting the Future of Big Data

© Hortonworks Inc. 2014

Multi-resource scheduling

• Today – memory & cpu– Physical memory / virtual memory– Cpu Cores – Virtual cores

• CPU stuff: More bake in• Disks

– Space– IOPS

• Network

Page 29Architecting the Future of Big Data

© Hortonworks Inc. 2014

Other features

• Application SLAs• Node labels• Node affinity/anti-affinity• Better online queue-management

Page 30Architecting the Future of Big Data

© Hortonworks Inc. 2014

YARN EcosystemBeyond the core YARN project: Briefly

Architecting the Future of Big DataPage 31

© Hortonworks Inc. 2014

Eco-system

Page 32

Applications Powered by YARN

Apache Giraph – Graph Processing

Apache Hama – BSP

Apache Hadoop MapReduce – Batch

Apache Tez – Batch/Interactive

Apache S4 – Stream Processing

Apache Samza – Stream Processing

Apache Storm – Stream Processing

Apache Spark – Iterative applications

HOYA – HBase on YARNYARN FrameworksApache Twill

REEF by Microsoft

Spring support for Hadoop 2

There's an app for that...

YARN App Marketplace!

© Hortonworks Inc. 2014

Apache TEZ

• Moving beyond MR• A data processing framework that can execute a complex DAG

of tasks.

• “Apache Tez - A New Chapter in Hadoop Data Processing”– By Siddharth Seth: YARN & Tez Committer/PMC Member– Thursday, April 3 (4:20-5:00pm)

Page 33Architecting the Future of Big Data

© Hortonworks Inc. 2014

Recap

Architecting the Future of Big DataPage 34

© Hortonworks Inc. 2014

Recap

Page 35Architecting the Future of Big Data

• Apache Hadoop 2 is, at least, twice as good!

• Exciting journey with Hadoop for this decade…– Hadoop is no longer a one-trick pony, err elephant– Beyond just HDFS & MapReduce

• Architecture for the future– Centralized data– Exciting spectrum of application types, workloads and usecases

© Hortonworks Inc. 2014

Couple more things..

Architecting the Future of Big DataPage 36

© Hortonworks Inc. 2014

The Book is out!

Page 37Architecting the Future of Big Data

http://yarn-book.com/

© Hortonworks Inc. 2014Page 38

Architecting the Future of Big Data

© Hortonworks Inc. 2014

Thank you!

Page 39

Download Sandbox: Experience Apache Hadoop

Both 2.x and 1.x Versions Available!

http://hortonworks.com/products/hortonworks-sandbox/

Questions Time!

Recommended