37
YUMING LIANG VOIDBOX - DOCKER ON YARN

BDTC2015 hulu-梁宇明-voidbox - docker on yarn

Embed Size (px)

Citation preview

Page 1: BDTC2015 hulu-梁宇明-voidbox - docker on yarn

YUMING LIANG VOIDBOX - DOCKER ON YARN

Page 2: BDTC2015 hulu-梁宇明-voidbox - docker on yarn

WHAT IS HULU

GAMING • CONNECTED TVS • MOBILE • COMPUTER

Page 3: BDTC2015 hulu-梁宇明-voidbox - docker on yarn

540 CONTENT PARTNERS

+!75+ DISTRIBUTION PARTNERS 100+

HULU BEIJING

Page 4: BDTC2015 hulu-梁宇明-voidbox - docker on yarn

WHAT IS AUDIENCE PLATFORM

WORKFLOW ENGINE

HADOOP ECOSYSTEM: YARN + HDFS + MAP/REDUCE + HBASE + ZOOKEEPER

CONFIGURATION SERVICE

KILLUA / MATRIX

FIREWORK

SPARK

VOIDBOX DOCKER ON YARN

ESTIMATION SERVICE

NESTO BATCH FLOW: ETL + PROCESSING + SERVING

REAL-TIME FLOW: ETL + PROCESSING + SERVING

Lambda Architecture

SEGMENTATION SYSTEM OLAP

Page 5: BDTC2015 hulu-梁宇明-voidbox - docker on yarn

AGENDA

•  Why Voidbox"

•  What is Voidbox"•  Architecture"•  How to use Voidbox"

•  Voidbox in Hulu"•  What’s next"

•  Q&A"

Page 6: BDTC2015 hulu-梁宇明-voidbox - docker on yarn

WHY VOIDBOX

YARN: DATA OPERATION SYSTEM

CLUSTER

MAP / REDUCE

TEZ SLIDER

HIVE PIG HBASE STORM

USER APPLICATION

Why only big data application? Can we run other application on yarn? What about task processing, web service?

Page 7: BDTC2015 hulu-梁宇明-voidbox - docker on yarn

WHY VOIDBOX

YARN: DISTRIBUTED OPERATION SYSTEM

CLUSTER

MAP / REDUCE

TEZ SLIDER

HIVE PIG HBASE STORM

USER APPLICATION

TASK SYSTEM

WEB SERVICE OTHER

Data App Other App

Is there any demand?

VOIDBOX

Page 8: BDTC2015 hulu-梁宇明-voidbox - docker on yarn

WHY VOIDBOX

20+ TASK IN DIFFERENT LANGUAGE

20+ VIRTUAL MACHINE

LOW UTILIZATION

MACHINE FAILURE VOIDBOX

Page 9: BDTC2015 hulu-梁宇明-voidbox - docker on yarn

WHAT IS VOIDBOX

LIGHTWEIGHT VM

RUNTIME ISOLATION

UNION FILE SYSTEM

RESOURCE MANAGEMENT

SCHEDULING

FAULT TOLERANCE

DISTRIBUTE COMPUTATION

ARBITRARY APPLICATION

FAULT TOLERANCE

VOIDBOX IS A PROGRAMMING FRAMEWORK

Page 10: BDTC2015 hulu-梁宇明-voidbox - docker on yarn

WHAT IS VOIDBOX

Core

VOIDBOX MASTER VOIDBOX PROXY

RUNTIME MODE REGISTRY / MGMT

API

Framework

DAG SERVICE TASK

Page 11: BDTC2015 hulu-梁宇明-voidbox - docker on yarn
Page 12: BDTC2015 hulu-梁宇明-voidbox - docker on yarn
Page 13: BDTC2015 hulu-梁宇明-voidbox - docker on yarn

ARCHITECTURE – DOCKER

Host OS Server

bins/libs

App A

App B

Hypervisor

App A

Guest OS Guest OS

bins/libs bins/libs

App B

Docker Engine

App A

bins/libs bins/libs

App B

Docker Image

Docker Registry

DOCKER IMAGE

DOCKER ENGINE

DOCKER CONTAINER DOCKER

CONTAINER DOCKER CONTAINER

Runtime Isolation

bins/libs

App B’

Page 14: BDTC2015 hulu-梁宇明-voidbox - docker on yarn

ARCHITECTURE – DOCKER

Page 15: BDTC2015 hulu-梁宇明-voidbox - docker on yarn

ARCHITECTURE – VOIDBOX

YARN modules

Resource Manager

Client

NodeManager

HDFS

NodeManager

HDFS

NodeManager

HDFS

YARN application1

Container (MapTask)

MR AppMaster

Container (MapTask)

Client

Application Master

Container YARN application2

Page 16: BDTC2015 hulu-梁宇明-voidbox - docker on yarn

ARCHITECTURE – VOIDBOX

YARN modules

Resource Manager

Voidbox Client

NodeManager

NodeManager

NodeManager

Docker Engine

Docker Engine

Voidbox Master

State Server

Container

Voidbox Proxy

Container

Voidbox Proxy

Docker Registry

In GlusterFs

HDFS

HDFS

HDFS

Voidbox Driver

Pull

Voidbox modules

Docker modules

Voidbox is a standard appliction on top of YARN, just like MapReduce

server

server

server

server

Page 17: BDTC2015 hulu-梁宇明-voidbox - docker on yarn

ARCHITECTURE – VOIDBOX YARN CLUSTER MODE

YARN modules

Resource Manager

Voidbox Client

NodeManager

NodeManager

Docker Engine

Voidbox Master

Container

Voidbox Proxy

HDFS

HDFS

Voidbox Driver

Voidbox modules

Docker modules

Page 18: BDTC2015 hulu-梁宇明-voidbox - docker on yarn

ARCHITECTURE – VOIDBOX YARN CLIENT MODE

YARN modules

Resource Manager

Voidbox Client

NodeManager

NodeManager

Docker Engine

Voidbox Master

Container

Voidbox Proxy

HDFS

HDFS

Voidbox modules

Docker modules

Voidbox Driver

Page 19: BDTC2015 hulu-梁宇明-voidbox - docker on yarn

ARCHITECTURE – VOIDBOX YARN LOCAL MODE

YARN modules

Resource Manager

Voidbox Client

NodeManager

NodeManager

Docker Engine

Voidbox Master

Container

Voidbox Proxy

HDFS

HDFS

Voidbox modules

Docker modules

Voidbox Driver

Page 20: BDTC2015 hulu-梁宇明-voidbox - docker on yarn

ARCHITECTURE – VOIDBOX FAULT TOLERANCE

•  Work preserving in case ResourceManager restarts –  https://issues.apache.org/jira/browse/YARN-556 target version:2.6.0 –  Improvement of ResourceManager HA

•  Container preserving in case NodeManager restarts –  https://issues.apache.org/jira/browse/YARN-1336 –  Avoid multiple ApplicationMasters –  Avoid out-of-control containers in NM crashed machine

•  ApplicationMaster retry count reset window –  https://issues.apache.org/jira/browse/YARN-611

•  YARN container -> Docker cli -> Docker container –  Add shutdown hook to recycle docker container in case container unexpected exit –  Deal with voidboxproxy exits without running shutdown hook(kill -9 proxy.pid)

Page 21: BDTC2015 hulu-梁宇明-voidbox - docker on yarn

ARCHITECTURE – VOIDBOX FAULT TOLERANCE

yarn-site.xml <property> <name>yarn.resourcemanager.work-preserving-recovery.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.am.max-attempts</name> <value>100</value> </property> <property> <name>yarn.nodemanager.recovery.enabled</name> <value>true</value> </property>

ApplicationSubmissionContext public abstract void setMaxAppAttempts(int maxAppAttempts); public abstract void setAttemptFailuresValidityInterval(long attemptFailuresValidityInterval); public abstract void setKeepContainersAcrossApplicationAttempts(boolean keepContainers);

Page 22: BDTC2015 hulu-梁宇明-voidbox - docker on yarn

ARCHITECTURE – VOIDBOX RESOURCE MANAGEMENT

•  Management Center –  Add/remove Docker engine in cluster –  Garbage collection of out-of-control Container(NodeManage crashes) –  History Log Server –  Service discovery

Page 23: BDTC2015 hulu-梁宇明-voidbox - docker on yarn

ARCHITECTURE – LOGS

•  VoidboxMaster’s log –  Specify log4j.properties to rotate master’s log when submit an application

•  Docker container’s log

–  Stdout, stderr log •  Copyfile, truncate(0) to rotate docker container’s log •  Rotate log file in host(default:/var/lib/docker/containers/../containerid-json.log)

–  A log dir in Docker image •  Map specify log volume, by user-defined log rotate in Docker image

Page 24: BDTC2015 hulu-梁宇明-voidbox - docker on yarn

CORE

ARCHITECTURE – KEY COMPONENTS

FRAMEWORK

APP

Page 25: BDTC2015 hulu-梁宇明-voidbox - docker on yarn

CORE

ARCHITECTURE – KEY COMPONENTS

•  Functionality •  Allocate/release container •  Launch/stop task in container •  Multi run mode: yarn-cluster, yarn-client, yarn-local

•  Component •  VoidboxMaster

•  Container resource allocation/release/preserving •  Container context management

•  VoidboxProxy •  Docker container lifecycle management

•  Management Center •  Docker engine management •  Garbage collection FRAMEWORK

APP

Page 26: BDTC2015 hulu-梁宇明-voidbox - docker on yarn

CORE

ARCHITECTURE – KEY COMPONENTS

•  DAG framework •  DAGDriver

•  Task scheduler •  Task retry policy

•  DAGContext •  Describe DAG docker job

•  Service framework •  ServiceDriver

•  Replication controller •  Health check, warm up

•  ServiceContext •  Describe service docker job

FRAMEWORK

APP

Page 27: BDTC2015 hulu-梁宇明-voidbox - docker on yarn

CORE

ARCHITECTURE – KEY COMPONENTS

•  DAG APP •  DAG programming based on Framework

•  Service APP •  Service programming based on Framework •  Register to load balance

FRAMEWORK

APP

Page 28: BDTC2015 hulu-梁宇明-voidbox - docker on yarn

HOW TO USE VOIDBOX

Advanced API:

•  AMClient –  requestResource(cpu, memory) –  releaseResource(container)

•  DockerTaskRunner –  run(task)

Page 29: BDTC2015 hulu-梁宇明-voidbox - docker on yarn

P1

P2

C1

C2

C3

How to increase / decrease number of consumers automatically?

HOW TO USE VOIDBOX

Page 30: BDTC2015 hulu-梁宇明-voidbox - docker on yarn

HOW TO USE VOIDBOX

P1

P2

Message Queue Monitor

Consumer Manager

Docker

Consumer

Docker

Consumer

Docker

Consumer

AMClient TaskRunner

Page 31: BDTC2015 hulu-梁宇明-voidbox - docker on yarn

VOIDBOX IN HULU

Page 32: BDTC2015 hulu-梁宇明-voidbox - docker on yarn

VOIDBOX IN HULU

RESOURCE BOUND HIGHLY PARALLELIZABLE

COMPLEX ENVIRONMENT DEPENDENCIES

VOIDBOX APPLICATION

Page 33: BDTC2015 hulu-梁宇明-voidbox - docker on yarn

VOIDBOX BENEFITS

•  Ease creating distributed applications –  Build-in service discovery, fault tolerance, scheduling, communication –  Support complex environment dependencies

•  Improve cluster efficiency –  Share cluster with other data application

•  Simplify deployment –  Package only –  No deployment

•  Flexibility –  Programming interface

Page 34: BDTC2015 hulu-梁宇明-voidbox - docker on yarn

WHAT’S NEXT

YARN

VOIDBOX

MESOS

MAP REDUCE / SPARK /

AURORA / MARATHON /

KUBERNETES

DOCKER

M / R SPARK HIVE TEZ

HBASE STORM SLIDER

PAAS SAAS DISTRIBUTED APPLICATION

Page 35: BDTC2015 hulu-梁宇明-voidbox - docker on yarn
Page 36: BDTC2015 hulu-梁宇明-voidbox - docker on yarn

QUESTIONS Q & A

Page 37: BDTC2015 hulu-梁宇明-voidbox - docker on yarn

THANK YOU