Upload
chris-riccomini
View
2.623
Download
3
Tags:
Embed Size (px)
DESCRIPTION
On Friday, I presented, "Building Applications on YARN," at the Apache YARN meet-up at Hortonworks. This post contains my slides.
Citation preview
Building Applications on Building Applications on YARNYARN
Chris RiccominiChris Riccomini10/11/201210/11/2012
Staff Software Engineer at Staff Software Engineer at LinkedInLinkedIn
http://riccomini.name
@criccomini@criccomini
What I want to Talk What I want to Talk AboutAbout
Anatomy of a YARN ApplicationAnatomy of a YARN Application
Things to consider when building your applicationThings to consider when building your applicationArchitectureArchitecture
OperationsOperations
Anatomy of a YARN AppAnatomy of a YARN App
ClientClient
Application MasterApplication Master
Container CodeContainer Code
Resource ManagerResource Manager
Node ManagerNode Manager
Anatomy of a YARN AppAnatomy of a YARN App
ClientClient
Application MasterApplication Master
Container CodeContainer Code
Resource ManagerResource Manager
Node ManagerNode Manager
Client
Client RMRM
NMNMNMNM
AMAM CCCC
* simplified
A lot to considerA lot to consider
DeploymentDeployment
MetricsMetrics
ConfigurationConfiguration
SecuritySecurity
LanguageLanguage
LoggingLogging
Fault ToleranceFault Tolerance
IsolationIsolation
DashboardDashboard
StateState
DeploymentDeployment
HDFSHDFS
HTTPHTTP
File (NFS)File (NFS)
DDOS’ing your serversDDOS’ing your servers
What we do: Tarball over HTTP. Life is easier with What we do: Tarball over HTTP. Life is easier with HDFS, but operational overhead is too high.HDFS, but operational overhead is too high.
MetricsMetrics
Application-level metricsApplication-level metrics
YARN-level metricsYARN-level metrics
metrics2metrics2
Containers are transientContainers are transient
What we do: Both app-level and framework-level What we do: Both app-level and framework-level metrics use same metrics framework. Pipe to in-metrics use same metrics framework. Pipe to in-house metrics dashboard. We don’t use metrics2 house metrics dashboard. We don’t use metrics2 since we don’t want a dependency on Hadoop in since we don’t want a dependency on Hadoop in our core jar.our core jar.
ConfigurationConfiguration
YARN config (yarn-site.xml, core-site.xml, etc)YARN config (yarn-site.xml, core-site.xml, etc)
Application ConfigurationApplication Configuration
Transporting ConfigurationTransporting Configuration
What we do: Config is fully resolved at client What we do: Config is fully resolved at client execution time. No admin-override/locked config execution time. No admin-override/locked config protection yet. Config is passed from client to AM protection yet. Config is passed from client to AM to containers via environment variables.to containers via environment variables.
SecuritySecurity
Kerberos?Kerberos?
Firewalls are your friendFirewalls are your friend
Gateway machineGateway machine
DashboardDashboard
What we do: Firewall all YARN machines so they What we do: Firewall all YARN machines so they can only talk to each-other. All users go through can only talk to each-other. All users go through LDAP controlled dashboard.LDAP controlled dashboard.
LanguageLanguage
Favor complexity in Application Master, and make Favor complexity in Application Master, and make container-logic thincontainer-logic thin
Talk to RM via RESTTalk to RM via REST
Potential to talk to RM via Protobuf RPCPotential to talk to RM via Protobuf RPC
What we do: Application AM is Java. Tasks-side of What we do: Application AM is Java. Tasks-side of application has Python and Java implementations.application has Python and Java implementations.
LoggingLogging
Local storage (application is running)Local storage (application is running)
HDFS storage (application has stopped for a while)HDFS storage (application has stopped for a while)
Be careful with STDOUT/STDERR (rollover)Be careful with STDOUT/STDERR (rollover)
What we do: No HDFS. Logs sit for 7 days, then What we do: No HDFS. Logs sit for 7 days, then disappear. Not ideal.disappear. Not ideal.
Fault ToleranceFault Tolerance
Failure matrixFailure matrix
HA RM/NMHA RM/NM
Orphaned processesOrphaned processes
Pay attention to process treesPay attention to process trees
What we do: No HA. Manual fail over when RM dies. What we do: No HA. Manual fail over when RM dies. Orphaned process monitor (proc start time < RM Orphaned process monitor (proc start time < RM start time).start time).
IsolationIsolation
MemoryMemory
DiskDisk
CPUCPU
NetworkNetwork
What we do: Nothing, right now. Hoping YARN will What we do: Nothing, right now. Hoping YARN will solve this before we need it (cgroups?).solve this before we need it (cgroups?).
DashboardDashboard
Application-specific informationApplication-specific information
Integrate with YARNIntegrate with YARN
Application Master or Standalone?Application Master or Standalone?
What we do: Dashboard enforces security, talks to What we do: Dashboard enforces security, talks to RM/AM via HTTP/JSON to get information about RM/AM via HTTP/JSON to get information about jobs.jobs.
StateState
HDFSHDFS
Deployed with ApplicationDeployed with Application
Remote data storeRemote data store
What we do: Nothing, right now.What we do: Nothing, right now.
TakeawaysTakeaways
There’s a lot more than just the YARN APIThere’s a lot more than just the YARN API
Look for examples (Spark, Storm, Map-Reduce)Look for examples (Spark, Storm, Map-Reduce)
Decide your level of Hadoop integrationDecide your level of Hadoop integration
Metrics2Metrics2
HDFSHDFS
ConfigConfig
Kerberos and doAsKerberos and doAs