41
Proprietary & Confidential. Copyright © 2014. Hadoop Operations @ Rocket Fuel We’re Hiring rocketfuel.com/careers Kishore Kumar Yellamraju

Big data summit

Embed Size (px)

Citation preview

Page 1: Big data summit

Proprietary amp Confidential Copyright copy 2014

Hadoop Operations Rocket Fuel

Wersquore Hiringrocketfuelcomcareers

Kishore Kumar Yellamraju

Proprietary amp Confidential Copyright copy 2014

The Web Is Monetized By Advertising

Proprietary amp Confidential Copyright copy 2014

Delivery Methods

raquoDisplayraquoVideoraquoMobileraquoSocial

Proprietary amp Confidential Copyright copy 2014

6 Ad Served

User Segments

3 Bid Reques

t

Overview

Publishers

2 Ad Request

1 Page Request

4 Bid amp Ad

User Engagemen

ts

Data Partners

Advertisers

Browser

Some Exchange Partners

Ad Exchange

Optimize

Rocket Fuel Platform

Real-time BidderAutomated Decisions

Models

Refresh learning

Data Store

Ads ampBudget

ModelScores

Events

5 RocketfuelWinning Ad

Proprietary amp Confidential Copyright copy 2014

$238965$06782$17234

$009$178964$16782$17234$0809$242125

$211$126

$2178$2056$0809$242125

$211$126$278$156

$1809$242125

$211$126$278$056$242125

$211$126$278

$0756$0809$242125

$211$126$278

$1256$1809$242125

$211$126$278

$0586$2009

125$211$126$278$156

$000

[ + ][ + ]

SitePageGeoWeatherTime of DayBrand AffinityUser

Always buying the best impressions amp serving the best ad

Real Time Bidding and Serving

Proprietary amp Confidential Copyright copy 2014

GoalLeadsamp sales

GoalCoupondownloads

GoalBrandawareness

SitePageGeoWeatherTime of DayBrand AffinityDemo

Impression ScorecardDemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-marketBehaviorResponse

Impression ScorecardDemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-MarketBehaviorResponse X

Impression ScorecardDemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-MarketBehaviorResponse

+100+40-20+20+15+10+40+35

+97

+40-70-20+10+15-25-40-18

+07

+10-10-20+20+10-35-25+10

+14

Real Time Bidding and Serving

Xuuml

Proprietary amp Confidential Copyright copy 2014

6 Ad Served

User Segments

3 Bid Reques

t

Overview

Publishers

2 Ad Request

1 Page Request

4 Bid amp Ad

User Engagemen

ts

Data Partners

Advertisers

Browser

Some Exchange Partners

Ad Exchange

Optimize

Rocket Fuel Platform

Real-time BidderAutomated Decisions

Models

Refresh learning

Data Store

Ads ampBudget

ModelScores

Events

5 RocketfuelWinning Ad

Proprietary amp Confidential Copyright copy 2014

Facebook likes

Searches on Google

Bid Requests Considered by Rocketfuel

5 B

6 B

45 B

Requests per day

Throughput

Proprietary amp Confidential Copyright copy 2014

Blink of an eye

SF to Tokyo network round trip

One beat of a hummindbirds wing

Look up in Blackbird

400

100

20

2

Time (ms)

Latency

Proprietary amp Confidential Copyright copy 2014

Architecture and Scale

raquoDatacentersraquoScaleraquoGrowthraquoArchitecture

Proprietary amp Confidential Copyright copy 2014

Data Center Expansion

raquoabc

Proprietary amp Confidential Copyright copy 2014

Data Center Design

bull Racks custom built at Rocket Fuelbull Leased spacebandwidth in colocation facilities

Hadoop Server24 2U servers (85kW)

Bidders40 2-U Twin 2 servers (17kW)

Proprietary amp Confidential Copyright copy 2014

Rocket Fuel Scale

raquo34474 CPU processor coresndash2655 serversndash1874 Teraflops of computing

raquo188 Terabytes of memoryndash13X the memory of IBM computer Watson that

played Jeopardy

raquo42PB Petabytes of storagendash106X the data volume of the entire Library of

Congress

Proprietary amp Confidential Copyright copy 2014

Hadoop at Rocket Fuel

raquo 1400 servers

raquo 15K Disks

raquo 15K Cores

raquo 90 TB

raquo 30K MR slots

raquo 12K daily MR jobs

Proprietary amp Confidential Copyright copy 2014

200 Servers 1400 Servers

1 Year

5 PB

41 PB8x

Growth

Proprietary amp Confidential Copyright copy 2014

Data Architecture 30

Proprietary amp Confidential Copyright copy 2014

Batch and Real Time Pipelines

Webservers

STORM

scribe

scribe

scribe

MySql

Zookeeper

Proprietary amp Confidential Copyright copy 2014

Hadoop Setup

QJM ZK Quorum

raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC

raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC

raquo same as DNrsquosraquo Dedicated disk

to ZK or JN

JT

Standby NN

ZKFCZKFC

Active NN

DNTT

DNTT

DNTT

DNTT

DNTT

DNTT

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Puppet+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you canraquo Adding a slave node to Hadoop cluster lt 120 secondsraquo Bringing up a new Hadoop cluster lt 500 secondsraquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsdatanodehandlercount dfsnamenodehandlercount

dfsdatanodemaxtransferthreads dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alerts gtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted TTrsquos

with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page httpltJT-hostnamegt50030scheduleradvanced httpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

Multi Tenancy

raquo create separate Queuesraquo Enable ACLrsquos for queuesraquo limit no of jobs per user and per queueraquo set pre-emption timeouts based on priorityraquo set weight based on priority

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Planraquo Near real time reporting over 15+ TB of daily dataraquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Quer

ies

Amazon Backup

LSV Data Clust

er

USEUHK Serving Clusters

Research

Ad-

hoc

Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 2: Big data summit

Proprietary amp Confidential Copyright copy 2014

The Web Is Monetized By Advertising

Proprietary amp Confidential Copyright copy 2014

Delivery Methods

raquoDisplayraquoVideoraquoMobileraquoSocial

Proprietary amp Confidential Copyright copy 2014

6 Ad Served

User Segments

3 Bid Reques

t

Overview

Publishers

2 Ad Request

1 Page Request

4 Bid amp Ad

User Engagemen

ts

Data Partners

Advertisers

Browser

Some Exchange Partners

Ad Exchange

Optimize

Rocket Fuel Platform

Real-time BidderAutomated Decisions

Models

Refresh learning

Data Store

Ads ampBudget

ModelScores

Events

5 RocketfuelWinning Ad

Proprietary amp Confidential Copyright copy 2014

$238965$06782$17234

$009$178964$16782$17234$0809$242125

$211$126

$2178$2056$0809$242125

$211$126$278$156

$1809$242125

$211$126$278$056$242125

$211$126$278

$0756$0809$242125

$211$126$278

$1256$1809$242125

$211$126$278

$0586$2009

125$211$126$278$156

$000

[ + ][ + ]

SitePageGeoWeatherTime of DayBrand AffinityUser

Always buying the best impressions amp serving the best ad

Real Time Bidding and Serving

Proprietary amp Confidential Copyright copy 2014

GoalLeadsamp sales

GoalCoupondownloads

GoalBrandawareness

SitePageGeoWeatherTime of DayBrand AffinityDemo

Impression ScorecardDemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-marketBehaviorResponse

Impression ScorecardDemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-MarketBehaviorResponse X

Impression ScorecardDemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-MarketBehaviorResponse

+100+40-20+20+15+10+40+35

+97

+40-70-20+10+15-25-40-18

+07

+10-10-20+20+10-35-25+10

+14

Real Time Bidding and Serving

Xuuml

Proprietary amp Confidential Copyright copy 2014

6 Ad Served

User Segments

3 Bid Reques

t

Overview

Publishers

2 Ad Request

1 Page Request

4 Bid amp Ad

User Engagemen

ts

Data Partners

Advertisers

Browser

Some Exchange Partners

Ad Exchange

Optimize

Rocket Fuel Platform

Real-time BidderAutomated Decisions

Models

Refresh learning

Data Store

Ads ampBudget

ModelScores

Events

5 RocketfuelWinning Ad

Proprietary amp Confidential Copyright copy 2014

Facebook likes

Searches on Google

Bid Requests Considered by Rocketfuel

5 B

6 B

45 B

Requests per day

Throughput

Proprietary amp Confidential Copyright copy 2014

Blink of an eye

SF to Tokyo network round trip

One beat of a hummindbirds wing

Look up in Blackbird

400

100

20

2

Time (ms)

Latency

Proprietary amp Confidential Copyright copy 2014

Architecture and Scale

raquoDatacentersraquoScaleraquoGrowthraquoArchitecture

Proprietary amp Confidential Copyright copy 2014

Data Center Expansion

raquoabc

Proprietary amp Confidential Copyright copy 2014

Data Center Design

bull Racks custom built at Rocket Fuelbull Leased spacebandwidth in colocation facilities

Hadoop Server24 2U servers (85kW)

Bidders40 2-U Twin 2 servers (17kW)

Proprietary amp Confidential Copyright copy 2014

Rocket Fuel Scale

raquo34474 CPU processor coresndash2655 serversndash1874 Teraflops of computing

raquo188 Terabytes of memoryndash13X the memory of IBM computer Watson that

played Jeopardy

raquo42PB Petabytes of storagendash106X the data volume of the entire Library of

Congress

Proprietary amp Confidential Copyright copy 2014

Hadoop at Rocket Fuel

raquo 1400 servers

raquo 15K Disks

raquo 15K Cores

raquo 90 TB

raquo 30K MR slots

raquo 12K daily MR jobs

Proprietary amp Confidential Copyright copy 2014

200 Servers 1400 Servers

1 Year

5 PB

41 PB8x

Growth

Proprietary amp Confidential Copyright copy 2014

Data Architecture 30

Proprietary amp Confidential Copyright copy 2014

Batch and Real Time Pipelines

Webservers

STORM

scribe

scribe

scribe

MySql

Zookeeper

Proprietary amp Confidential Copyright copy 2014

Hadoop Setup

QJM ZK Quorum

raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC

raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC

raquo same as DNrsquosraquo Dedicated disk

to ZK or JN

JT

Standby NN

ZKFCZKFC

Active NN

DNTT

DNTT

DNTT

DNTT

DNTT

DNTT

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Puppet+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you canraquo Adding a slave node to Hadoop cluster lt 120 secondsraquo Bringing up a new Hadoop cluster lt 500 secondsraquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsdatanodehandlercount dfsnamenodehandlercount

dfsdatanodemaxtransferthreads dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alerts gtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted TTrsquos

with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page httpltJT-hostnamegt50030scheduleradvanced httpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

Multi Tenancy

raquo create separate Queuesraquo Enable ACLrsquos for queuesraquo limit no of jobs per user and per queueraquo set pre-emption timeouts based on priorityraquo set weight based on priority

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Planraquo Near real time reporting over 15+ TB of daily dataraquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Quer

ies

Amazon Backup

LSV Data Clust

er

USEUHK Serving Clusters

Research

Ad-

hoc

Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 3: Big data summit

Proprietary amp Confidential Copyright copy 2014

Delivery Methods

raquoDisplayraquoVideoraquoMobileraquoSocial

Proprietary amp Confidential Copyright copy 2014

6 Ad Served

User Segments

3 Bid Reques

t

Overview

Publishers

2 Ad Request

1 Page Request

4 Bid amp Ad

User Engagemen

ts

Data Partners

Advertisers

Browser

Some Exchange Partners

Ad Exchange

Optimize

Rocket Fuel Platform

Real-time BidderAutomated Decisions

Models

Refresh learning

Data Store

Ads ampBudget

ModelScores

Events

5 RocketfuelWinning Ad

Proprietary amp Confidential Copyright copy 2014

$238965$06782$17234

$009$178964$16782$17234$0809$242125

$211$126

$2178$2056$0809$242125

$211$126$278$156

$1809$242125

$211$126$278$056$242125

$211$126$278

$0756$0809$242125

$211$126$278

$1256$1809$242125

$211$126$278

$0586$2009

125$211$126$278$156

$000

[ + ][ + ]

SitePageGeoWeatherTime of DayBrand AffinityUser

Always buying the best impressions amp serving the best ad

Real Time Bidding and Serving

Proprietary amp Confidential Copyright copy 2014

GoalLeadsamp sales

GoalCoupondownloads

GoalBrandawareness

SitePageGeoWeatherTime of DayBrand AffinityDemo

Impression ScorecardDemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-marketBehaviorResponse

Impression ScorecardDemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-MarketBehaviorResponse X

Impression ScorecardDemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-MarketBehaviorResponse

+100+40-20+20+15+10+40+35

+97

+40-70-20+10+15-25-40-18

+07

+10-10-20+20+10-35-25+10

+14

Real Time Bidding and Serving

Xuuml

Proprietary amp Confidential Copyright copy 2014

6 Ad Served

User Segments

3 Bid Reques

t

Overview

Publishers

2 Ad Request

1 Page Request

4 Bid amp Ad

User Engagemen

ts

Data Partners

Advertisers

Browser

Some Exchange Partners

Ad Exchange

Optimize

Rocket Fuel Platform

Real-time BidderAutomated Decisions

Models

Refresh learning

Data Store

Ads ampBudget

ModelScores

Events

5 RocketfuelWinning Ad

Proprietary amp Confidential Copyright copy 2014

Facebook likes

Searches on Google

Bid Requests Considered by Rocketfuel

5 B

6 B

45 B

Requests per day

Throughput

Proprietary amp Confidential Copyright copy 2014

Blink of an eye

SF to Tokyo network round trip

One beat of a hummindbirds wing

Look up in Blackbird

400

100

20

2

Time (ms)

Latency

Proprietary amp Confidential Copyright copy 2014

Architecture and Scale

raquoDatacentersraquoScaleraquoGrowthraquoArchitecture

Proprietary amp Confidential Copyright copy 2014

Data Center Expansion

raquoabc

Proprietary amp Confidential Copyright copy 2014

Data Center Design

bull Racks custom built at Rocket Fuelbull Leased spacebandwidth in colocation facilities

Hadoop Server24 2U servers (85kW)

Bidders40 2-U Twin 2 servers (17kW)

Proprietary amp Confidential Copyright copy 2014

Rocket Fuel Scale

raquo34474 CPU processor coresndash2655 serversndash1874 Teraflops of computing

raquo188 Terabytes of memoryndash13X the memory of IBM computer Watson that

played Jeopardy

raquo42PB Petabytes of storagendash106X the data volume of the entire Library of

Congress

Proprietary amp Confidential Copyright copy 2014

Hadoop at Rocket Fuel

raquo 1400 servers

raquo 15K Disks

raquo 15K Cores

raquo 90 TB

raquo 30K MR slots

raquo 12K daily MR jobs

Proprietary amp Confidential Copyright copy 2014

200 Servers 1400 Servers

1 Year

5 PB

41 PB8x

Growth

Proprietary amp Confidential Copyright copy 2014

Data Architecture 30

Proprietary amp Confidential Copyright copy 2014

Batch and Real Time Pipelines

Webservers

STORM

scribe

scribe

scribe

MySql

Zookeeper

Proprietary amp Confidential Copyright copy 2014

Hadoop Setup

QJM ZK Quorum

raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC

raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC

raquo same as DNrsquosraquo Dedicated disk

to ZK or JN

JT

Standby NN

ZKFCZKFC

Active NN

DNTT

DNTT

DNTT

DNTT

DNTT

DNTT

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Puppet+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you canraquo Adding a slave node to Hadoop cluster lt 120 secondsraquo Bringing up a new Hadoop cluster lt 500 secondsraquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsdatanodehandlercount dfsnamenodehandlercount

dfsdatanodemaxtransferthreads dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alerts gtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted TTrsquos

with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page httpltJT-hostnamegt50030scheduleradvanced httpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

Multi Tenancy

raquo create separate Queuesraquo Enable ACLrsquos for queuesraquo limit no of jobs per user and per queueraquo set pre-emption timeouts based on priorityraquo set weight based on priority

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Planraquo Near real time reporting over 15+ TB of daily dataraquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Quer

ies

Amazon Backup

LSV Data Clust

er

USEUHK Serving Clusters

Research

Ad-

hoc

Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 4: Big data summit

Proprietary amp Confidential Copyright copy 2014

6 Ad Served

User Segments

3 Bid Reques

t

Overview

Publishers

2 Ad Request

1 Page Request

4 Bid amp Ad

User Engagemen

ts

Data Partners

Advertisers

Browser

Some Exchange Partners

Ad Exchange

Optimize

Rocket Fuel Platform

Real-time BidderAutomated Decisions

Models

Refresh learning

Data Store

Ads ampBudget

ModelScores

Events

5 RocketfuelWinning Ad

Proprietary amp Confidential Copyright copy 2014

$238965$06782$17234

$009$178964$16782$17234$0809$242125

$211$126

$2178$2056$0809$242125

$211$126$278$156

$1809$242125

$211$126$278$056$242125

$211$126$278

$0756$0809$242125

$211$126$278

$1256$1809$242125

$211$126$278

$0586$2009

125$211$126$278$156

$000

[ + ][ + ]

SitePageGeoWeatherTime of DayBrand AffinityUser

Always buying the best impressions amp serving the best ad

Real Time Bidding and Serving

Proprietary amp Confidential Copyright copy 2014

GoalLeadsamp sales

GoalCoupondownloads

GoalBrandawareness

SitePageGeoWeatherTime of DayBrand AffinityDemo

Impression ScorecardDemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-marketBehaviorResponse

Impression ScorecardDemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-MarketBehaviorResponse X

Impression ScorecardDemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-MarketBehaviorResponse

+100+40-20+20+15+10+40+35

+97

+40-70-20+10+15-25-40-18

+07

+10-10-20+20+10-35-25+10

+14

Real Time Bidding and Serving

Xuuml

Proprietary amp Confidential Copyright copy 2014

6 Ad Served

User Segments

3 Bid Reques

t

Overview

Publishers

2 Ad Request

1 Page Request

4 Bid amp Ad

User Engagemen

ts

Data Partners

Advertisers

Browser

Some Exchange Partners

Ad Exchange

Optimize

Rocket Fuel Platform

Real-time BidderAutomated Decisions

Models

Refresh learning

Data Store

Ads ampBudget

ModelScores

Events

5 RocketfuelWinning Ad

Proprietary amp Confidential Copyright copy 2014

Facebook likes

Searches on Google

Bid Requests Considered by Rocketfuel

5 B

6 B

45 B

Requests per day

Throughput

Proprietary amp Confidential Copyright copy 2014

Blink of an eye

SF to Tokyo network round trip

One beat of a hummindbirds wing

Look up in Blackbird

400

100

20

2

Time (ms)

Latency

Proprietary amp Confidential Copyright copy 2014

Architecture and Scale

raquoDatacentersraquoScaleraquoGrowthraquoArchitecture

Proprietary amp Confidential Copyright copy 2014

Data Center Expansion

raquoabc

Proprietary amp Confidential Copyright copy 2014

Data Center Design

bull Racks custom built at Rocket Fuelbull Leased spacebandwidth in colocation facilities

Hadoop Server24 2U servers (85kW)

Bidders40 2-U Twin 2 servers (17kW)

Proprietary amp Confidential Copyright copy 2014

Rocket Fuel Scale

raquo34474 CPU processor coresndash2655 serversndash1874 Teraflops of computing

raquo188 Terabytes of memoryndash13X the memory of IBM computer Watson that

played Jeopardy

raquo42PB Petabytes of storagendash106X the data volume of the entire Library of

Congress

Proprietary amp Confidential Copyright copy 2014

Hadoop at Rocket Fuel

raquo 1400 servers

raquo 15K Disks

raquo 15K Cores

raquo 90 TB

raquo 30K MR slots

raquo 12K daily MR jobs

Proprietary amp Confidential Copyright copy 2014

200 Servers 1400 Servers

1 Year

5 PB

41 PB8x

Growth

Proprietary amp Confidential Copyright copy 2014

Data Architecture 30

Proprietary amp Confidential Copyright copy 2014

Batch and Real Time Pipelines

Webservers

STORM

scribe

scribe

scribe

MySql

Zookeeper

Proprietary amp Confidential Copyright copy 2014

Hadoop Setup

QJM ZK Quorum

raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC

raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC

raquo same as DNrsquosraquo Dedicated disk

to ZK or JN

JT

Standby NN

ZKFCZKFC

Active NN

DNTT

DNTT

DNTT

DNTT

DNTT

DNTT

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Puppet+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you canraquo Adding a slave node to Hadoop cluster lt 120 secondsraquo Bringing up a new Hadoop cluster lt 500 secondsraquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsdatanodehandlercount dfsnamenodehandlercount

dfsdatanodemaxtransferthreads dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alerts gtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted TTrsquos

with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page httpltJT-hostnamegt50030scheduleradvanced httpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

Multi Tenancy

raquo create separate Queuesraquo Enable ACLrsquos for queuesraquo limit no of jobs per user and per queueraquo set pre-emption timeouts based on priorityraquo set weight based on priority

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Planraquo Near real time reporting over 15+ TB of daily dataraquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Quer

ies

Amazon Backup

LSV Data Clust

er

USEUHK Serving Clusters

Research

Ad-

hoc

Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 5: Big data summit

Proprietary amp Confidential Copyright copy 2014

$238965$06782$17234

$009$178964$16782$17234$0809$242125

$211$126

$2178$2056$0809$242125

$211$126$278$156

$1809$242125

$211$126$278$056$242125

$211$126$278

$0756$0809$242125

$211$126$278

$1256$1809$242125

$211$126$278

$0586$2009

125$211$126$278$156

$000

[ + ][ + ]

SitePageGeoWeatherTime of DayBrand AffinityUser

Always buying the best impressions amp serving the best ad

Real Time Bidding and Serving

Proprietary amp Confidential Copyright copy 2014

GoalLeadsamp sales

GoalCoupondownloads

GoalBrandawareness

SitePageGeoWeatherTime of DayBrand AffinityDemo

Impression ScorecardDemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-marketBehaviorResponse

Impression ScorecardDemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-MarketBehaviorResponse X

Impression ScorecardDemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-MarketBehaviorResponse

+100+40-20+20+15+10+40+35

+97

+40-70-20+10+15-25-40-18

+07

+10-10-20+20+10-35-25+10

+14

Real Time Bidding and Serving

Xuuml

Proprietary amp Confidential Copyright copy 2014

6 Ad Served

User Segments

3 Bid Reques

t

Overview

Publishers

2 Ad Request

1 Page Request

4 Bid amp Ad

User Engagemen

ts

Data Partners

Advertisers

Browser

Some Exchange Partners

Ad Exchange

Optimize

Rocket Fuel Platform

Real-time BidderAutomated Decisions

Models

Refresh learning

Data Store

Ads ampBudget

ModelScores

Events

5 RocketfuelWinning Ad

Proprietary amp Confidential Copyright copy 2014

Facebook likes

Searches on Google

Bid Requests Considered by Rocketfuel

5 B

6 B

45 B

Requests per day

Throughput

Proprietary amp Confidential Copyright copy 2014

Blink of an eye

SF to Tokyo network round trip

One beat of a hummindbirds wing

Look up in Blackbird

400

100

20

2

Time (ms)

Latency

Proprietary amp Confidential Copyright copy 2014

Architecture and Scale

raquoDatacentersraquoScaleraquoGrowthraquoArchitecture

Proprietary amp Confidential Copyright copy 2014

Data Center Expansion

raquoabc

Proprietary amp Confidential Copyright copy 2014

Data Center Design

bull Racks custom built at Rocket Fuelbull Leased spacebandwidth in colocation facilities

Hadoop Server24 2U servers (85kW)

Bidders40 2-U Twin 2 servers (17kW)

Proprietary amp Confidential Copyright copy 2014

Rocket Fuel Scale

raquo34474 CPU processor coresndash2655 serversndash1874 Teraflops of computing

raquo188 Terabytes of memoryndash13X the memory of IBM computer Watson that

played Jeopardy

raquo42PB Petabytes of storagendash106X the data volume of the entire Library of

Congress

Proprietary amp Confidential Copyright copy 2014

Hadoop at Rocket Fuel

raquo 1400 servers

raquo 15K Disks

raquo 15K Cores

raquo 90 TB

raquo 30K MR slots

raquo 12K daily MR jobs

Proprietary amp Confidential Copyright copy 2014

200 Servers 1400 Servers

1 Year

5 PB

41 PB8x

Growth

Proprietary amp Confidential Copyright copy 2014

Data Architecture 30

Proprietary amp Confidential Copyright copy 2014

Batch and Real Time Pipelines

Webservers

STORM

scribe

scribe

scribe

MySql

Zookeeper

Proprietary amp Confidential Copyright copy 2014

Hadoop Setup

QJM ZK Quorum

raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC

raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC

raquo same as DNrsquosraquo Dedicated disk

to ZK or JN

JT

Standby NN

ZKFCZKFC

Active NN

DNTT

DNTT

DNTT

DNTT

DNTT

DNTT

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Puppet+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you canraquo Adding a slave node to Hadoop cluster lt 120 secondsraquo Bringing up a new Hadoop cluster lt 500 secondsraquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsdatanodehandlercount dfsnamenodehandlercount

dfsdatanodemaxtransferthreads dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alerts gtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted TTrsquos

with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page httpltJT-hostnamegt50030scheduleradvanced httpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

Multi Tenancy

raquo create separate Queuesraquo Enable ACLrsquos for queuesraquo limit no of jobs per user and per queueraquo set pre-emption timeouts based on priorityraquo set weight based on priority

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Planraquo Near real time reporting over 15+ TB of daily dataraquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Quer

ies

Amazon Backup

LSV Data Clust

er

USEUHK Serving Clusters

Research

Ad-

hoc

Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 6: Big data summit

Proprietary amp Confidential Copyright copy 2014

GoalLeadsamp sales

GoalCoupondownloads

GoalBrandawareness

SitePageGeoWeatherTime of DayBrand AffinityDemo

Impression ScorecardDemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-marketBehaviorResponse

Impression ScorecardDemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-MarketBehaviorResponse X

Impression ScorecardDemoBrand AffinityTime of DayGeoWeatherSitePageAd PositionIn-MarketBehaviorResponse

+100+40-20+20+15+10+40+35

+97

+40-70-20+10+15-25-40-18

+07

+10-10-20+20+10-35-25+10

+14

Real Time Bidding and Serving

Xuuml

Proprietary amp Confidential Copyright copy 2014

6 Ad Served

User Segments

3 Bid Reques

t

Overview

Publishers

2 Ad Request

1 Page Request

4 Bid amp Ad

User Engagemen

ts

Data Partners

Advertisers

Browser

Some Exchange Partners

Ad Exchange

Optimize

Rocket Fuel Platform

Real-time BidderAutomated Decisions

Models

Refresh learning

Data Store

Ads ampBudget

ModelScores

Events

5 RocketfuelWinning Ad

Proprietary amp Confidential Copyright copy 2014

Facebook likes

Searches on Google

Bid Requests Considered by Rocketfuel

5 B

6 B

45 B

Requests per day

Throughput

Proprietary amp Confidential Copyright copy 2014

Blink of an eye

SF to Tokyo network round trip

One beat of a hummindbirds wing

Look up in Blackbird

400

100

20

2

Time (ms)

Latency

Proprietary amp Confidential Copyright copy 2014

Architecture and Scale

raquoDatacentersraquoScaleraquoGrowthraquoArchitecture

Proprietary amp Confidential Copyright copy 2014

Data Center Expansion

raquoabc

Proprietary amp Confidential Copyright copy 2014

Data Center Design

bull Racks custom built at Rocket Fuelbull Leased spacebandwidth in colocation facilities

Hadoop Server24 2U servers (85kW)

Bidders40 2-U Twin 2 servers (17kW)

Proprietary amp Confidential Copyright copy 2014

Rocket Fuel Scale

raquo34474 CPU processor coresndash2655 serversndash1874 Teraflops of computing

raquo188 Terabytes of memoryndash13X the memory of IBM computer Watson that

played Jeopardy

raquo42PB Petabytes of storagendash106X the data volume of the entire Library of

Congress

Proprietary amp Confidential Copyright copy 2014

Hadoop at Rocket Fuel

raquo 1400 servers

raquo 15K Disks

raquo 15K Cores

raquo 90 TB

raquo 30K MR slots

raquo 12K daily MR jobs

Proprietary amp Confidential Copyright copy 2014

200 Servers 1400 Servers

1 Year

5 PB

41 PB8x

Growth

Proprietary amp Confidential Copyright copy 2014

Data Architecture 30

Proprietary amp Confidential Copyright copy 2014

Batch and Real Time Pipelines

Webservers

STORM

scribe

scribe

scribe

MySql

Zookeeper

Proprietary amp Confidential Copyright copy 2014

Hadoop Setup

QJM ZK Quorum

raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC

raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC

raquo same as DNrsquosraquo Dedicated disk

to ZK or JN

JT

Standby NN

ZKFCZKFC

Active NN

DNTT

DNTT

DNTT

DNTT

DNTT

DNTT

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Puppet+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you canraquo Adding a slave node to Hadoop cluster lt 120 secondsraquo Bringing up a new Hadoop cluster lt 500 secondsraquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsdatanodehandlercount dfsnamenodehandlercount

dfsdatanodemaxtransferthreads dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alerts gtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted TTrsquos

with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page httpltJT-hostnamegt50030scheduleradvanced httpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

Multi Tenancy

raquo create separate Queuesraquo Enable ACLrsquos for queuesraquo limit no of jobs per user and per queueraquo set pre-emption timeouts based on priorityraquo set weight based on priority

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Planraquo Near real time reporting over 15+ TB of daily dataraquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Quer

ies

Amazon Backup

LSV Data Clust

er

USEUHK Serving Clusters

Research

Ad-

hoc

Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 7: Big data summit

Proprietary amp Confidential Copyright copy 2014

6 Ad Served

User Segments

3 Bid Reques

t

Overview

Publishers

2 Ad Request

1 Page Request

4 Bid amp Ad

User Engagemen

ts

Data Partners

Advertisers

Browser

Some Exchange Partners

Ad Exchange

Optimize

Rocket Fuel Platform

Real-time BidderAutomated Decisions

Models

Refresh learning

Data Store

Ads ampBudget

ModelScores

Events

5 RocketfuelWinning Ad

Proprietary amp Confidential Copyright copy 2014

Facebook likes

Searches on Google

Bid Requests Considered by Rocketfuel

5 B

6 B

45 B

Requests per day

Throughput

Proprietary amp Confidential Copyright copy 2014

Blink of an eye

SF to Tokyo network round trip

One beat of a hummindbirds wing

Look up in Blackbird

400

100

20

2

Time (ms)

Latency

Proprietary amp Confidential Copyright copy 2014

Architecture and Scale

raquoDatacentersraquoScaleraquoGrowthraquoArchitecture

Proprietary amp Confidential Copyright copy 2014

Data Center Expansion

raquoabc

Proprietary amp Confidential Copyright copy 2014

Data Center Design

bull Racks custom built at Rocket Fuelbull Leased spacebandwidth in colocation facilities

Hadoop Server24 2U servers (85kW)

Bidders40 2-U Twin 2 servers (17kW)

Proprietary amp Confidential Copyright copy 2014

Rocket Fuel Scale

raquo34474 CPU processor coresndash2655 serversndash1874 Teraflops of computing

raquo188 Terabytes of memoryndash13X the memory of IBM computer Watson that

played Jeopardy

raquo42PB Petabytes of storagendash106X the data volume of the entire Library of

Congress

Proprietary amp Confidential Copyright copy 2014

Hadoop at Rocket Fuel

raquo 1400 servers

raquo 15K Disks

raquo 15K Cores

raquo 90 TB

raquo 30K MR slots

raquo 12K daily MR jobs

Proprietary amp Confidential Copyright copy 2014

200 Servers 1400 Servers

1 Year

5 PB

41 PB8x

Growth

Proprietary amp Confidential Copyright copy 2014

Data Architecture 30

Proprietary amp Confidential Copyright copy 2014

Batch and Real Time Pipelines

Webservers

STORM

scribe

scribe

scribe

MySql

Zookeeper

Proprietary amp Confidential Copyright copy 2014

Hadoop Setup

QJM ZK Quorum

raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC

raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC

raquo same as DNrsquosraquo Dedicated disk

to ZK or JN

JT

Standby NN

ZKFCZKFC

Active NN

DNTT

DNTT

DNTT

DNTT

DNTT

DNTT

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Puppet+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you canraquo Adding a slave node to Hadoop cluster lt 120 secondsraquo Bringing up a new Hadoop cluster lt 500 secondsraquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsdatanodehandlercount dfsnamenodehandlercount

dfsdatanodemaxtransferthreads dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alerts gtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted TTrsquos

with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page httpltJT-hostnamegt50030scheduleradvanced httpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

Multi Tenancy

raquo create separate Queuesraquo Enable ACLrsquos for queuesraquo limit no of jobs per user and per queueraquo set pre-emption timeouts based on priorityraquo set weight based on priority

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Planraquo Near real time reporting over 15+ TB of daily dataraquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Quer

ies

Amazon Backup

LSV Data Clust

er

USEUHK Serving Clusters

Research

Ad-

hoc

Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 8: Big data summit

Proprietary amp Confidential Copyright copy 2014

Facebook likes

Searches on Google

Bid Requests Considered by Rocketfuel

5 B

6 B

45 B

Requests per day

Throughput

Proprietary amp Confidential Copyright copy 2014

Blink of an eye

SF to Tokyo network round trip

One beat of a hummindbirds wing

Look up in Blackbird

400

100

20

2

Time (ms)

Latency

Proprietary amp Confidential Copyright copy 2014

Architecture and Scale

raquoDatacentersraquoScaleraquoGrowthraquoArchitecture

Proprietary amp Confidential Copyright copy 2014

Data Center Expansion

raquoabc

Proprietary amp Confidential Copyright copy 2014

Data Center Design

bull Racks custom built at Rocket Fuelbull Leased spacebandwidth in colocation facilities

Hadoop Server24 2U servers (85kW)

Bidders40 2-U Twin 2 servers (17kW)

Proprietary amp Confidential Copyright copy 2014

Rocket Fuel Scale

raquo34474 CPU processor coresndash2655 serversndash1874 Teraflops of computing

raquo188 Terabytes of memoryndash13X the memory of IBM computer Watson that

played Jeopardy

raquo42PB Petabytes of storagendash106X the data volume of the entire Library of

Congress

Proprietary amp Confidential Copyright copy 2014

Hadoop at Rocket Fuel

raquo 1400 servers

raquo 15K Disks

raquo 15K Cores

raquo 90 TB

raquo 30K MR slots

raquo 12K daily MR jobs

Proprietary amp Confidential Copyright copy 2014

200 Servers 1400 Servers

1 Year

5 PB

41 PB8x

Growth

Proprietary amp Confidential Copyright copy 2014

Data Architecture 30

Proprietary amp Confidential Copyright copy 2014

Batch and Real Time Pipelines

Webservers

STORM

scribe

scribe

scribe

MySql

Zookeeper

Proprietary amp Confidential Copyright copy 2014

Hadoop Setup

QJM ZK Quorum

raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC

raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC

raquo same as DNrsquosraquo Dedicated disk

to ZK or JN

JT

Standby NN

ZKFCZKFC

Active NN

DNTT

DNTT

DNTT

DNTT

DNTT

DNTT

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Puppet+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you canraquo Adding a slave node to Hadoop cluster lt 120 secondsraquo Bringing up a new Hadoop cluster lt 500 secondsraquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsdatanodehandlercount dfsnamenodehandlercount

dfsdatanodemaxtransferthreads dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alerts gtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted TTrsquos

with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page httpltJT-hostnamegt50030scheduleradvanced httpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

Multi Tenancy

raquo create separate Queuesraquo Enable ACLrsquos for queuesraquo limit no of jobs per user and per queueraquo set pre-emption timeouts based on priorityraquo set weight based on priority

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Planraquo Near real time reporting over 15+ TB of daily dataraquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Quer

ies

Amazon Backup

LSV Data Clust

er

USEUHK Serving Clusters

Research

Ad-

hoc

Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 9: Big data summit

Proprietary amp Confidential Copyright copy 2014

Blink of an eye

SF to Tokyo network round trip

One beat of a hummindbirds wing

Look up in Blackbird

400

100

20

2

Time (ms)

Latency

Proprietary amp Confidential Copyright copy 2014

Architecture and Scale

raquoDatacentersraquoScaleraquoGrowthraquoArchitecture

Proprietary amp Confidential Copyright copy 2014

Data Center Expansion

raquoabc

Proprietary amp Confidential Copyright copy 2014

Data Center Design

bull Racks custom built at Rocket Fuelbull Leased spacebandwidth in colocation facilities

Hadoop Server24 2U servers (85kW)

Bidders40 2-U Twin 2 servers (17kW)

Proprietary amp Confidential Copyright copy 2014

Rocket Fuel Scale

raquo34474 CPU processor coresndash2655 serversndash1874 Teraflops of computing

raquo188 Terabytes of memoryndash13X the memory of IBM computer Watson that

played Jeopardy

raquo42PB Petabytes of storagendash106X the data volume of the entire Library of

Congress

Proprietary amp Confidential Copyright copy 2014

Hadoop at Rocket Fuel

raquo 1400 servers

raquo 15K Disks

raquo 15K Cores

raquo 90 TB

raquo 30K MR slots

raquo 12K daily MR jobs

Proprietary amp Confidential Copyright copy 2014

200 Servers 1400 Servers

1 Year

5 PB

41 PB8x

Growth

Proprietary amp Confidential Copyright copy 2014

Data Architecture 30

Proprietary amp Confidential Copyright copy 2014

Batch and Real Time Pipelines

Webservers

STORM

scribe

scribe

scribe

MySql

Zookeeper

Proprietary amp Confidential Copyright copy 2014

Hadoop Setup

QJM ZK Quorum

raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC

raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC

raquo same as DNrsquosraquo Dedicated disk

to ZK or JN

JT

Standby NN

ZKFCZKFC

Active NN

DNTT

DNTT

DNTT

DNTT

DNTT

DNTT

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Puppet+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you canraquo Adding a slave node to Hadoop cluster lt 120 secondsraquo Bringing up a new Hadoop cluster lt 500 secondsraquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsdatanodehandlercount dfsnamenodehandlercount

dfsdatanodemaxtransferthreads dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alerts gtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted TTrsquos

with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page httpltJT-hostnamegt50030scheduleradvanced httpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

Multi Tenancy

raquo create separate Queuesraquo Enable ACLrsquos for queuesraquo limit no of jobs per user and per queueraquo set pre-emption timeouts based on priorityraquo set weight based on priority

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Planraquo Near real time reporting over 15+ TB of daily dataraquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Quer

ies

Amazon Backup

LSV Data Clust

er

USEUHK Serving Clusters

Research

Ad-

hoc

Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 10: Big data summit

Proprietary amp Confidential Copyright copy 2014

Architecture and Scale

raquoDatacentersraquoScaleraquoGrowthraquoArchitecture

Proprietary amp Confidential Copyright copy 2014

Data Center Expansion

raquoabc

Proprietary amp Confidential Copyright copy 2014

Data Center Design

bull Racks custom built at Rocket Fuelbull Leased spacebandwidth in colocation facilities

Hadoop Server24 2U servers (85kW)

Bidders40 2-U Twin 2 servers (17kW)

Proprietary amp Confidential Copyright copy 2014

Rocket Fuel Scale

raquo34474 CPU processor coresndash2655 serversndash1874 Teraflops of computing

raquo188 Terabytes of memoryndash13X the memory of IBM computer Watson that

played Jeopardy

raquo42PB Petabytes of storagendash106X the data volume of the entire Library of

Congress

Proprietary amp Confidential Copyright copy 2014

Hadoop at Rocket Fuel

raquo 1400 servers

raquo 15K Disks

raquo 15K Cores

raquo 90 TB

raquo 30K MR slots

raquo 12K daily MR jobs

Proprietary amp Confidential Copyright copy 2014

200 Servers 1400 Servers

1 Year

5 PB

41 PB8x

Growth

Proprietary amp Confidential Copyright copy 2014

Data Architecture 30

Proprietary amp Confidential Copyright copy 2014

Batch and Real Time Pipelines

Webservers

STORM

scribe

scribe

scribe

MySql

Zookeeper

Proprietary amp Confidential Copyright copy 2014

Hadoop Setup

QJM ZK Quorum

raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC

raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC

raquo same as DNrsquosraquo Dedicated disk

to ZK or JN

JT

Standby NN

ZKFCZKFC

Active NN

DNTT

DNTT

DNTT

DNTT

DNTT

DNTT

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Puppet+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you canraquo Adding a slave node to Hadoop cluster lt 120 secondsraquo Bringing up a new Hadoop cluster lt 500 secondsraquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsdatanodehandlercount dfsnamenodehandlercount

dfsdatanodemaxtransferthreads dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alerts gtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted TTrsquos

with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page httpltJT-hostnamegt50030scheduleradvanced httpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

Multi Tenancy

raquo create separate Queuesraquo Enable ACLrsquos for queuesraquo limit no of jobs per user and per queueraquo set pre-emption timeouts based on priorityraquo set weight based on priority

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Planraquo Near real time reporting over 15+ TB of daily dataraquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Quer

ies

Amazon Backup

LSV Data Clust

er

USEUHK Serving Clusters

Research

Ad-

hoc

Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 11: Big data summit

Proprietary amp Confidential Copyright copy 2014

Data Center Expansion

raquoabc

Proprietary amp Confidential Copyright copy 2014

Data Center Design

bull Racks custom built at Rocket Fuelbull Leased spacebandwidth in colocation facilities

Hadoop Server24 2U servers (85kW)

Bidders40 2-U Twin 2 servers (17kW)

Proprietary amp Confidential Copyright copy 2014

Rocket Fuel Scale

raquo34474 CPU processor coresndash2655 serversndash1874 Teraflops of computing

raquo188 Terabytes of memoryndash13X the memory of IBM computer Watson that

played Jeopardy

raquo42PB Petabytes of storagendash106X the data volume of the entire Library of

Congress

Proprietary amp Confidential Copyright copy 2014

Hadoop at Rocket Fuel

raquo 1400 servers

raquo 15K Disks

raquo 15K Cores

raquo 90 TB

raquo 30K MR slots

raquo 12K daily MR jobs

Proprietary amp Confidential Copyright copy 2014

200 Servers 1400 Servers

1 Year

5 PB

41 PB8x

Growth

Proprietary amp Confidential Copyright copy 2014

Data Architecture 30

Proprietary amp Confidential Copyright copy 2014

Batch and Real Time Pipelines

Webservers

STORM

scribe

scribe

scribe

MySql

Zookeeper

Proprietary amp Confidential Copyright copy 2014

Hadoop Setup

QJM ZK Quorum

raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC

raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC

raquo same as DNrsquosraquo Dedicated disk

to ZK or JN

JT

Standby NN

ZKFCZKFC

Active NN

DNTT

DNTT

DNTT

DNTT

DNTT

DNTT

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Puppet+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you canraquo Adding a slave node to Hadoop cluster lt 120 secondsraquo Bringing up a new Hadoop cluster lt 500 secondsraquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsdatanodehandlercount dfsnamenodehandlercount

dfsdatanodemaxtransferthreads dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alerts gtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted TTrsquos

with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page httpltJT-hostnamegt50030scheduleradvanced httpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

Multi Tenancy

raquo create separate Queuesraquo Enable ACLrsquos for queuesraquo limit no of jobs per user and per queueraquo set pre-emption timeouts based on priorityraquo set weight based on priority

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Planraquo Near real time reporting over 15+ TB of daily dataraquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Quer

ies

Amazon Backup

LSV Data Clust

er

USEUHK Serving Clusters

Research

Ad-

hoc

Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 12: Big data summit

Proprietary amp Confidential Copyright copy 2014

Data Center Design

bull Racks custom built at Rocket Fuelbull Leased spacebandwidth in colocation facilities

Hadoop Server24 2U servers (85kW)

Bidders40 2-U Twin 2 servers (17kW)

Proprietary amp Confidential Copyright copy 2014

Rocket Fuel Scale

raquo34474 CPU processor coresndash2655 serversndash1874 Teraflops of computing

raquo188 Terabytes of memoryndash13X the memory of IBM computer Watson that

played Jeopardy

raquo42PB Petabytes of storagendash106X the data volume of the entire Library of

Congress

Proprietary amp Confidential Copyright copy 2014

Hadoop at Rocket Fuel

raquo 1400 servers

raquo 15K Disks

raquo 15K Cores

raquo 90 TB

raquo 30K MR slots

raquo 12K daily MR jobs

Proprietary amp Confidential Copyright copy 2014

200 Servers 1400 Servers

1 Year

5 PB

41 PB8x

Growth

Proprietary amp Confidential Copyright copy 2014

Data Architecture 30

Proprietary amp Confidential Copyright copy 2014

Batch and Real Time Pipelines

Webservers

STORM

scribe

scribe

scribe

MySql

Zookeeper

Proprietary amp Confidential Copyright copy 2014

Hadoop Setup

QJM ZK Quorum

raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC

raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC

raquo same as DNrsquosraquo Dedicated disk

to ZK or JN

JT

Standby NN

ZKFCZKFC

Active NN

DNTT

DNTT

DNTT

DNTT

DNTT

DNTT

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Puppet+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you canraquo Adding a slave node to Hadoop cluster lt 120 secondsraquo Bringing up a new Hadoop cluster lt 500 secondsraquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsdatanodehandlercount dfsnamenodehandlercount

dfsdatanodemaxtransferthreads dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alerts gtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted TTrsquos

with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page httpltJT-hostnamegt50030scheduleradvanced httpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

Multi Tenancy

raquo create separate Queuesraquo Enable ACLrsquos for queuesraquo limit no of jobs per user and per queueraquo set pre-emption timeouts based on priorityraquo set weight based on priority

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Planraquo Near real time reporting over 15+ TB of daily dataraquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Quer

ies

Amazon Backup

LSV Data Clust

er

USEUHK Serving Clusters

Research

Ad-

hoc

Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 13: Big data summit

Proprietary amp Confidential Copyright copy 2014

Rocket Fuel Scale

raquo34474 CPU processor coresndash2655 serversndash1874 Teraflops of computing

raquo188 Terabytes of memoryndash13X the memory of IBM computer Watson that

played Jeopardy

raquo42PB Petabytes of storagendash106X the data volume of the entire Library of

Congress

Proprietary amp Confidential Copyright copy 2014

Hadoop at Rocket Fuel

raquo 1400 servers

raquo 15K Disks

raquo 15K Cores

raquo 90 TB

raquo 30K MR slots

raquo 12K daily MR jobs

Proprietary amp Confidential Copyright copy 2014

200 Servers 1400 Servers

1 Year

5 PB

41 PB8x

Growth

Proprietary amp Confidential Copyright copy 2014

Data Architecture 30

Proprietary amp Confidential Copyright copy 2014

Batch and Real Time Pipelines

Webservers

STORM

scribe

scribe

scribe

MySql

Zookeeper

Proprietary amp Confidential Copyright copy 2014

Hadoop Setup

QJM ZK Quorum

raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC

raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC

raquo same as DNrsquosraquo Dedicated disk

to ZK or JN

JT

Standby NN

ZKFCZKFC

Active NN

DNTT

DNTT

DNTT

DNTT

DNTT

DNTT

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Puppet+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you canraquo Adding a slave node to Hadoop cluster lt 120 secondsraquo Bringing up a new Hadoop cluster lt 500 secondsraquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsdatanodehandlercount dfsnamenodehandlercount

dfsdatanodemaxtransferthreads dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alerts gtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted TTrsquos

with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page httpltJT-hostnamegt50030scheduleradvanced httpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

Multi Tenancy

raquo create separate Queuesraquo Enable ACLrsquos for queuesraquo limit no of jobs per user and per queueraquo set pre-emption timeouts based on priorityraquo set weight based on priority

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Planraquo Near real time reporting over 15+ TB of daily dataraquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Quer

ies

Amazon Backup

LSV Data Clust

er

USEUHK Serving Clusters

Research

Ad-

hoc

Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 14: Big data summit

Proprietary amp Confidential Copyright copy 2014

Hadoop at Rocket Fuel

raquo 1400 servers

raquo 15K Disks

raquo 15K Cores

raquo 90 TB

raquo 30K MR slots

raquo 12K daily MR jobs

Proprietary amp Confidential Copyright copy 2014

200 Servers 1400 Servers

1 Year

5 PB

41 PB8x

Growth

Proprietary amp Confidential Copyright copy 2014

Data Architecture 30

Proprietary amp Confidential Copyright copy 2014

Batch and Real Time Pipelines

Webservers

STORM

scribe

scribe

scribe

MySql

Zookeeper

Proprietary amp Confidential Copyright copy 2014

Hadoop Setup

QJM ZK Quorum

raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC

raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC

raquo same as DNrsquosraquo Dedicated disk

to ZK or JN

JT

Standby NN

ZKFCZKFC

Active NN

DNTT

DNTT

DNTT

DNTT

DNTT

DNTT

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Puppet+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you canraquo Adding a slave node to Hadoop cluster lt 120 secondsraquo Bringing up a new Hadoop cluster lt 500 secondsraquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsdatanodehandlercount dfsnamenodehandlercount

dfsdatanodemaxtransferthreads dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alerts gtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted TTrsquos

with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page httpltJT-hostnamegt50030scheduleradvanced httpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

Multi Tenancy

raquo create separate Queuesraquo Enable ACLrsquos for queuesraquo limit no of jobs per user and per queueraquo set pre-emption timeouts based on priorityraquo set weight based on priority

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Planraquo Near real time reporting over 15+ TB of daily dataraquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Quer

ies

Amazon Backup

LSV Data Clust

er

USEUHK Serving Clusters

Research

Ad-

hoc

Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 15: Big data summit

Proprietary amp Confidential Copyright copy 2014

200 Servers 1400 Servers

1 Year

5 PB

41 PB8x

Growth

Proprietary amp Confidential Copyright copy 2014

Data Architecture 30

Proprietary amp Confidential Copyright copy 2014

Batch and Real Time Pipelines

Webservers

STORM

scribe

scribe

scribe

MySql

Zookeeper

Proprietary amp Confidential Copyright copy 2014

Hadoop Setup

QJM ZK Quorum

raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC

raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC

raquo same as DNrsquosraquo Dedicated disk

to ZK or JN

JT

Standby NN

ZKFCZKFC

Active NN

DNTT

DNTT

DNTT

DNTT

DNTT

DNTT

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Puppet+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you canraquo Adding a slave node to Hadoop cluster lt 120 secondsraquo Bringing up a new Hadoop cluster lt 500 secondsraquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsdatanodehandlercount dfsnamenodehandlercount

dfsdatanodemaxtransferthreads dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alerts gtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted TTrsquos

with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page httpltJT-hostnamegt50030scheduleradvanced httpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

Multi Tenancy

raquo create separate Queuesraquo Enable ACLrsquos for queuesraquo limit no of jobs per user and per queueraquo set pre-emption timeouts based on priorityraquo set weight based on priority

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Planraquo Near real time reporting over 15+ TB of daily dataraquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Quer

ies

Amazon Backup

LSV Data Clust

er

USEUHK Serving Clusters

Research

Ad-

hoc

Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 16: Big data summit

Proprietary amp Confidential Copyright copy 2014

Data Architecture 30

Proprietary amp Confidential Copyright copy 2014

Batch and Real Time Pipelines

Webservers

STORM

scribe

scribe

scribe

MySql

Zookeeper

Proprietary amp Confidential Copyright copy 2014

Hadoop Setup

QJM ZK Quorum

raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC

raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC

raquo same as DNrsquosraquo Dedicated disk

to ZK or JN

JT

Standby NN

ZKFCZKFC

Active NN

DNTT

DNTT

DNTT

DNTT

DNTT

DNTT

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Puppet+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you canraquo Adding a slave node to Hadoop cluster lt 120 secondsraquo Bringing up a new Hadoop cluster lt 500 secondsraquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsdatanodehandlercount dfsnamenodehandlercount

dfsdatanodemaxtransferthreads dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alerts gtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted TTrsquos

with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page httpltJT-hostnamegt50030scheduleradvanced httpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

Multi Tenancy

raquo create separate Queuesraquo Enable ACLrsquos for queuesraquo limit no of jobs per user and per queueraquo set pre-emption timeouts based on priorityraquo set weight based on priority

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Planraquo Near real time reporting over 15+ TB of daily dataraquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Quer

ies

Amazon Backup

LSV Data Clust

er

USEUHK Serving Clusters

Research

Ad-

hoc

Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 17: Big data summit

Proprietary amp Confidential Copyright copy 2014

Batch and Real Time Pipelines

Webservers

STORM

scribe

scribe

scribe

MySql

Zookeeper

Proprietary amp Confidential Copyright copy 2014

Hadoop Setup

QJM ZK Quorum

raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC

raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC

raquo same as DNrsquosraquo Dedicated disk

to ZK or JN

JT

Standby NN

ZKFCZKFC

Active NN

DNTT

DNTT

DNTT

DNTT

DNTT

DNTT

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Puppet+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you canraquo Adding a slave node to Hadoop cluster lt 120 secondsraquo Bringing up a new Hadoop cluster lt 500 secondsraquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsdatanodehandlercount dfsnamenodehandlercount

dfsdatanodemaxtransferthreads dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alerts gtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted TTrsquos

with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page httpltJT-hostnamegt50030scheduleradvanced httpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

Multi Tenancy

raquo create separate Queuesraquo Enable ACLrsquos for queuesraquo limit no of jobs per user and per queueraquo set pre-emption timeouts based on priorityraquo set weight based on priority

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Planraquo Near real time reporting over 15+ TB of daily dataraquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Quer

ies

Amazon Backup

LSV Data Clust

er

USEUHK Serving Clusters

Research

Ad-

hoc

Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 18: Big data summit

Proprietary amp Confidential Copyright copy 2014

Hadoop Setup

QJM ZK Quorum

raquo 6x2TB Disksraquo 2x6 coreraquo 196 GB RAMraquo 2x1G NIC

raquo 12x3TB Disksraquo 2x6 coreraquo 64 GB RAMraquo 10G NIC

raquo same as DNrsquosraquo Dedicated disk

to ZK or JN

JT

Standby NN

ZKFCZKFC

Active NN

DNTT

DNTT

DNTT

DNTT

DNTT

DNTT

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Puppet+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you canraquo Adding a slave node to Hadoop cluster lt 120 secondsraquo Bringing up a new Hadoop cluster lt 500 secondsraquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsdatanodehandlercount dfsnamenodehandlercount

dfsdatanodemaxtransferthreads dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alerts gtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted TTrsquos

with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page httpltJT-hostnamegt50030scheduleradvanced httpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

Multi Tenancy

raquo create separate Queuesraquo Enable ACLrsquos for queuesraquo limit no of jobs per user and per queueraquo set pre-emption timeouts based on priorityraquo set weight based on priority

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Planraquo Near real time reporting over 15+ TB of daily dataraquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Quer

ies

Amazon Backup

LSV Data Clust

er

USEUHK Serving Clusters

Research

Ad-

hoc

Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 19: Big data summit

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Puppet+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you canraquo Adding a slave node to Hadoop cluster lt 120 secondsraquo Bringing up a new Hadoop cluster lt 500 secondsraquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsdatanodehandlercount dfsnamenodehandlercount

dfsdatanodemaxtransferthreads dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alerts gtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted TTrsquos

with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page httpltJT-hostnamegt50030scheduleradvanced httpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

Multi Tenancy

raquo create separate Queuesraquo Enable ACLrsquos for queuesraquo limit no of jobs per user and per queueraquo set pre-emption timeouts based on priorityraquo set weight based on priority

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Planraquo Near real time reporting over 15+ TB of daily dataraquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Quer

ies

Amazon Backup

LSV Data Clust

er

USEUHK Serving Clusters

Research

Ad-

hoc

Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 20: Big data summit

Proprietary amp Confidential Copyright copy 2014

Puppet+

Infradb

Automation is key

Maintenance is Not Easy

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you canraquo Adding a slave node to Hadoop cluster lt 120 secondsraquo Bringing up a new Hadoop cluster lt 500 secondsraquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsdatanodehandlercount dfsnamenodehandlercount

dfsdatanodemaxtransferthreads dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alerts gtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted TTrsquos

with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page httpltJT-hostnamegt50030scheduleradvanced httpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

Multi Tenancy

raquo create separate Queuesraquo Enable ACLrsquos for queuesraquo limit no of jobs per user and per queueraquo set pre-emption timeouts based on priorityraquo set weight based on priority

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Planraquo Near real time reporting over 15+ TB of daily dataraquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Quer

ies

Amazon Backup

LSV Data Clust

er

USEUHK Serving Clusters

Research

Ad-

hoc

Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 21: Big data summit

Proprietary amp Confidential Copyright copy 2014

Puppet and Infradb

raquo Automate as much as you canraquo Adding a slave node to Hadoop cluster lt 120 secondsraquo Bringing up a new Hadoop cluster lt 500 secondsraquo MR slots are automatically determined based on hardware config

Isnrsquot it cool

Just define once

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsdatanodehandlercount dfsnamenodehandlercount

dfsdatanodemaxtransferthreads dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alerts gtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted TTrsquos

with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page httpltJT-hostnamegt50030scheduleradvanced httpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

Multi Tenancy

raquo create separate Queuesraquo Enable ACLrsquos for queuesraquo limit no of jobs per user and per queueraquo set pre-emption timeouts based on priorityraquo set weight based on priority

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Planraquo Near real time reporting over 15+ TB of daily dataraquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Quer

ies

Amazon Backup

LSV Data Clust

er

USEUHK Serving Clusters

Research

Ad-

hoc

Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 22: Big data summit

Proprietary amp Confidential Copyright copy 2014

No issues when cluster is small Problems starts when it grows

Performance Tuning

Proprietary amp Confidential Copyright copy 2014

dfsdatanodehandlercount dfsnamenodehandlercount

dfsdatanodemaxtransferthreads dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alerts gtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted TTrsquos

with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page httpltJT-hostnamegt50030scheduleradvanced httpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

Multi Tenancy

raquo create separate Queuesraquo Enable ACLrsquos for queuesraquo limit no of jobs per user and per queueraquo set pre-emption timeouts based on priorityraquo set weight based on priority

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Planraquo Near real time reporting over 15+ TB of daily dataraquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Quer

ies

Amazon Backup

LSV Data Clust

er

USEUHK Serving Clusters

Research

Ad-

hoc

Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 23: Big data summit

Proprietary amp Confidential Copyright copy 2014

dfsdatanodehandlercount dfsnamenodehandlercount

dfsdatanodemaxtransferthreads dfsimagetransfertimeout

mapredreduceparallelcopies

mapredjobtrackerhandlercount

iosortmbiosortfactor

maxClientCnxns ZK

HDFS

MR

IMP MAPREDUCE-2026

-XX+UseConcMarkSweepGC

-XXCMSFullGCsBeforeCompaction=1

-XXCMSInitiatingOccupancyFraction=60

ha-timeoutms

JVM

Performance Tuning

mapreducereduceshuffleparallelcopies

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alerts gtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted TTrsquos

with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page httpltJT-hostnamegt50030scheduleradvanced httpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

Multi Tenancy

raquo create separate Queuesraquo Enable ACLrsquos for queuesraquo limit no of jobs per user and per queueraquo set pre-emption timeouts based on priorityraquo set weight based on priority

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Planraquo Near real time reporting over 15+ TB of daily dataraquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Quer

ies

Amazon Backup

LSV Data Clust

er

USEUHK Serving Clusters

Research

Ad-

hoc

Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 24: Big data summit

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alerts gtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted TTrsquos

with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page httpltJT-hostnamegt50030scheduleradvanced httpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

Multi Tenancy

raquo create separate Queuesraquo Enable ACLrsquos for queuesraquo limit no of jobs per user and per queueraquo set pre-emption timeouts based on priorityraquo set weight based on priority

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Planraquo Near real time reporting over 15+ TB of daily dataraquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Quer

ies

Amazon Backup

LSV Data Clust

er

USEUHK Serving Clusters

Research

Ad-

hoc

Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 25: Big data summit

Proprietary amp Confidential Copyright copy 2014

Monitoring

Wall of Ops

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alerts gtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted TTrsquos

with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page httpltJT-hostnamegt50030scheduleradvanced httpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

Multi Tenancy

raquo create separate Queuesraquo Enable ACLrsquos for queuesraquo limit no of jobs per user and per queueraquo set pre-emption timeouts based on priorityraquo set weight based on priority

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Planraquo Near real time reporting over 15+ TB of daily dataraquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Quer

ies

Amazon Backup

LSV Data Clust

er

USEUHK Serving Clusters

Research

Ad-

hoc

Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 26: Big data summit

Proprietary amp Confidential Copyright copy 2014

Monitoring

hadoopnamenodeCallQueueLength hadoopjobtrackerjvmmemheapusedm

Donrsquot fly blind you will crash

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alerts gtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted TTrsquos

with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page httpltJT-hostnamegt50030scheduleradvanced httpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

Multi Tenancy

raquo create separate Queuesraquo Enable ACLrsquos for queuesraquo limit no of jobs per user and per queueraquo set pre-emption timeouts based on priorityraquo set weight based on priority

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Planraquo Near real time reporting over 15+ TB of daily dataraquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Quer

ies

Amazon Backup

LSV Data Clust

er

USEUHK Serving Clusters

Research

Ad-

hoc

Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 27: Big data summit

Proprietary amp Confidential Copyright copy 2014

MR Workload Monitoring

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alerts gtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted TTrsquos

with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page httpltJT-hostnamegt50030scheduleradvanced httpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

Multi Tenancy

raquo create separate Queuesraquo Enable ACLrsquos for queuesraquo limit no of jobs per user and per queueraquo set pre-emption timeouts based on priorityraquo set weight based on priority

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Planraquo Near real time reporting over 15+ TB of daily dataraquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Quer

ies

Amazon Backup

LSV Data Clust

er

USEUHK Serving Clusters

Research

Ad-

hoc

Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 28: Big data summit

Proprietary amp Confidential Copyright copy 2014

Network Monitoring

Donrsquot blame network instead monitor it Network Mesh can be mess

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alerts gtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted TTrsquos

with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page httpltJT-hostnamegt50030scheduleradvanced httpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

Multi Tenancy

raquo create separate Queuesraquo Enable ACLrsquos for queuesraquo limit no of jobs per user and per queueraquo set pre-emption timeouts based on priorityraquo set weight based on priority

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Planraquo Near real time reporting over 15+ TB of daily dataraquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Quer

ies

Amazon Backup

LSV Data Clust

er

USEUHK Serving Clusters

Research

Ad-

hoc

Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 29: Big data summit

Proprietary amp Confidential Copyright copy 2014

Alerting

Monitoring is not enough need better Alerting

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alerts gtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted TTrsquos

with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page httpltJT-hostnamegt50030scheduleradvanced httpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

Multi Tenancy

raquo create separate Queuesraquo Enable ACLrsquos for queuesraquo limit no of jobs per user and per queueraquo set pre-emption timeouts based on priorityraquo set weight based on priority

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Planraquo Near real time reporting over 15+ TB of daily dataraquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Quer

ies

Amazon Backup

LSV Data Clust

er

USEUHK Serving Clusters

Research

Ad-

hoc

Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 30: Big data summit

Proprietary amp Confidential Copyright copy 2014

Alerts

httphostnameportjmx

qry=Hadoopservice=NameNodename=NameNodeInfo

gtgt Checking whether NN and JT are up is a no brainer gtgt Reduce alert noise by having summaryaggregate alerts gtgt We heavily rely on custom scripts that query jmx for NN and JT

qry=hadoopservice=JobTrackername=JobTrackerInfo

NameDirStatuses DeadNodes NumberOfMissingBlocks

qry=Hadoopservice=NameNodename=FSNamesystemState

FSState CapacityRemaining NumDeadDataNodes UnderReplicatedBlocks

Blacklisted TTrsquos jobs slots_used ThreadCount

qry=javalangtype=Memory

Used jvm free jvm etc

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted TTrsquos

with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page httpltJT-hostnamegt50030scheduleradvanced httpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

Multi Tenancy

raquo create separate Queuesraquo Enable ACLrsquos for queuesraquo limit no of jobs per user and per queueraquo set pre-emption timeouts based on priorityraquo set weight based on priority

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Planraquo Near real time reporting over 15+ TB of daily dataraquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Quer

ies

Amazon Backup

LSV Data Clust

er

USEUHK Serving Clusters

Research

Ad-

hoc

Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 31: Big data summit

Proprietary amp Confidential Copyright copy 2014

MR Workload Alerting

raquo Monitoring MR workload and alertndash In-house tool that use ldquohoudahrdquo ruby gem monitorsndash Long running jobs jobs with more map tasks blacklisted TTrsquos

with more failure counts etchellip

raquo Collect details and auto-restart blacklisted TTrsquosraquo Parse the JT logfile for rouge jobsraquo Parse the JT log and collects all Job related inforaquo White-elephant or hraven could helpraquo Parse the scheduler html page or use metrics page httpltJT-hostnamegt50030scheduleradvanced httpltJT-hostnamegt50030metrics

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

Multi Tenancy

raquo create separate Queuesraquo Enable ACLrsquos for queuesraquo limit no of jobs per user and per queueraquo set pre-emption timeouts based on priorityraquo set weight based on priority

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Planraquo Near real time reporting over 15+ TB of daily dataraquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Quer

ies

Amazon Backup

LSV Data Clust

er

USEUHK Serving Clusters

Research

Ad-

hoc

Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 32: Big data summit

Proprietary amp Confidential Copyright copy 2014

Modeling

OPS

ETL

Ad-hoc

Multi Tenancy

Proprietary amp Confidential Copyright copy 2014

Multi Tenancy

raquo create separate Queuesraquo Enable ACLrsquos for queuesraquo limit no of jobs per user and per queueraquo set pre-emption timeouts based on priorityraquo set weight based on priority

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Planraquo Near real time reporting over 15+ TB of daily dataraquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Quer

ies

Amazon Backup

LSV Data Clust

er

USEUHK Serving Clusters

Research

Ad-

hoc

Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 33: Big data summit

Proprietary amp Confidential Copyright copy 2014

Multi Tenancy

raquo create separate Queuesraquo Enable ACLrsquos for queuesraquo limit no of jobs per user and per queueraquo set pre-emption timeouts based on priorityraquo set weight based on priority

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Planraquo Near real time reporting over 15+ TB of daily dataraquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Quer

ies

Amazon Backup

LSV Data Clust

er

USEUHK Serving Clusters

Research

Ad-

hoc

Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 34: Big data summit

Proprietary amp Confidential Copyright copy 2014

No Scheduler is perfect unless you understand and tune it properly

Scheduling

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Planraquo Near real time reporting over 15+ TB of daily dataraquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Quer

ies

Amazon Backup

LSV Data Clust

er

USEUHK Serving Clusters

Research

Ad-

hoc

Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 35: Big data summit

Proprietary amp Confidential Copyright copy 2014

Operations

raquo Maintenanceraquo Performance Tuningraquo Monitoringraquo BCPraquo YARN

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Planraquo Near real time reporting over 15+ TB of daily dataraquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Quer

ies

Amazon Backup

LSV Data Clust

er

USEUHK Serving Clusters

Research

Ad-

hoc

Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 36: Big data summit

Proprietary amp Confidential Copyright copy 2014

BCP

raquo BCP Business Continuity Planraquo Near real time reporting over 15+ TB of daily dataraquo Freshness of models trained over petabytes of data

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Quer

ies

Amazon Backup

LSV Data Clust

er

USEUHK Serving Clusters

Research

Ad-

hoc

Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 37: Big data summit

Proprietary amp Confidential Copyright copy 2014

Data BCP Cluster

INW Data Cluster

US Serving Clusters

EU Serving Clusters

HK Serving Clusters

Modeling

Reporting

User Quer

ies

Amazon Backup

LSV Data Clust

er

USEUHK Serving Clusters

Research

Ad-

hoc

Queries

Processed Data

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 38: Big data summit

Proprietary amp Confidential Copyright copy 2014

YARN

JobTracker

raquo Resource Manager - Global resource scheduler - Hierarchical queues - Application management

raquo Node Manager - Per-machine agent - Manages life cycle of container - Container resource monitoring

raquo Application Master - Per-application - Manages application scheduling and task execution

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 39: Big data summit

Proprietary amp Confidential Copyright copy 2014

YARN at Rocket FueI

raquo Yarn in production raquo 1000+ nodesraquo 51TB RAM 123K disks 123K cores raquo Primary use case Map-Reduceraquo HBase on Yarnraquo Tez Spark Storm are in race

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 40: Big data summit

Proprietary amp Confidential Copyright copy 2014

We Are Hiring

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41
Page 41: Big data summit

Proprietary amp Confidential Copyright copy 2014

THANKS

kishorerocketfuelcom

  • Hadoop Operations Rocket Fuel
  • The Web Is Monetized By Advertising
  • Delivery Methods
  • Overview
  • Always buying the best impressions amp serving the best ad
  • Real Time Bidding and Serving
  • Overview (2)
  • Throughput
  • Latency
  • Architecture and Scale
  • Data Center Expansion
  • Data Center Design
  • Rocket Fuel Scale
  • Hadoop at Rocket Fuel
  • Growth
  • Data Architecture 30
  • Batch and Real Time Pipelines
  • Hadoop Setup
  • Operations
  • Maintenance is Not Easy
  • Puppet and Infradb
  • Performance Tuning
  • Performance Tuning (2)
  • Operations (2)
  • Monitoring
  • Monitoring (2)
  • MR Workload Monitoring
  • Network Monitoring
  • Alerting
  • Alerts
  • MR Workload Alerting
  • Multi Tenancy
  • Multi Tenancy (2)
  • Scheduling
  • Operations (3)
  • BCP
  • Data BCP Cluster
  • YARN
  • YARN at Rocket FueI
  • We Are Hiring
  • Slide 41