26
Why is my Hadoop* job slow? Bikas Saha @bikassaha *Apache Hadoop, Falcon, Atlas, Tez, Sqoop, Flume, Kafka, Pig, Hive, HBase, Accumulo, Storm, Solr, Spark, Ranger, Knox, Ambari, ZooKeeper, Oozie, Zeppelin and the Hadoop elephant logo are trademarks of the Apache Software Foundation. Hitesh Shah

Why is my Hadoop cluster slow?

Embed Size (px)

Citation preview

Page 1: Why is my Hadoop cluster slow?

Why is my Hadoop* job slow?

Bikas Saha@bikassaha

*Apache Hadoop, Falcon, Atlas, Tez, Sqoop, Flume, Kafka, Pig, Hive, HBase, Accumulo, Storm, Solr, Spark, Ranger, Knox, Ambari, ZooKeeper, Oozie, Zeppelin and the Hadoop elephant logo are trademarks of the Apache Software Foundation.

Hitesh Shah

Page 2: Why is my Hadoop cluster slow?

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Agenda

Metrics and Monitoring

Logging and Correlation

Tracing and Analysis

Page 3: Why is my Hadoop cluster slow?

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Metrics and Monitoring

Metrics as high level pointers Ambari Metrics System Ambari Grafana Integration HBase, HDFS, YARN Dashboards Metrics based alerting

Page 4: Why is my Hadoop cluster slow?

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Metrics as high level pointers

Machine level metrics like CPU load Application level metrics like HDFS counters Metrics at point of time Metrics anomalies along a time series Correlated anomalies Problem is to need to know what to look for

Page 5: Why is my Hadoop cluster slow?

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ambari Metrics Service - Motivation

Limited Ganglia capabilities OpenTSDB – GPL license and needs a Hadoop cluster Need service level aggregation as well as time based Alerts based on metrics system Ability to scale past a 1000 nodes Ability to perform analytics based on a use case Allow fine grained control over aspects like: retention, collection intervals, aggregation Pluggable and Extensible

First version released with Ambari 2.0.0

Page 6: Why is my Hadoop cluster slow?

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ambari Grafana Integration Open source dashboard builder integrated with AMS. Available from Ambari-2.2.2 Pre-defined host level and service level (HDFS, HBase, Yarn etc) dashboards. Added to Ambari through API after upgrade

Page 7: Why is my Hadoop cluster slow?

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

HBase Dashboard

Page 8: Why is my Hadoop cluster slow?

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

HDFS Dashboard

Page 9: Why is my Hadoop cluster slow?

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

YARN Dashboard

Page 10: Why is my Hadoop cluster slow?

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Metrics based Alerting

Top N support to quickly identify potential offenders Alerting based on time series

Page 11: Why is my Hadoop cluster slow?

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Agenda

Metrics and Monitoring

Logging and Correlation

Tracing and Analysis

Page 12: Why is my Hadoop cluster slow?

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Logging and Correlation

HDFS, YARN Audit logs Caller Context YARN Application Timeline Service Lineage tracking of operations across workloads Ambari Log Search

Page 13: Why is my Hadoop cluster slow?

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

HDFS Audit Logs and Caller ContextFSNamesystem.audit: allowed=true ugi=userA (auth:SIMPLE) ip=/172.22.68.32 cmd=create src=/tmp/in/_temporary/1/_temporary/attempt_14644848874070_0009_m_009995_0/part-m-09995 dst=null perm=root:hdfs:rw-r--r-- proto=rpc callerContext=tez_ta:attempt_1464484887407_0009_1_00_009995_0

FSNamesystem.audit: allowed=true ugi=userA (auth:SIMPLE) ip=/172.22.68.33 cmd=create src=/tmp/in2/_temporary/1/_temporary/attempt_1464484887407_0011_m_000097_0/part-m-00097 dst=null perm=root:hdfs:rw-r--r-- proto=rpc callerContext=mr_attempt_1464484887407_0011_m_000097_0

FSNamesystem.audit: allowed=true ugi=userB (auth:SIMPLE) ip=/172.22.68.34 cmd=create src=/tmp/in2/_temporary/1/_temporary/attempt_1464484887407_0011_m_000095_0/part-m-00095 dst=null perm=root:hdfs:rw-r--r-- proto=rpc callerContext=mr_attempt_1464484887407_0011_m_000095_0

Page 14: Why is my Hadoop cluster slow?

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

ResourceManager Audit Logs and Caller Contextresourcemanager.RMAuditLogger: USER=userA IP=172.22.68.32 OPERATION=Submit Application Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1464484887407_0001

CALLERCONTEXT=PIG-pigSmoke.sh-8a052588-0013-4e39-83b1-ebad699d8e2e

resourcemanager.RMAuditLogger: USER=userA IP=172.22.68.30 OPERATION=Submit Application Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1464484887407_0009CALLERCONTEXT=CLI

resourcemanager.RMAuditLogger: USER=userB IP=172.22.68.34 OPERATION=Submit Application Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1464484887407_0008CALLERCONTEXT=mr_attempt_1464484887407_0007_m_000000_0

resourcemanager.RMAuditLogger: USER=userB IP=172.22.68.30 OPERATION=Submit Application Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1464484887407_0012CALLERCONTEXT=HIVE_SSN_ID:f3aadf99-9e36-494b-84a1-99b685ac344b

Page 15: Why is my Hadoop cluster slow?

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

YARN Application Timeline Service

YARN service for fine grained application level tracing Enables complex metadata to be recorded as the YARN app makes progress Allows retrieval of this timeline data based on filters Can be used to drive limited online analytics and extensive post-hoc analysis

Page 16: Why is my Hadoop cluster slow?

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Lineage Tracking using YARN Timeline Timeline:8188/ws/v1/timeline/TEZ_DAG_ID/dag_1464484887407_0013_1

dagContext: { callerId: "root_20160529021115_006f8007-5840-4c64-9970-c1b506f68db2", callerType: "HIVE_QUERY_ID", context: "HIVE", description: "select user, count(visit_id) as visits from users group by user order by

visits” }

Timeline:8188/ws/v1/timeline/HIVE_QUERY_ID/root_20160529021115_006f8007-5840-4c64-9970-c1b506f68db2

hiveContext: { callerId: “workflow_abcd", callerType: “OOZIE_ID", context: “OOZIE", description: “Daily ETL Summary Job” }

Page 17: Why is my Hadoop cluster slow?

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ambari Log Search

Page 18: Why is my Hadoop cluster slow?

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ambari Log Search

Page 19: Why is my Hadoop cluster slow?

19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Agenda

Metrics and Monitoring

Logging and Correlation

Tracing and Analysis

Page 20: Why is my Hadoop cluster slow?

20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Tracing and Analysis

Use Big Data methods to solve Big Data problems Apache Zeppelin as analytical tool Hive/Tez/YARN notebook for analysis

Page 21: Why is my Hadoop cluster slow?

21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Zeppelin for Ad-hoc Analytics

Page 22: Why is my Hadoop cluster slow?

22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

YARN Analyzer

Page 23: Why is my Hadoop cluster slow?

23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Tez Analyzer

Page 24: Why is my Hadoop cluster slow?

24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Tez Analyzer

Page 25: Why is my Hadoop cluster slow?

25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Tez Analyzer

Page 26: Why is my Hadoop cluster slow?

26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Thank You