23
ACHIEVING OPERATIONAL EXCELLENCE WITH HIVE AND MAPREDUCE

5 Best Practices to Achieve Operational Excellence with Hive and MapReduce

Embed Size (px)

Citation preview

ACHIEVING OPERATIONAL EXCELLENCE WITH

HIVE AND MAPREDUCE

Confidential

CHALLENGES

2

Heterogeneous Application Environments Cluster Performance Monitoring Application Performance Monitor

Production Hadoop Environments Contain a Variety of Application Technologies

Confidential

CHALLENGES

3

Application Performance MonitorCluster Performance MonitoringHeterogeneous Application Environments

Cluster Monitoring Products Do Not Provide Application Insight

Confidential

CHALLENGES

4

Cluster Performance Monitoring Application Performance MonitorHeterogeneous Application Environments

Existing Tools Offer Limited Value for Monitoring Application PerformanceLeaving us blind to business context, priority, ownership and performance of our data applications

Confidential

PERFORMANCE MONITORING & VISIBILITY

5

Enterprise Scale Monitoring and Management for Big Data Apps

Business & Operational Context

Data & TechnologyConnecting Business and Data

Confidential

4

5

3

5 BEST PRACTICES TO ACHIEVE OPERATIONAL EXCELLENCE

6

Visibility

1Performance monitoring and visibility into all of your big data applications• Increase the quality and efficiency of your deployments with a single integrated view of your

data applications and real-time performance metrics across all environments.

Segmenting users, applications and environments• Quickly understand what is happening, where and by whom in ways that are meaningful and

aligned to how your business operates.

Identify performance issues, bottleneck and noncompliant applications and queries• Spend less time wading through Hadoop logs, ResourceManager and source code to find

issues with your data pipelines. Instead, use that time optimizing your environment.

Add business context to better monitor your applications• Immediately understand the business impact of an issue, including the downstream

implications, so you can rapidly take the right corrective action.

Collaborate across teams to resolve issues faster• Collaboration between all roles that interact with an application, data scientists, developers

and operations, the quality and efficiency of your application increases.

2

Confidential

PERFORMANCE MONITORING & VISIBILITY

7

Pinpoint bottlenecks and identify causes

Monitor current executions and performance

Comprehensive view of all your data processing execution Fully visualize your entire data pipeline

Immediately understand the status of all your data applications

See all successful, failed, pending processes…

Confidential

PERFORMANCE MONITORING & VISIBILITY

8

Fully visualize your queries and data pipelinesComprehensive view of all your data processing executions

RESULTS

JOIN OPERATIONS

SOURCE SINK

SURFACE HQL

Confidential

1

2

4

5 BEST PRACTICES TO ACHIEVE OPERATIONAL READINESS

9

Segmentation

2

Performance monitoring and visibility into all of your big data applications• Increase the quality and efficiency of your deployments with a single integrated view of your

data applications and real-time performance metrics across all environments.

Segmenting users, applications and environments• Quickly understand what is happening, where and by whom in ways that are meaningful and

aligned to how your business operates.

Identify performance issues, bottleneck and noncompliant applications and queries• Spend less time wading through Hadoop logs, ResourceManager and source code to find

issues with your data pipelines. Instead, use that time optimizing your environment.

Add business context to better monitor your applications• Immediately understand the business impact of an issue, including the downstream

implications, so you can rapidly take the right corrective action.

Collaborate across teams to resolve issues faster• Collaboration between all roles that interact with an application, data scientists, developers

and operations, the quality and efficiency of your application increases.5

3

Confidential

SEGMENTATION

10

Pinpoint bottlenecks and identify causes

Signal Over Noise

Quickly find and filter what you are looking for and save as a custom view

Views can private, shared with a team, or made public

Quickly view application data by cluster, owner, technology etc

Confidential

5 BEST PRACTICES TO ACHIEVE OPERATIONAL READINESS

11

Identify Problems

3

Performance monitoring and visibility into all of your big data applications• Increase the quality and efficiency of your deployments with a single integrated view of your

data applications and real-time performance metrics across all environments.

Segmenting users, applications and environments• Quickly understand what is happening, where and by whom in ways that are meaningful and

aligned to how your business operates.

Identify performance issues, bottleneck and noncompliant applications and queries• Spend less time wading through Hadoop logs, ResourceManager and source code to find

issues with your data pipelines. Instead, use that time optimizing your environment.

Add business context to better monitor your applications• Immediately understand the business impact of an issue, including the downstream

implications, so you can rapidly take the right corrective action.

Collaborate across teams to resolve issues faster• Collaboration between all roles that interact with an application, data scientists, developers

and operations, the quality and efficiency of your application increases.

1

2

4

5

Confidential

QUICKLY DRILL DOWN TO EXPOSE ROOT CAUSE

12

Create JIRA issues with views and data for quickly collaborating to resolve performance problems

With one click, create a Jiraissue with a link to this view

Confidential

IDENTIFY BOTTLENECKS AND SLOWDOWNS

13

Pinpoint bottlenecks and identify causes

Pinpoint bottlenecks and identify causes

CHOOSE METRICSUNDERSTAND BEHAVIORS VISUALIZE SLOWDOWNSDRILL DOWN TO QUERY PERFORMANCE VIEW

Confidential

5 BEST PRACTICES TO ACHIEVE OPERATIONAL READINESS

14

Add Context

4

Performance monitoring and visibility into all of your big data applications• Increase the quality and efficiency of your deployments with a single integrated view of your

data applications and real-time performance metrics across all environments.

Segmenting users, applications and environments• Quickly understand what is happening, where and by whom in ways that are meaningful and

aligned to how your business operates.

Identify performance issues, bottleneck and noncompliant applications and queries• Spend less time wading through Hadoop logs, ResourceManager and source code to find

issues with your data pipelines. Instead, use that time optimizing your environment.

Add business context to better monitor your applications• Immediately understand the business impact of an issue, including the downstream

implications, so you can rapidly take the right corrective action.

Collaborate across teams to resolve issues faster• Collaboration between all roles that interact with an application, data scientists, developers

and operations, the quality and efficiency of your application increases.

1

2

3

5

Confidential

BUSINESS CONTEXT

15

Leverage metadata to align applications with their business context

View and sort by application metadata

Visualize executions and resource contention

Understand concurrency

Confidential16

SURFACE ALL FAILURESQuickly identify all failing applications

App NameOwnerOrganizationCluster A or BPrivacy LevelProduction or DevCustom TagsMore …

Not all problems are created equal

Confidential

5 BEST PRACTICES TO ACHIEVE OPERATIONAL READINESS

17

Collaborate

5

Performance monitoring and visibility into all of your big data applications• Increase the quality and efficiency of your deployments with a single integrated view of your

data applications and real-time performance metrics across all environments.

Segmenting users, applications and environments• Quickly understand what is happening, where and by whom in ways that are meaningful and

aligned to how your business operates.

Identify performance issues, bottleneck and noncompliant applications and queries• Spend less time wading through Hadoop logs, ResourceManager and source code to find

issues with your data pipelines. Instead, use that time optimizing your environment.

Add business context to better monitor your applications• Immediately understand the business impact of an issue, including the downstream

implications, so you can rapidly take the right corrective action.

Collaborate across teams to resolve issues faster• Collaboration between all roles that interact with an application, data scientists, developers

and operations, the quality and efficiency of your application increases.

1

2

3

4

Confidential

NURTURE A CULTURE OF OPERATIONAL EXCELLENCE

18

Ensure that business, development, IT operations can collaborate seamlessly when it matters

Confidential

LET’S TAKE A TOUR

For a walk-through of all the features of Driven,

Go to our Showcase interactive demo

http://showcase.driven.io

THANK YOU

APPENDIX

Confidential

End-to-end operational telemetry metadata for big data applicationsAccessible via Web browser, command-line interface (CLI), or simple search queriesEasy integrations through JMX and upcoming Driven SDK

… THROUGH A SCALABLE, SEARCHABLE METADATA STORE

Telemetry metadata(SSL)

YARN

HADOOP APPS AND INFRASTRUCTURE

APPLICATIONS

Plugin

22

HADOOP CLUSTERS

WAR

files Web App

Server

Server

Web CLI JMX

Web AppServer

SCALE OUT

SCALE OUT

Confidential

Commercial

Training Consulting

Community

Free community support through our mailing list and our online forumswww.cascading.org/support | forums.cascading.io

We offer short-term consulting engagements designed to help customers with mentoring and best practices

Developer training for Cascading and Scalding,

private training also availablewww.cascading.io/services/training/

Varying levels of technical support are available to support your production deploymentswww.cascading.io/services/support

Supporting our Customers & Community

SUPPORT OPTIONS

23