Jumbune optimize hadoop-solutions

1

Community 1.3.0(Optimize both Yarn & Non Yarn Hadoop clusters)

2

Agenda

• Big Data Trends

• What is Jumbune?

• Description of Components

3

Big Data Trends

Resource sharing/isolation frameworks: Yarn, Mesos,

etc.Shared cluster workers (resources)

Multiple Execution engines: MapReduce, Spark, Hama,

Storm, Giraph, etc.

Data ETLing from all possible sources to Data

Lake

4

Hadoop based solution life stages

(as on ground) – Cyclic execution

xxxxxx

Business User Data Analyst MapReduce DevLogic & Data Test

DevopsStaging DataProduction

Bad Logic?

Resource Utilization ?

Bad Data?

Monitoring Needs

5

5

Challenges in Analytical Solutions

1. No common

platform across

actors to detect root

causes

2. Incremental

imports may ingest

bad data

3. Cluster

resources are

shared and optimal

utilization is key

4. Implementing

models in custom

MR in initial

attempts is like

hitting bull’s eye

5. Bad Logic or Bad

data

6

Intersecting solution Lifecycle Stages

xxxxxx

Solution

Development Quality Test

DevopsBulk & Incremental

Data

7

Jumbune

Flow AnalyzerData Validation Cluster Monitor Job Profiler

“A catalyst to accelerate realization of analytical solutions”

8

Niche offerings

• In depth code level analysis of cluster wide flow

• Record level data violation reports.

• No deployment on Workers - Ultra light agent installation on Hadoop master

only

• Ability to turn on/off cluster monitoring at will – lessens resource load

• Customizable rack aware monitoring

• Correlated profiling analysis of phases, throughput and resource consumption

• Ability to work across all Hadoop Distributions

9

Components - Recommended Environments

Dev

• Flow Debugger

• Data Validation

• MR Job Profiler

QA

• Data Validation

Stage + Perf

• MR Job Profiler

Prod

• Cluster Monitoring

• Data Validation

10

Supported Deployments

Jumbune

Azure, EC2

All major distributions

On Premise

http://www.google.co.in/url?sa=i&rct=j&q=&esrc=s&frm=1&source=images&cd=&cad=rja&docid=qsaQvrP7kbQZNM&tbnid=RTXeyu46LINYRM:&ved=0CAUQjRw&url=http://www.mixbook.com/photo-books/all/classroom-etiquette-7561431&ei=E32MUt30M46qrAft1oA4&bvm=bv.56643336,d.dGI&psig=AFQjCNE99PlXVU2ZakUdslRhFleG7JhKeg&ust=1385025053046962

http://www.google.co.in/url?sa=i&rct=j&q=&esrc=s&frm=1&source=images&cd=&cad=rja&docid=qsaQvrP7kbQZNM&tbnid=RTXeyu46LINYRM:&ved=0CAUQjRw&url=http://www.mixbook.com/photo-books/all/classroom-etiquette-7561431&ei=E32MUt30M46qrAft1oA4&bvm=bv.56643336,d.dGI&psig=AFQjCNE99PlXVU2ZakUdslRhFleG7JhKeg&ust=1385025053046962

11

MapReduce Flow Debugger

• Verifies the flow of input records in user’s map reduce implementation

• Drill down visualization helps developer to quickly identify the problem.

• Only tool to assist developers to figure out MapReduce implementation

faults without any extra coding

12

Data Validator

• Validates inconsistencies in data in the form of :

– Null checks

– Data type checks

– Regular expression checks

• Generic way of specifying validation rules

• Provides record level report for found anomalies

• Currently supports HDFS as the lake file system

13

MR Job Profiling

• Per Job Phase wise

– performance for each JVM

– data flow rate

– Resource usage

• Per Job Heap sites for Mapper & Reducer

• Per Job CPU cycles for Mapper & Reducer

14

Hadoop Cluster Monitoring

• Data Centre & Rack aware nodes view of Yarn and Non Yarn Daemons

• Dynamic Interval based monitoring

• Hadoop JMX, Node Resource Statistics

• Per file, node wise replica Placement (which nodes have replicas of a given

file ?)

• HDFS data placement view (HDFS balanced ?)

15

How we are building Jumbune?

http://jumbune.org/jira/secure/Dashboard.jspa

http://jumbune.org/jira/secure/Dashboard.jspa

https://github.com/impetus-opensource/jumbune

https://github.com/impetus-opensource/jumbune

https://travis-ci.org/impetus-opensource/jumbune

https://travis-ci.org/impetus-opensource/jumbune

http://www.jumbune.org/

http://www.jumbune.org/

https://bintray.com/jumbune

https://bintray.com/jumbune

16

Let’s Collaborate

Website

• http://jumbune.org

Contribute

• http://github.com/impetus-opensource/jumbune

• http://jumbune.org/jira/JUM

Social

• Follow @jumbune Use #jumbune

• Jumbune Group: http://linkd.in/1mUmcYm

Forums

• Users: [email protected]

• Dev: [email protected]

• Issues: [email protected]

Downloads

• http://jumbune.org

• https://bintray.com/jumbune/downloads/jumbune

https://bintray.com/jumbune/downloads/jumbune

17

Thanks

Documents

Jumbune optimize hadoop-solutions