65
Extending Hadoop for Fun & Profit Milind Bhandarkar Chief Scientist, Pivotal Software, (Twitter: @techmilind)

Extending Hadoop for Fun & Profit

  • View
    2.169

  • Download
    1

Embed Size (px)

DESCRIPTION

Apache Hadoop project, and the Hadoop ecosystem has been designed be extremely flexible, and extensible. HDFS, Yarn, and MapReduce combined have more that 1000 configuration parameters that allow users to tune performance of Hadoop applications, and more importantly, extend Hadoop with application-specific functionality, without having to modify any of the core Hadoop code. In this talk, I will start with simple extensions, such as writing a new InputFormat to efficiently process video files. I will provide with some extensions that boost application performance, such as optimized compression codecs, and pluggable shuffle implementations. With refactoring of MapReduce framework, and emergence of YARN, as a generic resource manager for Hadoop, one can extend Hadoop further by implementing new computation paradigms. I will discuss one such computation framework, that allows Message Passing applications to run in the Hadoop cluster alongside MapReduce. I will conclude by outlining some of our ongoing work, that extends HDFS, by removing namespace limitations of the current Namenode implementation.

Citation preview

Page 1: Extending Hadoop for Fun & Profit

Extending Hadoop for Fun & Profit

Milind Bhandarkar Chief Scientist, Pivotal Software,

(Twitter : @techmilind)

Page 2: Extending Hadoop for Fun & Profit

About Me• http://www.linkedin.com/in/milindb

• Founding member of Hadoop team at Yahoo! [2005-2010]

• Contributor to Apache Hadoop since v0.1

• Built and led Grid Solutions Team at Yahoo! [2007-2010]

• Parallel Programming Paradigms [1989-today] (PhD cs.illinois.edu)

• Center for Development of Advanced Computing (C-DAC), National Center for Supercomputing Applications (NCSA), Center for Simulation of Advanced Rockets, Siebel Systems (acquired by Oracle), Pathscale Inc. (acquired by QLogic), Yahoo!, LinkedIn, and Pivotal (formerly Greenplum)

Page 3: Extending Hadoop for Fun & Profit

Agenda• Extending MapReduce

• Functionality

• Performance

• Beyond MapReduce with YARN

• Hamster & GraphLab

• Extending HDFS

• Q & A

Page 4: Extending Hadoop for Fun & Profit

Extending MapReduce

Page 5: Extending Hadoop for Fun & Profit

MapReduce Overview

• Record = (Key, Value)

• Key : Comparable, Serializable

• Value: Serializable

• Logical Phases: Input, Map, Shuffle, Reduce, Output

Page 6: Extending Hadoop for Fun & Profit

Map

• Input: (Key1, Value1)

• Output: List(Key2, Value2)

• Projections, Filtering, Transformation

Page 7: Extending Hadoop for Fun & Profit

Shuffle

• Input: List(Key2, Value2)

• Output

• Sort(Partition(List(Key2, List(Value2))))

• Provided by Hadoop : Several Customizations Possible

Page 8: Extending Hadoop for Fun & Profit

Reduce

• Input: List(Key2, List(Value2))

• Output: List(Key3, Value3)

• Aggregations

Page 9: Extending Hadoop for Fun & Profit

MapReduce DataFlow

Page 10: Extending Hadoop for Fun & Profit

Configuration• Unified Mechanism for

• Configuring Daemons

• Runtime environment for Jobs/Tasks

• Defaults: *-default.xml

• Site-Specific: *-site.xml

• final parameters

Page 11: Extending Hadoop for Fun & Profit

<configuration> <property> <name>mapred.job.tracker</name> <value>head.server.node.com:9001</value> </property> <property> <name>fs.default.name</name> <value>hdfs://head.server.node.com:9000</value> </property> <property> <name>mapred.child.java.opts</name> <value>-Xmx512m</value> <final>true</final> </property>....</configuration>

Example

Page 12: Extending Hadoop for Fun & Profit

Extending Input Phase• Convert ByteStream to List(Key, Value)

• Several Formats pre-packaged

• TextInputFormat<long, Text>!

• SequenceFileInputFormat<K,V>!

• KeyValueTextInputFormat<Text,Text>!

• Specify InputFormat for each job

• JobConf.setInputFormat()

Page 13: Extending Hadoop for Fun & Profit

InputFormat

• getSplits() : From Input descriptors, get Input Splits, such that each Split can be processed independently

•<FileName, startOffset, length>!

• getRecordReader() : From an InputSplit, get list of Records

Page 14: Extending Hadoop for Fun & Profit

Industry Use Case !

Surveillance Video Anomaly Detection

Page 15: Extending Hadoop for Fun & Profit

Acknowledgements

• Victor Fang

• Regu Radhakrishnan

• Derek Lin

• Sameer Tiwari

Page 16: Extending Hadoop for Fun & Profit

Anomaly Detection in Surveillance Video

• Detect anomalous objects in a restricted perimeter

• Typical large enterprise collects TB’s video per day

• Hadoop MapReduce runs computer vision algorithms in parallel and captures violation events

• Post-Incident monitoring enabled by Interactive Query

Page 17: Extending Hadoop for Fun & Profit

Video DataFlow

• Timestamped Video Files as input

• Distributed Video Transcoding : ETL in Hadoop

• Distributed Video Analytics in Hadoop/HAWQ

• Insights in relational DB

Page 18: Extending Hadoop for Fun & Profit

Real World Video Data

• Benchmark Surveillance videos from UK Home Office (iLids)

• CCTV Video footage depicting scenarios central to Govt requirements

Page 19: Extending Hadoop for Fun & Profit

Common Video Standards

• MPEG & ITU responsible for most video standards

• MPEG-2 (1995) Widely adopted in DVDs, TV, Set Top boxes

Page 20: Extending Hadoop for Fun & Profit

MPEG Standard Format

• Sequence of encoded video frames

• Compression by eliminating:

• Redundancy in Time: Inter-Frame Encoding

• Redundancy in Space: Intra-Frame Encoding

Page 21: Extending Hadoop for Fun & Profit

Motion Compensation

• I-Frame: Intra-Frame encoding

• P-Frame: Predicated frame from previous frame

• B-Frame: Predicted frame from both previous & next frame

Page 22: Extending Hadoop for Fun & Profit

Distributed MPEG Decoding

• HDFS splits large files in 64 MB/128 MB blocks

• Each HDFS block can be processed independently by a Map task

• Can we decode individual video frames from an arbitrary HDFS block in an MPEG File ?

Page 23: Extending Hadoop for Fun & Profit

Splitting MPEG-2

• Header Information available only once per file

• Group of Pictures (GOP) header repeats

• Each GOP starts with an I-Frame and ends with an I-Frame

• Each GOP can be decoded independently

• First and last GOP may straddle HDFS blocks

Page 24: Extending Hadoop for Fun & Profit

MPEG2InputFormat

• Derived from FileInputFormat

• getSplits() : Identical to FileInputFormat

• InputSplit = HDFS Block

•getRecordReader()!

•MPEG2RecordReader

Page 25: Extending Hadoop for Fun & Profit

MPEG2RecordReader

• Start from beginning of block

• Search for the first GOP Header

• Locate an I-Frame, decode, keep in memory

• If P-Frame, decode using last frame

• If B-Frame, keep current frame in memory, read next frame, decode current frame

Page 26: Extending Hadoop for Fun & Profit

Considerations for Input Format

• Use as little metadata as possible

• Number of Splits = Number of Map Tasks

• Combine small files

• Split determination happens in a single process, so should be metadata-based

• Affects scalability of MapReduce

Page 27: Extending Hadoop for Fun & Profit

Scalability

• If one node processes k MB/s, then N nodes should process (k*N) MB/s

• If some fixed amount of data is processed in T minutes on one node, the N nodes should process same data in (T/N) minutes

• Linear Scalability

Page 28: Extending Hadoop for Fun & Profit

Reduce LatencyMinimize Job Execution time

Page 29: Extending Hadoop for Fun & Profit

Increase ThroughputMaximize amount of data processed per unit time

Page 30: Extending Hadoop for Fun & Profit

Amdahl’s Law

S = N1+!(N !1)

Page 31: Extending Hadoop for Fun & Profit

Multi-Phase Computations

• If computation C is split into N different parts, C1..CN

• If partial computation Ci can be speeded up by a factor of Si

Page 32: Extending Hadoop for Fun & Profit

Amdahl’s Law, Restated

S =

Cii=1

N

∑Ci

Sii=1

N

Page 33: Extending Hadoop for Fun & Profit

Amdahl’s Law• Suppose Job has 5 phases: P0 is 10 seconds, P1,

P2, P3 are 200 seconds each, and P4 is 10 seconds

• Sequential runtime = 620 seconds • P1, P2, P3 parallelized on 100 machines with

speedup of 80 (Each executes in 2.5 seconds)

• After parallelization, runtime = 27.5 seconds • Effective Speedup: (620s/27.5s) = 22.5

Page 34: Extending Hadoop for Fun & Profit

MapReduce Workflow

Page 35: Extending Hadoop for Fun & Profit

Extending Shuffle

Page 36: Extending Hadoop for Fun & Profit

Why Shuffle ?

• Often, the most expensive phase in MapReduce, involves slow disks and network

• Map tasks partition, sort and serialize outputs, and write to local disk

• Reduce tasks pull individual Map outputs over network, merge, and may spill to disk

Page 37: Extending Hadoop for Fun & Profit

Message Cost Model

T = α + Nβ

Page 38: Extending Hadoop for Fun & Profit

Message Granularity

• For Gigabit Ethernet

• α = 300 μS

• β = 100 MB/s

• 100 Messages of 10KB each = 40 ms

• 10 Messages of 100 KB each = 13 ms

Page 39: Extending Hadoop for Fun & Profit

Alpha-Beta• Common Mistake: Assuming that α is constant

• Scheduling latency for responder

• MR daemons time slice inversely proportional to number of concurrent tasks

• Common Mistake: Assuming that β is constant

• Network congestion

• TCP incast

Page 40: Extending Hadoop for Fun & Profit

Efficient Hardware Platforms

• Mellanox - Hadoop Acceleration through Network-assisted Merge

• RoCE - Brocade, Cisco, Extreme, Arista...

• SSD - Velobit, Violin, FusionIO, Samsung..

• Niche - Compression, Encryption...

Page 41: Extending Hadoop for Fun & Profit

Pluggable Shuffle & Sort• Replace HTTP-based pull with RDMA

• Avoid spilling altogether

• Replace default Sort implementation with Job-optimized sorting algorithm

• Experimental APIs

• google PluggableShuffleAndPluggableSort.html

Page 42: Extending Hadoop for Fun & Profit

Mellanox UDA

• Developed jointly with Auburn University

• 2x Performance on TeraSort

• Reduces disk writes by 45%, disk reads by 15%

Page 43: Extending Hadoop for Fun & Profit

Syncsort DMX-h

Page 44: Extending Hadoop for Fun & Profit

Beyond MapReduce with YARN

Page 45: Extending Hadoop for Fun & Profit

Single'App'

BATCH

HDFS

Single'App'

INTERACTIVE

Single'App'

BATCH

HDFS

Single'App'

BATCH

HDFS

Single'App'

ONLINE

Hadoop 1.0 (Image Courtesy Arun Murthy, Hortonworks)

Page 46: Extending Hadoop for Fun & Profit

MapReduce 1.0 (Image Courtesy Arun Murthy, Hortonworks)

Page 47: Extending Hadoop for Fun & Profit

Hadoop 2.0 (Image Courtesy Arun Murthy, Hortonworks)

HADOOP 1.0

HDFS%(redundant,*reliable*storage)*

MapReduce%(cluster*resource*management*

*&*data*processing)*

HDFS2%(redundant,*reliable*storage)*

YARN%(cluster*resource*management)*

Tez%(execu7on*engine)*

HADOOP 2.0

Pig%(data*flow)*

Hive%(sql)*

%Others%(cascading)*

*

Pig%(data*flow)*

Hive%(sql)*

%Others%(cascading)*

%

MR%(batch)*

RT%%Stream,%Graph%Storm,''Giraph'

*

Services%HBase'

*

Page 48: Extending Hadoop for Fun & Profit

Applica'ons+Run+Na'vely+IN+Hadoop+

HDFS2+(Redundant,*Reliable*Storage)*

YARN+(Cluster*Resource*Management)***

BATCH+(MapReduce)+

INTERACTIVE+(Tez)+

STREAMING+(Storm,+S4,…)+

GRAPH+(Giraph)+

INLMEMORY+(Spark)+

HPC+MPI+(OpenMPI)+

ONLINE+(HBase)+

OTHER+(Search)+(Weave…)+

YARN Platform (Image Courtesy Arun Murthy, Hortonworks)

Page 49: Extending Hadoop for Fun & Profit

NodeManager* NodeManager* NodeManager* NodeManager*

Container*1.1*

Container*2.4*

NodeManager* NodeManager* NodeManager* NodeManager*

NodeManager* NodeManager* NodeManager* NodeManager*

Container*1.2*

Container*1.3*

AM*1*

Container*2.2*

Container*2.1*

Container*2.3*

AM2*

Client2*

ResourceManager*

Scheduler*

YARN Architecture (Image Courtesy Arun Murthy, Hortonworks)

Page 50: Extending Hadoop for Fun & Profit

YARN

• Yet Another Resource Negotiator

• Resource Manager

• Node Managers

• Application Masters

• Specific to paradigm, e.g. MR Application master (aka JobTracker)

Page 51: Extending Hadoop for Fun & Profit

Beyond MapReduce

• Apache Giraph - BSP & Graph Processing

• Storm on Yarn - Streaming Computation

• HOYA - HBase on Yarn

• Hamster - MPI on Hadoop

• More to come ...

Page 52: Extending Hadoop for Fun & Profit

Hamster• Hadoop and MPI on the same

cluster

• OpenMPI Runtime on Hadoop YARN

• Hadoop Provides: Resource Scheduling, Process monitoring, Distributed File System

• Open MPI Provides: Process launching, Communication, I/O forwarding

Page 53: Extending Hadoop for Fun & Profit

Hamster Components

• Hamster Application Master

• Gang Scheduler, YARN Application Preemption

• Resource Isolation (lxc Containers)

• ORTE: Hamster Runtime

• Process launching, Wireup, Interconnect

Page 54: Extending Hadoop for Fun & Profit

Resource Manager

Scheduler

AMService

Node Manager Node Manager Node Manager …

Proc/Container

Framework Daemon NS MPI

Scheduler HNP

MPI AM

Proc/Container

… RM-AM

AM-NM

RM-NodeManager Client Client-RM

Aux Srvcs

Proc/Container

Framework Daemon NS

Proc/Container

Aux Srvcs RM-

NodeManager

Hamster Architecture

Page 55: Extending Hadoop for Fun & Profit

Hamster Scalability• Sufficient for small to medium HPC

workloads

• Job launch time gated by YARN resource scheduler

Launch WireUp Collectives

Monitor

OpenMPI O(logN) O(logN) O(logN) O(logN)

Hamster O(N) O(logN) O(logN) O(logN)

Page 56: Extending Hadoop for Fun & Profit

GraphLab + Hamster on Hadoop

!

Page 57: Extending Hadoop for Fun & Profit

About GraphLab

• Graph-based, High-Performance distributed computation framework

• Started by Prof. Carlos Guestrin in CMU in 2009

• Recently founded Graphlab Inc to commercialize Graphlab.org

Page 58: Extending Hadoop for Fun & Profit

GraphLab Features• Topic Modeling (e.g. LDA)

• Graph Analytics (Pagerank, Triangle counting)

• Clustering (K-Means)

• Collaborative Filtering

• Linear Solvers

• etc...

Page 59: Extending Hadoop for Fun & Profit

Only Graphs are not Enough

• Full Data processing workflow required ETL/Postprocessing, Visualization, Data Wrangling, Serving

• MapReduce excels at data wrangling

• OLTP/NoSQL Row-Based stores excel at Serving

• GraphLab should co-exist with other Hadoop frameworks

Page 60: Extending Hadoop for Fun & Profit

Coming Soon…

Page 61: Extending Hadoop for Fun & Profit

Extending HDFS

Page 62: Extending Hadoop for Fun & Profit

HCFS

• Hadoop Compatible File Systems

• FileSystem, FileContext

• S3, Local FS, webhdfs

• Azure Blob Storage, CassandraFS, Ceph, CleverSafe, Google Cloud Storage, Gluster, Lustre, QFS, EMC ViPR (more to come)

Page 63: Extending Hadoop for Fun & Profit

New Dataset

• Reuse Namenode and Datanode implementations

• Substitute a different DataSet implementation: FsDatasetSpi, FsVolumeSpi

• Jira: HDFS-5194

Page 64: Extending Hadoop for Fun & Profit

Extending Namenode

• Pluggable Namespace: HDFS-5324, HDFS-5389

• Pluggable Block Management: HDFS-5477

• Requires fine-grained locking in Namenode: HDFS-5453

Page 65: Extending Hadoop for Fun & Profit

Questions ?