11
In-Memory Accelerator for Hadoop™ www.gridgain.com #gridgain

October 2013 HUG: GridGain-In memory accelaration

Embed Size (px)

DESCRIPTION

As Apache Hadoop adoption continues to advance, customers are depending more and more on Hadoop for critical tasks, and deploying Hadoop for use cases with more real-time requirements. In this session, we will discuss the desired performance characteristics of such a deployment and the corresponding challenges. Leave with an understanding of how performance-sensitive deployments can be accelerated using In-Memory technologies that merge the Big Data capabilities of Hadoop with the unmatched performance of In-Memory data management.

Citation preview

Page 1: October 2013 HUG: GridGain-In memory accelaration

In-Memory Accelerator for Hadoop™

www.gridgain.com #gridgain

Page 2: October 2013 HUG: GridGain-In memory accelaration

Slide www.gridgain.com

Hadoop: Pros and Cons

2

What is Hadoop?> Hadoop is a batch system> HDFS - Hadoop Distributed File System> Data must be ETL-ed into HDFS> Parallel processing over data in HDFS> Hive, Pig, HBase, Mahout...> Most popular data warehouse

Pros:> Scales very well> Fault tolerant and resilient> Very active and rich eco-system> Process TBs/PBs in parallel fashion

Cons:> Batch oriented - real time not possible> Complex deployment> Significant execution overhead> HDFS is IO and network bound

Page 3: October 2013 HUG: GridGain-In memory accelaration

Slide www.gridgain.com

In-Memory Accelerator For Hadoop: Overview

3

Up To 100x Faster:1. In-Memory File System

100% compatible with HDFSBoost HDFS performance by removing IO overheadDual-mode: standalone or cachingBlend into Hadoop ecosystem

2. In-Memory MapReduceEliminate Hadoop MapReduce overheadAllow for embedded executionRecord-based

Page 4: October 2013 HUG: GridGain-In memory accelaration

Slide www.gridgain.com

GridGain: In-Memory Computing Platform

4

Page 5: October 2013 HUG: GridGain-In memory accelaration

Slide www.gridgain.com

In-Memory Accelerator For Hadoop: Details

> PnP IntegrationMinimal or zero code change

> Any Hadoop distroHadoop v1 and v2

> In-Memory File System100% compatible with HDFSDual-mode: no ETL needed, read/write-throughBlock-level caching & smart evictionAutomatic pre-fetchingBackground fragmentizerOn-heap and off-heap memory utilization

> In-Memory MapReduceIn-process co-located computations - access GGFS in-processEliminate unnecessary IPCEliminate long task startup timeEliminate mandatory sorting and re-shuffling on reduction

5

Page 6: October 2013 HUG: GridGain-In memory accelaration

Slide www.gridgain.com

GridGain Visor: Unified DevOps

6

HDFS Profiler

File Manager

Page 7: October 2013 HUG: GridGain-In memory accelaration

Slide www.gridgain.com

Benchmarks: GGFS vs HDFS

7

10 nodes cluster of Dell R610> Each has dual 8-core CPU> Ubuntu 12.4, Java 7> 10 GBE network> Stock Apache Hadoop 2.x

Page 8: October 2013 HUG: GridGain-In memory accelaration

Slide www.gridgain.com

Comparison: Hadoop Accelerator vs. Spark

8

> No ETL requiredAutomatic HDFS read-through and write-throughData is loaded on demand

> Per-block file cachingOnly hot data blocks are in memory

> Strong management capabilitiesGridGain Visor - Unified DevOps

> Requires data ETL-ed into SparkChanges to data do not get propagated to HDFSExplicit ETL step consumes time

> Needs to have full file loadedIf does not fit - gets offloaded to disk

> No management capabilities

Page 9: October 2013 HUG: GridGain-In memory accelaration

Slide www.gridgain.com

Customer Use Case: Task & Challenge

9

Task:> Real time search with MapReduce> Dataset size is 5TB > Writes 80%, reads 20%> Perceptual real time SLA (few seconds)

Challenge:> Hadoop MapReduce too slow (> 30 sec)> Data scanning slow due to constant IO> Overall job takes > 1 minute

Page 10: October 2013 HUG: GridGain-In memory accelaration

Slide www.gridgain.com

Customer Use Case: Solution

10

> Utilize existing serversStart GridGain data node on every server

> Only put highly utilized files in GGFSUser controlled caching

> In-Memory MapReduce over GGFSEmbedded processing

> Results under 3 seconds

Page 11: October 2013 HUG: GridGain-In memory accelaration

GridGain Systemswww.gridgain.com

1065 East Hillsdale Blvd, Suite 230Foster City, CA 94404

@gridgain