14
Steve Totman Director Of Strategy Syncsort March 20 th 2013 Making Hadoop Ready for Prime Time Hadoop Summit Amsterdam March 2013 Photo Credit Aaron Sikkink http://www.flickr.com/people/housequakecom/

Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort Lightening Talk

Embed Size (px)

Citation preview

Page 1: Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort Lightening Talk

Steve Totman

Director Of Strategy

Syncsort

March 20th 2013

Making Hadoop Ready for Prime TimeHadoop Summit Amsterdam March 2013

Photo Credit Aaron Sikkink http://www.flickr.com/people/housequakecom/

Page 2: Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort Lightening Talk

2

Page 3: Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort Lightening Talk

Syncsort Confidential and Proprietary - do not copy or distribute 3

Page 4: Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort Lightening Talk

The Big Data Continuum

Syncsort Confidential and Proprietary - do not copy or distribute 4

EvolvedDynamicPlateauingAdvancingTraditional

BI

Data Awakening

Big

Dat

a C

on

tin

uu

m

Early Hadoop adoption prototyping & experimentation

Hand-coding:SQL, JCL. Basic ETL Tools

Standardization & Heavy Platforms. Demand for MF data

Hitting arch limits + exponential costs. Growing MIPS

Big Data is the new standard for both MF & open systems data

Ch

alle

nge

s

Long development

cycles

Unsustainable costs

Hadoopconnectivity &

sort gaps

Efficiency, ETL &

skills gaps

Hand-coding

nightmare

Value MaxMin

Inte

grat

ing

Big

Dat

a… S

mar

ter

DMExpress

MFX

SQL Migration Hadoop ETLHadoop Sort

& ConnectivityETL & Rehosting

OptimizationHigh-

performance ETL

Page 5: Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort Lightening Talk

Syncsort Confidential and Proprietary - do not copy or distribute 5

Mandatory sort steps in MapReduce processing

Page 6: Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort Lightening Talk

Syncsort Confidential and Proprietary - do not copy or distribute 6

Page 7: Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort Lightening Talk

7

Page 8: Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort Lightening Talk

Smart Contributions to Improve Hadoop

8 Sy

nc

so

JIRA

4807 Allow MapOutputBuffer to be pluggable

4808 Allow Reduce-side merge to be pluggable

4809 Make classes required for 2454 public

4812 Create reduce input merger plug-in

Description

…and more!!

4842 Shuffle race can hang reducer

2461 HDFS file name globbing in libhdfs

4482 Backport of 2454 to MapReduce 1 & 1.2

Native Sort:

Nχot modular

Lχimited capabilities

Dχifficult to fine-tune & configure (requires coding & compilation)

Native

Sort

HadoopNode

Native

Sort

HadoopNode

Contribution:

Modular

Extensible

Configurable through use of external sorters on MapReduce nodes

Native

Sort

HadoopNode

Native

Sort

HadoopNode

First Included - Hadoop distribution, CDH4.2, on February 26th

Page 9: Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort Lightening Talk

Syncsort Confidential and Proprietary - do not copy or distribute 9

0

50

100

150

200

250

0 1000 2000 3000 4000 5000

Elap

sed

Tim

e (

min

)

File Size (GB)

TeraSort Benchmark

Benefits to the Community

JOIN

MERGE

AGGREGRATION

CDC

COMPRESSION

LOOKUPRANK

MATCH

Page 10: Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort Lightening Talk

Syncsort Confidential and Proprietary - do not copy or distribute 10

50%Data Access:

Today

Run

Ma

infr

am

es

Page 11: Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort Lightening Talk

•HDFS Connectivity•Mainframe•Teradata•Files•RDBMS, Appliances

Syncsort. A Bridge to Scalable, Cost-effective Big Data

Syncsort Confidential and Proprietary - do not copy or distribute 11

Connect Pre-process Facilitate Optimize•Sort, Join•Aggregate•Compress•Partition

•Graphical UI•No Manual Coding•No Tuning

•Up to 6x Faster Load•Up to 2x Faster Sort•Faster MapReduce Jobs

•Less Storage

Over 40 Years Solving Big Data Challenges with Fast. Efficient. Simple.

Cost Effective DI Technology

Page 12: Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort Lightening Talk

© comScore, Inc. Proprietary. 12

Hourly Load into comScore’s Hadoop Cluster

SyncSort’s DMExpress saves comScore over 4TB of data per day!

That’s 1460TB a year -1.42 Petabytes

-

50,000,000,000

100,000,000,000

150,000,000,000

200,000,000,000

250,000,000,000

300,000,000,000

350,000,000,000

400,000,000,000

450,000,000,000

500,000,000,000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Input Data in Bytes Output Data in Bytes

Page 13: Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort Lightening Talk

© comScore, Inc. Proprietary. 13

comScore’s Daily Trend of Event Volume

0

1,000,000,000

2,000,000,000

3,000,000,000

4,000,000,000

5,000,000,000

6,000,000,000

0

10,000,000,000

20,000,000,000

30,000,000,000

40,000,000,000

50,000,000,000

60,000,000,000

# o

f p

an

el

reco

rds

# o

f cen

su

s r

eco

rds

Beacon Records Panel Records

Please Attend Mike Brown’s Session Analyzing 1.4

Trillion Events with Hadoop Tomorrow

Page 14: Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort Lightening Talk

© comScore, Inc. Proprietary. 14Syncsort Confidential and Proprietary - do

not copy or distribute

(No elephants were harmed duringthe creation of this talk but someare now a lot faster & meaner)

Please visit our booth to register for a free evaluation