38
1 1 1 Hadoop at Rakuten. Rakuten Inc. Architect Group Hamba Mitsuharu & Nakagawa Gen 2011/07/06(Wed)

Hadoop at Rakuten, 2011/07/06

Embed Size (px)

DESCRIPTION

Hadoop at Rakuten

Citation preview

Page 1: Hadoop at Rakuten, 2011/07/06

1111

Hadoop at Rakuten.

Rakuten Inc. Architect GroupHamba Mitsuharu & Nakagawa Gen 2011/07/06(Wed)

Page 2: Hadoop at Rakuten, 2011/07/06

2222

Today’s Agenda.

Hadoop at Rakuten.

1. Our Profie.2. What is Hadoop?3. Our Current Hadoop System Overview.4. Our Hadoop Usage.5. Our Challenge.6. Our Future Plan.

Page 3: Hadoop at Rakuten, 2011/07/06

3333

Our Profile.

Hadoop at Rakuten.

Page 4: Hadoop at Rakuten, 2011/07/06

4444

From ACT GroupNakagawa GenHamba Mitsuharu

Our Profile.

Page 5: Hadoop at Rakuten, 2011/07/06

5555

Our Profile.

Our Mission

Enhancing Hadoop at Rakuten.

Page 6: Hadoop at Rakuten, 2011/07/06

6666

Our Profile.

Latest Our Tasks.Done.

1.Implementing Ganglia.2.Implementing HA.

Page 7: Hadoop at Rakuten, 2011/07/06

7777

Our Profile.

Latest Our Tasks.Now Handing Over.

1. Keeping Up Our Hadoop Cluster.2. Modifying Our Hadoop Configurations.3. Implementing Scripts for Daily Chores.

Page 8: Hadoop at Rakuten, 2011/07/06

8888

Our Profile.

Latest Our Tasks. Concentrate It!

1.Evaluating The Related Products.

Page 9: Hadoop at Rakuten, 2011/07/06

9999

What is Hadoop?

Hadoop at Rakuten.

Page 10: Hadoop at Rakuten, 2011/07/06

10101010

One of The Most PowerfulDistributed Processing for Large Data Sets.

What is Hadoop?

Page 11: Hadoop at Rakuten, 2011/07/06

11111111

Distributions.

What is Hadoop?

Page 12: Hadoop at Rakuten, 2011/07/06

12121212

Ecosystem.

What is Hadoop?

ETC...

Page 13: Hadoop at Rakuten, 2011/07/06

13131313

What is Hadoop?

HDFS : Hadoop Distributed File System.MapReduce :Map & Reduce (Includes Shuffle & Sort) .

HDFS & MapReduce Constitute Hadoop.

Page 14: Hadoop at Rakuten, 2011/07/06

14141414

What is Hadoop?

Source : http://horicky.blogspot.com/2008_11_01_archive.html

Input from HDFS.

Output to HDFS. Process by MapReduce.

Page 15: Hadoop at Rakuten, 2011/07/06

15151515

What is Hadoop?

Simple Example.

Source : http://techblog.yahoo.co.jp/cat207/cat209/hadoop/

Page 16: Hadoop at Rakuten, 2011/07/06

16161616

What is Hadoop?

Source : http://horicky.blogspot.com/2008_11_01_archive.html

In Common Case,Combine Several Simple Jobs.

Page 17: Hadoop at Rakuten, 2011/07/06

17171717

What is Hadoop?

NameNode & DataNode Constitute HDFS.

Source : http://horicky.blogspot.com/2008_11_01_archive.html

Page 18: Hadoop at Rakuten, 2011/07/06

18181818

What is Hadoop?

Read & Write on HDFS.

Source : http://hadoop.apache.org/common/docs/current/hdfs_design.html#NameNode+and+DataNodes

Page 19: Hadoop at Rakuten, 2011/07/06

19191919

What is Hadoop?

JobTracker & TaskTracker Constitute MapReduce.

Source : http://horicky.blogspot.com/2008_11_01_archive.html

Page 20: Hadoop at Rakuten, 2011/07/06

20202020

What is Hadoop?

Good & Bad Points of Hadoop.

�Bad…There is SPoF at NameNode.

�Good!Easy to Scale Out System.Easy to Implement Distributed Processing.

Page 21: Hadoop at Rakuten, 2011/07/06

21212121

Our Current HadoopSystem Overview.

Hadoop at Rakuten.

Page 22: Hadoop at Rakuten, 2011/07/06

22222222

Our Current Hadoop System Overview.

The Cluster Infrastructure. #1For Instance.

Source : http://www.ibm.com/developerworks/linux/library/l-hadoop/

Page 23: Hadoop at Rakuten, 2011/07/06

23232323

Our Current Hadoop System Overview.

The Cluster Infrastructure. #2In Our Case.

Switch Switch Switch

Switch

Rack Rack Rack Rack Rack Rack Rack Rack Rack Rack Rack Rack

NN&JTActive

NN&JTStandby

SNN

DN&TT DN&TT DN&TT DN&TT DN&TT DN&TT

DN&TTDN&TTDN&TT

1Gbps1Gbps1Gbps

1Gbps

x10 x10 x10 x10 x10 x10

x3 x3 x3

Client

Others Others Others

x18 x18 x183 Masters & 69 Slaves.

Page 24: Hadoop at Rakuten, 2011/07/06

24242424

Our Current Hadoop System Overview.

The Monitoring System.Using Ganglia (& MRTG).Every Time We Easily Can CheckThe Resource Usage,Not Only Each MachineBut As Cluster.

Page 25: Hadoop at Rakuten, 2011/07/06

25252525

Our Current Hadoop System Overview.

High Availability.Using DRBD & HeartBeat.

v-host.rakuten.co.jp

eth1

NN JT NN JT

/foo/drbd0 /foo/drbd1 /foo/drbd0 /foo/drbd1

DRBD Sync The Change.

eth0 eth0

eth1

Active Standby

Client

Source : Gen

NN : NameNodeJT : JobTracker

Page 26: Hadoop at Rakuten, 2011/07/06

26262626

Our Hadoop Usage.

Hadoop at Rakuten.

Page 27: Hadoop at Rakuten, 2011/07/06

27272727

Our Hadoop Usage.

1. Generating Recommend Engine Index.2. Analyzing Redirect Log.3. Calculating AD Targeting Index.4. Measuring AD Effects.5. Analyzing Ichiba Merchandise & Order Info. 6. Calculating Ichiba Product Ranking.7. Analyzing Search Log.

8. Analyzing Rakuten Travel’s Access Log. (Coming Soon...)9. Analyzing Search Word N-gram. (Coming Soon...)

Who Is Using Our Hadoop.

Page 28: Hadoop at Rakuten, 2011/07/06

28282828

Our Hadoop Usage.

The Issues of The Previous System.

Purchase

Shop

ITEM

Intermediate

Intermediate

FileFileFileFileFileFile

Marketing

Utility

Previous System

Category NFS

Mail

Unload

Load

Manipulate

1. Need High Cost to Keep Up The RDBMS.2. Need Quite a Lot of Storage Space More & More.3. System Cannot Handle So Many Job Request

Due to Low Performance.Batch Server

Page 29: Hadoop at Rakuten, 2011/07/06

29292929

Our Hadoop Usage.

The Effect of The New System.

Purchase

Shop

ITEM

FileFileFileFileFileFile

Marketing

Utility

New System! 1st Step.

Category NFS

Mail

Unload

Load

Manipulate

Batch Serverwith

1. Get Scalable System at Very Low Cost. (80% OFF as Storage.)2. Transaction Time is Dramatically Improved. (50-75% OFF.)

Intermediate

Page 30: Hadoop at Rakuten, 2011/07/06

30303030

Our Hadoop Usage.

The Remaining Subject ofThe New System.1. Still Halfway to Aiming DWH.2. The Negative Influence Due to The Migration

from Occupied Environment to Shared Environment.1. Security.2. Sharing Cluster Resource.

Page 31: Hadoop at Rakuten, 2011/07/06

31313131

Our Challenge.

Hadoop at Rakuten.

Page 32: Hadoop at Rakuten, 2011/07/06

32323232

Our Challenge.

1. Likely to Use Up The HDFS Space.2. Need Much Electlicity Power.3. Share The Cluster Resource Efficiently.4. Need More Network Bandwidth.

The Issues with Our Hadoop.

Page 33: Hadoop at Rakuten, 2011/07/06

33333333

Our Future Plan.

Hadoop at Rakuten.

Page 34: Hadoop at Rakuten, 2011/07/06

34343434

Our Future Plan.

Considering New Slave Machine.

?

Now Looking for a Machine Which has…Low Electric Power Consumption,About 6 Cores CPU x2,About 10TB HDD,About 96GB Memory,& Naturally Compatible With Our Data Center.

Page 35: Hadoop at Rakuten, 2011/07/06

35353535

Our Future Plan.

Upgrade from Apache to CDH3.

Source : http://www.quora.com/What-are-the-advantages-of-getting-Apache-Hadoop-from-Cloudera-rather-than-the-Apache-Software-Foundation]

1. A version of Hadoop that has frequent releases (quarterly) that include bug fixes and back ported features (append for HBase, Kerberos security from Y!, etc.).

2. Related projects (Hive, Pig, Oozie, HBase, Flume, Sqoop, etc.) tested together and work as a cohesive system.

3. Simplified installation via Yum / Apt repositories.4. Tighter integration with the OS (init scripts for daemons, installation of things in

common paths, logs in their proper location.).5. A fixed release schedule.6. Support available from Cloudera with SLAs.

Mr.Eric Sammer (Solution Architect at Cloudera) Described the Advantage of Hadoop from Cloudera on Quora.

Page 36: Hadoop at Rakuten, 2011/07/06

36363636

Our Future Plan.

Evaluating HBase Using AWS.

Constructing HBase Cluster on Amazon EC2.Doing Evaluation & Verification This Summer!

Page 37: Hadoop at Rakuten, 2011/07/06

37373737

Hadoop at Rakuten.

We Need Hadooper Much More!Come With Us!

Need Your Help!

Page 38: Hadoop at Rakuten, 2011/07/06

38383838

Thank You.

Hadoop at Rakuten.