27
Why Every Couchbase Deployment Should Be Paired With Hadoop Couchbase Server + Cloudera CDH

CCSF12_Why_Every_Couchbase_Deployment_Should_be_paired_with_Hadoop

Embed Size (px)

Citation preview

Page 1: CCSF12_Why_Every_Couchbase_Deployment_Should_be_paired_with_Hadoop

Why Every Couchbase Deployment Should Be Paired With Hadoop Couchbase Server + Cloudera CDH

Page 2: CCSF12_Why_Every_Couchbase_Deployment_Should_be_paired_with_Hadoop

2  

pol·∙y·∙glot  /ˈpäliˌglät/  Adjec1ve:  Knowing  or  using  several  languages.  Noun:    A  person  who  knows  several  languages.  Synonyms:  mul1lingual    

per·sist·ence /pəәrˈsistəәns/ Noun: The continued or prolonged existence

of something. Synonyms: perseverance - tenacity - pertinacity –

stubbornness

Page 3: CCSF12_Why_Every_Couchbase_Deployment_Should_be_paired_with_Hadoop

3  

Mo1va1on  

Until recently, the architecture behind our persistence systems were designed for: •  Extremely limited RAM •  Limited storage

capacity •  Limited I/O throughput •  Simple transformation

on data, if any

Page 4: CCSF12_Why_Every_Couchbase_Deployment_Should_be_paired_with_Hadoop

What is Hadoop?

  Highly scalable   Unstructured data   Open source   Big Data Operating System   Changing the World

One Petabyte at a Time

Page 5: CCSF12_Why_Every_Couchbase_Deployment_Should_be_paired_with_Hadoop

What is Hadoop?

  Simplest unit of compute and storage

CPU

Disks Application

Data

Page 6: CCSF12_Why_Every_Couchbase_Deployment_Should_be_paired_with_Hadoop

What is Hadoop?

  And when it grows?

Application

Data

Page 7: CCSF12_Why_Every_Couchbase_Deployment_Should_be_paired_with_Hadoop

What is Hadoop?

  And when it grows more?

Page 8: CCSF12_Why_Every_Couchbase_Deployment_Should_be_paired_with_Hadoop

What is Hadoop?

  NoSQL to the rescue!

Application

Data

Page 9: CCSF12_Why_Every_Couchbase_Deployment_Should_be_paired_with_Hadoop

What is Hadoop?

  Hadoop is a different paradigm

Application

Data

Page 10: CCSF12_Why_Every_Couchbase_Deployment_Should_be_paired_with_Hadoop

What is Sqoop?

Sqoop is a tool designed to transfer data between Hadoop and relational databases. You can use Sqoop to import data from a relational database management system (RDBMS) such as MySQL or Oracle into the Hadoop Distributed File System (HDFS), transform the data in Hadoop MapReduce, and then export the data back into an RDBMS.

sqoop.apache.org

Page 11: CCSF12_Why_Every_Couchbase_Deployment_Should_be_paired_with_Hadoop

What is Sqoop?

  Traditional ETL

Application Data Data

T

Page 12: CCSF12_Why_Every_Couchbase_Deployment_Should_be_paired_with_Hadoop

What is Sqoop?

  A different paradigm

Data

Application

Data

Page 13: CCSF12_Why_Every_Couchbase_Deployment_Should_be_paired_with_Hadoop

What is Sqoop?

  A very scalable different paradigm

Data

Application

Data

Application

Data

Application

Data

Page 14: CCSF12_Why_Every_Couchbase_Deployment_Should_be_paired_with_Hadoop

What is Sqoop?

  Where did the Transform go?

Application

Data

TTT TTT TTT TTT

Page 15: CCSF12_Why_Every_Couchbase_Deployment_Should_be_paired_with_Hadoop

Sqoop Details

  Sqoop 1.4.1 bundled in CDH4   Sqoop 2.0 coming soon   Default connection is via JDBC   Lots of custom connectors -  Couchbase, VoltDB, Vertica -  Teradata, Netezza -  Oracle, MySQL, Postgres

Page 16: CCSF12_Why_Every_Couchbase_Deployment_Should_be_paired_with_Hadoop

16  

   

COMMON  USE  CASES  

 

16

16  

Page 17: CCSF12_Why_Every_Couchbase_Deployment_Should_be_paired_with_Hadoop

17  

Ad  and  offer  targe1ng  

events  profiles,  campaigns  

profiles,  real  1me  campaign    sta1s1cs  

40  milliseconds  to  respond  with  the  decision.  

2  

3  

1  

Page 18: CCSF12_Why_Every_Couchbase_Deployment_Should_be_paired_with_Hadoop

18  

Ad  Targe1ng:  Moving  Parts  

Logs

Couchbase Server Cluster

Hadoop Cluster

sqoop import

LogsLogs

LogsLogs

Ad Targeting Platform

sqoop export

flumeflow

Page 19: CCSF12_Why_Every_Couchbase_Deployment_Should_be_paired_with_Hadoop

19  

Content  and  Recommenda1on  Targe1ng  

events  

user  profiles  

make    recommenda1ons  

2  

3  

1  

Content Oriented Site

Legacy Relational Database

Page 20: CCSF12_Why_Every_Couchbase_Deployment_Should_be_paired_with_Hadoop

20  

Content  Driven  Site:  Moving  Parts  

Logs

Couchbase Server Cluster

Hadoop Cluster

sqoop import

LogsLogs

LogsLogs

Content Driven Web Site

sqoop export

Legacy RDBMS

In order to keep up with changing needs on richer, more targeted content that is delivered to larger and larger audiences very quickly, data behind content driven sites is shifting to Couchbase.

Hadoop excels at complex analytics which may involve multiple steps of processing which incorporate a number of different data sources.

sqoop importflumeflow

Page 21: CCSF12_Why_Every_Couchbase_Deployment_Should_be_paired_with_Hadoop

21  

DEMO!  

21

21  

Page 22: CCSF12_Why_Every_Couchbase_Deployment_Should_be_paired_with_Hadoop

22  

here be a demo which shows a workload against couchbase, sqooping that over into hadoop, running some processing there, then sqooping the data back to couchbase. possibly using oozie to drive sqoop processing

Page 23: CCSF12_Why_Every_Couchbase_Deployment_Should_be_paired_with_Hadoop

23  

RUNNING  SQOOP  AND  OPTIONS  

Page 24: CCSF12_Why_Every_Couchbase_Deployment_Should_be_paired_with_Hadoop

24  

Couchbase  Import  and  Export  

$ sqoop import –-connect http://localhost:8091/pools --table DUMP

$ sqoop import –-connect http://localhost:8091/pools --table BACKFILL_5

$ sqoop export --connect http://localhost:8091/pools --table DUMP –export-dir DUMP

•  For  Imports,  table  must  be:  –  DUMP:  All  keys  currently  in  Couchbase  –  BACKFILL_n:  All  key  muta1ons  for  n  minutes  

•  Specified  –username  maps  to  bucket  –  By  default  set  to  “default”  bucket  

Page 25: CCSF12_Why_Every_Couchbase_Deployment_Should_be_paired_with_Hadoop

25  

QUESTIONS?  

Page 26: CCSF12_Why_Every_Couchbase_Deployment_Should_be_paired_with_Hadoop

26  

THANK  YOU!  

Get  Couchbase  Server  at    hcp://www.couchbase.com/download  

 Give  us  feedback  at:  

hcp://www.couchbase.com/forums  

Page 27: CCSF12_Why_Every_Couchbase_Deployment_Should_be_paired_with_Hadoop

27  

Image  acribu1on  

•  TRS-­‐80  computer:  hcp://www.fotopedia.com/items/flickr-­‐455238557