Upload
couchbase
View
3.670
Download
0
Tags:
Embed Size (px)
Citation preview
Why Every Couchbase Deployment Should Be Paired With Hadoop Couchbase Server + Cloudera CDH
2
pol·∙y·∙glot /ˈpäliˌglät/ Adjec1ve: Knowing or using several languages. Noun: A person who knows several languages. Synonyms: mul1lingual
per·sist·ence /pəәrˈsistəәns/ Noun: The continued or prolonged existence
of something. Synonyms: perseverance - tenacity - pertinacity –
stubbornness
3
Mo1va1on
Until recently, the architecture behind our persistence systems were designed for: • Extremely limited RAM • Limited storage
capacity • Limited I/O throughput • Simple transformation
on data, if any
What is Hadoop?
Highly scalable Unstructured data Open source Big Data Operating System Changing the World
One Petabyte at a Time
What is Hadoop?
Simplest unit of compute and storage
CPU
Disks Application
Data
What is Hadoop?
And when it grows?
Application
Data
What is Hadoop?
And when it grows more?
What is Hadoop?
NoSQL to the rescue!
Application
Data
What is Hadoop?
Hadoop is a different paradigm
Application
Data
What is Sqoop?
Sqoop is a tool designed to transfer data between Hadoop and relational databases. You can use Sqoop to import data from a relational database management system (RDBMS) such as MySQL or Oracle into the Hadoop Distributed File System (HDFS), transform the data in Hadoop MapReduce, and then export the data back into an RDBMS.
sqoop.apache.org
What is Sqoop?
Traditional ETL
Application Data Data
T
What is Sqoop?
A different paradigm
Data
Application
Data
What is Sqoop?
A very scalable different paradigm
Data
Application
Data
Application
Data
Application
Data
What is Sqoop?
Where did the Transform go?
Application
Data
TTT TTT TTT TTT
Sqoop Details
Sqoop 1.4.1 bundled in CDH4 Sqoop 2.0 coming soon Default connection is via JDBC Lots of custom connectors - Couchbase, VoltDB, Vertica - Teradata, Netezza - Oracle, MySQL, Postgres
16
COMMON USE CASES
16
16
17
Ad and offer targe1ng
events profiles, campaigns
profiles, real 1me campaign sta1s1cs
40 milliseconds to respond with the decision.
2
3
1
18
Ad Targe1ng: Moving Parts
Logs
Couchbase Server Cluster
Hadoop Cluster
sqoop import
LogsLogs
LogsLogs
Ad Targeting Platform
sqoop export
flumeflow
19
Content and Recommenda1on Targe1ng
events
user profiles
make recommenda1ons
2
3
1
Content Oriented Site
Legacy Relational Database
20
Content Driven Site: Moving Parts
Logs
Couchbase Server Cluster
Hadoop Cluster
sqoop import
LogsLogs
LogsLogs
Content Driven Web Site
sqoop export
Legacy RDBMS
In order to keep up with changing needs on richer, more targeted content that is delivered to larger and larger audiences very quickly, data behind content driven sites is shifting to Couchbase.
Hadoop excels at complex analytics which may involve multiple steps of processing which incorporate a number of different data sources.
sqoop importflumeflow
21
DEMO!
21
21
22
here be a demo which shows a workload against couchbase, sqooping that over into hadoop, running some processing there, then sqooping the data back to couchbase. possibly using oozie to drive sqoop processing
23
RUNNING SQOOP AND OPTIONS
24
Couchbase Import and Export
$ sqoop import –-connect http://localhost:8091/pools --table DUMP
$ sqoop import –-connect http://localhost:8091/pools --table BACKFILL_5
$ sqoop export --connect http://localhost:8091/pools --table DUMP –export-dir DUMP
• For Imports, table must be: – DUMP: All keys currently in Couchbase – BACKFILL_n: All key muta1ons for n minutes
• Specified –username maps to bucket – By default set to “default” bucket
25
QUESTIONS?
26
THANK YOU!
Get Couchbase Server at hcp://www.couchbase.com/download
Give us feedback at:
hcp://www.couchbase.com/forums
27
Image acribu1on
• TRS-‐80 computer: hcp://www.fotopedia.com/items/flickr-‐455238557