26
© 2012 IBM Corporation Couchbase 2012 Steve Beier Program Director, Big Data Applications & Solutions, IBM Dipti Borkar Director, Product Management, Couchbase Couchbase Server and IBM BigInsights: One + One = Three

Couchbase Server and IBM BigInsights: One + One = Three

Embed Size (px)

DESCRIPTION

Session presented at CouchConf San Francisco http://www.couchbase.com/couchconf-san-francisco Frequently the terms NoSQL and Big Data are used as synonyms. While both technologies divert from the traditional RDBMS data model and spread data across clusters of servers, the “problems” these technologies address are quite different. Hadoop, is focused on data analysis – gleaning insights from large volumes of data. NoSQL databases, focus on interactive applications – delivering high-performance, cost-effective data management for massive number of users. In this session, we share how IBM BigInsights and Couchbase Server can used together to build better applications.

Citation preview

Page 1: Couchbase Server and IBM BigInsights: One + One = Three

© 2012 IBM Corporation

Couchbase 2012

Steve Beier Program Director, Big Data Applications & Solutions, IBM Dipti Borkar Director, Product Management, Couchbase

Couchbase Server and IBM BigInsights: One + One = Three

Page 2: Couchbase Server and IBM BigInsights: One + One = Three

© 2012 IBM Corporation 2

OLTP  

Analy+cs  

2 kinds of database management system

Page 3: Couchbase Server and IBM BigInsights: One + One = Three

© 2012 IBM Corporation 3

OLTP  

Analy+cs  

2 kinds of database management system

Page 4: Couchbase Server and IBM BigInsights: One + One = Three

© 2012 IBM Corporation 4

OLTP  

Analy+cs  

2 kinds of database management system

Page 5: Couchbase Server and IBM BigInsights: One + One = Three

© 2012 IBM Corporation 5

Big  Data  

Big  Users  

2 kinds of database management system

Page 6: Couchbase Server and IBM BigInsights: One + One = Three

© 2012 IBM Corporation 6

Map-­‐reduce  against  huge  datasets  to  cook  up  insights  and  answers  

Simple,  fast,  elas+c  NoSQL  database  with  sub-­‐millisecond  performance  at  scale  

2 kinds of database management system

Page 7: Couchbase Server and IBM BigInsights: One + One = Three

© 2012 IBM Corporation 7

Ad and offer targeting

raw  event  data  cooked  insights  

profiles,  campaigns  /  offers,  cooked  insights  

40  milliseconds  to  pick  the  right  offer  

raw  event  data  

ac:o

nable  insig

hts  

Ad Targeting

Page 8: Couchbase Server and IBM BigInsights: One + One = Three

© 2012 IBM Corporation 8

Content Recommendation Targeting

events  

user  profiles  

targeted  recommenda:ons  

2  

3  

1  

content oriented site

relational database

Page 9: Couchbase Server and IBM BigInsights: One + One = Three

© 2012 IBM Corporation 9

sqoop

sqoop == sql RDBMS + hadoop

• a data transfer tool for Hadoop • for moving data from non-Hadoop datasources (like relational databases, NoSQL) into/out-of Hadoop

Couchbase provides Cloudera Certified sqoop connector

Page 10: Couchbase Server and IBM BigInsights: One + One = Three

© 2012 IBM Corporation 10

Ad Targeting

Logs

Couchbase Server Cluster

Hadoop Cluster

sqoop import

LogsLogs

LogsLogs

Ad Targeting Platform

sqoop export

flumeflow

Page 11: Couchbase Server and IBM BigInsights: One + One = Three

© 2012 IBM Corporation 11

Content Driven Site

Logs

Couchbase Server Cluster

Hadoop Cluster

sqoop import

LogsLogs

LogsLogs

Content Driven Web Site

sqoop export

Original RDBMS

In order to keep up with changing needs on richer, more targeted content that is delivered to larger and larger audiences very quickly, data behind content driven sites is shifting to Couchbase.

Hadoop excels at complex analytics which may involve multiple steps of processing which incorporate a number of different data sources.

sqoop importflumeflow

Page 12: Couchbase Server and IBM BigInsights: One + One = Three

© 2012 IBM Corporation 12

Couchbase à Hadoop

$ sqoop import \ –-connect http://couchbase-01:8091/pools \ --table DUMP

$ sqoop import \ –-connect http://couchbase-01:8091/pools \ --table BACKFILL_5

Page 13: Couchbase Server and IBM BigInsights: One + One = Three

© 2012 IBM Corporation 13

Couchbase à Hadoop

$ sqoop import \ –-connect http://couchbase-01:8091/pools \ --table DUMP

$ sqoop import \ –-connect http://couchbase-01:8091/pools \ --table BACKFILL_5

For import, table must be either:

•  DUMP: All items currently in Couchbase •  BACKFILL_n: All item mutations for n minutes

Page 14: Couchbase Server and IBM BigInsights: One + One = Three

© 2012 IBM Corporation 14

Hadoop à Couchbase

$ sqoop export \ --connect http://couchbase-01:8091/pools \ --table REQUIRED_BUT_IGNORED \

-–export-dir HDFS_DIRECTORY_TO_EXPORT

Page 15: Couchbase Server and IBM BigInsights: One + One = Three

© 2012 IBM Corporation 15

sqoop Versions

sqoop 1.4.2 Cloudera CDH3

•  Ubuntu 10.10 – 11.10; later versions missing package needed for CDH3

Cloudera CDH4 update 1 needed •  sqoop bug fix in Cloudera CDH4u1 required

Page 16: Couchbase Server and IBM BigInsights: One + One = Three

© 2012 IBM Corporation 16

Couchbase sqoop - Resources

http://www.couchbase.com/develop/connectors/hadoop http://www.couchbase.com/docs/hadoop-plugin/ https://github.com/couchbase/couchbase-hadoop-plugin http://www.ibm.com/developerworks/opensource/library/ba-hadoop-couchbase/ba-

hadoop-couchbase-pdf.pdf

Page 17: Couchbase Server and IBM BigInsights: One + One = Three

© 2012 IBM Corporation 17 © 2011 IBM Corporation

Big Data platform: Bring Together a Large Volume and Variety of Data to Find New Insights

Identify network security intrusions

Optimization and monitoring of public transportations

Predict weather patterns to plan optimal wind turbine usage

Detect life-threatening conditions in time to intervene

Multi-channel customer experience analysis

§  Analyzing a variety of data at enormous volumes"

§  Insights on streaming data"§  Large volume structured, semi-structure and unstructured data analysis"

Big Data Platform

•  Variety •  Velocity

•  Volume

T-Mobile

UOIT

Vestas

Dublin City Council

Brocade

Page 18: Couchbase Server and IBM BigInsights: One + One = Three

© 2012 IBM Corporation 18

§ Weather and geographic data analysis for wind turbine and wind farm site planning

§ Deployed IBM Big Data to store, manage and to analyze location-specific data

§ Analyzing 2.8 petabytes of public and private weather data for each geographic location

§ Reduced by 97% - from weeks to hours – the modeling time for wind forecasting information

Green Energy: Vestas Wind Systems A/S Volume

Page 19: Couchbase Server and IBM BigInsights: One + One = Three

© 2012 IBM Corporation 19

IBM Watson Demonstrated the Power of Big Data Analytics

Can we design a computing system that rivals a human’s ability to answer questions posed in natural language, interpreting meaning and context and

retrieving, analyzing and understanding vast amounts of information in real-time?

Variety

Page 20: Couchbase Server and IBM BigInsights: One + One = Three

© 2012 IBM Corporation 20

Big Data Analytics in Smarter Hospitals

IBM Data Baby youtube.com

Big Data enabled doctors from University of Ontario to apply neonatal infant monitoring to predict infection in ICU 24 hours in advance

Velocity

Page 21: Couchbase Server and IBM BigInsights: One + One = Three

© 2012 IBM Corporation 21 21

Asian telco reduces billing costs and improves customer satisfaction. Capabilities:

Stream Computing Analytic Accelerators

Real-time mediation and analysis of 6B CDRs per day

Data processing time reduced from 12 hrs to 1 sec

Hardware cost reduced to 1/8th

Proactively address issues (e.g. dropped calls) impacting customer satisfaction.

Page 22: Couchbase Server and IBM BigInsights: One + One = Three

© 2012 IBM Corporation 22

Telecommunications – Analyze in real time 500K/sec, 6B+ IPDRs analyzed per day on more than 4 PBs/yr. sustaining 1GBps.

§  A Telco processing Call Detail Records –  6 Billion CDRs per day –  Deduplicating data over 7 days –  Processing latency reduced from 12 hours to a few seconds

§  A Telco implementing a solution to access and analyze call, internet usage and texting detail records (xDRs) in real-time

–  91% reduction in time to merge data –  93% reduction in storage requirements –  85% reduction in servers used

§  A Telco requiring a solution to analyze up to 25M messages per second. At these volumes, in-motion analysis is the only option

–  “Streams handled at least an order of magnitude more events per second on the same hardware than competitors.” (Telco’s Chief Architect)

–  Even at these volumes, Streams provided near linear scalability

Page 23: Couchbase Server and IBM BigInsights: One + One = Three

© 2012 IBM Corporation 23

Business Analytic Applications (e.g. Cognos, SPSS) and Solutions

Warehouse and Appliances

Traditional data sources

Operational Data Store

Big Data is an integral part of an enterprise data platform §  Manage Big Data from the instant it enters the enterprise §  High fidelity – no changes to original format §  Available for new uses, analyses, and integrations

Big Data Applications

Big Data Enterprise Engine

IBM Big Data Solutions

Developers End Users Admin.

Big Data User Environment

Client and Partner Solutions

Big Data Platform

Source data (Web, sensors, logs, media, etc. )

Streaming analytics

Internet-scale analytics

Govern: Quality, Lifecycle Management, Security, Privacy

Page 24: Couchbase Server and IBM BigInsights: One + One = Three

© 2012 IBM Corporation 24 24

IBM’s Big Data Platform

Big Data Enterprise Engines

IBM Big Data Solutions

Internet Scale Analytics Streaming Analytics

Developers End Users Administrators

Big Data User Environments

Bringing Big Data to the Enterprise

Client and Partner Solutions

Open Source Foundational Components

Hadoop HBase Pig Lucene Jaql Hive

AG

ENTS

INTEG

RATIO

N

Information S

erver

Marketing

Warehouse Appliances

Data Warehouse

Database

Content Analytics

Business Analytics

Master Data Mgmt

InfoSphere Warehouse

Netezza

InfoSphere MDM

DB2, Informix

Cognos & SPSS

Unica

ECM

Data Growth Management InfoSphere Optim

Page 25: Couchbase Server and IBM BigInsights: One + One = Three

© 2012 IBM Corporation 25

IBM Big Data Platform Tools

•  Determine product sentiment, intent, customer segmentation •  Execute reusable Apps to classify users, predict sales, and forecast trends •  Create spreadsheets and dashboards Analyzing big data •  Productive environment for executing analysis (cluster, rank, score with R, ML, Text) •  Create reusable analytic Apps without programming •  Dynamic open dashboard

Business Users Data Scientists Business Analysts Developers Administrators

Page 26: Couchbase Server and IBM BigInsights: One + One = Three

© 2012 IBM Corporation 26

THANK YOU [email protected] [email protected]