Upload
dipti-borkar
View
1.156
Download
0
Embed Size (px)
DESCRIPTION
Session presented at CouchConf San Francisco http://www.couchbase.com/couchconf-san-francisco Frequently the terms NoSQL and Big Data are used as synonyms. While both technologies divert from the traditional RDBMS data model and spread data across clusters of servers, the “problems” these technologies address are quite different. Hadoop, is focused on data analysis – gleaning insights from large volumes of data. NoSQL databases, focus on interactive applications – delivering high-performance, cost-effective data management for massive number of users. In this session, we share how IBM BigInsights and Couchbase Server can used together to build better applications.
Citation preview
© 2012 IBM Corporation
Couchbase 2012
Steve Beier Program Director, Big Data Applications & Solutions, IBM Dipti Borkar Director, Product Management, Couchbase
Couchbase Server and IBM BigInsights: One + One = Three
© 2012 IBM Corporation 2
OLTP
Analy+cs
2 kinds of database management system
© 2012 IBM Corporation 3
OLTP
Analy+cs
2 kinds of database management system
© 2012 IBM Corporation 4
OLTP
Analy+cs
2 kinds of database management system
© 2012 IBM Corporation 5
Big Data
Big Users
2 kinds of database management system
© 2012 IBM Corporation 6
Map-‐reduce against huge datasets to cook up insights and answers
Simple, fast, elas+c NoSQL database with sub-‐millisecond performance at scale
2 kinds of database management system
© 2012 IBM Corporation 7
Ad and offer targeting
raw event data cooked insights
profiles, campaigns / offers, cooked insights
40 milliseconds to pick the right offer
raw event data
ac:o
nable insig
hts
Ad Targeting
© 2012 IBM Corporation 8
Content Recommendation Targeting
events
user profiles
targeted recommenda:ons
2
3
1
content oriented site
relational database
© 2012 IBM Corporation 9
sqoop
sqoop == sql RDBMS + hadoop
• a data transfer tool for Hadoop • for moving data from non-Hadoop datasources (like relational databases, NoSQL) into/out-of Hadoop
Couchbase provides Cloudera Certified sqoop connector
© 2012 IBM Corporation 10
Ad Targeting
Logs
Couchbase Server Cluster
Hadoop Cluster
sqoop import
LogsLogs
LogsLogs
Ad Targeting Platform
sqoop export
flumeflow
© 2012 IBM Corporation 11
Content Driven Site
Logs
Couchbase Server Cluster
Hadoop Cluster
sqoop import
LogsLogs
LogsLogs
Content Driven Web Site
sqoop export
Original RDBMS
In order to keep up with changing needs on richer, more targeted content that is delivered to larger and larger audiences very quickly, data behind content driven sites is shifting to Couchbase.
Hadoop excels at complex analytics which may involve multiple steps of processing which incorporate a number of different data sources.
sqoop importflumeflow
© 2012 IBM Corporation 12
Couchbase à Hadoop
$ sqoop import \ –-connect http://couchbase-01:8091/pools \ --table DUMP
$ sqoop import \ –-connect http://couchbase-01:8091/pools \ --table BACKFILL_5
© 2012 IBM Corporation 13
Couchbase à Hadoop
$ sqoop import \ –-connect http://couchbase-01:8091/pools \ --table DUMP
$ sqoop import \ –-connect http://couchbase-01:8091/pools \ --table BACKFILL_5
For import, table must be either:
• DUMP: All items currently in Couchbase • BACKFILL_n: All item mutations for n minutes
© 2012 IBM Corporation 14
Hadoop à Couchbase
$ sqoop export \ --connect http://couchbase-01:8091/pools \ --table REQUIRED_BUT_IGNORED \
-–export-dir HDFS_DIRECTORY_TO_EXPORT
© 2012 IBM Corporation 15
sqoop Versions
sqoop 1.4.2 Cloudera CDH3
• Ubuntu 10.10 – 11.10; later versions missing package needed for CDH3
Cloudera CDH4 update 1 needed • sqoop bug fix in Cloudera CDH4u1 required
© 2012 IBM Corporation 16
Couchbase sqoop - Resources
http://www.couchbase.com/develop/connectors/hadoop http://www.couchbase.com/docs/hadoop-plugin/ https://github.com/couchbase/couchbase-hadoop-plugin http://www.ibm.com/developerworks/opensource/library/ba-hadoop-couchbase/ba-
hadoop-couchbase-pdf.pdf
© 2012 IBM Corporation 17 © 2011 IBM Corporation
Big Data platform: Bring Together a Large Volume and Variety of Data to Find New Insights
Identify network security intrusions
Optimization and monitoring of public transportations
Predict weather patterns to plan optimal wind turbine usage
Detect life-threatening conditions in time to intervene
Multi-channel customer experience analysis
§ Analyzing a variety of data at enormous volumes"
§ Insights on streaming data"§ Large volume structured, semi-structure and unstructured data analysis"
Big Data Platform
• Variety • Velocity
• Volume
T-Mobile
UOIT
Vestas
Dublin City Council
Brocade
© 2012 IBM Corporation 18
§ Weather and geographic data analysis for wind turbine and wind farm site planning
§ Deployed IBM Big Data to store, manage and to analyze location-specific data
§ Analyzing 2.8 petabytes of public and private weather data for each geographic location
§ Reduced by 97% - from weeks to hours – the modeling time for wind forecasting information
Green Energy: Vestas Wind Systems A/S Volume
© 2012 IBM Corporation 19
IBM Watson Demonstrated the Power of Big Data Analytics
Can we design a computing system that rivals a human’s ability to answer questions posed in natural language, interpreting meaning and context and
retrieving, analyzing and understanding vast amounts of information in real-time?
Variety
© 2012 IBM Corporation 20
Big Data Analytics in Smarter Hospitals
IBM Data Baby youtube.com
Big Data enabled doctors from University of Ontario to apply neonatal infant monitoring to predict infection in ICU 24 hours in advance
Velocity
© 2012 IBM Corporation 21 21
Asian telco reduces billing costs and improves customer satisfaction. Capabilities:
Stream Computing Analytic Accelerators
Real-time mediation and analysis of 6B CDRs per day
Data processing time reduced from 12 hrs to 1 sec
Hardware cost reduced to 1/8th
Proactively address issues (e.g. dropped calls) impacting customer satisfaction.
© 2012 IBM Corporation 22
Telecommunications – Analyze in real time 500K/sec, 6B+ IPDRs analyzed per day on more than 4 PBs/yr. sustaining 1GBps.
§ A Telco processing Call Detail Records – 6 Billion CDRs per day – Deduplicating data over 7 days – Processing latency reduced from 12 hours to a few seconds
§ A Telco implementing a solution to access and analyze call, internet usage and texting detail records (xDRs) in real-time
– 91% reduction in time to merge data – 93% reduction in storage requirements – 85% reduction in servers used
§ A Telco requiring a solution to analyze up to 25M messages per second. At these volumes, in-motion analysis is the only option
– “Streams handled at least an order of magnitude more events per second on the same hardware than competitors.” (Telco’s Chief Architect)
– Even at these volumes, Streams provided near linear scalability
© 2012 IBM Corporation 23
Business Analytic Applications (e.g. Cognos, SPSS) and Solutions
Warehouse and Appliances
Traditional data sources
Operational Data Store
Big Data is an integral part of an enterprise data platform § Manage Big Data from the instant it enters the enterprise § High fidelity – no changes to original format § Available for new uses, analyses, and integrations
Big Data Applications
Big Data Enterprise Engine
IBM Big Data Solutions
Developers End Users Admin.
Big Data User Environment
Client and Partner Solutions
Big Data Platform
Source data (Web, sensors, logs, media, etc. )
Streaming analytics
Internet-scale analytics
Govern: Quality, Lifecycle Management, Security, Privacy
© 2012 IBM Corporation 24 24
IBM’s Big Data Platform
Big Data Enterprise Engines
IBM Big Data Solutions
Internet Scale Analytics Streaming Analytics
Developers End Users Administrators
Big Data User Environments
Bringing Big Data to the Enterprise
Client and Partner Solutions
Open Source Foundational Components
Hadoop HBase Pig Lucene Jaql Hive
AG
ENTS
INTEG
RATIO
N
Information S
erver
Marketing
Warehouse Appliances
Data Warehouse
Database
Content Analytics
Business Analytics
Master Data Mgmt
InfoSphere Warehouse
Netezza
InfoSphere MDM
DB2, Informix
Cognos & SPSS
Unica
ECM
Data Growth Management InfoSphere Optim
© 2012 IBM Corporation 25
IBM Big Data Platform Tools
• Determine product sentiment, intent, customer segmentation • Execute reusable Apps to classify users, predict sales, and forecast trends • Create spreadsheets and dashboards Analyzing big data • Productive environment for executing analysis (cluster, rank, score with R, ML, Text) • Create reusable analytic Apps without programming • Dynamic open dashboard
Business Users Data Scientists Business Analysts Developers Administrators
© 2012 IBM Corporation 26
THANK YOU [email protected] [email protected]