Upload
dipti-borkar
View
2.166
Download
1
Tags:
Embed Size (px)
DESCRIPTION
My presentation at NoSQL Now 2013
Citation preview
How Companies use NoSQL and Couchbase
Dip7 Borkar
Director, Product Management
Couchbase Server NoSQL Document Database
Couchbase Open Source Project
• Leading NoSQL database project focused on distributed database technology and surrounding ecosystem
• Supports both key-‐value and document-‐oriented use cases
• All components are available under the Apache 2.0 Public License
• Obtained as packaged so?ware in both enterprise and community ediAons.
Couchbase Open Source Project
OLTP
Analy7cs
2 kinds of database management system
NoSQL + Big Data
Map-‐reduce against huge datasets to analyze and find insights and answers
Opera7onal database for web and mobile apps with high performance at scale
Common Use Cases Social Gaming
• Couchbase stores player and game data
• Examples customers include: Zynga
• Tapjoy, Ubiso?, Tencent
Mobile Apps
• Couchbase stores user info and app content
• Examples customers include: Kobo, PlayAka
Ad Targe7ng
• Couchbase stores user informaAon for fast access
• Examples customers include: AOL, Mediamind, Convertro
Session store
• Couchbase Server as a key-‐value store
• Examples customers include: Concur, Sabre
User Profile Store
• Couchbase Server as a key-‐value store
• Examples customers include: Tunewiki
High availability cache
• Couchbase Server used as a cache Aer replacement
• Examples customers include: Orbitz
Content & Metadata Store
• Couchbase document store with ElasAc Search
• Examples customers include: McGraw Hill
3rd party data aggrega7on
• Couchbase stores social media and data feeds
• Examples customers include: LivePerson
• Applica7on objects • Popular search query results • Session informa7on • Heavily accessed web landing pages
High availability caching
Use Case: High-‐Availability Caching
• Speed up RDBMS • Consistently low response 7mes for document / key lookups • High-‐availability 24x7x365 • Replacement for en7re caching 7er
• Low latency in sub-‐milliseconds with consistently high read / write throughput using built-‐in cache • Always-‐on opera7ons even for database upgrades and maintenance with zero down 7me • memcached compa7bility for easy migra7on to Couchbase without any applica7on changes • High availability and disaster replica7on with intra-‐cluster and cross-‐cluster replica7on (XDCR)
Data Cached in Couchbase Applica7on Requirements
Why NoSQL and Couchbase
RDBMS
Applica7on Layer
User Requests
Cache Misses and Write Requests
Read-‐Write Requests
Couchbase Distributed Cache
Key Ver7cals
-‐ E-‐commerce -‐ Travel -‐ High-‐tech -‐ HR (ADP)
• Session values or Cookies (stored as key-‐value pairs) • Examples include: items in a shopping cart, flights selected, search results, etc.
Session Store
Use Case: Session Store
• Extremely fast access to session data using unique session ID • Easy scalability to handle fast growing number of users and user-‐generated data • Always-‐on func7onality for global user base
• Low latency in sub-‐milliseconds with consistently high read / write throughput for session data via the built-‐in object-‐level cache • Linear throughput scalability to grow the database as user and data volume grow • Always-‐on opera7ons even par7cularly high availability using Couchbase replica7on and failover • Intra cluster and cross cluster (XDCR) replica7on for globally distributed ac7ve-‐ac7ve plaeorm
Data Stored in Couchbase Applica7on Requirements
Why NoSQL and Couchbase
Key Ver7cals
• Ad Targe7ng • Travel • E-‐commerce
User ID / Profile Store
Use Case: Globally Distributed User Profile Store
• Extremely fast access to individual profiles • Always online system as mul7ple applica7ons access user profiles • Flexibility to add and update user agributes • Easy scalability to handle fast growing number of users
• Low latency and high throughput for very quick lookups for millions of concurrent users using built-‐in cache • Intra cluster and cross cluster (XDCR) replica7on for high availability and disaster recovery • Ac7ve-‐ac7ve geo-‐distributed system to handle globally distributed user base • Online admin opera7ons eliminate system down7me
Data Stored in Couchbase Applica7on Requirements
Why NoSQL and Couchbase
• User profile with unique ID • User sehng / preferences • User’s network • User applica7on state
Key Ver7cals
-‐ E-‐commerce -‐ Social gaming -‐ Sojware as a service
• Content metadata • Content: Ar7cles, text • Landing pages for website • Digital content: eBooks, magazine, research material
Content and Metadata Store
Use Case: Content and Metadata Store
• Flexibility to store any kind of content • Fast access to content metadata (most accessed objects) and content • Full-‐text Search across data set • Scales horizontally as more content gets added to the system
• Fast access to metadata and content via object-‐managed cache • JSON provides schema flexibility to store all types of content and metadata • Indexing and querying provides real-‐7me analy7cs capabili7es across dataset • Integra7on with Elas7cSearch for full-‐text search • Ease of scalability ensures that the data cluster can be grown seamlessly as the amount of user and ad data grows
Data Stored in Couchbase Applica7on Requirements
Why NoSQL and Couchbase
Key Ver7cals
-‐ Media & Publishing -‐ High-‐tech -‐ Social -‐ Fin serv.
• Social media feeds: Twiger, Facebook, LinkedIn • Blogs, news, press ar7cles • Data service feeds: Hoovers, Reuters • Data form other systems
Data Aggrega7on
Use Case: Data Aggrega7on
• Flexibility to store any kind of content • Flexibility to handle schema changes • Full-‐text Search across data set • High speed data inges7on • Scales horizontally as more content gets added to the system
• JSON provides schema flexibility to store all types of content and metadata • Fast access to individual documents via built-‐in cache, high write throughput • Indexing and querying provides real-‐7me analy7cs capabili7es across dataset • Integra7on with Elas7cSearch for full-‐text search • Ease of scalability ensures that the data cluster can be grown seamlessly as the amount of user and ad data grows
Data Stored in Couchbase Applica7on Requirements
Why NoSQL and Couchbase
Key Ver7cals
-‐ Ad targe7ng -‐ High-‐tech -‐ Media & Publishing
Common Use Cases Social Gaming
• Couchbase stores player and game data
• Examples customers include: Zynga
• Tapjoy, Ubiso?, Tencent
Mobile Apps
• Couchbase stores user info and app content
• Examples customers include: Kobo, PlayAka
Ad Targe7ng
• Couchbase stores user informaAon for fast access
• Examples customers include: AOL, Mediamind, Convertro
Session store
• Couchbase Server as a key-‐value store
• Examples customers include: Concur, Sabre
User Profile Store
• Couchbase Server as a key-‐value store
• Examples customers include: Tunewiki
High availability cache
• Couchbase Server used as a cache Aer replacement
• Examples customers include: Orbitz
Content & Metadata Store
• Couchbase document store with ElasAc Search
• Examples customers include: McGraw Hill
3rd party data aggrega7on
• Couchbase stores social media and data feeds
• Examples customers include: LivePerson
McGraw Hill Educa7on Labs Learning portal
Use Case: Content and metadata store
Building a self-‐adapAng, interacAve learning portal with Couchbase
As learning move online in great numbers
Growing need to build interactive learning environments that
Scale!!
Scale to millions of learners
Serve MHE as well as third-‐party content
Including open content
Support learning apps
010100100111010101010101001010101010
Self-‐adapt via usage data
The Problem
• Allow for elasAc scaling under spike periods
• Ability to catalog & deliver content from many sources
• Consistent low-‐latency for metadata and stats access
• Require full-‐text search support for content discovery
• Offer tunable content ranking & recommendaAon funcAons
Backend is an Interactive Content Delivery Cloud that must:
XML Databases
SQL/MR Engines
In-‐memory Data Grids
Enterprise Search Servers
Experimented with a combination of:
The Challenge
The Learning Portal
• Designed and built as a collaboration between MHE Labs and Couchbase
• Serves as proof-of-concept and testing harness for Couchbase + ElasticSearch integration
• Available for download and further development as open source code
• Document Modeling
• Metadata & Content Storage
• View Querying to support Content Browsing • ElasAc Search IntegraAon (Full Text Search) -‐ Content Updated in near Real-‐Time
-‐ Search Content Summaries
-‐ Relevancy boosted based on User Preferences • Real-‐Time Content Updates
• Event Logging for offline analysis
Techniques Used
Couchbase 2.0 + Elas7csearch
Store full-text articles as well as document metadata for image, video and text content in Couchbase
Combine user preferences staAsAcs with custom relevancy scoring to provide personalized search results
Logs user behavior to calculate user preference staAsAcs (e.g. video > text)
1
2 4
ConAnuously accept updates from Couchbase with new content & stats
3
Data Model
Content Metadata Bucket
User Profiles Bucket
Content Stats Bucket
• Stores content metadata for media objects and content for articles
• Includes tags, contributors, type information
• Includes pointer to the media
• Stores user view details per type • Updated every time a user views
a doc with running count • To be used for customizing ES
search results per user preference
• Stores content view details • Updated for every time a
document is viewed • To be used for boosting ES
search results based on popularity
Architecture
• Social media feeds: Twiger, Facebook, LinkedIn • Blogs, news, press ar7cles • Data service feeds: Hoovers, Reuters
3rd Party Data Aggrega7on
Use Case: 3rd party data aggrega7on
• Flexibility to store any kind of content • Flexibility to handle schema changes • Full-‐text Search across data set • High speed data inges7on • Scales horizontally as more content gets added to the system
• JSON provides schema flexibility to store all types of content and metadata • Fast access to individual documents via built-‐in cache, high write throughput • Indexing and querying provides real-‐7me analy7cs capabili7es across dataset • Integra7on with Elas7cSearch for full-‐text search • Ease of scalability ensures that the data cluster can be grown seamlessly as the amount of user and ad data grows
Types of Data Applica7on Requirements
Why NoSQL and Couchbase
LivePerson – Real 7me visitor engagement
Use Case: 3rd party data aggrega7on with analy7cs
Real Ame AnalyAcs for LivePerson's customers
LiveEngage DASHBOARD
LivePerson: Leading customer engagement plaeorm
Requirements Requirements Requirements
• High throughput, really fast • Linear scale • Searchable (Views and M/R)
• Supports both K/V & Document store
• Cross data center replicaAon • “Always on”, Resilience soluAon
The Problem
13 TB per month ~1 PB In total 1.8 B
Visits per month
VOLUME
Couchbase Java SDK
ApplicaAon server Tomcat
M/R views
cluster
M/R views
cluster
XDCR
REST API
Couchbase Java SDK
Storm Topology
Couchbase Java SDK
Storm Topology
Architecture
Visitor
Stream Event Processing
Visitor Feed -‐ Storm
Topology
Customer RepresentaAve
Kala
Couchbase
Visitor Monitoring Service
(1) Visitor browsing
(2) Visitor events
(4) Write event to user document
(6) Return relevant visitors
(7) Return relevant visitors
(5) Get visitors List Every 3 sec Visitor Feed
API
(3) Analyze relevant events and persist
Data flow
Document Structurestructure
{
"accountId": "64302875",
"id": 121640710013,
"rtSessionId": "643028754295878498",
"eventSequence": 5104,
"ipAddress": {
"fieldValue": "194.39.63.10",
"seq": 1
},
"browser": {
"fieldValue": "Chrome 27.0.1453.116",
"seq": 1
},
"state": {
"fieldValue": "LEFT_SITE",
"seq": 5104
}
......................................
}
MulA tenant
DB
Basic visitor
informaAon
Sequence
use due to
Kala
Couchbase Server
Easy Scalability
Consistent High Performance
Always On 24x365
Grow cluster without applicaAon changes, without downAme with a single click
Consistent sub-‐millisecond read and write response Ames with consistent high throughput
No downAme for so?ware upgrades, hardware maintenance, etc.
JSONJSONJSON
JSONJSON
PERFORMANCE
Flexible Data Model
JSON document model with no fixed schema.
Couchbase Server
Features in Couchbase Server 2.0
JSON support Indexing and Querying
Cross data center replica7on Incremental Map Reduce
JSONJSONJSON
JSONJSON
Addi7onal Features
Built-‐in clustering – All nodes equal Data replicaAon with auto-‐failover Zero-‐downAme maintenance Built-‐in managed cached
Append-‐only storage layer Online compacAon Monitoring and admin API & UI SDK for a variety of languages
Ques7ons?
Thank you!