Mobile Data with Couchbase Lite !&!
Big Data HPCC SystemsBy Fujio Turner
What is Couchbase Lite ?
What is Couchbase Lite ?
NoSQL JSON Document Database for Mobile
+Your Code
Embedded Database
Couchbase Lite 0.5 MB
Why do I need Couchbase Lite ?
Why do I need Couchbase Lite ?Mobile Myths:
1. Always Available 2. Always High Performing
The mobile network is:
How Couchbase Lite tackles the Mobile MythsLocal data is always faster
How Couchbase Lite tackles the Mobile MythsLocal data is always fasterI need to save the data non-locally
,but
How Couchbase Lite tackles the Mobile MythsLocal data is always fasterI need to save the data non-locally
I need to send data to another mobile devices
,but
and/or
EZ Data Syncing with !Couchbase Sync Gateway
https://github.com/couchbase/sync_gateway
Channels
{“data”:”yes”}• Authentication & Sessions • Definable channel rules via JavaScript
http(s):// REST server
How Sync Gateway Works
Written in:
Data Flow:
CRUD:
Who is using Couchbase Lite ?
HowUses Couchbase Litehttps://youtu.be/tYolHnbCavA
What BigData solution is ready for the next
20 plus years ?
LexisNexis is a provider of legal, tax, regulatory, news, business information, and analysis to legal, corporate, government,!
accounting and academic markets. !!
!
!
LexisNexis has been in business since 1977 with over 30,000 employees worldwide.
What is HPCC Systems?Who is ?
LexisNexis Risk is the division of the LexisNexis which focuses on data, Big Data processing, linking and vertical expertise and supports HPCC Systems as an open source project under Apache 2.0 License.
Comparison
JAVA C++
Petabytes
1-80,000 Jobs/day
Since 2005
Exabytes
Since 2000
Indexed: 2K-3K Jobs/sec*
? ? ? ? ? ?
Thor Roxie
Block Based File Based
In-Memory: 30 - 40 Jobs/min*
Non-Indexed: 4-1,040,000 Jobs/day
*based on job (size / result set / complexity)
“I’m sub-second fast.”
“I can query all or part of your
data.”
Thor RoxieSingle Threaded
Hard Disk Index(optional)
Multi-Threaded Hard Disk
Index(optional) In-memory
SSD
Either/Both
Architecture
BusinessDevelopmentCustomers1 20
Non-Indexed Full Data Set
http://hpccsystems.com/why-hpcc/benchmarks
300GB File
Kevin CA 45 Mark MI 27 Sara FL 64
Name State Age
How is Data Stored on !HPCC Systems ?!
Example
Customer Data May 2010
K.. CA 45 M.. MI 27 S.. FL 64
Thor Master
Thor Slaves
Kevin CA 45 Mark MI 27 Sara FL 64
Store Data
File Name ~/customers_2010-05
Data is distributed evenly in the cluster with replica copies and is seen as a file (example below).
K.. CA 45 M.. MI 27 S.. FL 64
Thor Master
Thor Slaves
Kevin CA 45 Mark MI 27 Sara FL 64
Store Data
Dali
File Location & Job Scheduler
File locations are stored on disk.
File Name ~/customers_2010-05
K CA 45 M MI 27 S FL 64Thor Master
Thor Slaves
Dali
What state do most people live in?
ESP
1a.
2.
File Location & Job Scheduler 1.a A pre-compiled query is triggered. (Mostly used in Roxie) 1b. Ad-hoc query. !2.Query is sent to Dali to get file locations.
1b.
K CA 45 M MI 27 S FL 64Thor Master
Thor Slaves
Dali
What state do most people live in?
ESP3.
File Location & Job Scheduler3. Job is placed in que to be sent to Thor Master. Thor Master coordinates job execution on Thor Slave nodes.
K CA 45 M MI 27 S FL 64Thor Master
Thor Slaves
Dali
What state do most people live in?
ESP
File Location & Job SchedulerJob are done locally on slaves and/or coordinated by master globally.
K CA 45 M MI 27 S FL 64Thor Master
Thor Slaves
Dali
What state do most people live in?
ESP
4.
4.
MI 500 CA 120 FL 7
File Location & Job Scheduler
4.Job is returned with optional grouped by & sorted by at run time.
K CA 45 M MI 27 S FL 64Thor Master
Thor Slaves
Dali
What state do most people live in?
ESP
MI 500 CA 120 FL 7
File Location & Job Scheduler
SORT!GROUP!DEDUP!JOIN!MERGE!BETWEEN!LENGTH!REGEX!ROUND!SUM!COUNT!TRIM!WHEN!AVE!CASE!NORMALIZE!DENORMALIZE!K-MEANS!more ….
Multiple other actions can be done on the data in a single job.
Sort
Count
Group
Classification
(ROXIE) 0.27 seconds to (THOR) few hours
Country = ‘US’
Join
Index of ~/facebook_2013
Query is Completed in a Single Job!Asynchronously
~/facebook_2013
Country = ‘US’
~/twitter_2013
optional
K CA 45 M MI 27 S FL 64Thor Master
Thor Slaves
Kevin CA 45 Mark MI 27 Sara FL 64
CA row #3 MI row #17 MI row #4 FL row #5
Speed - Part 1Indexing
Index Index Index
• index per file • customize by field(s)
File Name ~/customers_2010-05
File Name ~/customers_2010-05_index
1 40
Non-Indexed
1 200
To
Indexed
1 40
Non-Indexed
1 200
To
Indexed
male row #345 female row #4 male row #97 female row #267
CA row #3 MI row #17 MI row #4 FL row #5
Example Index Example Index
Speed - Part 2Roxie
K CA 45 M MI 27 S FL 64Roxie Master
Roxie Slaves
Index In-Memory
Index Index Index
Speed - Part 2Roxie
K CA 45 M MI 27 S FL 64Roxie Master
Roxie Slaves
Index In-Memory & Part or All Data
Index Index Index
orIndex In-Memory
Speed - Part 2Roxie
K CA 45 M MI 27 S FL 64Roxie Master
Roxie Slaves
Roxie is Multi-ThreadedIndex In-Memory & Part or All Data
orIndex In-Memory
Index Index Index
Speed - Part 2Roxie
K CA 45 M MI 27 S FL 64Roxie Master
Roxie Slaves
Roxie is Multi-ThreadedIndex In-Memory & Part or All Data
orIndex In-Memory
Index Index Index
SSD are OK - write few / read many
Speed - Part 2Roxie
K CA 45 M MI 27 S FL 64Roxie Master
Roxie Slaves
Roxie is Multi-ThreadedIndex In-Memory & Part or All Data
orIndex In-Memory
Index Index Index
2004
Thor Master
Thor Slaves
Dali ESP
Roxie Master
Roxie Slaves
Common Cluster
Data is a mix of structured and unstructured. Use Thor to do ETL and send results to Roxie for user queries.
HPCC Systems 5.2
New JSON file support
https://github.com/couchbase/sync_gateway/wiki/Webhooks
Flow Data !From: Sync Gateway !To: HPCC Systems
{“data”:”yes”}
Sync Gateway’s Webhooks API lets you catch every JSON coming into Sync Gateway
{“data”:”yes”} Couchbase Lite to !HPCC Systems !
Transport
A simple Python web server that can catch all the HTTP POST from Sync Gateway and writes it
to a file for HPCC Systems to store.
https://github.com/househippo
Couchbase Lite to HPCC Systems Transport
INSTALL!in 5 Minutes
Download
Source Code
Learning More - Couchbase Lite
http://couchbase.com/download
https://github.com/couchbase
Mountain View, CA San Francisco ,CA
http://developer.couchbase.com/mobile/get-started/get-started-
mobile/index.html
INSTALL!in 5 Minutes
Download
or
Source Codehttps://github.com/hpcc-systems
http://hpccsystems.com/download/
Learning More - HPCC Systems
Atlanta, GA Mountain View, CA
https://youtu.be/8SV43DCUqJg