Big Data - Fast Machine Learning at Scale + Couchbase

Preview:

Citation preview

Fast Machine Learning with

by Fujio Turner

@FujioTurner

Current & Future ProblemsChurn Prediction Truth and Veracity

Recommendations Online Advertisement

News Aggregation

Scalability

Content Discovery/Search

Intelligent Learning Machine Learning for Medicine

Source: Abhishek Shivkumar

LexisNexis is a provider of legal, tax, regulatory, news, business information, and analysis to legal, corporate, government,

accounting and academic markets.

LexisNexis has been in business since 1977 with over 30,000 employees worldwide. 

What is HPCC Systems?Who is ?

LexisNexis Risk is the division of the LexisNexis which focuses on data, Big Data processing, linking and vertical expertise and supports HPCC Systems as an open source project under Apache 2.0 License.

http://hpccsystems.com/

ProblemsData from 10,000+ Different Source

Different Needs for the Data

Different Levels of Proficiency

Lots of Data

Different Needs for the Data

Different Levels of Proficiency

Alot of Data

Normalized / Denormalized Structured / Unstructured

Data from 10,000+ Different Source

DEDUP, JOIN , INDEX , COUNT , REGEX, K-Means

BETWEEN, GROUP, CASE, Custom

1 Easy Language (ECL) or

SQL , R , JAVA , Python , C++, SAS

Reliable Data Distribution & Processing System that scales to exabytes+

Solutions

Machine Learning Built-in

Regression Linear Regression Classification Naive Bayes Perceptron Decisions Trees Logistic Regression Clustering K-Means KD Trees Agglomerative/Hierarchical Association Analysis AprioriN EclatN Rules

http://hpccsystems.com/ml

Michael Payne ,of Clemson University, on high speed machine learning with PB-BLAS in HPCC Systems.

http://youtu.be/s_HWlMwi6iI

“I’m sub-second fast.”

“I can query all or part of your

data.”

Thor Roxie

Single Threaded Hard Disk

Index(optional)

Multi-Threaded Hard Disk

Index(optional) In-memory

SSD

Either/Both

Cluster Architecture

Sort

Count

Group

Classification

(ROXIE) 0.27 seconds to (THOR) few hours

Country = ‘US’

Join

Index of ~/facebook_2013

Query is Completed in a Single JobAsynchronously

~/facebook_2013

Country = ‘US’

~/twitter_2013

SORTGROUPDEDUPJOINMERGEBETWEENLENGTHREGEXROUNDSUMCOUNTTRIMWHENAVECASENORMALIZEDENORMALIZEK-MEANSmore ….

+

http://www.youtube.com/watch?v=8SV43DCUqJg

Watch how to install HPCC Systems in 5 Minutes

Download HPCC Systems Open Source

Community Edition

or

Source Codehttps://github.com/hpcc-systems

http://hpccsystems.com/download/

+

Common Big Data Setup

What is Couchbase ?

Open Source

Memcached Built-InWhat is Couchbase ?

Open Source

Memcached Built-In w/ ReplicasWhat is Couchbase ?

Open Source

Memcached Built-InFlexible Schema (JSON)

w/ ReplicasWhat is Couchbase ?

Open Source

Memcached Built-In

Key/Value & DistributedFlexible Schema (JSON)

Cross Data Center Replication

w/ ReplicasWhat is Couchbase ?

Open Source

Memcached Built-InFlexible Schema (JSON)

SQL++ (N1QL)

w/ ReplicasWhat is Couchbase ?

Key/Value & DistributedCross Data Center Replication

Open Source

+

Sub-MillisecondSQL++(N1QL)

JSON

Distributed & Reliable

Distributed & Reliable

1 Language

Flexible Data Types

Ready for the Future

XDCR

Couchbase Mobile

.

.

.

.

.

Embedded JSON NoSQL Database

.

.

.

.

.

+ Sync Data Online / OfflineEmbedded JSON NoSQL Database

+ Sync & Channel Data Peer-To-Peer+ Sync Data Peer-To-Peer (directly)

Couchbase Mobile

Couchbase Mobile + HPCC Systems

.

.

.

.

.

Process & Store Data to Scale

INSTALL in 5 Minutes

Download

Source Code

Learning More - Couchbase Server & Lite

http://couchbase.com/download

https://github.com/couchbase

Mountain View, CA San Francisco ,CA

https://www.youtube.com/ user/CouchbaseVideo

Recommended