Big data e xposed from big data to smart data

Preview:

DESCRIPTION

This is the deck I presented in the Big Data eXposed event, September 30, David Intercontinental, Israel. In this session I’ll take the audience to a short trip in the eXelate’s cloud and present three big data related challenges and how we faced them.

Citation preview

1© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013

From Big data to Smart data

A journey into the

eXelate cloud

Motty Cohen,Chief Architect, eXelate

2© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013

eXelate is the smart data company that powers smarter digital marketing decisions worldwide

Advertiser 1st Party

Data

Data Providers

OfflineData

Online Data

Media Platforms

ModelingScoring

Segmentation

AnalyticsDistributionMarketing

Data Exchange Platform

3© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013

• Demographic• Age: 40-55• Urbanicity: Suburban• Income: High• Education: Graduate Plus• Employment: Management

• Interest• Sport• Travels• Wines• Gadgets

• Intent• Travel to Barcelona• 4-star resort

Smart Data:Accurate & actionable audience segmentation

4© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013

Our journey begins in the browser

The

Internet

5© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013

Inside eXelate Cloud:Real-time Serving & Smart data delivery

Get Event Info

Add History Data

Apply Rules & Models

Sell to buyers

200ms

100+ platforms

~500K Rules~20K Segments

5B Events/Day

~850M Unique Users

14TB Storage27GB daily

6© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013

Challenges

Big Data

Relevancy Access Time

On demand Analytics

7© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013

big data = noisesmart data = signal

8© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013

Challenge 1: Relevancy

Grabbing the relevant audienceon site, on time

9© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013

Generating Models

Model

ModelModel

Data Mining

Analytics

Create Models

eXtream

Netezza tables

Running Analytics on

Amazon

Java Packages

10© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013

Real time segmentation: Running rules and models

Basic Rules

AssociationRules

Analytic Models

Model

Model

Model

Real-time scoring

Real-time learning

Can we run all these within the limited time frame?

~500K Rules

Complex Models

11© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013

Continuous Incremental Segmentation

Users Info

Serving ClusterSegmentation

Cluster

0MQ

Continuous Incremental Segmentation

12© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013

Challenge 2: Fast access to distributed big storage

13© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013

User Object • User Info• Segments, Delivery info, Intermediate results• Object Size: x10 KB ~ x100 KB• ~ 850M UU

• Access time• Read / Write within a few ms

• Availability• For any machine in the cluster• For any cluster in every data center

14© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013

Aerospike: Frontend storage for fast access

Aerospike Cluster

Serving Cluster

XDR: Cross Data Center Replication

Optimized for SSD, Indexed in RAM

Smart Eviction Policy

Fast read/writes: 500K+ TPS

Key-value NoSQL distributed DB

15© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013

Replicated storage across data centers

US WEST CA

US CENRALTX

EUROPENL

US EASTNY

Aerospike XDR:Cross Datacenter Replication

16© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013

Challenge 3: On demand analytics

Show me the data, Now!

17© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013

optiX:Interactive data analytics

On Demand Calculation

18© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013

optiX:Interactive data analytics

On Demand Calculation

19© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013

Data Center

Elastic Search:Using search engine for counting.

NetezzaDWH Aggregator

ES Cluster(30 Nodes)

Reporter

S3

Loader

optiX

REST FTP

20© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013

What did we have so far?

• Data relevancy• Real-time scoring• Parallel processing• Split processing over time

• Big data access time• Front end, Replicated, Aerospike cluster

• On-demand analytics• Change your schema to optimize query time• Move processing from querying to loading phase• Trade off: Space + Processing -> Performance

21© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013

Thank YouQuestions?

Recommended