60
WSO2 Analytics Platform: The One Stop Shop for All Your Data Needs Anjana Fernando Senior Technical Lead, WSO2 Sriskandarajah Suhothayan Technical Lead, WSO2

WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

Embed Size (px)

Citation preview

Page 1: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

WSO2 Analytics Platform: The One Stop Shop for All Your Data Needs

Anjana FernandoSenior Technical Lead, WSO2

Sriskandarajah SuhothayanTechnical Lead, WSO2

Page 2: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

WSO2 Analytics Platform

WSO2 Analytics Platform uniquely combines simultaneous real-time and interactive, batch with predictive analytics to turn data from IoT, mobile and Web apps into actionable insights

Page 3: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

WSO2 Analytics Platform

Page 4: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

WSO2 Data Analytics Server

• Fully-open source solution with the ability to build systems and applications that collect and analyze both realtime and persisted data and communicatethe results.

• Part of WSO2 Big Data Analytics Platform

• High performance data capture framework

• Highly available and scalable by design

• Pre-built Data Agents for WSO2 products

Page 5: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

WSO2 DAS Architecture

Page 6: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

Data Processing Pipeline

Collect Data

• Define scheme for data

• Send events to batch and/or Real time pipeline

•Publish events

Analyze

•Spark SQL for batch analytics

•Siddhi Query Language for real time analytics

•Predictive models for Machine Learning.

Communicate

•Alerts•Dashboards•API

Page 7: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

Highly Pluggable Event Receiver Architecture

Page 8: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

Data Model

{

'name': 'stream.name',

'version': '1.0.0',

'nickName': 'stream nick name',

'description': 'description of the stream',

'metaData':[

{'name':'meta_data_1','type':'STRING'},

],

'correlationData':[

{'name':'correlation_data_1','type':'STRING'}

],

'payloadData':[

{'name':'payload_data_1','type':'BOOL'},

{'name':'payload_data_2','type':'LONG'}

]

}

● Data published conforming to a strongly typed data stream

Page 9: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

Data Persistence

● Data Abstraction Layer to enable pluggable data connectors

○ RDBMS, Cassandra, HBase, custom..

● Analytics Tables

○ The data persistence entity in WSO2 Data Analytics Server

○ Provides a backend data source agnostic way of storing and retrieving data

○ Allows applications to be written in a way, that it does not depend on a specific data source, e.g. JDBC

(RDBMS), Cassandra APIs etc..

○ WSO2 DAS gives a standard REST API in accessing the Analytics Tables

● Analytics Record Stores

○ An Analytics Record Store, stores a specific set of Analytics Tables

○ Event persistence can configure which Analytics Record Store to be used for storing incoming events

○ Single Analytics Table namespace, the target record store only given at the time of table creation

○ Useful in creating Analytics Tables where data will be stored in multiple target databases

● Analytics File System

Page 10: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

Interactive Analytics

Page 11: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

Interactive Analysis

● Full text data indexing support powered by Apache Lucene

● Drill down search support

● Distributed data indexing

○ Designed to support scalability

● Near real time data indexing and retrieval

○ Data indexed immediately as received

Page 12: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

Interactive Analysis

Page 13: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

Batch Analytics

Page 14: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

Batch Analytics

● Powered by Apache Spark up to 30x higher performance than Hadoop

● Parallel, distributed with optimized in-memory processing

● Scalable script-based analytics written using an easy-to-learn, SQL-like

query language powered by Spark SQL

● Interactive built in web interface for ad-hoc query execution

● HA/FO supported scheduled query script execution

● Run Spark on a single node, Spark embedded Carbon server cluster or

Page 15: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

Batch Analytics

Page 16: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

Batch Analytics

Page 17: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

● Idea is to given the “Overall idea” in a glance

(e.g. car dashboard)

● Support for personalization, you can build

your own dashboard.

● Also the entry point for Drill down

● How to build?

○ Dashboard via Google Gadget and

content via HTML5 + Javascript

○ Use WSO2 User Engagement Server to

build a dashboard (or JSP/PHP)

○ Use charting libraries like Vega or D3

Communicate: Dashboards

Page 18: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

• Start with data in tabular format • Map each column to dimension in your plot like X,Y, color,

point size, etc • Also do drill-downs• Create a chart with few clicks

Gadget Generation Wizard

Page 19: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

Realtime Analysis

Page 20: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

What’s Realtime Analytics?

Realtime Analytics in Complex Event Processing

Page 21: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

What’s Realtime Analytics?...

Realtime Analytics in Complex Event Processing

• Gather data from multiple sources• Correlate data streams over time• Find interesting occurrences • And Notify• All in Realtime !

Page 22: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

What is WSO2 CEP ?

Page 23: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

Event Flow of WSO2 CEP

Page 24: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

Realtime Execution

• Process in streaming fashion (one event at a time)

• Execution logic written as Execution Plans

• Execution Plan– An isolated logical execution unit– Includes a set of queries, and relates to multiple input and

output event streams– Executed using dedicated WSO2 Siddhi engine

Page 25: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

Realtime Processing Patterns

• Transformation - project, translate, enrich, split

• Filter

• Composition / Aggregation / Analytics

• basic stats, group by, moving averages

• Join multiple streams

• Detect patterns

• Coordinating events over time

• Trends – increasing, decreasing, stable, on-increasing, non-

decreasing, mixed

• Integrate with historical data

Page 26: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

Siddhi Query Structure

define stream <event stream>(<attribute> <type>,<attribute> <type>, ...);

from <event stream>select <attribute>,<attribute>, ...insert into <event stream> ;

Page 27: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

define stream SoftDrinkSales (region string, brand string, quantity int,

price double);

from SoftDrinkSalesselect brand, quantityinsert into OutputStream ;

define stream OutputStream(brand string, quantity int); Output Streams are inferred

Siddhi Query ...

Page 28: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

define stream SoftDrinkSales (region string, brand string, quantity int,

price double);

from SoftDrinkSalesselect brand, avg(price*quantity) as avgCost,‘USD’ as currencyinsert into AvgCostStream

from AvgCostStreamselect brand, toEuro(avgCost) as avgCost,‘EURO’ as currencyinsert into OutputStream ;

Enriching Streams

Using Functions

Siddhi Query ...

Page 29: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

define stream SoftDrinkSales (region string, brand string, quantity int,

price double);

from SoftDrinkSales[region == ‘USA’ and quantity > 99]select brand, price, quantityinsert into WholeSales ;

from SoftDrinkSales#window.time(1 hour)select region, brand, avg(quantity) as avgQuantitygroup by region, brandinsert into LastHourSales ;

Filtering

Aggregation over 1 hour

Other supported window types:

timeBatch(), length(), lengthBatch(), etc.

Siddhi Query (Filter & Window) ...

Page 30: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

define stream Purchase (price double, cardNo long,place string);

from every (a1 = Purchase[price < 10] ) ->a2 = Purchase[ price >10000 and a1.cardNo == a2.cardNo ]

within 1 dayselect a1.cardNo as cardNo, a2.price as price, a2.place as placeinsert into PotentialFraud ;

Siddhi Query (Pattern) ...

Page 31: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

define stream StockStream (symbol string, price double, volume int);

partition by (symbol of StockStream)begin

from t1=StockStream,t2=StockStream [(t2[last] is null and t1.price < price) or

(t2[last].price < price)]+within 5 min

select t1.price as initialPrice, t2[last].price as finalPrice,t1.symbolinsert into IncreaingMyStockPriceStream

end;

Siddhi Query (Trends & Partition)...

Page 32: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

define table CardUserTable (name string, cardNum long) ;

@from(eventtable = 'rdbms' , datasource.name = ‘CardDataSource’ , table.name = ‘UserTable’, caching.algorithm’=‘LRU’)define table CardUserTable (name string, cardNum long)

Cache types supported• Basic: A size-based algorithm based on FIFO.• LRU (Least Recently Used): The least recently used event is dropped

when cache is full.• LFU (Least Frequently Used): The least frequently used event is dropped

when cache is full.

Siddhi Query (Table) ...

Supported for RDBMS, In-

Memory, Analytics Table,

Hazelcast

Page 33: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

define stream Purchase (price double, cardNo long, place string);define stream CardUserStream (name string, cardNo long) ;

define table CardUserTable (name string, cardNum long) ;

from Purchase#window.length(1) join CardUserTableon Purchase.cardNo == CardUserTable.cardNum

select Purchase.cardNo as cardNo, CardUserTable.name as name, Purchase.price as priceinsert into PurchaseUserStream ;

from CardUserStreamselect name, cardNo as cardNumupdate CardUserTable

on CardUserTable.name == name ;

Similarly insert into and

delete are also supported!

Siddhi Query (Table) ...

Page 34: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

• Function extension

• Aggregator extension

• Window extension

• Stream Processor extension

define stream SalesStream (brand string, price double, currency string);

from SalesStreamselect brand, custom:toUSD(price, currency) as priceInUSDinsert into OutputStream ;

Referred with namespaces

Siddhi Query (Extension) ...

Page 35: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

• geo: Geographical processing

• nlp: Natural language Processing (with Stanford NLP)

• ml: Running machine learning models of WSO2 Machine Lerner

• pmml: Running PMML models learnt by R

• timeseries: Regression and time series

• math: Mathematical operations

• str: String operations

• regex: Regular expression

Siddhi Extensions

Page 36: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

Demo on Realtime Analytics

Page 37: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

WSO2 CEP (Realtime) High Availability

Page 38: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

WSO2 CEP (Realtime) Scalability

Distributed Realtime = Siddhi +

Advantages over Apache Storm

• No need to write Java code (Supports SQL like query language)

• No need to start from basic principles (Supports high level

language)

• Adoption for change is fast

• Govern artifacts using Toolboxes

• etc ...

Page 39: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

How we scale ?

Page 40: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

Scaling with Storm

Handling Stateless & Stateful Queries

Page 41: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

Siddhi QL

define stream StockStream (symbol string, volume int, price double);

@name(‘Filter Query’)from StockStream[price > 75]select *insert into HighPriceStockStream ;

@name(‘Window Query’)from HighPriceStockStream#window.time(10 min)select symbol, sum(volume) as sumVolume insert into ResultStockStream ;

Page 42: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

Siddhi QL - with partition

define stream StockStream (symbol string, volume int, price double);

@name(‘Filter Query’)from StockStream[price > 75]select *insert into HighPriceStockStream ;

@name(‘Window Query’)partition with (symbol of HighPriceStockStream)begin

from HighPriceStockStream#window.time(10 min)select symbol, sum(volume) as sumVolume insert into ResultStockStream ;

end;

Page 43: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

Siddhi QL - distributed

define stream StockStream (symbol string, volume int, price double);

@name(Filter Query’)@dist(parallel= ‘3')from StockStream[price > 75]select *insert into HightPriceStockStream ;

@name(‘Window Query’)@dist(parallel= ‘2')partition with (symbol of HighPriceStockStream)begin

from HighPriceStockStream#window.time(10 min)select symbol, sum(volume) as sumVolume insert into ResultStockStream ;

end;

Page 44: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

Distributed Execution on Storm UI

Page 45: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

Notifying Events

Page 46: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

Event Publisher

*Supports custom event publishers via its pluggable architecture!

Page 47: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

Realtime Dashboard

• Dashboard – Google Gadget – HTML5 + javascripts

• Support gadget generation

– Using D3 and Vega

• Gather data for UI from – Websockets – Polling

• Support Custom Gadgets and Dashboards

Page 48: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

Beyond Boundaries

• Expose analytics results as API– Mobile Apps, Third Party

• Provides – Security, Billing, – Throttling, Quotas & SLA

• How ? – Write data to database from DAS – Build Services via WSO2 Data Services Server – Expose them as APIs via WSO2 API Manager

Page 49: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

Demo on Notifying Events

Page 50: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

Predictive Analysis

Page 51: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

What’s Realtime Analytics?...

Predictive Analytics in→

• Extract, pre-process, and explore data• Create models, tune algorithms and make

predictions• Integrate for better intelligence

Page 52: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

Predictive Analytics

• Guided UI to build machine learning models

– Via Spark MlLib – Via R and export them as

PMML (from WSO2 ML 1.1)

• Run models using CEP, DAS and ESB

Run R Scripts, Regression and Anomaly Detection on

Realtime

Page 53: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

Machine Learning Pipeline

Page 54: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

ML Models

ML_Algo(Data) => Model

• Outcome of ML algos are models – E.g. Learning classification generate a model that you can use to classify

data.

• ML Wizard help you create models • These models will be publish to registry or downloaded • Than can be applied in CEP, DAS, ESB etc. for prediction

Page 55: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

Data Exploration

Page 56: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

Visualizing Results

Page 57: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

Upcoming ML features

• Out of the box model generation support for R • Deep learning algorithms• NLP techniques• Data pre-processing techniques

Page 58: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

Demo on Predictive Analytics

Page 59: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

Iris DataSet

setosa versicolorvirginica

Page 60: WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Data Needs

Thank You