65
WSO2 Analytics Platform <Presenter Name> <Title>

WSO2 Data Analytics Server - Product Overview

Embed Size (px)

Citation preview

Page 1: WSO2 Data Analytics Server - Product Overview

WSO2 Analytics Platform

<Presenter Name><Title>

Page 2: WSO2 Data Analytics Server - Product Overview

WSO2 Analytics Platform

WSO2 Analytics Platform uniquely combines simultaneous real-time and batch analysis with predictive analytics to turn data

from IoT, mobile and Web apps into actionable insights

2

Page 3: WSO2 Data Analytics Server - Product Overview

WSO2 Analytics Platform

3

Page 4: WSO2 Data Analytics Server - Product Overview

Analytics Strategy• We deliver a single platform to address all analytics styles - This was driven

by the increasing market requirement to expand analytics in enterprises beyond pure BI and start exploiting big data in real time.

• We deliver together

• Batch Analytics: analysis on data at-rest, running typically every hour or every day, and focused on historical dashboards and reports.

• Real time Analytics: analyze event streams in real-time and detect patterns and conditions.

• Predictive Analytics: leverage machine learning to create a mathematical model allowing to predict future behavior.

• Interactive Analytics: execute queries on the fly on top of data at rest.

4

Page 5: WSO2 Data Analytics Server - Product Overview

Analytics Strategy• Focus on supporting high-level, SQL query-like languages across the analytics

platform

• No Java programming involved

• Lowest learning curve

• Client Applications are agnostic of the part of the platform being used, so customers can increase their usage of the platform without changing their apps.

• Common set of receivers/publishers for all analytics types

• Common format for events

• Leverage leading open source projects such as Storm and Spark and contribute back (such as Siddhi).

• Even if they are packaged together, each component of the platform can scale independently

5

Page 6: WSO2 Data Analytics Server - Product Overview

Key Differentiators• Open Source, under Apache 2 license

• Integrated Batch, Streaming, Interactive and Predictive Analytics

• Rich, extensible, SQL-like configuration language

• Rich set of data connectors, which can be easily extended

• Events only need to be published once from applications to the platform, and can be consumed by batch or real time pipeline.

• Part of the overall WSO2 platform

6

Page 7: WSO2 Data Analytics Server - Product Overview

Key Differentiators• Rich set of data connectors, which can be easily extended

• Integrated with batch analytics (same receivers/publishers architecture)

• Events only need to be published once from applications to the platform, and can be consumed by batch or real time pipeline.

• Performance on single node satisfies 90% of use cases

7

Page 8: WSO2 Data Analytics Server - Product Overview

Market Recognition• Named as a Strong Performer in The Forrester Wave™: Big Data Streaming

Analytics, Q1 2016.• Highest score possible in 'Acquisition and Pricing' criteria, and among second-

highest scores in 'Ability to execute' criteria• The Forrester Report notes…..

“WSO2 is an open source middleware provider that includes a full spectrum of architected-as-one components such as application servers, message brokers, enterprise service bus, and many

others.

Its streaming analytics solution follows the complex event processor architectural approach, so it provides very low-latency analytics. Enterprises that already use WSO2 middleware can add CEP seamlessly. Enterprises looking for a full middleware stack that includes streaming analytics will

find a place for WSO2 on their shortlist as well.”

Page 9: WSO2 Data Analytics Server - Product Overview

IoT / Edge Analytics• We provide a solid foundation for an IoT analytics

solution, should it be for device manufacturers or device users

• Customers can today:• React in a few hours, a few mins or a few ms to a

condition, leveraging batch and streaming analytics.• Implement closed loop control (autonomic

computing) leveraging Machine Learning.• Embed streaming engine in IoT devices or gateways• Use a SDK and data agent to directly publish events at

the device hardware level.

9

Reference: https://iwringer.wordpress.com/2015/10/15/thinking-deeply-about-iot-analytics/

Page 10: WSO2 Data Analytics Server - Product Overview

Case Studies

10

Page 11: WSO2 Data Analytics Server - Product Overview

Smart Home

• DEBS (Distributed Event Based Systems) is a premier academic conference, which post yearly event processing challenge (http://www.cse.iitb.ac.in/debs2014/?page_id=42)

• Smart Home electricity data: 2000 sensors, 40 houses, 4 Billion events

• We posted fastest single node solution measured (400K events/sec) and close to one million distributed throughput.

• WSO2 CEP based solution is one of the four finalists (with Dresden University of Technology, Fraunhofer Institute, and Imperial College London)

• Only generic solution to become a finalist

11

Page 12: WSO2 Data Analytics Server - Product Overview

Customer Stories

a

12

Experian delivers a digital marketing platform, where CEP plays a key role to analyze in real-time customers behavior and offer targeted promotions. CEP was chosen after careful analysis, primarily for its openness, its open source nature, the fact support is driven by engineers and the availability of a complete middleware, integrated with CEP, for additional use cases.

Eurecat is the Catalunya innovation center (in Spain) - Using CEP to analyze data from iBeacons deployed within department stores to offer instant rebates to user or send them help if it detected that they seem “stuck” in the shop area. They chose WSO2 due to real time processing, the variety of IoT connectors available as well as the extensible framework and the rich configuration language. They also use WSO2 ESB in conjunction with WSO2 CEP.

Pacific Controls is an innovative company delivering an IoT platform of platforms: Galaxy 2021. The platform allows to manage all kinds of devices within a building and take automated decisions such as moving an elevator or starting the air conditioning based on certain conditions. Within Galaxy2021, CEP is used for monitoring alarms and specific conditions.Pacific Controls also uses other products from the WSO2 platform, such as WSO2 ESB and Identity Server.

A leading Airlines uses CEP to enhance customer experience by calculating the average time to reach their boarding gate (going through security, walking, etc.). They also want to track the time it takes to clean a plane, in order to better streamline the boarding process and notify both the air line and customers about potential delays. They evaluated WSO2 CEP first as they were already using our platform and decided to use it as it addressed all their requirements.

Page 13: WSO2 Data Analytics Server - Product Overview

Cloud IDE Analytics• Custom solution created in partnership with Codenvy to bring analytics to Codenvy

management team and its customers

• Developed in less than a month, with a custom plug-in to MongoDB.

• Deployed in the codenvy.com platform.

13

Page 14: WSO2 Data Analytics Server - Product Overview

Healthcare Data Monitoring• Allows to search/visualize/analyze healthcare records (HL7) across 20 hospitals in

Italy

• Used in combination with WSO2 ESB

• Custom toolbox tailored to customer’s requirement ( to replace existing system)

14

Page 15: WSO2 Data Analytics Server - Product Overview

Data Processing Pipeline

a

15

Collect Data

•Define scheme for data.

•Send events to batch and/or Real time pipeline.

•Publish events.

Analyze

•Spark Sql for batch analytics.

•Siddhi Query Language for real time analytics.

•Predictive models for Machine Learning.

Communicate

•Alerts•Dashboards•API

Page 16: WSO2 Data Analytics Server - Product Overview

Collect & Publish Data

16

Page 17: WSO2 Data Analytics Server - Product Overview

Extensible Receiver Architecture

* Supports custom event receivers via its pluggable architecture!

Page 18: WSO2 Data Analytics Server - Product Overview

Extensible Publisher Architecture

* Supports custom event publishers via its pluggable architecture

!

Page 19: WSO2 Data Analytics Server - Product Overview

Event Streams

• Event stream is a sequence of events• Event streams are defined by Stream

Definitions• Events streams have inflows and

outflows• Inflows can be from

• Event Receivers

• Execution plans• Outflows are to

• Event Publishers

• Execution plans

{

'name':'phone.retail.shop', 'version':'1.0.0',

'nickName': 'Phone_Retail_Shop', 'description':

'Phone Sales',

'metaData':[

{'name':'clientType','type':'STRING'}

],

'correlaitonData':[

{'name':’transactionID’,'type':'STRING'}

],

'payloadData':[

{'name':'brand','type':'STRING'},

{'name':'quantity','type':'INT'},

{'name':'total','type':'INT'}, {'name':'user','type':'STRING'}

]

}

Page 20: WSO2 Data Analytics Server - Product Overview

Data Connectors• We provide a complete set of data connectors, which customers can enrich.• The following connectors are available out of the box

• Source : Email, File, HTTP, JMS, Kafka, MQTT, SOAP, WebSocket, Thrift, Binary, Log and JMX receiver

• Sink : RDBMS, Cassandra, SMS, Email, File, HTTP, JMS, Kafka, MQTT, SOAP, WebSocket, Thrift, Binary

• Custom connectors can be written in Java - A Sample connector source is available as a

starting point and OOTB connectors source can be used as reference.

• Incoming/outgoing data can be mapped using XPath, regular expressions, or JSON paths.

• Data Connectors are common across the analytics platform.

20

Page 21: WSO2 Data Analytics Server - Product Overview

Process Data

21

Page 22: WSO2 Data Analytics Server - Product Overview

Batch Analytics

● Powered by Apache Spark up to 30x higher performance than Hadoop

● Parallel, distributed with optimized in-memory processing

● Scalable script-based analytics written using an easy-to-learn, SQL-like query language powered by Spark SQL

● Interactive built in web interface (Spark Console) for ad-hoc query execution

● HA/FO supported scheduled query script execution

● Run Spark on a single node, Spark embedded Carbon server cluster or connect to external Spark cluster

Page 23: WSO2 Data Analytics Server - Product Overview

Batch Analytics with Spark SQL

create temporary table product_data using carbonanalytics

options (schema …)

create temporary table products using carbonanalytics

options (schema …)

insert into products select product_name from product_data

group by …

23

Page 24: WSO2 Data Analytics Server - Product Overview

Interactive Analytics

• Full text data indexing support powered by Apache Lucene

• Drill down search support

• Distributed data indexing.

• Designed to support scalability

• Near real-time data indexing and retrieval

• Data indexed immediately as received

• Distributed indexing implementation for scalability

• Index sharding with Lucene indices

Page 25: WSO2 Data Analytics Server - Product Overview

Data Indexing

• Full text support data indexing powered by Apache Lucene.

• Drill down search support.

• Distributed data indexing.

• Designed to support scalability.

• Near real time data indexing and retrieval.

• Data indexed immediately as received.

25

Page 26: WSO2 Data Analytics Server - Product Overview

Realtime Analytics

• Process in streaming fashion (one event at a time)

• Execution logic written as Execution Plans

• Execution Plan

• An isolated logical execution unit

• Includes a set of queries, and relates to multiple input and output event streams

• Executed using dedicated WSO2 Siddhi engine

26

Page 27: WSO2 Data Analytics Server - Product Overview

CEP Operators with Siddhi

•Filterfrom SoftDrinkSales[region == ‘USA’ and quantity > 99] select brand, price, quantity

•Window

from SoftDrinkSales#window.time(1 hour) from SoftDrinkSales#window.timeBatch(15 min) from SoftDrinkSales#window.length(100)

•Join

from PizzaOrder#window.time(1h) as o join PizzaDelivery as d on o.id == d.id insert into DeliveryTime o.id as id, d.ts-0.ts as ts

Page 28: WSO2 Data Analytics Server - Product Overview

CEP Operators with Siddhi

•Event TableDefine table CardUserTable (name string, cardNum long) ;

@from(eventtable = 'rdbms' , datasource.name = ‘CardDataSource’ , table.name = ‘UserTable’, caching.algorithm’=‘LRU’)

•Sequencesfrom every a1 = PizzaOder -> a2 = PizzaOder[custid=a1.custid]

•Custom ExtentionsSelect brand, custom:toUSD(price, currency) as priceInUSD insert into OutputStream ;

Page 29: WSO2 Data Analytics Server - Product Overview

Operators Summary

a

29

Category Operators

Event Sequencing

e handle out of order events by using a variant of the K-Slack algorithm, which is a well-known solution to handling disorder in event streams, by buffering data until order can be guaranteed.Compensation for missed events is not supported in the current version, but is on the roadmap. Additionally, we can use filtering to reduce noisy events in a stream (based on Kalman filter)

EnrichmentEnrichment is done via two ways: event tables to access historical data from any JDBC data source, and custom extensions to connect to custom source of data, such as files.

Business LogicScripting can be used to add any business logic to any execution plan. JavaScript, Scala and R are supported out of the box. Additional, customers can easily invoke custom logic through their own operators.

Transformation

The filter operator can be used to filter streams on a certain set of conditions, which can be combined via and/or - Conditions can be expressed using mathematical operators, regular expressions, string manipulation and logical operators. Additional , queries allow to select information from input stream, project them to output stream or new stream, and replace certain elements

Page 30: WSO2 Data Analytics Server - Product Overview

Operators Summary

a

30

Category Operators

Time Windows

Siddhi provides very strong support for time windows, a domain where an SQL-like query language bring much simplicity compared to a programing language. Several types of windows are supported, including sliding and tumbling (batch) windows, time windows starting from a point in time, or CRON-based time windows. Additionally, we support applying streaming processing to events based on the number of events ( length window), the unicity of events or the frequency of events.

Aggregation/CorrelationUsing Join and Pattern operators, we can aggregate and correlate two or more streams of data. Join allows to join events based on condition, while pattern allows to correlate multiple events based on time, logical relationship or event counting.

Pattern Matching

We detect patterns based on temporal order (based on arrival order), logical relationship (based or the logical relationship of 2 events, or counting (to limit the number of events matching the pattern). The pattern may or may not allow events in between the events the condition. If no foreign event is allowed, the sequence operator must be used.

CustomDevelopers can create their own function, operators , time windows and processing operators. The extensions are written in Java. Once implemented the operators can be used as any other out of the box operator or function.

Libraries to support custom operators

Developers use the current operators as reference to develop their own, this is one of the key advantages with open source distribution. We deliver dozens of extensions on GitHub which can be adapted by 3rd parties. At the implementation level, implementing an extension just involves extending a well-defined interface.

Other operatorsWe support more than 100 custom operators on top of the list above, including geographical operators, for location-based applications, time series, math, natural language processing, integration with machine learning models created in PMML or our own Machine Learning product.

Page 31: WSO2 Data Analytics Server - Product Overview

Predictive Analytics (with WSO2 Machine Learner)

31

• Powered by Apache Spark Mlib

• Manage and explore your data

• Analyze the data using machine learning algorithms

• Build machine learning models

• Compare and manage generated machine learning models

• Predict using the built models

Page 32: WSO2 Data Analytics Server - Product Overview

Manage Data set

32

• Supported data sources

• CSV/TSV files from local file systems.

• Files from HDFS.

• Tables from WSO2 Data Analytics Server

• Supports data set versioning.

• Version data collected overtime from the same data set

• Generate models from the different versions.

• Manage datasets based on projects ,users.

Page 33: WSO2 Data Analytics Server - Product Overview

Pre-process & Explore Data

33

• Find key details from feature set

• Scatter plots to understand relationship between feature set

• Supported graphs:• Scatter plots, Parallel sets,Trellis charts,

Cluster diagram, Histogram

• Missing value handling with mean imputation and discard

Page 34: WSO2 Data Analytics Server - Product Overview

Analysis with ML Algorithm

34

• Supports deep learning

• Supports supervised and unsupervised learning.

• Includes algorithms for numerical prediction, classification and

clustering.

• Supports anomaly detection algorithm.

• Supports recommendation with Collaborative Filtering

Recommendation Algorithm

Page 35: WSO2 Data Analytics Server - Product Overview

Analysis with ML Algorithm

35

• Includes algorithms for numerical prediction, classification and

clustering.

Numerical prediction Linear Regression, Ridge Regression, Lasso Regression

Classification Logistic Regression, Naive Bayes, Decision Tree, Random Forest and Support Vector Machines

Clustering K-Means

Page 36: WSO2 Data Analytics Server - Product Overview

Model Evaluation & Comparison

36

• Evaluate generated models based on metrics

• Accuracy• Area under ROC curve• Confusion Matrix• Predicted vs. Actual graphs• Feature importance

• Compare models generated from different analysis.

• Set fractions for training data

Page 37: WSO2 Data Analytics Server - Product Overview

Development Tools

• SiddhiTryIt• Query Editor• Query verification• Wizard-like support to create an execution plan• Event flow viewer• Events tracer• Event Simulator

37

Page 38: WSO2 Data Analytics Server - Product Overview

Learning the language

38

Page 39: WSO2 Data Analytics Server - Product Overview

Editing Execution Plans

39

Page 40: WSO2 Data Analytics Server - Product Overview

Testing Execution Plans• Events can be sent individually or by reading a CSV file.

40

Page 41: WSO2 Data Analytics Server - Product Overview

Activating Statistics and Tracing• Statistics and Tracing can be activated individually for

• Execution Plans

• Event receivers

• Event publishers

41

Page 42: WSO2 Data Analytics Server - Product Overview

Event Flow Tracing

42

Page 43: WSO2 Data Analytics Server - Product Overview

Event Flow Representation

43

Page 44: WSO2 Data Analytics Server - Product Overview

Data Connectors

44

Page 45: WSO2 Data Analytics Server - Product Overview

Queries Dynamic Behavior• Developers can create dynamic queries leveraging templates

support

• Templates can be deployed from the Execution manager by

authorized personnel.

45

Page 46: WSO2 Data Analytics Server - Product Overview

Snippets support & Code Completion

46

Page 47: WSO2 Data Analytics Server - Product Overview

Error Markers & Suggestions

47

Page 48: WSO2 Data Analytics Server - Product Overview

Communication

48

Page 49: WSO2 Data Analytics Server - Product Overview

Realtime Dashboard•Visualization of the Event Stream flow in CEP

Page 50: WSO2 Data Analytics Server - Product Overview

Execution Manager Dashboard•Easy to use UI to configure predefined realtime analysis

Page 51: WSO2 Data Analytics Server - Product Overview

Communicate: Alerts• Detecting conditions can be done via CEP Queries

• Key is the “Last Mile”

• Email

• SMS

• Push notifications to a UI

• Pager

• Trigger physical Alarm

• How?

• Select Email sender “Output Adaptor” from DAS(Real time profile), or send from DAS (Real time profile) to ESB, and ESB has lot of connectors

Page 52: WSO2 Data Analytics Server - Product Overview

Communicate: APIs• With mobile Apps, most data are exposed and shared as APIs

(REST/Json ) to end users.

• Need to expose analytics results as API

• Following are some challenges

• Security and Permissions

• API Discovery

• Billing, throttling, quotas & SLA

• How?

• Write data to a database from DAS(Realtime profile) event tables

• Build service via WSO2 Data Services

• Expose as API via API Manager

Page 53: WSO2 Data Analytics Server - Product Overview

Securing WSO2 DAS• User Management

• Users are managed through the administration console. Administrators can create specific groups and assign them to new/existing users. Users and groups can be stored in LDAP, Active Directory, a database or any custom user store.

• Permissions are assigned to users to access all or parts of the DAS artifacts , either via the admin console or via APIs. For example, a user could have the right to use the simulation tools, view statistics, etc. but won’t be able to deploy applications.

• Auditing• All actions performed in the admin console or via CLI can be written to an

external audit log.

53

Page 54: WSO2 Data Analytics Server - Product Overview

Securing WSO2 DAS• Event Transmission

• HTTP-based, TCP-based, JMS and binary transports support encryption

(TLS and SSL) both at source and sink level. Receivers can be configured

so that they only accept secure connections.

54

Page 55: WSO2 Data Analytics Server - Product Overview

Scaling & High Availability(HA)

55

Page 56: WSO2 Data Analytics Server - Product Overview

Fully Distributed Deployment

Page 57: WSO2 Data Analytics Server - Product Overview

Minimum HA Deployment

Page 58: WSO2 Data Analytics Server - Product Overview

58

Scalability on WSO2 CEP & Apache Storm

Page 59: WSO2 Data Analytics Server - Product Overview

WSO2 Machine Learner -Deployment Model

a

Page 60: WSO2 Data Analytics Server - Product Overview

Solutions…• Pre-built solutions by 3rd party

• Apache Eagle: Apache Eagle is an Open Source Monitoring solution, contributed by eBay Inc, to instantly identify access to sensitive data, recognize attacks, malicious activities in Hadoop and take actions in real time.

• Open MRS: OpenMRS is an open source project used to manage electronic health records.

• Pre-build solutions from us• Fraud Detection solution, focused on Credit Card fraud.• GeoDashboard Solution• Auto-scaling manager for Apache stratos• Throttling manager for API Management

60

Page 61: WSO2 Data Analytics Server - Product Overview

Use Cases

61

Page 62: WSO2 Data Analytics Server - Product Overview

Fraud Detection

62

• Use or change the generic rules we provide and add as many rules as they like

• Change weights of Fraud Scoring Model to suit their business needs

• Use the Markov Modelling and Clustering capabilities to learn unknown Fraud Patterns in their domain

• Use the dashboard provided or plug the Fraud Detection Toolkit to their own Fraud Detection UI

http://wso2.com/library/webinars/2015/02/catch-them-in-the-act-fraud-detection-with-wso2-cep-and-wso2-bam/

Page 63: WSO2 Data Analytics Server - Product Overview

Fleet Management • Updating the locations in real time and showing the route a device has travelled

• Showing visual indicators to represent the status and for alerts

• Displaying and plotting useful information, such as location, speed, etc

63

http://wso2.com/library/articles/2015/01/article-geo-spatial-data-analysis-using-wso2-complex-event-processor-0/

Page 64: WSO2 Data Analytics Server - Product Overview

Football Game Analysis• Measures each player’s running speeds and

calculates how long he spent on different speed ranges

• Calculates the duration each player kept the ball in their possession throughout the match

• Detect hits on the ball and detects goals

• Calculate duration each player has spent in a given position can be derived

http://www.slideshare.net/hemapani/analyzing-a-soccer-game-with-wso2-cep

64