ROME, 17-10-2013 BIG DATA ANALYTICS...This is a challenge both for Business Analytics/BI, which needs mainly to aggregate those data, but also for Advanced Analytics which need to

BIG DATA ANALYTICS

ROME, 17-10-2013

BIG DATA FOUNDATIONS

2

“Big Data” is #1 on the 2012 and the 2013 list of most ambiguous terms - Global language monitor


3

� “Big Data” refers to data sets whose size is beyond the ability of commonly used

software tools to capture, manage, and process the data within a tolerable elapsed

time.

VOLUME

� Volume: Big Data Analytics

process a very large amount of

records

� The “size” of “Big” Data is

constantly moving :

• From TeraByte to PetaByte data

• From 100 Million to Billion row (and

growing..)

VARIETY

� Variety: public, social media,

comercial, operational, enteprise

dark data

VELOCITY

� Velocity: Not real-time (nor near-

real-time) processing, typically

batch processing


4

BIG DATA KILLED DWH?

5

BIG DATA TECHNOLOGY LANDSCAPE

6

Hadoop

Appliances

In memory

NoSQL Column

Oriented


7

In

Database In Memory MapReduce

� The general solution to big data problem is massive parallelism processing on distributed

hardware (for both storage and processing), eventually using “commodity” hardware


8

� Performing Big Data Analytics involves manipulating very large datasets, we explained

above dataset need to be partitioned, distributed, stored with caution

� Moving such large dataset to central memory in order to be elaborated would be

difficult, so the idea is to move a part of the computation in the database itself.

Dimensions are first reduced and then data can be moved to memory for further

elaboration

� This is a challenge both for Business Analytics/BI, which needs mainly to aggregate

those data, but also for Advanced Analytics which need to perform more complicated

algorithms such as regression, clustering, time series algorithms

� Moving computation to the database + nodes model = algorithms performed on a subset

In



9

� In-memory processing speeds up BI by reducing or removing the need for disk

input/output (I/O)

� The key benefit is fast response times — the ability to return queries and deliver

analysis to BI users much more rapidly

� For most organizations, in-memory processing reduces, but does not eliminate, the need

to create aggregates or summaries in advance

� In-memory BI's "sweet spot" lies in powering interactive visualizations of large

multidimensional datasets. It is less important for reporting use cases where

interactivity is less intensive and traditional performance improvement techniques are

applicable

In


NEED FOR SPEED POWERS IN-MEMORY BUSINESS INTELLIGENCEJAMES RICHARDSON


10

� One of the bottlenecks towards processing large datasets is the need to store all data in

memory. Therefore, users are limited to datasets that fit in memory limit. To avoid this,

the natural approach is to split statistical algorithms in two steps

� In the first step the data processing is performed in database or on flat text file

resulting in pre-computed data aggregates. In the second step these aggregates are

imported into analytical engine where the rest of the analysis is performed. Such data

aggregates are called sufficient statistics, because they contain all information

necessary to compute parameter estimates, test statistics, confidence intervals and

model summaries while are much smaller than the original dataset

MASSIVELY PARALLEL ANALYTICS FOR LARGE DATASETSPRZEMYSLAW BIECEK, PAWEL CHUDZIAN, CEZARY DENDEK, JUSTIN LINDSEY

In


11

I4C BIG DATA PROPOSITION

Advanced

Analytics

12

I4C BIG DATA PROPOSITION

BUSINESS NEED

Big relational

Real time analytics

Semistructured, unstructured data

CAPABILITY

Appliance

In memory

NoSQL

TECHNOLOGY

Netezza, SAP Hana

Kognition, ParDB,VoltDB

Hadoop, Cassandra, Mongo

ACE LOGICAL SCHEMA

13

ACE LOGICAL SCHEMA

14

POLYGLOT PERSISTENCE

15

� One size does NOT fit all: because the diversity of the functional and technical

requirements in enterprise applications. The factors that are driving the innovation

in the data persistence space are:

• Data Volume

• Scalability

• High availability

• Fault tolerance

• Distributability

• Flexibility (i.e. "schemaless" databases)

� Polyglot Persistence is all about choosing the right persistence option for each

analytical task

POLYGLOT PERSISTENCEMARTIN FOWLER, SCOTT LEBERKNIGHT

CONNECTORS

16

� i4C proposition about polyglot persistence on Big Data databases is based on

connectors

� Connectors decline ACE general-purpose analytic features on specific persistence

stores, leveraging their own technical capabilities

� ACE Connectors:

• RDBMS (In-database analytics):

− Oracle Database

− IBM DB2

• Appliances (parallel In-database and In-memory analytics):

− IBM PureData (formerly Netezza)

− SAP Hana DB

• Hadoop (Massively parallel Mapreduce)

• An ACE internal NoSQL In memory storage (High scalability and In-memory analytics)

ACE LOGICAL SCHEMA

17

BIG DATA ADVANCED ANALYTICS PARADIGM

18

BFSI BUSINESS QUESTION

• Total Amount of Money in

Accounts?

+

+++++


19

BFSILIN

REGREGREGREGREG

• Regression Money /

customer AGE

• Linear Model

• Library of distributed Exact

Algorythms

• Approximations

ADVANCED ANALYTICS


20

BIG DATA ADVANCED ANALYTICS

ACE LOGICAL SCHEMA

21

BIG DATA ANALYTICS

22

BIG DATA ANALYTICS

23

BIG DATA ANALYTICS

24

USE CASES

OPERATIONS RISKFEEDBACKMARKETING

PRICING

SOLVENCYTARGETING

PROPENSITY

NEXT BEST

CUSTOMER VALUE

LOYALTY

CREDIT COLLECTION

FRAUD DETECTION

DEMAND FORECAST

CC FORECASTING

IT PLANNING

CUSTOMER SATIFACTION

COMPLAINTS MGMT

SENTIMENT ANALYSIS

PREDICTIVE MAINT.

REVENUE FORECASTING

AUTOGRILL

25

Autogrill is the world’s leading provider of food &

beverage and retail services for travellers. Autogrill serves people

on the move and operates primarily under concession agreements.

The Group operates mainly in airports and motorways, followed by

railway stations and a selective presence in high street, shopping

centres, trade fairs, museums, and other cultural facilities.

� Increasing shrink rate hard to manage

� Lack of an efficient loss prevention process

� Need to promptly detect frauds and take actions

� Understand new fraud patterns

� Managing frauds in a multi-country group

CUSTOMER PROFILE

CHALLENGES

AUTOGRILL

26

� i4C APP Fraud Detection for Retail

� Ability to work at single transaction row from any point of view: store - cashier

� Ability to link actions to hi-risk transactions and user driven - automatic alerting

� Using both business knowledge and predictive analytics to evaluate every single

transaction

� Easy to read indicators including analytical insights

i4C APPLICATION

KEY SUCCESS FACTORS

� Leverage on business knowledge and find new anomalous behaviours

� Being able to detect frauds starting from a 360° view on stores and Cashiers

� Introducing analytics in the loss prevention process to gain insights

GOAL

AMADORI

27

Amadori is one of European leaders in the production

and trade of meat products. In particular, it has a market share of

30% of the total poultry meat. Italian innovative food company and

a reference point for dishes based on meat.

The turnover in 2011was over 1.2 billion Euros.

Listen and analyze consumer conversations about the Amadori

brand, products and market, in order to define marketing

strategies and communications.

CUSTOMER PROFILE

CHALLENGES

AMADORI

28

� Analysis of conversations related to food

� Segmentation based on consumer sentiment towards the hot topics, interests, shopping

habits

� Root-Cause Analysis of the brand strengths and weaknesses perceived by the market

� Launch of initiatives aimed at food bloggers and young target

� Review and improve the Corporate Social Responsibility

� Improve online presence to respond to consumer habits

� Development and launch of new products based on hot topics and desires of consumers

i4C APPLICATION

KEY SUCCESS FACTORS

� Detect news, hot topics, viral phenomena that cause evolution of tastes, behavior and

consumption habits

� Profile consumers of food vertical

� Analyze Reputation & Brand Awareness of the brand, products and people within the

Company

GOAL

CREDEM

29

Credem is a major private Italian bank, but behind its

modern appearance lies a century-long history. Founded in 1910,

it took on its present name in 1983, and today CREDEM can be

found in 16 regions of Italy, offering a coverage achieved through a

mixture of acquisitions and new branches. The banks focuses

considerable attention on innovative channels, offering advanced

remote banking systems to meet the needs for transaction speed

and security. Credem combines technological innovation with a

completely customer-centric approach to banking.

Having accurate customer information allows modern

banking to anticipate customers’ needs.

CUSTOMER PROFILE

CHALLENGES

credem

30

Simone Parrotto, CRM at CREDEM: ”We have built the logic and models we use to offer our

customers products (complementary to their existing portfolios) based on their history

and contact channel. With i4C Analytics, we have found a partner with both knowledge of

the finance market and in-depth advanced analytics expertise. The application is based on

customer segmentation and propensity models which enhance the information stored in

the DWH.“

This activity has already resulted in tangible benefits, namely increased revenue per

campaign alongside lower costs. Mr Parrotto summed up: ”i4C Analytics offers an

analytic journey, starting with the adoption of innovative tools, moving along a pathway

of gradually increased automation, and ending with a highly effective embedding of these

tools into the bank’s processes”.

i4C APPLICATION

KEY SUCCESS FACTORS

� carrying out prospecting that brings in high-potential customers

� developing the existing customer base

� stopping valued customers going elsewhere

GOAL

Documents

ROME, 17-10-2013 BIG DATA ANALYTICS...This is a challenge both for Business Analytics/BI, which needs mainly to aggregate those data, but also for Advanced Analytics which need to