Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
BIG DATA ANALYTICS
ROME, 17-10-2013
BIG DATA FOUNDATIONS
2
“Big Data” is #1 on the 2012 and the 2013 list of most ambiguous terms - Global language monitor
BIG DATA FOUNDATIONS
3
� “Big Data” refers to data sets whose size is beyond the ability of commonly used
software tools to capture, manage, and process the data within a tolerable elapsed
time.
VOLUME
� Volume: Big Data Analytics
process a very large amount of
records
� The “size” of “Big” Data is
constantly moving :
• From TeraByte to PetaByte data
• From 100 Million to Billion row (and
growing..)
VARIETY
� Variety: public, social media,
comercial, operational, enteprise
dark data
VELOCITY
� Velocity: Not real-time (nor near-
real-time) processing, typically
batch processing
BIG DATA FOUNDATIONS
4
BIG DATA KILLED DWH?
5
BIG DATA TECHNOLOGY LANDSCAPE
6
Hadoop
Appliances
In memory
NoSQL Column
Oriented
BIG DATA FOUNDATIONS
7
In
Database In Memory MapReduce
� The general solution to big data problem is massive parallelism processing on distributed
hardware (for both storage and processing), eventually using “commodity” hardware
BIG DATA FOUNDATIONS
8
� Performing Big Data Analytics involves manipulating very large datasets, we explained
above dataset need to be partitioned, distributed, stored with caution
� Moving such large dataset to central memory in order to be elaborated would be
difficult, so the idea is to move a part of the computation in the database itself.
Dimensions are first reduced and then data can be moved to memory for further
elaboration
� This is a challenge both for Business Analytics/BI, which needs mainly to aggregate
those data, but also for Advanced Analytics which need to perform more complicated
algorithms such as regression, clustering, time series algorithms
� Moving computation to the database + nodes model = algorithms performed on a subset
In
Database In Memory MapReduce
BIG DATA FOUNDATIONS
9
� In-memory processing speeds up BI by reducing or removing the need for disk
input/output (I/O)
� The key benefit is fast response times — the ability to return queries and deliver
analysis to BI users much more rapidly
� For most organizations, in-memory processing reduces, but does not eliminate, the need
to create aggregates or summaries in advance
� In-memory BI's "sweet spot" lies in powering interactive visualizations of large
multidimensional datasets. It is less important for reporting use cases where
interactivity is less intensive and traditional performance improvement techniques are
applicable
In
Database In Memory MapReduce
NEED FOR SPEED POWERS IN-MEMORY BUSINESS INTELLIGENCEJAMES RICHARDSON
BIG DATA FOUNDATIONS
10
� One of the bottlenecks towards processing large datasets is the need to store all data in
memory. Therefore, users are limited to datasets that fit in memory limit. To avoid this,
the natural approach is to split statistical algorithms in two steps
� In the first step the data processing is performed in database or on flat text file
resulting in pre-computed data aggregates. In the second step these aggregates are
imported into analytical engine where the rest of the analysis is performed. Such data
aggregates are called sufficient statistics, because they contain all information
necessary to compute parameter estimates, test statistics, confidence intervals and
model summaries while are much smaller than the original dataset
MASSIVELY PARALLEL ANALYTICS FOR LARGE DATASETSPRZEMYSLAW BIECEK, PAWEL CHUDZIAN, CEZARY DENDEK, JUSTIN LINDSEY
In
Database In Memory MapReduce
11
I4C BIG DATA PROPOSITION
Advanced
Analytics
12
I4C BIG DATA PROPOSITION
BUSINESS NEED
Big relational
Real time analytics
Semistructured, unstructured data
CAPABILITY
Appliance
In memory
NoSQL
TECHNOLOGY
Netezza, SAP Hana
Kognition, ParDB,VoltDB
Hadoop, Cassandra, Mongo
ACE LOGICAL SCHEMA
13
ACE LOGICAL SCHEMA
14
POLYGLOT PERSISTENCE
15
� One size does NOT fit all: because the diversity of the functional and technical
requirements in enterprise applications. The factors that are driving the innovation
in the data persistence space are:
• Data Volume
• Scalability
• High availability
• Fault tolerance
• Distributability
• Flexibility (i.e. "schemaless" databases)
� Polyglot Persistence is all about choosing the right persistence option for each
analytical task
POLYGLOT PERSISTENCEMARTIN FOWLER, SCOTT LEBERKNIGHT
CONNECTORS
16
� i4C proposition about polyglot persistence on Big Data databases is based on
connectors
� Connectors decline ACE general-purpose analytic features on specific persistence
stores, leveraging their own technical capabilities
� ACE Connectors:
• RDBMS (In-database analytics):
− Oracle Database
− IBM DB2
• Appliances (parallel In-database and In-memory analytics):
− IBM PureData (formerly Netezza)
− SAP Hana DB
• Hadoop (Massively parallel Mapreduce)
• An ACE internal NoSQL In memory storage (High scalability and In-memory analytics)
ACE LOGICAL SCHEMA
17
BIG DATA ADVANCED ANALYTICS PARADIGM
18
BFSI BUSINESS QUESTION
• Total Amount of Money in
Accounts?
+
+++++
BIG DATA ADVANCED ANALYTICS PARADIGM
19
BFSILIN
REGREGREGREGREG
• Regression Money /
customer AGE
• Linear Model
• Library of distributed Exact
Algorythms
• Approximations
ADVANCED ANALYTICS
BIG DATA ADVANCED ANALYTICS PARADIGM
20
BIG DATA ADVANCED ANALYTICS
ACE LOGICAL SCHEMA
21
BIG DATA ANALYTICS
22
BIG DATA ANALYTICS
23
BIG DATA ANALYTICS
24
USE CASES
OPERATIONS RISKFEEDBACKMARKETING
PRICING
SOLVENCYTARGETING
PROPENSITY
NEXT BEST
CUSTOMER VALUE
LOYALTY
CREDIT COLLECTION
FRAUD DETECTION
DEMAND FORECAST
CC FORECASTING
IT PLANNING
CUSTOMER SATIFACTION
COMPLAINTS MGMT
SENTIMENT ANALYSIS
PREDICTIVE MAINT.
REVENUE FORECASTING
AUTOGRILL
25
Autogrill is the world’s leading provider of food &
beverage and retail services for travellers. Autogrill serves people
on the move and operates primarily under concession agreements.
The Group operates mainly in airports and motorways, followed by
railway stations and a selective presence in high street, shopping
centres, trade fairs, museums, and other cultural facilities.
� Increasing shrink rate hard to manage
� Lack of an efficient loss prevention process
� Need to promptly detect frauds and take actions
� Understand new fraud patterns
� Managing frauds in a multi-country group
CUSTOMER PROFILE
CHALLENGES
AUTOGRILL
26
� i4C APP Fraud Detection for Retail
� Ability to work at single transaction row from any point of view: store - cashier
� Ability to link actions to hi-risk transactions and user driven - automatic alerting
� Using both business knowledge and predictive analytics to evaluate every single
transaction
� Easy to read indicators including analytical insights
i4C APPLICATION
KEY SUCCESS FACTORS
� Leverage on business knowledge and find new anomalous behaviours
� Being able to detect frauds starting from a 360° view on stores and Cashiers
� Introducing analytics in the loss prevention process to gain insights
GOAL
AMADORI
27
Amadori is one of European leaders in the production
and trade of meat products. In particular, it has a market share of
30% of the total poultry meat. Italian innovative food company and
a reference point for dishes based on meat.
The turnover in 2011was over 1.2 billion Euros.
Listen and analyze consumer conversations about the Amadori
brand, products and market, in order to define marketing
strategies and communications.
CUSTOMER PROFILE
CHALLENGES
AMADORI
28
� Analysis of conversations related to food
� Segmentation based on consumer sentiment towards the hot topics, interests, shopping
habits
� Root-Cause Analysis of the brand strengths and weaknesses perceived by the market
� Launch of initiatives aimed at food bloggers and young target
� Review and improve the Corporate Social Responsibility
� Improve online presence to respond to consumer habits
� Development and launch of new products based on hot topics and desires of consumers
i4C APPLICATION
KEY SUCCESS FACTORS
� Detect news, hot topics, viral phenomena that cause evolution of tastes, behavior and
consumption habits
� Profile consumers of food vertical
� Analyze Reputation & Brand Awareness of the brand, products and people within the
Company
GOAL
CREDEM
29
Credem is a major private Italian bank, but behind its
modern appearance lies a century-long history. Founded in 1910,
it took on its present name in 1983, and today CREDEM can be
found in 16 regions of Italy, offering a coverage achieved through a
mixture of acquisitions and new branches. The banks focuses
considerable attention on innovative channels, offering advanced
remote banking systems to meet the needs for transaction speed
and security. Credem combines technological innovation with a
completely customer-centric approach to banking.
Having accurate customer information allows modern
banking to anticipate customers’ needs.
CUSTOMER PROFILE
CHALLENGES
credem
30
Simone Parrotto, CRM at CREDEM: ”We have built the logic and models we use to offer our
customers products (complementary to their existing portfolios) based on their history
and contact channel. With i4C Analytics, we have found a partner with both knowledge of
the finance market and in-depth advanced analytics expertise. The application is based on
customer segmentation and propensity models which enhance the information stored in
the DWH.“
This activity has already resulted in tangible benefits, namely increased revenue per
campaign alongside lower costs. Mr Parrotto summed up: ”i4C Analytics offers an
analytic journey, starting with the adoption of innovative tools, moving along a pathway
of gradually increased automation, and ending with a highly effective embedding of these
tools into the bank’s processes”.
i4C APPLICATION
KEY SUCCESS FACTORS
� carrying out prospecting that brings in high-potential customers
� developing the existing customer base
� stopping valued customers going elsewhere
GOAL