Iod session 3423 analytics patterns of expertise, the fast path to amazing solutions - post iod update

Analytics Patterns of Expertise --the Fast Path to Amazing Solutions Session Number BBI-3423

Rachel Bland, IBM

Trent Gray-Donald, IBM

Neeraj Sharma, IBM

© 2013 IBM Corporation

Please note

IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion.

Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision.

The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.

Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.

Agenda

Market Problem Today

New Markets/Opportunities Possible

What is the “IBM Business Intelligence Pattern with BLU

Acceleration”?

Performance Overview

Architecture

Evolving Business Requirements Challenge the Status Quo

4

Big

Data

Self

Service

Business

Analytics

Integrated Systems

Recognizing the

Power of knowledge

Increasingly

independent

knowledge workers

Lead-times for Hardware & Software

Platforms

Exploding

Volumes,

Exponential

Demand

Interactive Exploration Transform Information to Innovation

System response time is directly correlated to the propensity of use for experimentation, exploration and discovery

Interactive Exploration - Its all about getting more data faster!

Re

sp

on

se

Tim

e

Request Volume, Complexity & Concurrency

Good!

Satisfactory

Unacceptable

Tolerable

Interactive

User Expectation

Environment

Variety of MW & independent Configurations

Multi-Terabyte Data Volume

DBA Database

& HW tuning

Complexity Many query Strategies

may result in content rewrite

Performance

Data Volume & System Complexity Leads to Risk & Unpredictable TCO

Complex Custom Infrastructure Unpredictable time to value

Traditional deployment practices Variable results

Multiple approaches Multiple iterations to achieve performance

Simplified In-Memory Columnar

Acceleration

Streamlined

Fit for Purpose

Performance

Dynamic Cubes

Expert Integrated Systems Predictable Time to Value

Pattern encoded deployment Repeatable results

Simple, streamlined approach Fast path to performance

In-Memory Acceleration & Patterns of Expertise Provide Agility and Predictability

Pattern deployment

Expert Integrated Systems

Pre-configured deployment for predictable, high performance analytics solution delivery

IBM Business Intelligence Pattern with BLU Acceleration

Fast on Fast Tailored for volume, concurrency, complexity

• Choose a system that learns, grows and keeps getting faster!

• Layers of In-Memory Acceleration • Results Caching - at the speed of memory!

• More use = more results in-memory

• Dynamic Cubes

• Prime the system for the workloads you can predict

• Memory-Exploiting Columnar Database

• Acceleration for every combination & permutation

• Evolutionary Innovation • Parallel Vector Processing

• Greater query & user concurrency

• Data Skipping

• Less I/O

• Active Compression

• Reduce time spent decompressing data

• Memory-Exploiting – not Memory-bound! • Not all in-memory solutions are created equal

• Dynamic Cubes and BLU leverage SSD and SDD to ensure stable,

continuous operation

Faster DB Query*

Frequent

requests

Expected

requests

Inevitable

requests

Average Acceleration

of database queries for reporting1

1. Based on internal testing comparing DB2 10.1 traditional row store vs. DB2 10.5 with BLU Acceleration. SQL queries for 20 different reports and dashboards were run in isolation against the database to measure database response time. Full report generation time would include data transfer and processing by the BI server. Performance gains will vary by workload and system specifications.

• Low touch optimization with Instrumented self-

tuning

• Automated query performance tuning

• Create objects

• Schedule & Load

• Auto-mapping to models

• Streamlined workflows

• Built-in data landing zone

• Import data from anywhere to the in-memory

columnar repository

• Simplified administration

• Integration of data movement scheduling with

Cognos Administration

• Built-in expertise

• Memory Optimization

• Programmatic allocation of cores and memory

• Automated management

• Data source

• Business Intelligence

Rich Pattern-based Deployment for Agility

Request Select Go

• Pattern-based deployment for agility

• Complete Stack

• OS, Middleware

• Database

• Business Intelligence

• Load Data and Go!

• Purpose – built integration

• Reduced skill thresholds

• Automated deployment

• Pattern specific product extensions

• Expert Integrated System Support

• Deploy to PureApplication System

• for Fastest Time to Value

1 Person

+ 1 Hour

1 Fully Deployed Stack

Simple Economics & Agility

Industry Specific Use Cases

Industry Use Case Solution Attributes

Retail Household and market-basket analysis. Exploration analysis of billions of rows per month with

millions of customers and product SKUs

Insurance Claims analysis Indepth dimensional analysis of millions of customers,

policies and itemized claims

Manufacturing &

Logistics

Parts supply and location identification Millions of parts, thousands of locations, hundreds of

thousands of processes

Life Science Large standardized data sets cross-

referenced by patient and practitioners.

Millions of rows of “aggregator” data cross-referenced by

attribute sets

Cross-Industry Use Cases

Agenda Use Case Solution Attributes

Self-service

Acceleration

Pockets of advanced analysts impacting data

warehouse performance

Self-contained data acceleration layer

Agility of deployment

Re-establish connection with Single-Trusted Data

Local telecom limitations require replica

infrastructure

Data privacy requirements necessitate isolated

tenants

Agility and standardization of deployment

Self-contained data acceleration layer

Support a hub & spoke approach to distributed IT or

replication hosting

Replacement for aging MOLAP infrastructure Robust OLAP functionality

Faster cube load times, larger volumes

Synchronized with Single-Trusted Data

New deployments Reduce risk and cost of deployment

Reduce skill and experience threshold to adopt

BA

Prescriptive pattern-based deployment

Available in general purpose and specialized

varieties

Time to value

Cognos Dynamic Cubes: Goals

Provide a high performance OLAP solution accessing terabytes of data

Provide an aggregate aware solution

Routing to database summary/aggregate tables

Routing to in-memory aggregate values

Provide an aggregate advisor to assist with selection of

database/memory aggregates

Data cached and shared amongst all users

Provide compelling features

Parent/child (recursive) hierarchies

Multiple hierarchies per dimension

Hidden measures

Virtual cubes

Relative time

Dimensional (member) security

Data

Warehouse

Security

Security

14

Initial Query

Data Cache

Aggregate Cache

Expression Cache

Member Cache

MDX Engine

SQL queries to obtain member information

SQL queries to obtain fact and summary data

SQL queries to obtain aggregate data

Search aggregate cache for exact match

Result Set Cache

Query Processor

DQM

Dynamic Cube

DQM

Security

Security

15

Subsequent Query

Data Cache

Aggregate Cache

Expression Cache

Member Cache

MDX Engine

SQL queries to obtain fact and summary data

Search aggregate cache for exact match

Result Set Cache

Query Processor

DQM

Dynamic Cube

DQM

What is BLU Acceleration?

• New innovative technology for analytic queries • Columnar storage

• New run-time engine with vector (aka SIMD) processing, deep multi-core optimizations and cache-aware memory management

• “Active compression” - unique encoding for further storage reduction beyond DB2 10 levels, and run-time processing without decompression

• “Revolution through Evolution” • Built directly into the DB2 kernel

• BLU tables can coexists with traditional row tables, in same schema, tablespaces, bufferpools

• Query any combination of BLU or row data

• Memory-optimized (not “in-memory”)

• Value : Order-of-magnitude benefits in … • Performance • Storage savings • Time to value

This means it can run more

stuff at the same time

And this means that analytic

queries with filters and

calculations don’t wait for

data to decompress

This is really

important. It means

the system will

continue running

even if it does fill up

the memory…other

solutions in market

are “memory-bound”

How fast is it ? … Current DB2 10.5 Results

8x-25x improvement

is common

“It was amazing to see the faster query times compared to the performance results with our row-organized tables. The performance of four of our

queries improved by over 100-fold! The best outcome was a query that finished 137x faster by using BLU Acceleration.”

- Kent Collins, Database Solutions Architect, BNSF Railway

Customer Workload Speedup over DB2 10.1

Analytic ISV 37.4x

Large European Bank 21.8x

BI Vendor (Simple) 124x

BI Vendor (Complex) 6.1x

Manufacturer 9.2x

Investment Bank 36.9x

1. Based on internal testing comparing DB2 10.1 traditional row store vs. DB2 10.5 with BLU Acceleration. SQL queries for 20 different reports and dashboards were run in isolation against the database to measure database response time. Full report generation time would include data transfer and processing by the BI server. Performance gains will vary by workload and system specifications.

~2x-3x storage reduction vs DB2 10.1 adaptive compression (comparing all objects - tables, indexes, etc)

New advanced compression techniques

Fewer storage objects required

DB2 with BLU Accel.DB2 with BLU Accel.

Significant Storage Savings

DB2 10.5 & Cognos BI Dynamic Cubes

Database

Aggregate Cache

Member Cache

Cube start up

Member cache filled with queries to data warehouse dimension tables

Aggregate cache filled with queries to data warehouse (or database aggregates, if defined)

Report processing

Waterfall lookup for data in descending order until all data is provided

1. Result set cache

2. Query data cache

3. Aggregate cache

4. Database aggregate

5. Data warehouse

Report

Query Data Cache

Aggregate Cache

Result Set Cache

Cognos BI 10.2 Dynamic Cubes Ad-hoc Reports with DB2 10.5 BLU Acceleration

Server: POWER7+ 780

CPU: 64 cores @ 4.4GHz , 1TB RAM

Cognos/DB2 client LPAR: 32 cores, 512GB

DB2 server LPAR: 32 cores, 512GB RAM

V7000 with 1.6TB SSD and 4TB HDD

Operating system: AIX 7.1 TL2 SP2

DB2 versions:

DB2 10.1 FP2 Enterprise Server Edition

DB2 10.5 Advanced Enterprise Server Edition

Cognos Business Intelligence 10.2.1

Report Workload Elapsed Time

DB2 10.1 DB2 10.5

24x faster

“Our BI solution at Taikang Life is built on a Cognos/DB2 solution. In order to ensure reports run

fast and meet our service level commitments to the business, we have to perform preaggregation

each night in database. While our end users experience fast report times, this batch work has

become a challenge because of limited and shrinking batch windows and an ever increasing

database size because we want to analyze more data. With BLU Acceleration, we’ve been able to

reduce the time spent on pre-aggregation by 30x - from one hour to two minutes! BLU Acceleration

is truly amazing.” –Yong Zhou, BI Manager

Breakthrough technology – Combines and extends the leading technologies – Over 25 patents filed and pending – Leveraging years of IBM R&D spanning 10

laboratories in 7 countries worldwide

Typical experience

– 8x-25x performance gains – 10x storage savings vs. uncompressed data

with indexes – Simple to implement and use

Order of magnitude improvements in – Consumability – Speed – Storage savings

DB2 10.5

Super analytics Super easy

DB2WITH BLUACCELERATION

DB2WITH BLUACCELERATION

DB2 with BLU Acceleration : Summary

© 2011 IBM Corporation

A Virtual Application represents a collection of application components, behavioral policies and

their relationships

• Definition is agnostic to middleware product or topology

• Makes customers focus on what’s important to them – applications, SLAs

• System Manages end-end lifecycle: deploy, update, monitor, scale, undeploy

Virtual Application Pattern

What deployer defines What system deploys

Load balancer

WAS cluster configured with session replication

Initial instance = 3

Lifecycle of Business Intelligence Pattern with BLU Acceleration

Closed loop automation create and populate

aggregates

Self-contained acceleration layer to minimized impact on the warehouse

and provide a landing zone for operational data

Exploration and discovery is faster with layers of

acceleration

Closed loop automation create and populate

aggregates

Self-contained acceleration layer to minimized impact on the warehouse

and provide a landing zone for operational data

Exploration and discovery is faster with layers of

acceleration

Fully functioning self-service environments

can be deployed in minutes

Closed loop automation maps

aggregates to the model instantly

IBM Business Intelligence Pattern with BLU Acceleration

Architecture

Source 1

Source 2

Source 3

Source N

Data Loading Tools

Data Accelerator

DB2 BLU

Analytics Engine (Cognos BI)

Metadata Store

Users

Pattern Components

: :

HTTP Server (ELB

service)

Sources

Network

Admin PureApp Console

Content Store

LDAP

~500GB RAM ~30 Cores

~200GB RAM ~30 Cores

Virtual Cube

Virtual Cube Virtual

Cube

Data Flows between all components (inc ETL)

IBM Confidential

Virtual Cube Design and Aggr. Advisor

ETL/DDL Script

Core Star Schema

Cube/V

irtu

al

BLU

D

ata

Tools

Report & Act

Design Flow Data Write Data Read

In-M

em

ory

Warehouse

Aggregate tables

Cube publish & in-memory aggregates

Model update for aggregates

ET

L

ETL

ETL Aggregates

ETL Design – Core Star

In DB update jobs

Deployment Characteristics

Com

ple

xity

Data Size

• Space and CPU are both highly dependent on

two main factors

• Report & model complexity.

• Data volumes.

• Both are hard to model ahead, so there are no

hard and fast rules. However…

Based on real-world experiments, we suggest the

starting point being the following allocation sizes

on an IBM PureApplication System box.

Deployment Cores RAM Uncompressed DB size

Small (eg: dev) 12 100GB 200GB

Medium 32 512GB 1TB

Large 64 1024GB 2TB

*Examples provided for education only in the context of IBM PureApplication System Power Mini 32 and 64. Pattern capable of leveraging more RAM.

Data Warehouse

IBM PureApplication System / Pattern-enabled Environment

Other Patterns

App servers

Other

Real-time Analytics

Middleware Hosting

IBM BI With BLU Acceleration

DB2 BLU Cognos

BI

Export and Explore

Reporting / Analysis Dashboards

Other Consolidation Scenarios

Business Intelligence Across the Spectrum of Information Management Needs

Acknowledgements and Disclaimers

Availability. References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates.

The workshops, sessions and materials have been prepared by IBM or the session speakers and reflect their own views. They are provided for informational purposes only, and are neither intended to, nor shall have the effect of being, legal or other guidance or advice to any participant. While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is provided AS-IS without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this presentation or any other materials. Nothing contained in this presentation is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software.

All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results.

© Copyright IBM Corporation 2013. All rights reserved.

•U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

IBM, the IBM logo, ibm.com, Cognos and DB2 are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml

Other company, product, or service names may be trademarks or service marks of others.

Performance Disclaimers

38X Average Acceleration of database queries for reporting- Based on internal testing comparing DB2 10.1 traditional row store vs. DB2 10.5 with BLU Acceleration. SQL queries for 20 different reports and dashboards were run in isolation against the database to measure database response time. Full report generation time would include data transfer and processing by the BI server. Performance gains will vary by workload and system specifications. *Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.

http://www.ibm.com/legal/copytrade.shtml

http://www.ibm.com/legal/copytrade.shtml

Communities

• On-line communities, User Groups, Technical Forums, Blogs, Social networks, and more

o Find the community that interests you …

• Information Management bit.ly/InfoMgmtCommunity

• Business Analytics bit.ly/AnalyticsCommunity

• Enterprise Content Management bit.ly/ECMCommunity

• IBM Champions

o Recognizing individuals who have made the most outstanding contributions to Information Management, Business Analytics, and Enterprise Content Management communities

• ibm.com/champion

http://bit.ly/InfoMgmtCommunity



http://bit.ly/AnalyticsCommunity



http://bit.ly/ECMCommunity



http://www.ibm.com/software/data/champion



Related IOD Sessions

Wed. 2-5 Modeling. Deploying and Optimizing New Features of IBM Cognos Dynamic Cubes v 10.2.1

Session Number 1872

Wed. 3 - 5:45 IBM Cognos Dynamic Cubes Super Session

Session Number 1963

Thank You Your feedback is important!

• Access the Conference Agenda Builder to complete your session surveys

o Any web or mobile browser at http://iod13surveys.com/surveys.html

o Any Agenda Builder kiosk onsite

http://iod13surveys.com/surveys.html

Technology

Iod session 3423 analytics patterns of expertise, the fast path to amazing solutions - post iod update