Upload
rachel-bland
View
349
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Session content from IBM Information On Demand 2013 provides an overview of the IBM Business Intelligence Pattern with BLU Acceleration and explains the underlying technology employed to deliver high speed analysis more quickly and easily than ever before.
Citation preview
Analytics Patterns of Expertise --the Fast Path to Amazing Solutions Session Number BBI-3423
Rachel Bland, IBM
Trent Gray-Donald, IBM
Neeraj Sharma, IBM
© 2013 IBM Corporation
Please note
IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion.
Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision.
The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.
Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.
Agenda
Market Problem Today
New Markets/Opportunities Possible
What is the “IBM Business Intelligence Pattern with BLU
Acceleration”?
Performance Overview
Architecture
Evolving Business Requirements Challenge the Status Quo
4
Big
Data
Self
Service
Business
Analytics
Integrated Systems
Recognizing the
Power of knowledge
Increasingly
independent
knowledge workers
Lead-times for Hardware & Software
Platforms
Exploding
Volumes,
Exponential
Demand
Interactive Exploration Transform Information to Innovation
System response time is directly correlated to the propensity of use for experimentation, exploration and discovery
Interactive Exploration - Its all about getting more data faster!
Re
sp
on
se
Tim
e
Request Volume, Complexity & Concurrency
Good!
Satisfactory
Unacceptable
Tolerable
Interactive
User Expectation
Environment
Variety of MW & independent Configurations
Multi-Terabyte Data Volume
DBA Database
& HW tuning
Complexity Many query Strategies
may result in content rewrite
Performance
Data Volume & System Complexity Leads to Risk & Unpredictable TCO
Complex Custom Infrastructure Unpredictable time to value
Traditional deployment practices Variable results
Multiple approaches Multiple iterations to achieve performance
Simplified In-Memory Columnar
Acceleration
Streamlined
Fit for Purpose
Performance
Dynamic Cubes
Expert Integrated Systems Predictable Time to Value
Pattern encoded deployment Repeatable results
Simple, streamlined approach Fast path to performance
In-Memory Acceleration & Patterns of Expertise Provide Agility and Predictability
Pattern deployment
Expert Integrated Systems
Pre-configured deployment for predictable, high performance analytics solution delivery
IBM Business Intelligence Pattern with BLU Acceleration
Fast on Fast Tailored for volume, concurrency, complexity
• Choose a system that learns, grows and keeps getting faster!
• Layers of In-Memory Acceleration • Results Caching - at the speed of memory!
• More use = more results in-memory
• Dynamic Cubes
• Prime the system for the workloads you can predict
• Memory-Exploiting Columnar Database
• Acceleration for every combination & permutation
• Evolutionary Innovation • Parallel Vector Processing
• Greater query & user concurrency
• Data Skipping
• Less I/O
• Active Compression
• Reduce time spent decompressing data
• Memory-Exploiting – not Memory-bound! • Not all in-memory solutions are created equal
• Dynamic Cubes and BLU leverage SSD and SDD to ensure stable,
continuous operation
Faster DB Query*
Frequent
requests
Expected
requests
Inevitable
requests
Average Acceleration
of database queries for reporting1
1. Based on internal testing comparing DB2 10.1 traditional row store vs. DB2 10.5 with BLU Acceleration. SQL queries for 20 different reports and dashboards were run in isolation against the database to measure database response time. Full report generation time would include data transfer and processing by the BI server. Performance gains will vary by workload and system specifications.
• Low touch optimization with Instrumented self-
tuning
• Automated query performance tuning
• Create objects
• Schedule & Load
• Auto-mapping to models
• Streamlined workflows
• Built-in data landing zone
• Import data from anywhere to the in-memory
columnar repository
• Simplified administration
• Integration of data movement scheduling with
Cognos Administration
• Built-in expertise
• Memory Optimization
• Programmatic allocation of cores and memory
• Automated management
• Data source
• Business Intelligence
Rich Pattern-based Deployment for Agility
Request Select Go
• Pattern-based deployment for agility
• Complete Stack
• OS, Middleware
• Database
• Business Intelligence
• Load Data and Go!
• Purpose – built integration
• Reduced skill thresholds
• Automated deployment
• Pattern specific product extensions
• Expert Integrated System Support
• Deploy to PureApplication System
• for Fastest Time to Value
1 Person
+ 1 Hour
1 Fully Deployed Stack
Simple Economics & Agility
Industry Specific Use Cases
Industry Use Case Solution Attributes
Retail Household and market-basket analysis. Exploration analysis of billions of rows per month with
millions of customers and product SKUs
Insurance Claims analysis Indepth dimensional analysis of millions of customers,
policies and itemized claims
Manufacturing &
Logistics
Parts supply and location identification Millions of parts, thousands of locations, hundreds of
thousands of processes
Life Science Large standardized data sets cross-
referenced by patient and practitioners.
Millions of rows of “aggregator” data cross-referenced by
attribute sets
Cross-Industry Use Cases
Agenda Use Case Solution Attributes
Self-service
Acceleration
Pockets of advanced analysts impacting data
warehouse performance
Self-contained data acceleration layer
Agility of deployment
Re-establish connection with Single-Trusted Data
Local telecom limitations require replica
infrastructure
Data privacy requirements necessitate isolated
tenants
Agility and standardization of deployment
Self-contained data acceleration layer
Support a hub & spoke approach to distributed IT or
replication hosting
Replacement for aging MOLAP infrastructure Robust OLAP functionality
Faster cube load times, larger volumes
Synchronized with Single-Trusted Data
New deployments Reduce risk and cost of deployment
Reduce skill and experience threshold to adopt
BA
Prescriptive pattern-based deployment
Available in general purpose and specialized
varieties
Time to value
Cognos Dynamic Cubes: Goals
Provide a high performance OLAP solution accessing terabytes of data
Provide an aggregate aware solution
Routing to database summary/aggregate tables
Routing to in-memory aggregate values
Provide an aggregate advisor to assist with selection of
database/memory aggregates
Data cached and shared amongst all users
Provide compelling features
Parent/child (recursive) hierarchies
Multiple hierarchies per dimension
Hidden measures
Virtual cubes
Relative time
Dimensional (member) security
Data
Warehouse
Security
Security
14
Initial Query
Data Cache
Aggregate Cache
Expression Cache
Member Cache
MDX Engine
SQL queries to obtain member information
SQL queries to obtain fact and summary data
SQL queries to obtain aggregate data
Search aggregate cache for exact match
Result Set Cache
Query Processor
DQM
Dynamic Cube
DQM
Security
Security
15
Subsequent Query
Data Cache
Aggregate Cache
Expression Cache
Member Cache
MDX Engine
SQL queries to obtain fact and summary data
Search aggregate cache for exact match
Result Set Cache
Query Processor
DQM
Dynamic Cube
DQM
What is BLU Acceleration?
• New innovative technology for analytic queries • Columnar storage
• New run-time engine with vector (aka SIMD) processing, deep multi-core optimizations and cache-aware memory management
• “Active compression” - unique encoding for further storage reduction beyond DB2 10 levels, and run-time processing without decompression
• “Revolution through Evolution” • Built directly into the DB2 kernel
• BLU tables can coexists with traditional row tables, in same schema, tablespaces, bufferpools
• Query any combination of BLU or row data
• Memory-optimized (not “in-memory”)
• Value : Order-of-magnitude benefits in … • Performance • Storage savings • Time to value
This means it can run more
stuff at the same time
And this means that analytic
queries with filters and
calculations don’t wait for
data to decompress
This is really
important. It means
the system will
continue running
even if it does fill up
the memory…other
solutions in market
are “memory-bound”
How fast is it ? … Current DB2 10.5 Results
8x-25x improvement
is common
“It was amazing to see the faster query times compared to the performance results with our row-organized tables. The performance of four of our
queries improved by over 100-fold! The best outcome was a query that finished 137x faster by using BLU Acceleration.”
- Kent Collins, Database Solutions Architect, BNSF Railway
Customer Workload Speedup over DB2 10.1
Analytic ISV 37.4x
Large European Bank 21.8x
BI Vendor (Simple) 124x
BI Vendor (Complex) 6.1x
Manufacturer 9.2x
Investment Bank 36.9x
1. Based on internal testing comparing DB2 10.1 traditional row store vs. DB2 10.5 with BLU Acceleration. SQL queries for 20 different reports and dashboards were run in isolation against the database to measure database response time. Full report generation time would include data transfer and processing by the BI server. Performance gains will vary by workload and system specifications.
~2x-3x storage reduction vs DB2 10.1 adaptive compression (comparing all objects - tables, indexes, etc)
New advanced compression techniques
Fewer storage objects required
DB2 with BLU Accel.DB2 with BLU Accel.
Significant Storage Savings
DB2 10.5 & Cognos BI Dynamic Cubes
Database
Aggregate Cache
Member Cache
Cube start up
Member cache filled with queries to data warehouse dimension tables
Aggregate cache filled with queries to data warehouse (or database aggregates, if defined)
Report processing
Waterfall lookup for data in descending order until all data is provided
1. Result set cache
2. Query data cache
3. Aggregate cache
4. Database aggregate
5. Data warehouse
Report
Query Data Cache
Aggregate Cache
Result Set Cache
Cognos BI 10.2 Dynamic Cubes Ad-hoc Reports with DB2 10.5 BLU Acceleration
Server: POWER7+ 780
CPU: 64 cores @ 4.4GHz , 1TB RAM
Cognos/DB2 client LPAR: 32 cores, 512GB
DB2 server LPAR: 32 cores, 512GB RAM
V7000 with 1.6TB SSD and 4TB HDD
Operating system: AIX 7.1 TL2 SP2
DB2 versions:
DB2 10.1 FP2 Enterprise Server Edition
DB2 10.5 Advanced Enterprise Server Edition
Cognos Business Intelligence 10.2.1
Report Workload Elapsed Time
DB2 10.1 DB2 10.5
24x faster
“Our BI solution at Taikang Life is built on a Cognos/DB2 solution. In order to ensure reports run
fast and meet our service level commitments to the business, we have to perform preaggregation
each night in database. While our end users experience fast report times, this batch work has
become a challenge because of limited and shrinking batch windows and an ever increasing
database size because we want to analyze more data. With BLU Acceleration, we’ve been able to
reduce the time spent on pre-aggregation by 30x - from one hour to two minutes! BLU Acceleration
is truly amazing.” –Yong Zhou, BI Manager
Breakthrough technology – Combines and extends the leading technologies – Over 25 patents filed and pending – Leveraging years of IBM R&D spanning 10
laboratories in 7 countries worldwide
Typical experience
– 8x-25x performance gains – 10x storage savings vs. uncompressed data
with indexes – Simple to implement and use
Order of magnitude improvements in – Consumability – Speed – Storage savings
DB2 10.5
Super analytics Super easy
DB2WITH BLUACCELERATION
DB2WITH BLUACCELERATION
DB2 with BLU Acceleration : Summary
© 2011 IBM Corporation
A Virtual Application represents a collection of application components, behavioral policies and
their relationships
• Definition is agnostic to middleware product or topology
• Makes customers focus on what’s important to them – applications, SLAs
• System Manages end-end lifecycle: deploy, update, monitor, scale, undeploy
Virtual Application Pattern
What deployer defines What system deploys
Load balancer
WAS cluster configured with session replication
Initial instance = 3
Lifecycle of Business Intelligence Pattern with BLU Acceleration
Closed loop automation create and populate
aggregates
Self-contained acceleration layer to minimized impact on the warehouse
and provide a landing zone for operational data
Exploration and discovery is faster with layers of
acceleration
Closed loop automation create and populate
aggregates
Self-contained acceleration layer to minimized impact on the warehouse
and provide a landing zone for operational data
Exploration and discovery is faster with layers of
acceleration
Fully functioning self-service environments
can be deployed in minutes
Closed loop automation maps
aggregates to the model instantly
IBM Business Intelligence Pattern with BLU Acceleration
Architecture
Source 1
Source 2
Source 3
Source N
Data Loading Tools
Data Accelerator
DB2 BLU
Analytics Engine (Cognos BI)
Metadata Store
Users
Pattern Components
: :
HTTP Server (ELB
service)
Sources
Network
Admin PureApp Console
Content Store
LDAP
~500GB RAM ~30 Cores
~200GB RAM ~30 Cores
Virtual Cube
Virtual Cube Virtual
Cube
Data Flows between all components (inc ETL)
IBM Confidential
Virtual Cube Design and Aggr. Advisor
ETL/DDL Script
Core Star Schema
Cube/V
irtu
al
BLU
D
ata
Tools
Report & Act
Design Flow Data Write Data Read
In-M
em
ory
Warehouse
Aggregate tables
Cube publish & in-memory aggregates
Model update for aggregates
ET
L
ETL
ETL Aggregates
ETL Design – Core Star
In DB update jobs
Deployment Characteristics
Com
ple
xity
Data Size
• Space and CPU are both highly dependent on
two main factors
• Report & model complexity.
• Data volumes.
• Both are hard to model ahead, so there are no
hard and fast rules. However…
Based on real-world experiments, we suggest the
starting point being the following allocation sizes
on an IBM PureApplication System box.
Deployment Cores RAM Uncompressed DB size
Small (eg: dev) 12 100GB 200GB
Medium 32 512GB 1TB
Large 64 1024GB 2TB
*Examples provided for education only in the context of IBM PureApplication System Power Mini 32 and 64. Pattern capable of leveraging more RAM.
Data Warehouse
IBM PureApplication System / Pattern-enabled Environment
Other Patterns
App servers
Other
Real-time Analytics
Middleware Hosting
IBM BI With BLU Acceleration
DB2 BLU Cognos
BI
Export and Explore
Reporting / Analysis Dashboards
Other Consolidation Scenarios
Business Intelligence Across the Spectrum of Information Management Needs
Acknowledgements and Disclaimers
Availability. References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates.
The workshops, sessions and materials have been prepared by IBM or the session speakers and reflect their own views. They are provided for informational purposes only, and are neither intended to, nor shall have the effect of being, legal or other guidance or advice to any participant. While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is provided AS-IS without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this presentation or any other materials. Nothing contained in this presentation is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software.
All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results.
© Copyright IBM Corporation 2013. All rights reserved.
•U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
IBM, the IBM logo, ibm.com, Cognos and DB2 are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml
Other company, product, or service names may be trademarks or service marks of others.
Performance Disclaimers
38X Average Acceleration of database queries for reporting- Based on internal testing comparing DB2 10.1 traditional row store vs. DB2 10.5 with BLU Acceleration. SQL queries for 20 different reports and dashboards were run in isolation against the database to measure database response time. Full report generation time would include data transfer and processing by the BI server. Performance gains will vary by workload and system specifications. *Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.
Communities
• On-line communities, User Groups, Technical Forums, Blogs, Social networks, and more
o Find the community that interests you …
• Information Management bit.ly/InfoMgmtCommunity
• Business Analytics bit.ly/AnalyticsCommunity
• Enterprise Content Management bit.ly/ECMCommunity
• IBM Champions
o Recognizing individuals who have made the most outstanding contributions to Information Management, Business Analytics, and Enterprise Content Management communities
• ibm.com/champion
Related IOD Sessions
Wed. 2-5 Modeling. Deploying and Optimizing New Features of IBM Cognos Dynamic Cubes v 10.2.1
Session Number 1872
Wed. 3 - 5:45 IBM Cognos Dynamic Cubes Super Session
Session Number 1963
Thank You Your feedback is important!
• Access the Conference Agenda Builder to complete your session surveys
o Any web or mobile browser at http://iod13surveys.com/surveys.html
o Any Agenda Builder kiosk onsite