27
October 2012 MANAGEMENT BRIEF Bottom-line Advantages of IBM InfoSphere Warehouse 10 Comparing Costs and Capabilities with Oracle Database 11g International Technology Group 609 Pacific Avenue, Suite 102 Santa Cruz, California 95060-4406 Telephone: + 831-427-9260 Email: [email protected] Website: ITGforInfo.com

Bottom-line Advantages of IBM InfoSphere Warehouse 10

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

October  2012  

MANAGEMENT  BRIEF  

Bottom-line Advantages of IBM InfoSphere Warehouse 10

Comparing Costs and Capabilities with Oracle Database 11g

International Technology Group 609 Pacific Avenue, Suite 102 Santa Cruz, California 95060-4406 Telephone: + 831-427-9260 Email: [email protected] Website: ITGforInfo.com

Copyright © 2012 by the International Technology Group. All rights reserved. Material, in whole or part, contained in this document may not be reproduced or distributed by any means or in any form, including original, without the prior written permission of the International Technology Group (ITG). Information has been obtained from sources assumed to be reliable and reflects conclusions at the time. This document was developed with International Business Machines Corporation (IBM) funding. Although the document may utilize publicly available material from various sources, including IBM, it does not necessarily reflect the positions of such sources on the issues addressed in this document. Material contained and conclusions presented in this document are subject to change without notice. All warranties as to the accuracy, completeness or adequacy of such material are disclaimed. There shall be no liability for errors, omissions or inadequacies in the material contained in this document or for interpretations thereof. Trademarks included in this document are the property of their respective owners.

International Technology Group i

TABLE OF CONTENTS EXECUTIVE SUMMARY 1

Bottom Line 1 Cost Variables 2 Conclusions 4

TRENDS 5 Overview 5 Growth 5

General Experience 5 Workload Complexity 6 Real Time 8 Implications 9 InfoSphere Warehouse 10

Overview 10 Editions 11 Tools 13

DB2 10 14 Differentiators 14 HA Clusters 19 Oracle Compatibility 20

Oracle Solutions 21 DETAILED DATA 22

List of Figures 1. Average Three-year IBM InfoSphere Warehouse 10 and Oracle Database 11g

Software Costs – 10 TB to 50 TB Data Warehouse Installations 1 2. Average Three-year IBM InfoSphere Warehouse 10 and Oracle Database 11g

Software Costs – 1 TB to 5 TB Data Warehouse Installations 1 3. IBM InfoSphere Warehouse 10 and Oracle Database 11g Packaging 3 4. Data Warehouse Growth – Manufacturing Company Example 6 5. Standard Customer Data Analysis Variables 7 6. Distribution of Data Warehouse Workloads – Financial Services Example 7 7. Data Warehouse Refresh Cycles in Fortune 500 Corporations 9 8. Principal InfoSphere Warehouse 10 Components 10 9. InfoSphere Warehouse 10 Editions 11

10. InfoSphere Warehouse 10 Industry Packages 12 11. IBM InfoSphere Warehouse 10 Advanced Enterprise Edition Compared to

Oracle Database 11g Enterprise Edition Warehouse Packaging 12 12. InfoSphere Warehouse 10 Advanced Enterprise Edition Tools 13 13. DB2 10 Autonomic Features 18 14. DB2 pureScale Cluster Size Relative to Performance 20 15. Commonly Used Oracle Features Supported by DB2 10 20 16. Cost Calculations for Information Warehouse 10 Enterprise Edition and

Oracle Database 11g Enterprise Edition – 10 TB to 50 TB Installations 22 17. Cost Calculations for Information Warehouse 10 Advanced Enterprise Edition and

Oracle Database 11g Enterprise Edition – 10 TB to 50 TB Installations 23 18. Cost Calculations for Information Warehouse 10 Departmental Edition and

Oracle Database 11g Enterprise Edition – 1 TB to 5 TB Installations 23

International Technology Group 1

EXECUTIVE SUMMARY

Bottom Line Data warehousing has become one of the fastest-growing segments of the IT world. The business value of data warehouse applications has been clearly demonstrated in thousands of cases, in organizations of all sizes, in all industries.

Success, however, brings new challenges. Once data warehouses are in place, end user demand expands rapidly. Businesses that start with small-scale, compact systems may find that, within a few years, they must deal with terabytes (TB) of data, increasingly sophisticated technologies, complex mixed workloads and hundreds or thousands of users.

As this occurs, costs may escalate rapidly. What can be done to control this process? The answer depends in large part on which data warehouse solution is employed. Vendor pricing and packaging, as well as technology content, may result in significantly different cost structures.

This report examines the cost implications of differences between two major software-based solutions: IBM InfoSphere Warehouse 10 and data warehouses built around Oracle Database 11g and related tools.

InfoSphere Warehouse software costs are significantly lower. For data warehouses with 10 TB to 50 TB of user data, license and three-year support costs for InfoSphere Warehouse 10 Enterprise Edition average 56 percent less than for use of Oracle Database 11g Enterprise Edition. Figure 1 illustrates this picture.

Figure 1: Average Three-year IBM InfoSphere Warehouse 10 and Oracle Database 11g Software Costs – 10 TB to 50 TB Data Warehouse Installations

For data warehouses with between 1 TB and 5 TB of user data, cost disparities are larger. As figure 2 illustrates, comparable costs for InfoSphere Warehouse 10 Departmental Edition average 85 percent less.

Figure 2: Average Three-year IBM InfoSphere Warehouse 10 and Oracle Database 11g Software Costs – 1 TB to 5 TB Data Warehouse Installations

These comparisons are based on six representative data warehouse installations in large and midsize organizations. Costs are for functionally equivalent configurations, and include license as well as support fees over a three-year period. For Oracle Database 11g Enterprise Edition, costs were calculated based on the lowest-cost option (per user or per processor) available for each installation.

Oracle  Database  11g  Enterprise  Edi?on  

IBM  InfoSphere  Warehouse  10  Departmental  Edi?on  

348.2  

 50.8    

$  thousands  

Oracle  Database  11g  Enterprise  Edi?on  

IBM  InfoSphere  Warehouse  10  Enterprise  Edi?on  

 1,553.1    

686.1  

$  thousands  

International Technology Group 2

InfoSphere Warehouse 10 prices are discounted by 40 percent, and those for Oracle Database 11g Enterprise Edition by 80 percent. Further information on installations, configurations and methodology, along with more granular cost breakdowns may be found in the Detailed Data section of this report.

Cost Variables Lower costs for InfoSphere Warehouse 10 are due to variations in three areas:

1. Pricing. While Oracle offers two pricing options – per named user and per processor (core) – IBM main pricing metric for InfoSphere Warehouse 10 is based on terabytes of user data.

This approach offers a number of advantages. Data warehouse costs may be more closely aligned with growth rates, user populations may expand without corresponding increases in license fees, and users are not penalized for concurrent or complex query workloads that require high levels of processing power.

Two other factors affect InfoSphere Warehouse 10 pricing. First, the IBM definition of “user data” does not include indexes, logs, temporary spaces and other data structures, which typically represent 30 to 50 percent of overall warehouse data volumes. Second, pricing allows for use of data compression; e.g., a 2 TB data warehouse compressed to 1 TB requires only a 1 TB license.

This becomes particularly significant because of the high levels of data compression realized by DB2 10. In the comparisons presented here, IBM InfoSphere Warehouse 10 calculations assume an average of 70 percent compression. Oracle Database 11g also enables data compression – however, this does not affect pricing.

There are further cost disparities for standby systems. In active-passive configurations, IBM charges only for 1 TB of data on the passive server, while Oracle requires full per processor or per user licensing. In active-active configurations, both vendors require full capacity licensing.

There are significant differences in pricing policies for x86-virtualized environments. While IBM offers sub-capacity (partition-based) pricing for VMware, Microsoft Hyper-V, KVM and Xen, Oracle offers this option only for its own VM hypervisor.

Oracle also takes the position that, if other x86 virtualization tools are employed to host Oracle software, the company will only provide support for issues that either are known to occur on the native operating system, or can be demonstrated not to be the result of running on VMware. In practice, it is necessary to recreate problems on non-virtualized servers.

For complex, rapidly evolving data warehouse environments, Oracle support constraints represent a high-risk proposition. Although VMware et al may be employed for development and test instances, many organizations are reluctant to virtualize x86 production systems. IBM supports InfoSphere Warehouse 10 deployed on all major x86 hypervisors.

2. Packaging. In this area, there are two major differences between IBM InfoSphere Warehouse 10 and Oracle offerings.

IBM InfoSphere Warehouse 10 Enterprise Edition includes the DB2 10 for Linux, UNIX and Windows (LUW) database along with features for partitioning, data compression, online analytical processing (OLAP), data mining, real-time data warehousing, access control, high availability (HA) clustering and other functions, along with an extensive suite of management, optimization and development tools.

Oracle charges separately for equivalent features. While Oracle Database 11g Enterprise Edition is list priced at $47,500 per processor or $950 per named user, inclusion of these features pushes overall list prices to $125,300 per processor or $2,510 per named user.

International Technology Group 3

A second difference is that IBM offers a low-cost version, IBM InfoSphere Warehouse 10 Departmental Edition, with a full set of data warehouse capabilities for systems with up to 16 server cores, 64 gigabytes (GB) of main memory per instance and 15 TB of user data.

In comparison, Oracle’s low-end (up to four sockets) Database 11g Standard Edition does not support partitioning, advanced compression, OLAP and data mining. While it may be employed for basic business intelligence applications, Standard Edition is not a serious candidate for organizational data warehousing.

For this reason, costs for Oracle Database 11g solutions for 1 TB to 5 TB, as well as 10 TB to 50 TB data warehouses presented in this report are for use of the company’s Enterprise Edition.

Figure 3 illustrates these differences.

INFOSPHERE  WAREHOUSE  10    ENTERPRISE  EDITION  

ORACLE  DATABASE  11g    ENTERPRISE  EDITION  

Included:  • Partitioning    • Adaptive  Compression    • Cubing  Services    • Intelligent  Miner    • Continuous  Data  Ingest  • Label  &  Row  Based  Access  Control  • High  Availability  Disaster  Recovery  (HADR)  • InfoSphere  &  Optim  tools    

List  price  per  terabyte:  $82,900  

Separate  charge:  • Partitioning    • Advanced  Compression    • OLAP    • Data  Mining    • GoldenGate  • Label  Security  • Active  Data  Guard  • Diagnostics  &  Tuning  Packs,  TopLink  

List  price  per  processor:  $125,300  List  price  per  user:  $2,510  

INFOSPHERE  WAREHOUSE  10  DEPARTMENTAL  EDITION  

ORACLE  DATABASE  11g    STANDARD  EDITION  

Included:  • Partitioning    • Adaptive  Compression    • Cubing  Services    • Intelligent  Miner    • Label  &  Row  Based  Access  Control  • High  Availability  Disaster  Recovery  (HADR)  • InfoSphere  &  Optim  tools    

List  price  per  terabyte:  $39,700  

Not  supported:  • Partitioning    • Advanced  Compression    • OLAP    • Data  Mining    

Separate  charge:  • Label  Security  • Active  Data  Guard  • Diagnostics  &  Tuning  Packs    List  price  per  processor:  $49,000    List  price  per  user:  $980  

Figure 3: IBM InfoSphere Warehouse 10 and Oracle Database 11g Packaging

It would be necessary to configure Oracle Database 11g Standard Edition with additional modules to provide functionality equivalent to InfoSphere Warehouse 10 Departmental Edition security, failover and management tools. However, this is a somewhat academic exercise. It is not a serious competitor.

Calculations do not include IBM DB2 10 pureScale and the Oracle equivalent, Real Application Clusters (RAC), for database clustering. Both vendors charge separately for these offerings.

IBM offers additional InfoSphere Warehouse 10 Advanced Enterprise and Advanced Departmental editions, which contain additional tooling.

International Technology Group 4

3. Technologies. Differences between IBM InfoSphere Warehouse 10 and Oracle Database 11g solutions extend beyond pricing and packaging. There are wide variations in the capabilities not only of data warehouse solutions, but also of underlying database and tooling technologies that directly affect comparative costs.

DB2 LUW, for example, employs a more efficient system design as well as integrated performance accelerators that have enabled it to outclass Oracle equivalents in a wide range of industry benchmarks. New code structures in DB2 10 deliver even higher performance for complex data warehouse applications and workloads.

Adaptive Compression in DB2 10 employs new algorithms that combine table, index and page compression. Additional reductions in processor, I/O, storage, network bandwidth and software costs may be realized.

Other new DB2 10 features include Continuous Data Ingest, which employs new IBM parallel loading technology for real-time data warehouse updates; Time Travel Query, which integrates new temporal database technology; Multi-Temperature Data Management, which enables database-level automatic storage tiering; and Row and Column-level Access Control for extremely fine-grained security.

In addition, DB2 10 reinforces established DB2 LUW strengths in such areas as automation, workload management, HA clustering and Oracle compatibility. Organizations that have migrated Oracle applications to DB2 10 have typically found that around 98 percent of code remains unchanged, while few or no changes to development skills are required.

A broader difference between DB2 10 and Oracle Database 11g is that, in adding new functionality, IBM has redesigned features and reengineered code. In contrast, Oracle has tended to add overlays to legacy software structures. This has reduced efficiency – system overhead is typically higher than for comparable DB2 environments – while complicating DBA tasks.

DBA productivity has not been a major focus for Oracle, while IBM has consistently streamlined administrator interfaces and implemented higher levels of automation with each new DB2 release. The widespread perception that Oracle is more “labor-intensive” than DB2 is correct.

A further differentiator is that InfoSphere Warehouse 10 solutions are designed for rapid deployment. Departmental Edition, for example, may be simply installed and configured using a downloadable virtual image. Larger Enterprise Edition deployments benefit from high levels of integration and testing. In both cases, delivery of business value and return on investment (ROI) may be materially accelerated.

Conclusions From a business perspective, the evolution of organizational data warehousing represents a broad spectrum of new, high-impact application opportunities. The use of information in new ways throughout the entire enterprise – and in interactions with customers and partners – will enable transformations whose potential extends far beyond what has been achieved to date.

From an IT perspective, data warehousing trends represent a “perfect storm.” Growth in data volumes, numbers of applications and user populations is compounded by increasingly complex data structures, technology bases and workload mixes. Solutions that enable organizations to meet these challenges more effectively offer direct business value.

The choice of a data warehouse platform is a critical decision. It will determine, in no small measure, how well an organization uses information to meet business needs. The consequences of the right choice, or the wrong one, will be felt for a long time to come.

In cost-effectiveness, as well as in key established and emerging technologies, InfoSphere Warehouse 10 offers superior capabilities.

International Technology Group 5

TRENDS

Overview The competitive case for InfoSphere Warehouse 10 is not simply that IBM pricing and packaging offer greater flexibility and cost-effectiveness. It is that the distinctive characteristics of InfoSphere Warehouse 10 address the principal trends in organizational data warehousing more effectively. This is particularly the case in three areas – growth, workload complexity and data currency – which are addressed below.

The next section, Solutions, provides additional information on IBM InfoSphere Warehouse 10 and Oracle Database 11g offerings, including underlying databases and key tools. InfoSphere Warehouse 10 technology differentiators are outlined, and their implications examined.

The report concludes with a Basis of Calculations section presenting the methodology and assumptions employed for comparative cost calculations, along with granular breakdowns of three-year costs for use of InfoSphere Warehouse 10 and Oracle Database 11g solutions.

Growth

General Experience

Data warehouses typically experience higher rates of data growth than any other major type of system. High double-digit annual growth has become the norm in most industries, in organizations of all sizes.

A number of factors have contributed to this trend. One is that organizations are collecting more data on customers and transactions. In the early 2000s, for example, banks typically maintained one to two megabytes (MB) of data per customer. The average is now over 12 MB. In telecommunications, retail and other industries, increases of 10 times or more have routinely occurred over the last decade.

A similar shift has occurred among organizations that operate supply chains. It has become technically feasible to track millions of stock keeping units (SKUs) across their entire lifecycle, from determination of demand, through acquisition and distribution, to sales, deliveries and returns, with increasing levels of granularity.

Increases in numbers and sizes of records have been compounded by longer retention periods. Data is now routinely retained for seven years or more in many organizations for historical analysis as well as compliance purposes.

Although growth in data volumes has tended to dominate industry headlines, the overall trend has been both broader and more complex. Most organizations that deploy data warehouses also experience sustained multi-year growth in numbers of applications and users.

Experience has shown that demand for high-quality information and analysis expands rapidly once its value is understood. Once data warehouses are put in place, demand soon extends to a wide range of specialist users, and to larger populations of managers, professionals and front-line employees.

In some industries, extending data warehouse access to distributors, agents, brokers and other partners has further accelerated growth in user populations. These often find that access to high-quality information and tools to analyze and interpret it represents a source of relationship value.

International Technology Group 6

The manufacturing company whose experiences are summarized in figure 4 illustrates growth trends.

Figure 4: Data Warehouse Growth – Manufacturing Company Example

In this case, the company’s principal data warehouse expanded over a five-year period from less than 250 GB of data, a dozen applications and 200 users to more than 12 TB, 70 applications and 2,600 users – almost 10 percent of the company’s total number of employees.

Certain aspects of the company’s experiences are instructive. One is that growth was a non-linear process. Annual growth rates varied from year to year, and even short-term trends could be unpredictable. Certain applications experienced “explosions” in usage across large segments of the company, while others supported largely stable user populations over time.

Growth was accelerated by consolidation of data marts. As adoption of BI tools became widespread during the 2000s, many user departments initially put their own systems in place. However, problems soon became apparent. Inconsistencies in data content, structures and currency, as well as technical differences between departments meant that results were often incompatible.

The costs and difficulties of maintaining separate data management, extract, transformation and load (ETL) and replication infrastructures for dozens of separate systems escalated to unacceptable levels. As a matter of corporate policy, most data marts were consolidated to the main corporate system.

Although growth rates vary, the experiences of most organizations that have deployed data warehouses have been generally similar.

Workload Complexity As data warehouse usage expands, not only high-end “number-crunching” applications, but also departmental and individual user tools grow increasingly sophisticated. Drill-down, OLAP and other capabilities that were once restricted to specialists become routine requirements across organizations.

0  

1,000  

2,000  

3,000  

1   2   3   4   5   6  Yearend  

NUMBER  OF  USERS  

0  

20  

40  

60  

80  

1   2   3   4   5   6  Yearend  

NUMBER  OF  APPLICATIONS  

0  

4  

8  

12  

1   2   3   4   5   6  Yearend  

TERABYTES  OF  DATA  

Yearend   1   5  

Users   200   2,600  

Applications   12   70  

Data   250  GB   12  TB  

International Technology Group 7

There is a progressive escalation in analytical complexity. A department that starts by, say, identifying customers who fit a certain profile, may move on to analyzing cross-product purchasing patterns or predicting customer behavior. Financial analysis may evolve from quantifying operating unit profit contributions to developing strategies to maximize profitability across the business as a whole.

Even relatively standard customer data analysis exercises, for example, now include the variables shown in figure 5.

Figure 5: Standard Customer Data Analysis Variables

These are among the capabilities provided by the IBM Pack for Customer Insight, which is described in the following section.

Growing diversity of applications magnifies these effects. Organizational data warehouses now typically support a wide range of queries with varying characteristics, degrees of business criticality and time sensitivity for different user groups.

Figure 6, for example, shows a financial services company’s estimate of its overall data warehouse workloads during a 24-hour period. Percentages are for numbers of CPU cycles.

Figure 6: Distribution of Data Warehouse Workloads – Financial Services Example

w Average  customer  profit  amount,  revenue,  wallet,  advances  &  debt  recovered  

w Average  arrangement/credit  balance w Customer  acquisition  cost w Average  item  price  

w Average  item  profit   w Average  market  basket  sales  amount        w Number  of  items  sold w Number  of  transactions        w Profit  margin        w Sales  amount  net  of  tax     w Transaction  interest  w Total  product  arrangement  balance,  change  in  balance,  beginning  balance,  ending  balance  

w Number  &  value  of  points  earned,  points  redeemed,  points  cancelled,  points  balance  

w Sales  discount w Sales  value  of  goods  returned  w Number  of  market  baskets  scanned,  not  scanned,  sold,  returned  &  voided  

w Number  of  customers  in  household        w Average  customer  age  &  household  income w Average  number  of  items  sold  per  day        w Number  of  customer  visits        w Days  since  last  purchase  w Average  number  of  arrangements w Average  purchase  amount w Customer  lifetime  value  

w Total  complaints,  average  number  of  complaints w Number  of  nonperforming  accounts  

w Value,  volume  of  market,  percentage w Customer  market  share,  percentage  

w Number  of  days  delinquent,  delinquent  amount w Number  of  channels  used  

Large  queries  19%  

Medium  queries  21%  

Small  queries  9%  Call  center  

10%  

Data  mining  27%  

Updates  14%  

CPU  CYCLES  

CPU  EXECUTION  TIMES    

Large  queries   20  –  60  seconds    

Medium  queries   15  –  30  seconds    

Small  queries   0.5  –  10  seconds    

Call  center   0.2  –  3  seconds    

Data  mining   Minutes  to  hours  

International Technology Group 8

In this, as in many other cases, usage had evolved from relatively simple standardized management reports and ad hoc queries to include large-scale data mining jobs; OLAP; a variety of BI applications for more than 20 business units, including call center and sales personnel; and others.

In such environments, effective mixed workload management is essential. Without highly granular job scheduling and monitoring of resource utilization, organizations may experience bottlenecks at peak periods, capacity utilization may be low at other times, or both may occur.

The ability to prioritize workloads, and to control the resources they may use, has also proved to be critical to realization of Service Level Agreement (SLA) goals.

These requirements are a close match with the capabilities of DB2 10 and Information Warehouse 10, whose workload management capabilities are among the most advanced in existence.

Real Time “Real-time” data warehousing has been the subject of growing user interest since the mid-2000s.

Genuine real-time applications – the generally accepted definition is that these involve response within five seconds of a business event occurring – are still comparatively rare. They have typically been implemented in time-sensitive e-commerce businesses, as well as in such industries as telecommunications and financial services for fraud detection, network monitoring and trading analysis.

However, there has been pervasive growth in applications involving hourly to daily data updates. Examples of such applications include “operational BI,” meaning use of data warehouses by contact center, sales and other personnel interacting with customers; continuous analysis of sales, pricing and promotional data in marketing organizations; and continuous tracking of key performance indicators (KPIs) in a wide range of businesses.

Pressures for more current data, as well as for faster delivery of query results, have been reinforced by the global economic downturn. Since the onset of recession in 2008, organizations have tended to adopt shorter planning and decision-making cycles that react more closely to changing business conditions. The value of current information has increased accordingly.

In supply chain-intensive industries, competitive pressures have made “real-time” operating models increasingly common. In some sectors, suppliers now receive continuous demand signals from their customers; recalibrate plans and forecasts; and initiate procurement, production and logistics actions in a matter of minutes to hours.

Growing use of mobile devices, RFID and more sophisticated two-dimensional (2D) stacked and Quick Response (QR) Code bar code formats has important implications. It becomes possible to track the status of sales, inventories and operational processes in real time, with unprecedented granularity, across the entire supply chain. “Real-time” analysis has become an increasingly realistic proposition.

These trends are reflected in the results of 2008 and 2011 surveys of Fortune 500 data warehouses by ITG. Although monthly cycles continue to be widely employed, the fastest growth has been in update cycles that range from daily to hourly and, in some cases, “near real-time.”

Figure 7 illustrates these results. Cycles shown are for 2008 and projected for 2012.

These trends pose new challenges for data warehouse infrastructures. While certain requirements may be met most effectively using high-end data warehouse appliances, organizations must deal with pressures for faster informational updates across a broad range of management, professional and in many cases front-line applications supporting thousands of users.

International Technology Group 9

Figure 7: Data Warehouse Refresh Cycles in Fortune 500 Corporations

These effects extend to back-end data collection and ETL processes. In response, organizations have increasingly adopted such techniques as “continuous batch” or “mini-batch updates” (e.g., batch ETL operations conducted multiple times per day, rather than overnight), as well as use of staging tables, ongoing trickle feeds and direct input via enterprise application integration (EAI) middleware.

There are, however, limitations to these techniques. Staging tables, trickle feeds and EIA input are often unable to handle high-volume data movements, while many organizations find it difficult to manage expanding daytime batch ETL workloads without affecting production. A more consistent, technologically more advanced approach is required.

The capabilities of DB2 10 Continuous Data Ingest, which include use of new, integrated parallel loading technology, represent such an approach.

As both query and update processes have become more time-sensitive, it has become increasingly important to avoid both unplanned and planned system outages. In a growing number of organizations, maintenance of high availability (HA) has become as critical for data warehouses as for key transactional systems. DB2 10 LUW and InfoSphere Warehouse 10 are recognized industry leaders in this area.

Implications The combined impact of these trends will mean that most organizations deploying data warehouses will experience sustained growth in software license and support fees for the foreseeable future. Costs for servers, storage and networks supporting data warehouses, and personnel costs for administering all of these resources will tend to escalate.

This process, however, can be controlled. It can be controlled more effectively if IBM InfoSphere Warehouse 10 rather than Oracle Database 11g solutions are employed.

Not only do IBM pricing and packaging offer greater economies than Oracle Database 11g equivalents, but InfoSphere Warehouse 10 components also offer differentiated capabilities in key areas of technical functionality – including data compression, performance, automation and others – that affect costs.

Mul?ple  ?mes/day   Daily   Weekly   Monthly   Other  cycles  

22%  

79%  

34%  42%  

18%  

35%  

89%  

39%  43%  

15%  

2008  

2012  

Source:  Interna?onal  Technology  Group  

International Technology Group 10

SOLUTIONS

InfoSphere Warehouse

Overview

InfoSphere Warehouse 10 is a single-price package of IBM offerings build around IBM DB2 10 LUW. It includes embedded analytics applications (IBM Intelligent Miner, Cognos 10 BI and others) as well as the SQL Warehousing Tool (SQW), which handles data movement and transformation flows in warehouses.

SQW incorporates the IBM WebSphere DataStage extract, transformation and load (ETL) solution, along with a run-time deployment environment based on WebSphere Application Server.

The InfoSphere Warehouse 10 package also includes design and development, administration and optimization, and other tools commonly employed in data warehousing. These components, which are illustrated in figure 8, are drawn from multiple IBM brands.

Figure 8: Principal InfoSphere Warehouse 10 Components

Components are closely integrated and optimized, yielding performance and functional benefits. Installation and modification are substantially easier, and personnel productivity higher, than would be the case if they were deployed separately.

 DESIGN  &  DEVELOPMENT  

DB2  Design  Advisor  

InfoSphere  Data  Architect  

InfoSphere  Warehouse  Design  Studio  

Optim  Development  Studio  

ADMINISTRATION  &  OPTIMIZATION  

Optim  Configuration  Manager  

Optim  Database  Administrator  

Optim  Performance  Manager    Extended  Edition  

Optim  Query  Workload  Tuner  

OTHER  TOOLS  

InfoSphere  Federation  Server  

DB2  Merge  Backup  

DB2  Recovery  Expert  

Optim  High  Performance  Unload  

IBM  DB2  10  

Database  Partitioning      •      Multi-­‐Dimensional  Clustering      •      Multi-­‐core  Parallelism      •      Index  Prefetching  

Optimized  Joins  &  Star  Schemas      •      Time  Travel  Query      •      Continuous  Data  Ingest      •      GraphStore    

pureXML        •      Multi-­‐Temperature  Data  Management      •      Adaptive  Compression      

Automation  &  Workload  Management      •      Row  &  Column-­‐level  Access  Control    

Oracle  Compatibility      •      High  Availability  Clusters  

EMBEDDED  ANALYTICS  

DATA  MINING  DB2  Intelligent  Miner  

IN-­‐LINE  ANALYTICS  

CUBING  SERVICES  (OLAP)  

UNSTRUCTURED  DATA  ANALYTICS  

SQL  WAREHOUSING  TOOL  (SQW)  

International Technology Group 11

Editions

InfoSphere Warehouse 10 is offered in four main versions: Enterprise Edition and Departmental Edition, and Advanced Enterprise Edition and Advanced Departmental Edition. There are a number of variations between editions, which are summarized in figure 9.

  EDITION  

ADVANCED    ENTERPRISE  

ENTERPRISE  ADVANCED  

DEPARTMENTAL  DEPARTMENTAL  

DB2  Enterprise  Server  Edition   X   X   X   X  

Database  Partitioning  

Adaptive  Compression  

Continuous  Data  Ingest  

Multi-­‐Temperature  Data  Management  

Label  &  Row  Based  Access  Control  

Workload  Manager  

X  

X  

X  

X  

X  

X  

X  

X  

X  

X  

X  

X  

X  

X  

X  

X  

X  

X  

X  

X  

–  

–  

X  

X  

Cognos  10  BI  

Cubing  Services  

Design  Studio  

SQL  Warehousing  Tool  

InfoSphere  Federation  Server  

InfoSphere  Replication  Server    

Intelligent  Miner  

Text  Analytics  

Administration  Console  

X  

X  

X  

X  

X  

X  

X  

X  

X  

X  

X    

X  

X  

X  

X  

X  

X  

X  

X  

X  

X  

X  

X  

X  

X  

X  

X  

X  

X  

X  

X  

X  

X  

X  

X  

X  

Customer  Insight  Pack  

Market  &  Campaign  Insight  Pack  

Supply  Chain  Insight  

X  

X  

X  

Available  

Available  

Available  

X  

X  

X  

Available  

Available  

Available  

IBM  Data  Studio  

InfoSphere  Data  Architect  

Optim  Configuration  Manager  Optim  Performance  Manager    Extended  Edition  

Optim  Query  Workload  Tuner  

X  

X  

X  

X    X  

X  

–  

–  

X    –  

X  

–  

–  

X    X  

X  

–  

–  

X    –  

DB2  Merge  Backup  

DB2  Recovery  Expert  

Optim  High  Performance  Unload  

X  

X  

X  

–  

–  

–  

–  

–  

–  

–  

–  

–  

Figure 9: InfoSphere Warehouse 10 Editions

To simplify and accelerate deployment, InfoSphere Warehouse 10 Departmental Edition may be installed as a virtual image on x86 servers. A SUSE Linux Enterprise Server (SLES) 11 license is included in the virtual image package, which is eligible for per terabyte pricing.

An expanded version of Departmental Edition, Advanced Departmental Edition, adds support for DB2 10 Continuous Data Ingest and Multi-Temperature Data Management, along with the Optim Query Workload Tuner optimization tool.

Like Departmental Edition, this version is limited to 15 TB of user data. It may be deployed on servers with up to four sockets, although there is no restriction on the number of cores supported – in principle, configurations of up to 32 POWER7 or 40 Intel E7 cores may be employed – or on main memory size.

International Technology Group 12

Advanced Enterprise and Advanced Departmental editions include packages for three common sets of data warehouse applications. These packages – which address customer data, marketing and sales and supply chain analysis – form part of a broader portfolio of IBM industry and process models for banking, financial markets, health care, insurance, retail, telecommunications and other lines of business.

These packages, summarized in figure 10, may be added to Enterprise and Departmental Editions.

WAREHOUSE  PACKS  

InfoSphere  Warehouse  Pack  for  Customer  Insight    

Enables  analysis  of  30+  types  of  customer-­‐related  data  including  sales,  returns,  acquisition  costs,  credit  &  payments,  profitability,  loyalty  program  activity,  customer  market  share,  lifetime  value,  demographics  &  market  basket.  

InfoSphere  Warehouse  Pack  for  Market  &  Campaign  Insight     Enables  data  analysis  for  marketing,  sales  &  campaign  management  applications.    

InfoSphere  Warehouse  Pack  for  Supply  Chain  Insight  

Enables  analysis  of  vendor,  inventory,  distribution,  forecasting,  cost  &  other  key  metrics  for  supply  chain  operations.    

Figure 10: InfoSphere Warehouse 10 Industry Packages

To compare InfoSphere Warehouse 10 Advanced Enterprise Edition with Oracle Database 11g Enterprise Edition, it would be necessary to add Oracle Database Lifecycle Management Pack to the Oracle stack. This tool provides capabilities that generally correspond to those of Optim Configuration Manager, although the latter is a great deal more sophisticated.

Resulting configurations and pricing are as shown in figure 11.

INFOSPHERE  WAREHOUSE  10  ADVANCED  ENTERPRISE  EDITION  

ORACLE  DATABASE  11g    ENTERPRISE  EDITION  

Included:  • Partitioning    • Adaptive  Compression    • Cubing  Services    • Intelligent  Miner    • Continuous  Data  Ingest  • Label  &  Row  Based  Access  Control  • High  Availability  Disaster  Recovery  (HADR)  • Optim  Configuration  Manager  • Other  management  &  optimization  tools    

List  price  per  terabyte:  $118,000  

Separate  charge:  • Partitioning    • Advanced  Compression    • OLAP    • Data  Mining    • GoldenGate  • Label  Security  • Active  Data  Guard  • Database  Lifecycle  Management  Pack  • Diagnostics  &  Tuning  Packs,  TopLink  

List  price  per  processor:  $137,300    List  price  per  user:  $2,750  

Figure 11: IBM InfoSphere Warehouse 10 Advanced Enterprise Edition Compared to Oracle Database 11g Enterprise Edition Warehouse Packaging

InfoSphere Warehouse 10 Advanced Departmental Edition is list priced at $54,300 per terabyte of user data. The limitations of Oracle Database 11g Standard Edition mean, however, that this solution is not realistically competitive.

In addition, IBM offers a Developer Edition, which includes most InfoSphere Warehouse 10 features and is licensed on a per user basis.

International Technology Group 13

Tools

InfoSphere Warehouse 10 tools include the IBM products shown in figure 12.

COMPONENT   DESCRIPTION  

DEVELOPMENT  &  ADMINISTRATION  

IBM  Data  Studio   Eclipse-­‐based  multifunction  tool  for  collaborative  database  development  &  administration.  Includes  features  for  instance,  object,  data,  job  &  connection  management;  defining  &  implementing  database  schema  changes;  centralized  health  monitoring;  &  other  functions.  

Positioned  as  baseline  solution  integrating  with  more  sophisticated  Optim  &  Rational  tools.  Combines  features  of  three  previously  separate  IBM  products:  Optim  Development  Studio,  Optim  Database  Administrator  &  Data  Studio.    

InfoSphere  Data  Architect  

Principal  IBM  Eclipse-­‐based  database  design,  data  modeling  &  integration  tool  for  InfoSphere  Warehouse  &  Cognos  application  development.  Supports  use  of  logical,  physical  &  dimensional  models.  Integrates  with  other  IBM  InfoSphere  Warehouse  components  &  Optim  solutions.    

OPTIMIZATION  

Optim  Performance  Manager  Extended  Edition  

Enables  rapid  identification,  diagnosis  &  resolution  of  database  &  application  performance  bottlenecks.  Data  may  also  be  used  to  predict  future  problems,  &  to  prevent  recurrence  of  failures  in  new  applications.    

Guided  Problem  Solving  allows  DBAs  to  receive  online  expert  assistance  for  the  specific  problem  &  configuration  they  are  dealing  with.    

Optim  Configuration  Manager  

Enables  extremely  granular  tracking  of  configuration  changes  initiated  by  DBAs,  developers,  end  users  &  others,  &  enables  immediate  diagnosis  &  resolution  of  problems  caused  by  these.  Tracks  client-­‐side  parameters.  Provides  comprehensive  view  of  the  entire  database  client  &  server  topology.    

Optim  Query  Workload  Tuner  

Enables  rapid  identification,  diagnosis  &  resolution  of  database  &  application  performance  bottlenecks.  Data  may  also  be  used  to  predict  future  problems,  &  to  prevent  recurrence  of  failures  in  new  applications.  Guided  Problem  Solving  allows  DBAs  to  receive  online  expert  assistance.  

BACKUP  &  RECOVERY  

IBM  DB2  Merge  Backup  for  Unload  for  DB2  

Reduces  backup  times  &  enables  more  rapid  data  recovery  in  the  event  of  an  outage.  Enables  DBAs  to  combine  incremental  &  delta  backups  with  latest  full  backup,  making  further  full  backups  unnecessary.  Processing  occurs  offline;  i.e.,  production  operations  are  unaffected  

IBM  DB2  Recovery  Expert  

Identifies  sources  of  failures  in  database  assets  including  table  spaces,  tables,  indexes  &  data.  Searches  DB2  logs  to  identify  where  error  occurred  &  recommends  scenario-­‐dependent  recovery  options.  Recovery  may  be  conducted  without  restoring  entire  database  or  taking  production  systems  offline.  

Optim  High  Performance  Unload  for  DB2  

Enables  unloading,  extracting  &  repartitioning  of  large  volumes  of  data  in  the  event  of  single  table  failures  or  accidental  table  drops.  Accelerates  recovery  process  by  extracting  data  directly  from  full,  incremental  &  delta  backups.  Unloads  from  multiple  database  partitions  &  provides  single-­‐step  repartitioning.  

Figure 12: InfoSphere Warehouse 10 Advanced Enterprise Edition Tools

Optim tools can provide particular value in data warehouse environments. Optim Performance Manager Extended Edition enables use of data warehouse-specific best practice templates for performance analysis and tuning, while Optim Query Workload Tuner can play a major role in balancing complex mixed query workloads.

Optim Configuration Manager may be employed to generate highly granular statistical estimates of actual and potential compression savings, and may leverage the capabilities of DB2 10 Multi-Temperature Data Management to provide policy-based management of data migration across storage tiers. All Optim tools allow close integration with DB2 10 DB2 Workload Manager (WLM).

IBM DB2 Merge Backup and Recovery Expert have proved useful in minimizing backup and recovery windows for large and/or rapidly growing data warehouses.

International Technology Group 14

DB2 10

Differentiators

Although Oracle Database 11g offers many functionally similar capabilities, the underlying DB2 10 database represents significant competitive differentiators that are exploited by InfoSphere Warehouse 10. This is particularly the case in the following areas:

• Core designs. DB2 LUW is the newest of the industry’s major relational databases. First introduced in 1996, it maintained compatibility and shared some features with the older IBM mainframe version of DB2, but the core design was significantly different from the latter.

The core Oracle database design originated in the 1970s – the first commercially available version was introduced in 1978 – and its popularity created a legacy installed base with which Oracle has been obliged to maintain compatibility ever since. Over time, additional capabilities have been added in the form of overlays and extensions.

The core DB2 LUW design differs from successive Oracle database designs in a number of key areas. Software structures are more lightweight and operate more efficiently; additional functionality in successive upgrades has been integrated into rather than overlaid on the DB2 kernel; and optimization and automation functions are more advanced.

• Performance optimization. While DB2 10 and Oracle 11g implement such capabilities as partitioning and clustering, DB2 10 implementations are generally recognized as more effective.

Users employing established DB2 features such as Database Partitioning and Multi-Dimensional Clustering (MDC) have experienced improvements of up to eight to nine times in throughput for processes involving repetitive queries, large tablespaces, or both.

MDC, which is optimized for use in large data warehouse environments, enables continuous, flexible and automatic clustering of data across multiple dimensions. Organizations have typically accelerated query performance by around three times, and improvements of ten times or more have been reported. Oracle Database 11g implements a table clustering feature that offers some comparable capabilities, but does not match overall MDC performance and functionality.

Major new performance optimization features have been introduced in DB2 10. These include multi-core parallelism, which automatically optimizes thread to core ratios on symmetric multiprocessing (SMP) servers; index prefetching; and functions that boost performance for hash joins, queries across star schemas, aggregation and other processes. A new zigzag join method, for example, may significantly reduce the time and I/O loading for complex multidimensional queries.

The impact of these features is cumulative. Organizations employing DB2 10 have experienced performance gains of between 50 percent and more than two times compared to use of DB2 9.7, depending on applications and workloads. Higher levels have been reported for high-volume complex query workloads particularly benefit from DB2 10 join enhancements.

• Data compression. DB2 10 incorporates new page-level compression technology that extends table and index compression capabilities in DB2 9.7. The implementation is not simply an overlay. The entire set of algorithms has been rewritten to maximize overall efficiency.

In comparison, Oracle 11g Advanced Compression remains essentially as implemented by the company in Oracle 11g R1 in 2007. Advanced Compression is built upon block-level techniques employed in legacy Oracle databases, and tends to be most effective in compressing indexes. Performance tends to degrade as compression levels increase.

International Technology Group 15

Oracle 11g users typically realize 20 to 40 percent compression for production systems. Early DB2 10 users have routinely achieved 60 to 80 percent depending on database and workload characteristics, and levels of 90 percent and higher have been reported.

In both DB2 versions, compression effects extend to permanent as well as temporary tables, all types of index, log files, large objects (LOBs), values, XML Data Areas (XDAs) and other structures. The system automatically identifies opportunity areas.

Oracle compression technology tends to be most effective when databases are highly structured, and undergo few changes. In more dynamic environments, compression effects tend to be less pronounced and to erode over time unless frequent database reorganizations are conducted – which may not always be feasible.

Higher compression levels deliver a variety of benefits. These include savings in disk capacity and costs – early user experiences indicate DB2 10 enables up to eight times reductions in disk capacity compared to uncompressed environments – and in memory and I/O costs for storage systems and database servers.

Costs may be reduced for storage hardware and for software products priced on a per terabyte basis. Further economies may be generated in systems, media and administrator time for backups, replication, and other data management and movement processes. In addition, costs of wide area network bandwidth may be materially reduced.

• Time Travel Query. This new feature in DB2 10 implements what is generally referred to as temporal database capability.

In practice, it means that systems can distinguish between system time (when an event is logged) and business time (an alternative date and/or time associated with the event) in maintaining and querying records. This capability responds to key informational, legal and compliance needs in a variety of industries.

Although temporal database concepts date back to the 1990s, structures in which the ability to maintain multiple timelines is a core design parameter have only recently entered the commercial mainstream. In addition to DB2 10, full temporal capabilities are otherwise offered only by the latest Teradata Database Version 14 and specialized databases.

For most organizations, temporal analysis has meant developing customizing applications. This approach tends to be clumsy and time-consuming, and may impair database performance. Significant modifications to applications may be required if temporal parameters change. Designed-in temporal structures are a great deal more efficient and productive.

Although there is no direct Oracle Database 11g equivalent, two Oracle offerings address temporal data. Flashback Data Archive (FDA), also referred to as “Total Recall,” allows historical records to be viewed “as is,” even if they have been subsequently written over. Originally offered as a separately charged product, FDA now forms part of Oracle Advanced Compression for Database 11g.

FDA is primarily an archival storage solution. Although in principle FDA can be used as a source of data for business analysis, it is typically necessary to write customized code in order to do so.

Oracle Workspace Manager is a Database 11g add-on that allows developers to work with multiple timelines in dedicated “workspaces” separate from the main database structure. However, workspaces are complex to administer, and require a great deal of DBA and developer effort over time. In addition, performance impacts may again be significant.

International Technology Group 16

DB2 10 is the first major database to comply with the temporal features of the new ANSI/ISO SQL:2011 standard. According to Oracle, the company will support this standard in the future.

• Multi-Temperature Data Management. This new DB2 10 feature implements automated storage tiering; i.e., it allows organizations to transfer data between multiple tiers of high-performance and lower-cost, higher-capacity drives for better overall performance and reduced media costs. Automated storage tiering for Oracle and other databases is implemented in disk array controllers.

Like tiering approaches, Multi-Temperature Data Management moves “hot” (frequently accessed) and “cold” data between high performance SAS and slower, lower-cost Nearline SAS (NL-SAS) or SATA drives. Movement occurs as a background process with minimal performance impact. It may be set to occur automatically or as a result of administrator intervention.

The DB2 10 feature provides full tiering capabilities, including support for solid state drives (SSDs), along with high-performance Fibre Channel (FC) and Serial Attached SCSI (SAS), and high-capacity Serial ATA (SATA) and Near-Line SAS (NL-SAS) drives in conventional and RAID configurations.

Key structures such as storage pools are built into the core database, rather than in disk controller software and microcode, while tiering functions are tightly integrated with DB2 10 Workload Manager (WLM) and automation facilities. Movement of data between tiers occurs as a background process with minimal performance impact, and may be set to occur automatically or as a result of administrator intervention.

These capabilities address key weaknesses of array-centric automated storage tiering approaches. Although performance gains and media savings may be realized, statistics collection and data movement processes often generate high levels of system overhead, and frequent intervention by administrators may be required as workloads change.

It can be expected that, in high-volume, exceptionally performance-sensitive environments, array-based tiering approaches offered by IBM (Easy Tier) and others remain appropriate. DB2 10 Multi-Temperature Data Management offers a comparatively efficient, low-cost alternative for organizations with less exacting workloads.

• Continuous Data Ingest. This new DB2 10 feature represents a “real-time data warehousing” capability. The feature, which employs a new IBM technology for parallel loading of multiple data streams, enables extremely fast, low-overhead transfers of data into DB2-based systems. It offers a higher-performance alternative to conventional batch and “trickle feed” techniques.

A variety of data formats are supported, and ETL processes may be performed concurrently.

Oracle strategy for real-time data warehousing is built around the GoldenGate solution set, acquired by the company in 2007. Continuous data updates can be implemented for Oracle Database 11g using this solution set. As in other areas of Oracle capability, however, throughput times are slower, more administrator time is required and the feature is separately charged.

• Big Data. DB2 10 supports the SPARQL query language and Resource Definition Framework (RDF) data stores for development of new NoSQL applications for InfoSphere Warehouse 10. The implementation conforms to SPARQL and RDF standards defined by the World Wide Web Consortium (W3C).

SPARQL, RDF and NoSQL form part of the broader complex generally referred to as “Big Data.” This complex includes solutions built around Hadoop and MapReduce, which were originally developed by Google and later transferred to the control of the Apache Software Foundation.

International Technology Group 17

Big Data technologies have attracted a great deal of industry attention. They were developed and deployed by popular search engine and social media companies to handle massive volumes of unstructured data (in some cases, tens or hundreds of petabytes) that exceed the architectural limits of relational databases.

Other types of business may also have petabyte-scale requirements. The general industry expectation is, however, that Big Data and relational models will coexist in most organizations. The capabilities implemented by IBM in DB2 10 enable unstructured and structured data to be integrated for analytical purposes.

A new DB2 10 feature, GraphStore, implements support for NoSQL tripled graph stores, which are commonly employed in big data environments to establish and illustrate relationships between different data sets. Tripled graph stores have, for example, been adopted by Facebook, LinkedIn and other social networks to establish connections between individuals on their sites.

Oracle supports use of SPARQL and RDF data stores on Oracle Database 11g Enterprise Edition, although the implementation is clumsier, and requires more DBA time and effort. In this as in other areas of functionality, DB2 10 capabilities are integrated into the core system structure rather than overlaid on it.

• XML support. DB2 has implemented XML since the introduction of DB2 9.1 in 2006, and capabilities have been repeatedly enhanced since that time. Although IBM has moved aggressively to implement new big data technologies, the company continues to support the earlier Extensible Markup Language (XML) standard. DB2 10 users may employ either RDF and NoSQL, or native XML data models.

IBM pureXML provides full DB2 10 support for XML storage, indexing, queries, updates and data management. The full range of DB2 10 capabilities – including database and range partitioning, multidimensional clustering, data compression, performance optimization and automation – extend to XML data content.

Oracle Database 11g supports XML and has been adopted by a number of organizations for XML database applications. Users have reported, however, that earlier DB2 versions delivered significantly higher levels of performance and functionality than Oracle equivalents, and disparities appear to have widened with the introduction of DB2 10.

• Workload management. This longstanding DB2 strength has been enhanced in DB2 10. DB2 capabilities are derived from mainframe architecture, and draw upon mainframe strengths in managing diverse concurrent workloads. These capabilities allow organizations to realize higher levels of capacity utilization over time than with less well-optimized databases such as Oracle.

The DB2 10 Workload Manager enables highly granular, automated prioritization, along with resource allocation, queuing and real-time monitoring and management of workloads generated by hundreds to thousands of separate query jobs. Priorities may be set based on user group, query type, time of day and other variables.

A key benefit is that service level agreement (SLA) targets may be met a highly cost-effective and reliable manner. Up 64 primary classes and 3,904 subclasses of service may be defined.

DB2 10 workload management strengths provide particular value in environments characterized by diverse transactional and/or query workload mixes. Such environments increasingly include organizational data warehouses that must handle a wide range of query types and sizes with varying degrees of time-sensitivity.

International Technology Group 18

Oracle does not address workload management in Database 11g from a service level perspective. Allocation of resources to user groups and workloads is handled through the Database Resource Manager and related components of Oracle Database 11g. As described by Oracle, “resources are allocated to users according to a resource plan specified by the DBA.”

In the Oracle approach, service level management is handled through separately charged solutions offered by third parties or customized through the Oracle professional services organization.

• Automation. This is a longstanding DB2 strength that contributes to higher levels of DBA productivity as well as performance (system parameters may be adjusted more rapidly and efficiently than with manual techniques) and availability (risks of human error are reduced).

DB2 automation builds upon core design features in a number of areas – including server and memory architecture, parameter-setting and storage management – that allow DBAs to perform tasks with fewer, simpler actions, in less time than their Oracle counterparts.

Core DB2 strengths are reinforced by implementation of IBM autonomic technologies. Autonomic computing, meaning the application of artificial intelligence to system administration and optimization tasks, has been a major IBM development focus for more than a decade. The company is the recognized industry leader in this area.

In DB2 10, autonomic features include those shown in figure 13.

FEATURE   FUNCTION  

Automatic  storage  management  

Monitors  &  automatically  creates,  extends  &  adds  storage  device  containers  to  support  database  growth  across  disk  &  file  systems.  Redefines  storage  paths  as  needed.  

Self  tuning  memory  manager  (STMM)  

Tunes  database  memory  settings  in  real  time  during  run  time  to  optimize  performance  for  one  or  multiple  concurrent  databases.  Optimizes  cache  current  workloads  up  to  60  times  per  hour.  Increases  DBA  productivity,  improves  performance  up  to  10  times  &  reduces  risks  of  bottlenecks.

Automatic  maintenance  

Automatic/real-­‐time  statistics  collection  

Determines  whether  statics  need  to  be  updated,  &  initiates  collection  for  tables  where  this  is  the  case.  Real-­‐time  statistics  may  be  generated  in  under  five  seconds  if  required.  

Automatic  reorganization  

Evaluates  updated  tables  or  indexes,  determines  whether  reorganization  is  required,  &  schedules  such  operations  during  predefined  periods.  

Automatic  backup   Performs  online  &/or  offline  backups  according  to  predefined  schedules  &  criteria.  

Workload  Manager   Enables  fine-­‐grain  resource  allocation,  monitoring  &  management  of  workloads  based  on  service  classes,  workload  characteristics,  elapsed  time,  time  of  day  &  other  criteria.  Integrates  with  AIX,  Linux  &  other  external  workload  management  facilities.  

Silent  Installation   Allows  DB2  installation  based  on  application-­‐specific  setup  &  configuration  information;  i.e.,  no  user  input  is  required.  Enables  rapid  startup  &  minimizes  installation  footprint.  

Figure 13: DB2 10 Autonomic Features

Users have found the self-tuning memory manager (STMM) to be particularly valuable. STMM automatically adjusts memory configuration parameters and buffer pool sizes as workload characteristics change. This feature, which represents one of the industry’s most advanced self-tuning technologies, maintains continuous performance optimization without DBA intervention.

A key IBM focus for DB2 automation has been on administration tasks for databases that undergo frequent change in size, schemas and underlying data structures. DBA productivity gains may be significantly larger than for more stable environments.

International Technology Group 19

• Security features. DB2 10 implements a new IBM technology, Row and Column-level Access Control (RCAC), which combines multiple security mechanisms for what is referred to as Fine-Grained Access Control (FGAC). User access may be restricted with higher granularity than with conventional techniques.

Oracle Database 11g also offers the ability to combine row and column-level security, although, again, the Oracle approach tends to require more administrative time and effort.

A broader DB2 10 differentiator should be cited. The impact of hundreds of superior IBM integration and optimization features is cumulative; i.e., in DB2 10, the whole is more than the sum of the parts.

This is not the case for Oracle. Numerous software overlays on, and add-ons to legacy Oracle data structures tend to both reduce overall system efficiency and, in many cases, impair rather than improve DBA productivity.

HA Clusters

In this area, the principal IBM high-end offering is DB2 pureScale. A more limited offering, DB2 High Availability Disaster Recovery (HADR) is widely employed for less business-critical systems. In practice, most user requirements can be met with HADR.

Both solutions have been enhanced in DB2 10. HADR, for example, now supports use of multiple “hot” standby servers, and allows delays to be set in the failover process to prevent replication of problems to standbys. pureScale enhancements include tighter integration with the core DB2 10 engine, closer alignment with WLM and support for range (table) partitions.

The equivalent Oracle offerings are Real Application Clusters (RAC) and Active Data Guard respectively. DB2 pureScale, and Oracle RAC and Active Data Guard are separately charged. HADR is included in DB2 10.

Oracle RAC and DB2 pureScale are positioned in three main roles, as enabling: (1) Tier 1 database failover and recovery in the event of a disastrous unplanned outage; (2) transparent failover between systems in order to perform tasks that would otherwise require planned outages; and (3) realization of high levels of scalability by spreading database images across multiple physical systems.

There are, however, a number of differences in architecture and technology between the two solutions.

DB2 pureScale integrates a number of IBM technology components that are generally regarded as industry leading. The core architecture is derived from Parallel Sysplex Data Sharing, which has supported the world’s largest mainframe-based business-critical systems for close to 20 years.

Automation of failover and recovery processes is provided by Tivoli System Automation, another mainframe-derived capability that has been reinforced with autonomic technologies.

The DB2 pureScale cluster file system is IBM General Parallel File System (GPFS), which has been widely deployed for high-performance commercial, as well scientific and technical applications for more than a decade. The latest version 3.5 of GPFS was introduced in April 2012.

International Technology Group 20

GPFS has demonstrated near-linear scalability in extremely large configurations – installations with 1,000+ nodes are common, and the largest exceed 5,000 nodes. Storage volumes of hundreds of terabytes are routine, and petabyte-scale systems have been demonstrated. Installations of the RAC equivalent, Oracle Cluster File System, are typically a great deal smaller.

There are a number of implications. For example, Oracle RAC failover is a more complex and slower process than is the case for DB2 pureScale, and tends to require more manual intervention by administrators.

Cluster overhead is also significantly higher for RAC than for DB2 pureScale. RAC clusters can typically be expanded to six to eight systems, depending on workloads, before performance degradation becomes unacceptable. In comparison, IBM tests demonstrate significantly lower degradation as cluster size increases. Figure 14 shows these results.

Number  of  Systems   Degradation  

2-­‐64   <  5%  

88   10%  

112   11%  

128   16%  

Figure 14: DB2 pureScale Cluster Size Relative to Performance

A further differentiator is that DB2 applications can be migrated “as is” to pureScale environments. In contrast, transitioning Oracle Database applications to RAC involves extensive modifications, and a great deal of testing is normally required.

Oracle Compatibility

A further DB2 10 characteristic should be highlighted. IBM has placed a major emphasis on Oracle compatibility in order to minimize migration costs and difficulties. DB2 9.7 incorporated technologies from EnterpriseDB, an industry leader in this area, and DB2 10 expands compatibility features.

Key DB2 features include native support for Oracle Procedural Language/Structured Query Language (PL/SQL) and the Oracle SQL dialect, along with a wide range of code, tools and functions commonly employed by Oracle developers. Examples are listed in figure 15.

Figure 15: Commonly Used Oracle Features Supported by DB2 10

Organizations that migrated Oracle applications to DB2 9.7 routinely found that more than 95 percent of code had remained unchanged, and with DB2 10, the proportion appears to be closer to 98 percent. Few or no changes to existing Oracle development tools and skills are required, and transition periods are relatively short – in some cases, less than two weeks – and non-disruptive.

A further benefit is that, in DB2 10, Oracle compatibility functions are built into the core engine rather than implemented in the form of a software overlay. Thus, organizations experience the same levels of performance as native DB2 10 users.

Concurrency  control   JDBC  client  +  extensions   PL/SQL,  SQL  dialect     OCI  client  applications  

PL/SQL  packages   SQL*Plus  scripts   Built-­‐in  packages   Oracle  Forms         (automated  conversion)  

International Technology Group 21

Oracle Solutions In addition to Oracle Database 11g Enterprise Edition capabilities, the company offers other solutions that may be employed in a data warehouse role. These include the following:

• Oracle Exadata Database Machine. This has been aggressively promoted for data warehousing. According to the company, around two-thirds of the 1,000+ Exadata Database Machines installed to date are employed in this role. (The remainder mainly support Oracle database consolidations and Oracle E-Business Suite, Oracle Retail and other transaction processing applications.)

Exadata Database Machine is an overlay of new technology on legacy structures. The system combines a conventional RAC database cluster with Exadata Storage Servers designed to handle I/O-intensive processing. This approach is highly inefficient in its use of system resources.

The I/O-intensive components of Exadata Database Machine architecture are designed primarily to execute high-volume sequential table scans generated by applications that are structurally simple, but require a great deal of processing power; e.g., identifying and collating specific variables in large volumes of records.

This is the case for the three key technologies presented by Oracle as critical differentiators for the system: Smart Scan, Exadata Hybrid Columnar Compression (EHCC) and Smart Flash Cache. Performance is weaker, however, for other types of application, and the system does not support complex mixed workloads well.

EHCC, for example, is designed to compress large tables, and is most effective when these tables are processed sequentially. Oracle has claimed compression rates of up to 70 times. Among users, rates of two to three times have been reported. EHCC, however, does not have a similar effect for other data structures and types of workload.

Like other high-end data warehouse appliances, Oracle Exadata Database Machine is expensive. Organizations must invest in Exadata Storage Server and Database Machine hardware and software, as well as licenses and support for Oracle Database 11g Enterprise Edition, Partitioning, Advanced Compression, Real Application Clusters, and Diagnostics and Tuning Packs.

• Oracle Big Data solutions. These include the internally developed Oracle NoSQL Database as well as a set of software-based Big Data Connectors enabling transfer of Hadoop Distributed File System, MapReduce and R applications to Oracle Database 11g (R is an open source statistics language commonly employed in Big Data environments).

In addition, the company offers the Oracle Big Data Appliance, which bundles these software components and the Cloudera distribution of Hadoop with Exadata-like configurations of Sun x86-based server hardware, and dedicated storage servers. The Oracle Enterprise Linux operating system is employed.

The Oracle Big Data Appliance forms part of the company’s broader portfolio of Engineered Systems (appliances) and is promoted by the company for use alongside Exadata Database Machine and Exalogic Elastic Cloud, a similar appliance bundled with WebLogic Server and other Oracle middleware offerings.

Oracle offers a variety of other BI-related solutions including Essbase OLAP; Hyperion Enterprise Performance Management (EPM); Real-Time Decisions (RTD); Oracle Exalytics In-Memory Machine, an appliance incorporating the company’s Times Ten in-memory database technology; and Oracle Business Intelligence Suite Enterprise Edition Plus (OBIEE), a superset of Oracle Database 11g Enterprise Edition including Siebel and Hyperion BI tools.

International Technology Group 22

DETAILED DATA Calculations presented here include initial license and three-year support costs, and are based on vendor U.S. list prices current when this report was prepared.

Calculations for per processor pricing for Oracle Database 11g Enterprise Edition stacks calculated using Oracle core factors of 0.5 for Intel and 1.0 for IBM POWER7 processors. Per user calculations allow for the Oracle pricing minimum of 25 users per core.

InfoSphere Warehouse 10 costs were calculated using IBM pricing per terabyte of user data. Where appropriate, calculations were rounded to the next largest 1 TB increment; e.g., a value of 0.3 TB of compressed user data was rounded to 1 TB, while 7.5 TB was rounded to 8 TB for pricing purposes.

Three sets of detailed cost data are presented below:

1. Figure 16 compares costs for InfoSphere Warehouse 10 Enterprise Edition and an equivalent Oracle Database 11g Enterprise Edition stacks.

2. Figure 17 compares costs for InfoSphere Warehouse 10 Advanced Enterprise Edition and an equivalent Oracle Database 11g Enterprise Edition stacks.

3. Figure 18 compares costs for InfoSphere Warehouse 10 Departmental Edition and equivalent Oracle Database 11g Enterprise Edition stacks.

Calculations are for vendor list prices current when this report was prepared, and do not include IBM DB2 pureScale or Oracle RAC.

DATA  WAREHOUSE  

User  data  (uncompressed)   10  TB   25  TB   50  TB  

Number  of  users   1,000   3,000   5,000  

Number  of  processors   2/16  x  POWER7   4/32  x  POWER7   8/64  x  POWER7  

ORACLE  DATABASE  11g  ENTERPRISE  EDITION    

Minimum  number  of  users   400   800   1,600  

Per  user  license  cost   $2,510,000   $7,530,000   $12,550,000  

3-­‐year  cost  including  support   $4,166,600   $12,499,800   $20,833,000  

Per  processor  license  cost   $2,004,800   $4,009,600   $8,019,200  

3-­‐year  cost  including  support   $3,327,968   $6,655,936   $13,311,872  

IBM  INFOSPHERE  WAREHOUSE  10  ENTERPRISE  EDITION  

UNCOMPRESSED  DATA  

Per  terabyte  license  cost   $829,000   $2,072,500   $4,145,000  

3-­‐year  cost  including  support   $1,326,400   $3,316,000   $6,632,000  

70%  DATA  COMPRESSION  

Per  terabyte  license  cost   $248,700   $663,200     $1,243,500  

3-­‐year  cost  including  support   $379,920   $1,061,120   $1,989,600  

Figure 16: Cost Calculations for Information Warehouse 10 Enterprise Edition and Oracle Database 11g Enterprise Edition – 10 TB to 50 TB Installations

International Technology Group 23

DATA  WAREHOUSE  

User  data  (uncompressed)   10  TB   25  TB   50  TB  

Number  of  users   1,000   3,000   5,000  

Number  of  processors   2/16  x  POWER7   4/32  x  POWER7   8/64  x  POWER7  

ORACLE  DATABASE  11g  ENTERPRISE  EDITION  with  Database  Lifecycle  Management  Pack  

Minimum  number  of  users   400   800   1,600  

Per  user  license  cost   $2,750,000   $8,250,000   $13,750,000  

3-­‐year  cost  including  support   $4,565,000   $13,695,000   $22,825,000  

Per  processor  license  cost   $2,196,800   $4,393,600   $8,787,200  

3-­‐year  cost  including  support   $3,646,688   $7,293,376   $14,586,752  

IBM  INFOSPHERE  WAREHOUSE  10  ADVANCED  ENTERPRISE  EDITION  

UNCOMPRESSED  DATA  

Per  terabyte  license  cost   $1,180,000     $2,950,000   $5,900,000    

3-­‐year  cost  including  support   $1,888,000   $4,720,000   $9,440,000  

70%  DATA  COMPRESSION  

Per  terabyte  license  cost   $354,000   $944,000     $1,770,000  

3-­‐year  cost  including  support   $566,400   $1,510,400   $2,832,000  

Figure 17: Cost Calculations for Information Warehouse 10 Advanced Enterprise Edition and Oracle Database 11g Enterprise Edition – 10 TB to 50 TB Installations

DATA  WAREHOUSE  

User  data  (uncompressed)   1  TB   2  TB   5  TB  

Number  of  users   60   150   400  

Number  of  processors   2/8  x  Intel  E5   2/16  x  Intel  E5   2/20  x  Intel  E7  

ORACLE  DATABASE  11g  ENTERPRISE  EDITION  with  Database  Lifecycle  Management  Pack  

Minimum  number  of  users   200   400   500  

Per  user  license  cost   $572,000   $1,144,000   $1,430,000  

3-­‐year  cost  including  support   $949,520   $1,899,040   $2,373,800  

Per  processor  license  cost   $619,200   $1,238,400   $1,548,000  

3-­‐year  cost  including  support   $1,027,872   $2,055,744   $2,569,680  

IBM  INFOSPHERE  WAREHOUSE  10  ADVANCED  ENTERPRISE  EDITION  

UNCOMPRESSED  DATA  

Per  terabyte  license  cost   $39,700   $79,400   $198,500  

3-­‐year  cost  including  support   $63,520   $127,040   $317,600  

70%  DATA  COMPRESSION  

Per  terabyte  license  cost   $39,700   $39,700   $79,400  

3-­‐year  cost  including  support   $63,520   $63,520   $127,040  

Figure 18: Cost Calculations for Information Warehouse 10 Departmental Edition and Oracle Database 11g Enterprise Edition – 1 TB to 5 TB Installations

ABOUT THE INTERNATIONAL TECHNOLOGY GROUP ITG sharpens your awareness of what’s happening and your competitive edge

. . . this could affect your future growth and profit prospects

International Technology Group (ITG), established in 1983, is an independent research and management consulting firm specializing in information technology (IT) investment strategy, cost/benefit metrics, infrastructure studies, deployment tactics, business alignment and financial analysis.

ITG was an early innovator and pioneer in developing total cost of ownership (TCO) and return on investment (ROI) processes and methodologies. In 2004, the firm received a Decade of Education Award from the Information Technology Financial Management Association (ITFMA), the leading professional association dedicated to education and advancement of financial management practices in end-user IT organizations.

The firm has undertaken more than 120 major consulting projects, released more than 250 management reports and white papers and more than 1,800 briefings and presentations to individual clients, user groups, industry conferences and seminars throughout the world.

Client services are designed to provide factual data and reliable documentation to assist in the decision-making process. Information provided establishes the basis for developing tactical and strategic plans. Important developments are analyzed and practical guidance is offered on the most effective ways to respond to changes that may impact complex IT deployment agendas.

A broad range of services is offered, furnishing clients with the information necessary to complement their internal capabilities and resources. Customized client programs involve various combinations of the following deliverables:

Status Reports In-depth studies of important issues

Management Briefs Detailed analysis of significant developments

Management Briefings Periodic interactive meetings with management

Executive Presentations Scheduled strategic presentations for decision-makers

Email Communications Timely replies to informational requests

Telephone Consultation Immediate response to informational needs

Clients include a cross section of IT end users in the private and public sectors representing multinational corporations, industrial companies, financial institutions, service organizations, educational institutions, federal and state government agencies as well as IT system suppliers, software vendors and service firms. Federal government clients have included agencies within the Department of Defense (e.g., DISA), Department of Transportation (e.g., FAA) and Department of Treasury (e.g., US Mint).

International Technology Group 609 Pacific Avenue, Suite 102 Santa Cruz, California 95060-4406 Telephone: + 831-427-9260 Email: [email protected] Website: ITGforInfo.com