Extreme Performance with Oracle Data Warehousing Andreas Katsaris Data warehouse and BI Practice Manager, Arisant LLC

Extreme Performancewith Oracle Data Warehousing

<Insert Picture Here>

Andreas KatsarisData warehouse and BI Practice Manager, Arisant LLC

Source: IDC, July 2009 – “Worldwide Data Warehouse Management Tools 2008 Vendor Shares”

Oracle #1 for Data Warehousing

Gartner Magic Quadrant for BI Platforms

2008 2009

Market Position Recognized as a leader since OWB 10.2

• 10,000+ companies rely on OWB

• Optimized for Oracle environments

2005

Oracle Data Warehousing Complete, Open and Integrated

• Standard components• Certified configurations• Comprehensive security• Higher availability• Easier to manage• Lower cost of ownership

Smart StorageSmart Storage

Operating SystemOperating System

DatabaseDatabase

Data ModelsData Models

BI ToolsBI Tools

BI ApplicationsBI Applications

Distributed Data Marts and ServersExpensive Data Warehouse Architecture

DataDataMartsMarts

Data MiningData Mining

Online Online AnalyticsAnalytics ETLETL

Consolidate onto a Data WarehouseSingle Source of Truth on Low-Cost Servers and Storage

Oracle Database 11g

DataDataMartsMarts

Data MiningData Mining

Online Online AnalyticsAnalytics ETLETL

Choice of Data Warehouse Solutions

Database Machine

Complete system, including software, servers, networking

and storage

Custom Solutions

Flexibility for the most demanding data

warehouse

Reference Configurations

Best-practice configurations for data warehousing

Drastically Simplified Deployments

• Database Machine eliminates the complexity of deploying database systems

– Months of configuration, troubleshooting, tuning

• Database Machine is ready on day one

– Pre-built, tested, standard, supportable configuration

– Runs existing applications unchanged

• Extreme performance out of the boxMonths to Days

http://www.oracle.com/technology/products/bi/db/dbmachine/ds_db_machine.pdf

Best Data Warehouse Machine

• Massively parallel high volume hardware to quickly process vast amounts of data

– Exadata runs data intensive processing directly in storage

• Most complete analytic capabilities – OLAP, Statistics, Spatial, Data Mining, Real-time

transactional ETL, Efficient point queries

• Powerful warehouse specific optimizations– Flexible Partitioning, Bitmap Indexing, Join indexing, Materialized

Views, Result Cache

• Dramatic new warehousing capabilities

Data Mining

OLAP

ETLETL

New

Sun Oracle Database Machine Hardware Improvements

• Same architecture as previous Database Machine– Same number and type of Servers, CPUs, Disks

Plus Flash Storage!

NewLatest TechnologiesLatest Technologies

Faster

Better

80% Faster CPUs

33% More SAS Disk Capacity

100% More SATA Disk Capacity

50% Faster Disk Throughput

100% Faster Networking

125% More Memory

200% Faster Memory

100% More Ethernet Connectivity

Xeon 5500 Nehalem

600 GB SAS Disks

2 TB SATA Disks

6 Gb SAS Links

40 Gb InfiniBand

72 GB per DB Node

DDR3 DRAM

4 Ethernet links per DB Node

New

Exadata Database Processing in Storage• Exadata storage servers implement data

intensive processing in storage– Row filtering based on “where” predicate– Column filtering– Join filtering– Incremental backup filtering– Storage Indexing– Scans on encrypted data– Data Mining model scoring

• 10x reduction in data sent to DB servers is common

• No application changes needed– Processing is automatic and transparent– Even if cell or disk fails during a query

Simple Query Example

What were my sales

yesterday?

Exadata Storage Grid

SUM

Scan compressed blocks in

partitions/indexes

Retrieve sales amounts for

Sept 24

Optimizer Chooses

Partitions and Indexes to

Access

Select sum(sales)

whereDate=’24-Sept’

10 TB scanned

1 GB returned to servers

Oracle Database Grid

Flash

• Flash storage more than doubles scan throughput

– 50 GB/sec

• Smart– Knows when to avoid caching

based on object reusability and size

– Accepts user directives at table, index and segment level

• Combined with Columnar compression

– Up to 50 TB of data fits in flash– Queries on compressed data

run up to 500 GB/sec

21

1011.47.5

0

10

20

30

40

50

60

Query Throughput Query Throughput with Flash

50

HITACHIUSP V

HITACHIUSP V

TERADATA2550

TERADATA2550

NETEZZATwinFin 12

NETEZZATwinFin 12

SUN ORACLEDatabase MachineSUN ORACLEDatabase Machine

Query Throughput Uncompressed DataQuery Throughput

Uncompressed Data

50XUp To

Exadata Hybrid Columnar Compression

• Data is stored by columnand then compressed

– the decompression of the data is offloaded to Exadata eliminating CPU overhead on the database servers

• Query Mode for data warehousing

– Optimized for speed– 10X compression ratio is typical – Scans improve proportionally

• Archival Mode for infrequently accessed data

– Optimized to reduce space– 15X compression is typical– Up to 50X for some data

Exadata Storage IndexTransparent I/O Elimination with No Overhead

• Exadata Storage Indexes maintain summary information about table data in memory

– Store MIN and MAX values of columns

– Typically one index entry for every MB of disk

• Eliminates disk I/Os if MIN and MAX can never match “where” clause of a query

• Completely automatic and transparent

A B C D

1

3

5

5

8

3

Min B = 1Max B =5

Table Index

Min B = 3 Max B =8

Select * from Table where B<2 - Only first set of rows can match

Benefits Multiply

1 TBwith compression

10 TB of user dataRequires 10 TB of IO

100 GBwith partition pruning

20 GB with Storage Indexes

5 GB Smart Scan on Memory or Flash

Subsecond On Database

Machine

Data is 10x Smaller, Scans are 2000x faster

Database Machine Success

• Database Machine is succeeding in all geographies and industries against every competitor

Oracle’s Data Integration Strategy Pervasive Data Integration

• Embed Data Integration within Oracle Database– Integrated, optimized and Best for Oracle Database– Easiest way to load external information into Oracle Database

• Provide Comprehensive Data Integration– Comprehensive Heterogeneous Technology Foundation– Integrated Runtime, Data Management Tools and Administration– Best of Breed: significant architectural differentiators vs. competitors– “Hot Pluggable”: broad support of sources & packaged applications

• Pre-Integrate Solutions for Oracle Portfolio– Make data integration pervasive with Lower Cost & Complexity

ODI – Enterprise EditionBundling E-LT Forces

E-LT Transformation vs. E-T-L

Real Time Change Data Capture

Oracle Data Integrator & Oracle Warehouse Builder

High Speed Batch Data Movement

Data Warehouse

Planning System

OLTP Sources

Application Sources

Legacy Sources

Set-based DataTransformations

OWB 11g Release 2Heterogeneous connectivity with high performance native code

• Extensible heterogeneous connectivity– Connect to any JDBC, ODBC or gateway enabled data store– Enhanced support for Change Data Capture and real-time extracts– Advanced ERP/CRM/MDM connectors– Call any web service as a data provider, publish extracts as web

services

• High-performance native code for any platform– Use extensible code templates to create native code– Leverage the database engine for best performance– Use native un-loaders/loaders for extremely fast bulk loading

• Enhanced Developer Productivity– New declarative user interface based on Fusion Client Platform– Rich metadata support for end-to-end data lineage and impact

analysis– Oracle BI EE metadata generation

Unified dev team

High-Level Roadmap Unified platform that is a superset of OWB and ODI

• OWB/ODI investments are fully protected

• No forced migrations

• Natural upgrade path

• No regressions in functionality

• Training and support to continue for both product releases

10gR210gR1 11gR1

10gR3 11gR111gR2

11gR2

OWB

ODI

“ CY 2012”Common pricing

Oracle Runs the Largest Databases

Meteorological Research–220+ Terabyte Oracle database–World’s largest database on Linux

Website Personalization–500,000,000 unique users–200 Terabyte Data Warehouse

How are these companies successful using Oracle?

Q: What was the total number of umbrellas sold in Boston during the month of

May 2008 ?

Customers Times

Products Channel

Sales

Star Query OptimizationSpecific Data Warehouse Access Methods

Oracle Partitioning

• Optimized performance– Comprehensive partitioning strategies to address business problems

• One consistent way to manage all your data – Not just for data warehouse and high-end OLTP– New partitioning features bring partitioning to every application

• Reduced total cost of ownership – Place less used data on lower cost storage

• Proven functionality in 8th generation– Experience comes with age and customer usage

The Concept of PartitioningSimple yet powerful

Large Table

Difficult to Manage

Partition

Divide and Conquer

Easier to Manage

Improve Performance

Composite Partition

Higher Performance

More flexibility to match business needs

SALES SALES

Jan Feb

SALES

Jan Feb

Europe

USA

Q: What was the total sales amount for May 20

and May 21 2009?

Select sum(sales_amount)

From SALES

Where sales_date between

to_date(‘05/20/2009’,’MM/DD/YYYY’)

And

to_date(‘05/22/2009’,’MM/DD/YYYY’);

Sales Table

5/17

5/18

5/19

5/20

5/21

5/22

Only the 2 relevant partitions are read

Partition for PerformancePartition Pruning

5% Active5% Active95% Less Active95% Less Active

SALES TABLE (7 years)

High End Storage Tier

2003

Partition to Manage Data Growth

2008 2009

Low End Storage Tier 2-3x less per terabyte

Partitioning - 11g enhancements

• Partition Advisor• New composite combinations

– list/range, range/range, list/hash, list/list• Interval Partitioning

– Automatic creation of range-based partitions• REF Partitioning

– Partition detail table based on the master-table key• Virtual-Column Based Partitioning

– Partition based on an expression

Interval Partitioning

• Minimizes periodic partition maintenance (No need to create new partitions)

• Partition segments allocated as soon as new data arrives

• Local indexes are created and maintained as well• Requires at least one range partition• Range key value determines the range high point• Partitioning key can only be a single column, and either

DATE or NUMBER datatype

CREATE TABLE sales(…)PARTITION BY RANGE (sales_date)INTERVAL(NUMTOYMINTERVAL(1, 'MONTH'))( PARTITION p1 VALUES LESS THAN (TO_DATE('1-2-2006', 'DD-MM-YYYY')) );

ILM – Information Lifecycle Management

• Why Bother– Compliance– Performance– Cost– Data Maintenance

5% - most active

35% - less active

60% - historical

ILM – Information Lifecycle Management

• Implementation of different tiers of storage• Consider Oracle ILM Assistant (free!)

– Leverages Oracle Partitioning– Uses Lifecycle Definitions– Calculates storage costs & savings– Simulates the impact of partitioning on a table– Advises how to partition a table– Generates scripts to move data when required

In-Memory Parallel Execution

Challenge• Traditionally Parallel Execution takes advantage of the IO

capacity of a system• Disk speeds are not keeping up with Moore’s law while CPU

and Memory are

Solution

• In-Memory Parallel Execution harness the memory capacity of the entire system

• An affinity algorithm places fragments of a object in memory on different RAC nodes

New


SQLstatement

Determine the size of the table being looked at

Read into the buffer cache on any node

Table is extremely small

Table is a good candidate for In-Memory Parallel Execution

Always use direct read from disk

Table is extremely Large

Fragments of Table are read into each node’s buffer cache

Only parallel server on the same RAC node will access each fragment


• A single database machine has over 400GB of memory usable for caching

• Database release 11.2 introduces parallel query processing on memory cached data

– Harnesses memory capacity of entire database cluster for queries

– Foundation for world record 1TB TPC-H

• Exadata Hybrid Columnar Compression enables multi-terabyte tables or partitions to be cached in memory

315,842

1,018,321

1,166,976

ParAccel Exasol Oracle

QphH: 1 TB TPC-H

Faster than in-memory specialized startups

Memory has 100x more bandwidth than Disk

Source: Transaction Processing Council, as of 9/14/2009: Oracle on HP Bladesystem c-Class 128P RAC, 1,166,976 QphH@1000GB, $5.42/QphH@1000GB, available 12/1/09. Exasol on PRIMERGY RX300 S4, 1,018,321 QphH@1000GB, $1.18/QphH@1000GB, available 08/01/08. ParAccel on SunFire X4100 315,842 QphH@1000GB, $4.57 /QphH@1000GB, available 10/29/07.

Automatic Degree of ParallelismAuto DOP

• Optimizer derives the DOP from the statement based on resource requirements for all scans operations

• Applies to all types of statements Query, DML, or DDL

• Explain plan has been enhanced to show DOP selected

• SQL Tune now uses Auto DOP to recommend parallelism

Parallel Statement Queuing

SQLstatements

Statement is parsedand oracle automatically determines DOP

If enough parallel servers available execute immediately

If not enough parallel servers available queuestatements

128163264

8

FIFO Queue

When the required number of parallel servers become available the first stmt on the queue is dequeued and executed

128

163264

Summary managementMaterialized Views

• Separate database object– Stores pre-calculated and aggregated results

• Database supports sophisticated transparent query rewrite– Queries don't change – run against base tables– Join-backs, additional aggregations etc. supported

• Database supports incremental fast refresh– Based on the query definition

Enhanced MV RefreshRefresh time for conventional insert, aggregate MV

0

20

40

60

80

100

120

20000 50000 250000 500000 750000

number of inserted rows

Ref

resh

tim

e n

ew v

s. o

ld [

%]

MV refresh 11.2

MV refresh pre 11.2

• On average 30 – 40% better refresh performance in 11g Release 2

SummariesSales

by RegionSales

by Date

Sales by Product

Sales by Channel

QueryRewrite

Summary Management Today

Region Date

Product Channel

SQL Query

RelationalStar Schema

Cube Organized Materialized Views

Summaries

Region Date

Product Channel

SQL Query

QueryRewrite

AutomaticRefresh

OLAP Cube

Cubes and Cube Organized MViews

• Stored in special db areas called Analytic Workspaces (which are stored in BLOBs)

• Manipulated via Analytic Workspaces Manager tool • 11g- Best of both worlds

– Rewrite and refresh features of regular MVs– Performance benefits of OLAP cubes– Expose OLAP cube as a relational object access via

SQL– CUBE_TABLE function searches cube using SQL

Cubes and Cube Organized MViews

SQL> explain plan for 2 select * from table(cube_table('GLOBAL.PRICE_CUBE'));

PLAN_TABLE_OUTPUT---------------------------------------------------------------------------------------Plan hash value: 3184667476

--------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |--------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 2000 | 195K| 29 (0)| 00:00:01 || 1 | CUBE SCAN PARTIAL OUTER| PRICE_CUBE | 2000 | 195K| 29 (0)| 00:00:01 |--------------------------------------------------------------------------------------

• Query cube as if it was a relational object

•Refresh using DBMS_MVIEW.REFRESH

Customer Case Study

Reporting Application• Cube-organized MVs replaced

table-based MVs• Time to build aggregate data

reduced by 89%• Longer running queries reduced

from 5 minutes to 12 seconds• Transparent access to cube-MV• No changes to reporting

applications

Advanced Compression

• Table Compression– Table Scan Performance: 2x faster– Storage Savings: 2x smaller– DML Performance: 5% slower

CREATE TABLE SALES_FACT (…) COMPRESS FOR ALL OPERATIONS;

• RMAN Compression– ~40% faster than compressed backups in 10g– Slightly better compression ratio than in 10g

RMAN> CONFIGURE COMPRESSION ALGORITHM ‘zlib’; RMAN> backup as COMPRESSED BACKUPSET database archivelog

all; • Data Pump Compression

expdp hr FULL=y DUMPFILE=dpump_dir:full.dmp COMPRESS;

Advanced Compression

SQL Query Result Cache

• Caching of query results or PL/SQL function calls• DML/DDL against dependent database objects

invalidates cache• Candidate queries

– access many, many rows– return few rows (small result set)– executed many times

SQL Query Result Cache

• result_cache_mode init.ora parameter– AUTO (optimizer uses repetitive executions to determine if

query will be cached)– MANUAL (need use /*+ RESULT_CACHE */ hint in queries)– FORCE (All results are stored in cache)

• result_cache_max_size init.ora parameter – default is dependent on other memory settings– (0.25% of memory_target or 0.5% of sga_target or 1% of

shared_pool_size)– 0 disables result cache– never >75% of shared pool (built-in restriction)

Q&A

•Thank you for attending• If you have follow-up questions I will be here for the rest of the day or can be contacted by email [email protected]

Documents

Extreme Performance with Oracle Data Warehousing Andreas Katsaris Data warehouse and BI Practice Manager, Arisant LLC