UTOUG Training Day...Oracle Business Intelligence Complete, Open, Integrated Data Warehouse Data Mart SAP, Oracle PeopleSoft, Siebel, Custom Apps Files Excel XML Business Essbase Process

<Insert Picture Here>

Oracle Database 11g for Data Warehousing and Business Intelligence

Daniel T. LiuPrincipal Solution Architect

UTOUG Training DayDate: Wednesday, March 30, 2011

Time: 09:00 AM – 11:00 AM Venue: 9750 South 300 West

Sandy, Utah 84070

Agenda

1. Introduction2. Oracle DW/BI

Overview3. Oracle Database

Features for DW/BI4. Summary5. Q & A

Why a Data Warehouse?

Business Intelligence Tools

E2EEnd to end view of the business

Single TruthConsolidated, cleansed, high quality system of record

360° ViewComplete view of global customers

Enterprise Data Warehouse

User Dilemma

"Looks like you've got all the data—what's the holdup?"Cartoon by Davis Harbaugh, HBR January 2006

Data Warehouse: Essential to BI

Transaction Systems

Business Intelligence Suite

Financial Performance Management Applications

Operational BI Applications

Enterprise Data Warehouse

Staging areafor supporting ETL as neededStaging Area

Atomic Data Layer

Performance Data Layer

Dimensional view of dataApplication-specific performance structuresSummary data / materialized viewsSupports specific end-users, tools, and applications

Physical Layers of a Data Warehouse

Base data warehouse schemaAtomic-level data, 3nf designSupports general end-user queriesData feeds to all dependent systems

Different PerspectivesInformation Technology & Line of Business

• Information Access• Purchased by LOB• Rapid business benefits• Heterogeneous &

ubiquitous access• Information consumption

represents value

Business Intelligence

• Information Management• Purchased by IT• Robust data architecture• Single version of the truth• Information management

represents value

Data Warehouse

Oracle’s Goal for Data Warehousing and Business Intelligence

• A comprehensive platform for data warehousing and business intelligence:– Industry-leading scalability and performance– Deep integrated analytics– Embedded data quality

All in a single platform running on a low-cost, highly-available enterprise grid

OLTP (Online Transaction Processing)- Characteristics

• Small data volumes• Read/write intensive• Lots of users • High transaction rates• Transactions are relatively small in terms of data processed• Significant write activity (insert/update/delete) • Fast access via indexes • High buffer cache usage • Limited scope for parallelism

OLAP (Online Analytical Processing)- Characteristics

• Deals with historical data• Large data volumes not uncommon

• Operations often require lots of resources• Access many tables• Perform expensive calculations• Complex SQL• Long running (may be hours)

OLAP (Online Analytical Processing)- Characteristics

• Filled through a controlled process– Extract, Transform, Load (ETL) – Often multiple sources, including flat files

• Supports different pre-defined workloads, e.g.– Scheduled ETL– Scheduled reporting– Ad-hoc end user access at business hours

• Peak usage of different workload patterns at different times– System have to be sized appropriately

IT Dilemma

“I am in a hurry, can I have my steak in 2 minutes?"

OLTP OLAP

Oracle’s Approach to Data Warehousing

Data Warehouse

Oracle Optimized Warehouse

Oracle Database EE

Scalability PartitioningRAC

Analytics OLAPData Mining

IntegrationOWB Ent ETLOWB Data QualityOWB ConnectorsODI, GoldenGateDatabase Gateways

Security VPD

Managability Diagnostics PackTuning Pack

Oracle’s Approach to Data Warehousing

Data Warehouse

Oracle Optimized Warehouse

Oracle Database EE

Availability Data GuardRAC

Agile RACASM

Performance PartitioningMaterialized ViewParallel QuerySQL Query Result Cache

Hardware Exadata

Storage ASMAdvanced Compression

Oracle Business IntelligenceComplete, Open, Integrated

Data WarehouseData Mart

SAP, OraclePeopleSoft, Siebel,Custom Apps

Files ExcelXML

BusinessProcessEssbase

Common Enterprise Information Model

Data Integration

Ad-hoc Analysis

Interactive Dashboards

Essbase Reporting &Publishing

ProactiveDetectionand Alerts

Disconnected& MobileAnalytics

MS Office& OutlookIntegration

OLTP & ODSSystems

BI Server• Integrated Security, User Management, Personalization• Intelligence Caching• Intelligent Request Generation and Optimized Data Access Services


Oracle11g Key Featuresfor Data Warehousing and

Business Intelligence

Broaden Oracle’s Technology Lead

New Features for BI/DW

• Manageability – Partition Advisor– Interval Partitioning– SQL Plan Management– Automatic SQL Tuning with Self-Learning Capabilities– Enhanced Optimizer Statistics Maintenance– Multi-Column Optimizer Statistics– Parallel Query enhancements with RAC– ASM Fast Resync, Fast VLDB Startup and other

enhancements• VLDB

– Composite Range-Range– Composite List-Range– Composite List-List– Composite List-Hash– REF Partitioning– Virtual Column Partitioning– Compression enhancements

• Performance– Result Cache

• Data loading– Change data capture enhancements– Materialized view refresh enhancements

• SQL– SQL Pivot and Unpivot– Materialized view rewrite enhancements

• OLAP– Simplified application development

• Fully declarative cube calculations• Cost-Based Aggregation• Simpler calculation definitions

– Continued database integration• Cube metadata in the Data Dictionary• SQL optimizer enhanced for cubes• Fine-grained data security on cubes

• Data Mining– Simplified development and deployment of models

• Supermodels: data preparation combined with mining model

• Additional packaged predictive analytics • Integration in database dictionary

– New algorithms: “General Linear Models”• Encapsulates several widely used analytic

methods• Multivariate linear regression; logistic regression

Large-Scale Data WarehousesFeature Usage

Source: Oracle ST Survey

Key DW/BI Features

1. Partitioning2. Advance

Compression3. Invisible Index4. Bitmap Index5. Materialized View6. SQL Query Result

Cache7. DW Security

Key DW/BI Features

8. Real Application Cluster9. RAC and PQ10. RAC and ETL11. Automatic Storage

Management12. Information Life Cycle

Management13. Integrating Unstructured

Data14. DW Architecture

Approach15. Exadata


Oracle Partitioning in Oracle Database 11g

Oracle PartitioningTen Years of Development

Interval PartitioningPartition Advisor

More composite choicesREF PartitioningVirtual Column Partitioning

Oracle Database 11g

Fast drop table“Multi-dimensional” pruning

1M partitions per tableOracle10g R2

Local Index maintenance

Global hash indexesOracle10g

Fast partition splitComposite range-list partitioning

Oracle9i R2

Global index maintenance

List partitioningOracle9i

Merge operationPartition-wise joins“Dynamic” pruning

Hash and composite range-hash partitioning

Oracle8i

Basic maintenance operations: add, drop, exchange

“Static” partition pruning

Range partitioningGlobal range indexes

Oracle8ManageabilityPerformanceCore functionality

• Complete the basic partitioning strategiesdefines HOW data is going to be partitioned

– new composite partitioning methods

• Introduce partitioning extensionsdefines WHAT controls the data placement

– enhance the manageability and automation– virtual column based partitioning– REF partitioning – interval partitioning– partition advisor

Oracle PartitioningEnhancements


Composite Partitioning in Oracle Database 11g

• Concept of composite partitioning– Data is partitioned along two dimensions (A,B) – A distinct value pair for the two dimensions uniquely

determines the target partitioning

• Composite partitioning is complementary to multi-column range partitioning

• Extensions in Oracle Database 11g ..

Extended Composite Partitioning Strategies

New 11g Strategy Use CaseList – Range Geography -TimeRange - Range ShipDate - OrderDateList - Hash Geography - OrderIDList - List Geography - Product

Table SALESRANGE(order_date)-RANGE(ship_date)

Composite Partitioning - Concept

ship_date

... ...

... ...

... ...

...

...

...

...Jan 2006 Feb 2006 Jan 2007

Jan 2006

Feb 2006

May 2006

Mar 2006order_date


Jan 2006

... ...

Feb 2006 Mar 2006 Jan 2007

... ...

... ...

...

...

...

...

Jan 2006

Feb 2006

May 2006

Mar 2006

• All records with order_date in March 2006

ship_date

order_date



Jan 2006

... ...

Feb 2006 Mar 2006 Jan 2007

... ...

... ...

...

...

...

...

Jan 2006

Feb 2006

May 2006

• All records with ship_date in May 2006

May2006

ship_date

order_date



Jan 2006

... ...

Feb 2006 Mar 2006 Jan 2007

... ...

... ...

...

...

...

...

Jan 2006

Feb 2006

May 2006

Mar 2006

• All records with order_date in March 2006ANDship_date in May 2006

May2006

ship_date

order_date



Virtual Column based Partitioning

Virtual Columns

Business Problem• Extended Schema attributes are fully derived and dependent on

existing common data• Redundant storage or extended view definitions are solving this

problem today– requires additional maintenance and creates overhead

Solution• Oracle Database 11g introduces virtual columns

– purely virtual, meta-data only• Treated as real columns except no DML

– can have statistics– eligible as partitioning key

• Enhanced performance and manageability

Virtual Columns - Example

• Base table with all attributes ...

CREATE TABLE accounts (acc_no number(10) not null,acc_name varchar2(50) not null, ...

12500 Adams12507 Blake1266612875 Smith

King


12500 Adams12507 12Blake12666 1212875 12Smith

King

CREATE TABLE accounts (acc_no number(10) not null,acc_name varchar2(50) not null, ...acc_branch number(2) generated always as (to_number(substr(to_char(acc_no),1,2)))

12

• Base table with all attributes ...• ... is extended with the virtual (derived) column


12500 Adams12507 12Blake12666 1212875 12Smith

King

CREATE TABLE accounts (acc_no number(10) not null,acc_name varchar2(50) not null, ...acc_branch number(2) generated always as

(to_number(substr(to_char(acc_no),1,2))) partition by list (acc_branch) ...

12

• Base table with all attributes ...• ... is extended with the virtual (derived) column• ... and the virtual column is used as partitioning key

32320 Jones32407 32Clark32758 3232980 32Phillips

32

... Hurd


Interval Partitioning

Interval Partitioning

• Partitioning is key-enabling functionality for managing large volumes of data– one logical object for application transparency– multiple physical segments for administration

but• Physical segmentation requires additional data

management overhead– new partitions must be created on-time for new data

MarJanFeb

CDRs

SQLApplication

Automate the partition management

Interval PartitioningHow it worksCREATE TABLE sales (order_date DATE, ...) PARTITON BY RANGE (order_date) INTERVAL(NUMTOYMINTERVAL(1,'month') (PARTITION p_first VALUES LESS THAN ('01-JAN-2006');

Table SALES

Jan 2006

... ...

Feb 2006 Mar 2006 Jan 2007 Oct 2009 Nov 2009

...

First segment is created

Table SALES

Jan 2006

... ...


...

Other partitions only exist in metadata

CREATE TABLE sales (order_date DATE, ...) PARTITON BY RANGE (order_date) INTERVAL(NUMTOYMINTERVAL(1,'month') (PARTITION p_first VALUES LESS THAN ('01-JAN-2006');

Interval PartitioningHow it works

Table SALES

Jan 2006

... ...


...

INSERT INTO sales (order_date DATE, ...) VALUES ('04-MAR-2006',...);

New segment is automatically allocated



Table SALES

Jan 2006

... ...


...

INSERT INTO sales (order_date DATE, ...) VALUES ('17-OCT-2009',...);

... whenever data for a new partition arrives



Table SALES

Jan 2006

... ...


...

• Interval partitioned table can have classical range and automated interval section– Automated new partition management plus full partition

maintenance capabilities: “Best of both worlds”


Table SALES

... ...

2006 Jan 2007 Oct 2009 Nov 2009

...



MERGE and move old partitions for ILM

Range partition section


Table SALES

... ...

Jan 2007 Oct 2009 Nov 2009

...



MERGE and move old partitions for ILM

Range partition section Interval partition section

Insert new data- Automatic segment creation


2006

INSERT INTO sales (order_date DATE, ...) VALUES ('13-NOV-2009',...);


REF Partitioning

REF Partitioning

Business Problem• Related tables benefit from same partitioning strategy

– e.g. order – lineitem • Redundant storage of the same information solves this problem

– data overhead– maintenance overhead

Solution• Oracle Database 11g introduces REF Partitioning

– child table inherits the partitioning strategy of parent table through PK-FK relationship

– intuitive modelling• Enhanced Performance and Manageability

Before REF Partitioning

Table ORDERS

Jan 2006

... ...

Feb 2006

Table LINEITEMS

Jan 2006

... ...

Feb 2006

• Redundant storage of order_date• Redundant maintenance

• RANGE(order_date) • Primary key order_id

• RANGE(order_date) • Foreign key order_id

REF Partitioning

Table ORDERS

Jan 2006

... ...

Feb 2006

Table LINEITEMS

Jan 2006

... ...

Feb 2006

• RANGE(order_date) • Primary key order_id

• RANGE(order_date) • Foreign key order_id

PARTITION BY REFERENCE• Partitioning key inherited

through PK-FK relationship

Select sum(sales_amount)

From

SALES s, CUSTOMER c

Where s.cust_id = c.cust_id;

Both tables have the same degree of parallelism and are partitioned the same way on the join column (cust_id)

SalesRange partition May 18th 2008

Sub part 2

Sub part 3

Sub part 4

Sub part 1

CustomerRange partition May 18th 2008

Sub part 2

Sub part 3

Sub part 4

Sub part 1

Sub part 2

Sub part 3

Sub part 4

Sub part 1

Sub part 2

Sub part 3

Sub part 4

Sub part 1

A large join is divided into multiple smaller joins, each joins a pair of partitions in parallel

Partition Wise join

Well-tunedSQL &

Schema

Partitioning Advisor

• Considers entire query workload to improve query performance

• Advises on partitioning methods – Range (equal-interval), range

key and interval– Hash, hash key

• Integrated, non-conflicting advice with Indexes, MVs

SQL Workload

PackagedApps

Custom Apps

SQL Advisor

SQL PlanTuning

SQLStructureAnalysis

AccessAnalysis

SQLProfile

SQLAdvice

Indexes& MVs

PartitionAnalysis

Partition Advice

New!


Oracle Database 11g

Advanced Compression Option

Challenges

• Explosion in data volume managed by Enterprises– Government regulations (Sarbanes-Oxley, HIPPA, etc)– User generated content (Web 2.0)

• IT managers must support larger volumes of data with limited technology budgets – Need to optimize storage consumption– Also maintain acceptable application performance

• Intelligent and efficient compression technology can help address these challenges

Introducing Advanced Compression Option

• Oracle Database 11g introduces a comprehensive set of compression capabilities – Structured/Relational data compression– Unstructured data compression– Compression for backup data– Network transport compression

• Reduces resource requirements and costs – Storage System– Network Bandwidth– Memory Usage

Redo logs BackupsStandby

OLTP Table Compression

• Oracle Database 11g extends compression for OLTP data– Support for conventional DML Operations

(INSERT, UPDATE, DELETE)

• New algorithm significantly reduces write overhead– Batched compression ensures no impact for most OLTP

transactions

• No impact on reads– Reads may actually see improved performance due to fewer

IOs and enhanced memory efficiency


Overhead

Free Space

Uncompressed

Compressed

Inserts are uncompressed

Block usage reaches PCTFREE – triggers Compression

Inserts are again uncompressed

Block usage reaches PCTFREE – triggers Compression

• Adaptable, continuous compression• Compression automatically triggered when block usage

reaches PCTFREE• Compression eliminates holes created due to deletions

and maximizes contiguous free space in block


SmithJohn3DoeJane4

DoeJane2DoeJohn1

LAST_NAMEFIRST_NAMEID

Employee Table Initially Uncompressed Block

INSERT INTO EMPLOYEEVALUES (5, ‘Jack’, ‘Smith’);

COMMIT;

1•John•Doe 2•Jane• Doe 3•John•Smith 4• Jane • Doe

Free Space

Header


Block

John= |Doe= |Jane= |Smith=

Header

DoeJane4SmithJohn3

SmithJack5

DoeJane2DoeJohn1

LAST_NAMEFIRST_NAMEID

Employee Table


Free Space

1• • 2• • 3• • 4 • • 5•Jack•

Free Space

Compressed Block

Local Symbol Table


1•John•Doe 2•Jane• Doe 3•John•Smith 4• Jane • Doe 5•Jack •Smith Free Space

Header

Uncompressed Block

John= |Doe= |Jane= |Smith=

Header


Free Space

1• • 2• • 3• • 4 • • 5•Jack•

Free Space

Compressed Block

Local Symbol Table More Data

Per Block

Advanced Compression OptionSave Disk, Reduce I/O, Maximise Memory

Compression4XUp To

• Compress Large Application Tables• Transaction processing, data warehousing

• Compress All Data Types• Structured and unstructured data types

• Compress Backup Data– Faster RMAN compression– Data Pump compression

• Typical Compression of 2-4X• Cascade storage savings throughout data center

Real World Compression Results10 Largest ERP Database Tables

0

500

1000

1500

2000

2500

3x Saving

Data Storage

Table Scans

0

0.1

0.2

0.3

0.4

2.5x Faster

DML Performance

0

10

20

30

40

< 3% Overhead


Invisible Index

Invisible Index

• An invisible index is an index that is ignored by the optimizer unless you explicitly set the OPTIMIZER_USE_INVISIBLE_INDEXES initialization parameter to TRUE at the session or system level. The default value for this parameter is FALSE.

• Making an index invisible is an alternative to making it unusable or dropping it. Using invisible indexes, you can do the following:– Test the removal of an index before dropping it.– Use temporary index structures for certain operations or modules of

an application without affecting the overall application.

Invisible Index

• Here are a few examples:

SQL> alter index emp_id_idx invisible;

SQL> alter index emp_id_idx visible;

SQL> create index emp_id_idx on emp (emp_id) invisible;


Bitmap Index

Bitmap Index vs B-tree Index

• In bitmap structures, a two-dimensional array is created with one column for every row in the table being indexed.

• Bitmap indexes are good for low-cardinality columns• Tables that have little insert/update are good

candidates (static data in warehouse)• The advantages of them are that they have a highly

compressed structure, making them fast to read and their structure makes it possible for the system to combine multiple indexes together for fast access to the underlying table.

Bitmap Index


Materialized View

Summary Management

• Summary Management:– Improves query response time– Is the key to the performance of data warehouse

• A summary is a table that– Stores pre-aggregated and pre-joined data– Is based on user query requirements

• Materialized views:– Store pre-computed aggregates in joins– Results are stored in the database– Use query rewrite– Improve query performance

Materialized ViewsSales

by RegionSales

by Date

Sales by Product

Sales by Channel

QueryRewrite

Materialized Views Typical Architecture Today

Region Date

Product Channel

SQL Query

RelationalStar Schema

New in Oracle Database 11gCube Organized Materialized Views

Materialized Views

Region Date

Product Channel

SQL Query

QueryRewrite

AutomaticRefresh

OLAP Cube

• Case study: Automotive industry reporting application

– Cube-organized MVs replaced table-based MVs

– Time to build aggregate data reduced by 89%

– Longer running queries reduced from 5 minutes to 12 seconds

– Transparent access to cube-MV• No changes to reporting

applications

385

380

100200300400500

11g Table MVs 11g Cube MV

Time to Build MVs (Minutes)

New in Oracle Database 11gCube Organized Materialized Views

300

120

100

200

300

11g Table MVs 11g Cube MV

Long Running Queries (Seconds)


SQL Query Result Cache

Data Warehouse Workload

• Analyze data across large data sets– reporting– forecasting – trend analysis– data mining

• Use parallel execution for good performance• Result

– very IO intensive workload – direct reads from disk– memory is less important

• mostly execution memory

Data Warehouse Query Example

• accesses very many rows• returns few rows

select p.prod_category, sum(s.amount_sold) revenuefrom products p, sales swhere s.prod_id = p.prod_idand s.time_idbetween to_date('01-JAN-2006','dd-MON-yyyy') and to_date('31-DEC-2006','dd-MON-yyyy')

group by rollup (p.prod_category)

Data Warehouse ConfigurationSizing

• Critical success factors– IO throughput

• number of physical disks• number of channels to disks

– CPU power• Everything else follows

– Storage capacity (500GB – 1TB common)- use surplus for high availability and ILM

– Memory capacity (4GB/CPU is “standard”)- use surplus for... RESULT CACHE

SQL Query Result CacheBenefits

• Caches results of queries, query blocks, or pl/sql function calls• Read consistency is enforced

– DML/DDL against dependent database objects invalidates cache• Bind variables parameterize cached result with variable values

Table 1

Table 2 Table 3

join

join

Group by

query 1executes

Table 1

Table 2 Table 3

join

join

Group bycachedresult

result iscached

Table 5 Table 6

join

Table 4

join

Group by

joinquery 2 uses cachedresult transparently

SQL Query Result CacheEnabling

• result_cache_mode initialization parameter– MANUAL, use hints to populate and use– FORCE, queries will use cache without hint

• result_cache_max_size initialization parameter– default is dependent on other memory settings

(0.25% of memory_target or 0.5% of sga_target or 1% of shared_pool_size)

– 0 disables result cache– never >75% of shared pool (built-in restriction)

• /*+ RESULT_CACHE */ hint in queries

SQL Query Result Cache Example

• Use RESULT_CACHE hint

select /*+ RESULT_CACHE */ p.prod_category, sum(s.amount_sold) revenuefrom products p, sales swhere s.prod_id = p.prod_idand s.time_idbetween to_date('01-JAN-2006','dd-MON-yyyy') and to_date('31-DEC-2006','dd-MON-yyyy')

group by rollup (p.prod_category)

SQL Query Result Cache Example

• Execution plan fragment

------------------------------------------------------------------| Id | Operation | Name |------------------------------------------------------------------| 0 | SELECT STATEMENT | || 1 | RESULT CACHE | fz6cm4jbpcwh48wcyk60m7qypu || 2 | SORT GROUP BY ROLLUP | ||* 3 | HASH JOIN | || 4 | PARTITION RANGE ITERATOR| ||* 5 | TABLE ACCESS FULL | SALES || 6 | VIEW | index$_join$_001 ||* 7 | HASH JOIN | || 8 | INDEX FAST FULL SCAN | PRODUCTS_PK || 9 | INDEX FAST FULL SCAN | PRODUCTS_PROD_CAT_IX |------------------------------------------------------------------


Data Warehousing Security

Data Security: Oracle Key Products

Core Platform Security

AuthenticationUser Management• Oracle Identity Management• Enterprise User Security

Data ProtectionEncryption• Oracle Advanced Security• Oracle Secure Backup• EM Data Masking

AuthorizationAccess Control• Oracle Database Vault• Virtual Private Database• Oracle Label Security

Auditing Monitoring• Database Auditing• Oracle Audit Vault• EM Configuration Pack

Virtual Private DatabaseReal Time Fine Grained Access Control

where account_mgr_id = sys_context('APP','CURRENT_MGR');

381-395-9223431-395-9332

483-562-0912461-978-8212

581-295-7603181-095-1232121-791-4212701-495-2123

1500017000

1200010000

1500025000

Select * from customers;

APPSSN

SYS_CONTEXT can be initialized via database login trigger or application login module

VPD

Virtual Private DatabaseColumn Relevant Fine Grained Access Control

• Introduced in Oracle Database 10g• Filter rows if specific column is referenced• Optionally return all rows but mask column

where account_mgr_id = sys_context('APP','CURRENT_MGR');

381-395-9223431-395-9332

483-562-0912461-978-8212

581-295-7603181-095-1232121-791-4212701-495-2123

1500017000

1200010000

1500025000

Select * from customers;

APPSSN

VPD


Real Application Cluster

Real Application Clusters

Benefits• Highest Availability• On-demand flexible

scalability• Lower computing costs• World record performance

Database

Storage

RAC: The Cluster Database

ClusteredDatabase Servers

Mirrored Disk Subsystem

High Speed Switch or Interconnect

Hub or Switch Fabric

Network

Centralized Management Console

Drive and Exploit Industry Advances in Clustering

Users

No SinglePoint Of Failure

Interconnect

Storage Area Network


RAC and PQ

SQL Parallel Execution

Messages QC connectionParallel server connection

QC is the user session that initiates the parallel SQL statement & it will distribute the work to parallel servers

Parallel servers - individual sessions that perform work in parallel They are allocated from a pool of globally available parallel server processes and assigned to a given operation

Parallel servers communicate among themselves & the QC using messages that are passed via memory buffers in the shared pool

Parallel Servers do majority of the work

Query Coordinator

SQL Parallel Execution Plan

ID Operation Name TQ IN-OUT PQ Distribution

0 SELECT STATEMENT

1 PX COORDINATOR

2 PX SEND QC {RANDOM} Q1,01 P->S

3 HASH JOIN Q1,01 PCWP

4 PX RECEIVE Q1,01 PCWP

5 PX SEND BROADCAST Q1,01 P->P BROADCAST

6 PX BLOCK ITERATOR Q1,01 PCWP

7 TABLE ACCESS FULL CUSTOMERS Q1,01 PCWP


9 TABLE ACCESS FULL SALES Q1,01 PCWP

SELECT c.cust_name, s.purchase_date, s.amount

FROM sales s, customers c

WHERE s.cust_id = c.cust_id;

select sum(revenue), storefrom line_itemsWhere profit(price,units) > 0.2order by store

Data on Disk Query Servers

scan

scan

scan

sort A-K

sort L-S

sort T-Z

dispatch work;assemble

results

Producers or scanners

ConsumersOr Aggregators)

Coordinator

Parallel Execution in action

ProducersProducers

ConsumersConsumers Query Coordinator

Producers and Consumer PQ sets in the execution plan

ID Operation Name TQ IN-OUT PQ Distribution

0 SELECT STATEMENT

1 PX COORDINATOR

2 PX SEND QC {RANDOM} Q1,01 P->S

3 HASH JOIN Q1,01 PCWP


5 PX SEND HASH Q1,01 P->P


7 TABLE ACCESS FULL CUSTOMERS Q1,01 PCWP


9 PX SEND HASH Q1,01 P->P


11 TABLE ACCESS FULL SALES Q1,01 PCWP

• Data is Partitioned into Granules (block range or partition)

• Each Parallel Server is Assigned Multiple Granules

• No two Parallel Servers ever contend for the same granule

• Granules are assigned so that the Load is Balanced Across all Parallel Scanners

• Dynamic Granules chosen by the optimizer

Oracle Parallel Query - Scanning

. . .

Parallel Server #1

Parallel Server #2

Parallel Server #3

PQ integration with Services

• Parallel Query slaves will only execute on nodes where the service of the query owner is active.

• No longer have to code instance_groups

DW

OLTP 1

OLTP 2

OLTP 3

OLTP 4

Node-4Node-3Node-2Node-1 Node-6Node-5

BatchReporting

New in 11g


RAC and ETL

• Very large queries utilize all resources on the cluster

RAC and Parallel Execution

Large Query

• Many large-scale DWs have many concurrent jobs–Multiple “small-to-medium” size queries –Degree of parallelism < CPUs-per-node

• With Oracle, queries will automatically run on a single node, eliminating traffic over the interconnect

RAC and Parallel Execution

Q1 Q2 Q4Q3Q5 Q7Q6 Q8

Q9 Q12Q11Q10

Controlling PQ on RAC Using services

Create two servicesSrvctl add service –d database_name

-s ETL-r sid1, sid2

Srvctl add service –d database_name-s AHOC-r sid3, sid4

ETL Ad-Hoc queries

Note: Prior to 11g use init.ora parameters instance_groups and parallel_instance_group to control PQ on RAC


RAC and ETL

Typical Architecture

SGA

ETL

SGA

Reporting

Disk Copy

Real Application Clusters

SGA

ETL

SGA

Reporting

IO Issues with Real Application Clusters

SGA

ETL

SGA

Reporting

IO Contention???

IO Issues

SGA

ETL

SGA

Reporting

IO Issues

SGA

ETL

SGA

Reporting

IO Issues

SGA

ETL

SGA

Reporting

SGA

ETL

SGA

Reporting

IO Issues


Automatic Storage Management

The Ideal Storage Configuration

• S.A.M.E.– Stripe And Mirror Everything– Optimize throughput across as many physical disks as

possible – stripe across all devices– Exception: storage tiers

• Automatic Storage Manager (ASM) – Implements S.A.M.E. per disk group (mirroring optional) – Simplifies and automates database storage management

• Automatic rebalancing• Separate disk groups for different storage tiers

ASMOptimal Performance – No Space Wasted

• Balance IO– Across disks– Across disk arrays

Automatic Storage ManagementLowers the cost of storage management

• Virtualize and share storage resources• Advanced data striping for maximum I/O

performance• Online addition and migration of storage

HR SALES ERP

© 2009 Oracle Corporation – Proprietary and Confidential

Information Lifecycle ManagementOptimize storage cost and performance

• Low Cost

• Enforceable Compliance Policies

• Transparent to Applications

Active

Oracle Database 9i 10g 11gwith Partitioning Option, VPD &

Advanced Compression

LessActive Historical

ApplicationsOracle Desktop

AppsPortals &

Browsers ISV Apps


Integrating Unstructured Data in Data Warehousing

Integrating Unstructured Data

Images

New in Oracle Database 11gCritical New Data Types

RFID Data Types

DICOMMedical Images

3D SpatialImages


Data Warehousing- An Architecture Approach

Data Volume Growth

Byte Value Name Value 1,000 1.E+03 kilobyte (KB)

1,000,000 1.E+06 megabyte (MB) 1,000,000,000 1.E+09 gigabyte (GB)

1,000,000,000,000 1.E+12 terabyte (TB) 1,000,000,000,000,000 1.E+15 petabyte (PB)

1,000,000,000,000,000,000 1.E+18 exabyte (EB) 1,000,000,000,000,000,000,000 1.E+21 zettabyte (ZB)

1,000,000,000,000,000,000,000,000 1.E+24 yottabyte (YB)

Data Volume Growth

• 2K – A typewritten page• 5M – The complete works of

Shakespeare• 10 M – One minute of high

fidelity sound• 2 T – Information generated

on YouTube in one day• 10T – 530,000,000 miles of

bookshelves at Library of congress

• 20P – All hard-disk drives in 1995 (or your database in 2010)

Data Volume Growth

• 700P – Data of 700,000 companies with Revenues less than $200M

• 1E – Combined Fortune 1000 company database (1P each) • 1E – Next 9000 world company databases (average 100T each)• 8E – Capacity of ONE Oracle10g/11g Database (CURRENT)• 12E to 16E – Info generated before 1999 (memory resident in

64-bit)• 16E – Addressable memory with 64-bit (CURRENT)• 161E – New information in 2006 (most images not stored in DB)• 1Z – 1000E (Zettabyte – Grains of sand on beaches – 125

Oracle DBs)• 100TY – Yottabytes – Addressable memory 128-bit (FUTURE)

8 Exabytes:Look what fits in one 10g/11g Database!

• All databases of the largest 1,000,000 companies in the world (3E).

• All Information generated in the world in 1999 (2E)• All Information generated in the world in 2006 (5E)• All Email generated in the world in 2006 (6E)• 1 Mount Everest filled with Documents (approx.)

DW Performance Tuning – An Architecture Approach

• End-to-End Approach– Web tier– Application tier– Database tier– Storage– Network

• Design and Configuration– Hardware– Logical model– Physical model– System Management

DW Performance Tuning – A Mathematical Approach

• A Balanced Configuration• CPU Throughput• HBA Throughput• Network Throughput • Disk Throughput• Memory and CPU ratios

DiskArray 1

DiskArray 2

DiskArray 3

DiskArray 4

DiskArray 5

DiskArray 6

DiskArray 7

DiskArray 8

FC-Switch1 FC-Switch2H

BA1

HBA

2

HBA

1H

BA2

HBA

1H

BA2

HBA

1

HBA

2

Balanced Configuration “The weakest link” defines the throughput

CPU Quantity and Speed dictatenumber of HBAs capacity of interconnect

HBA Quantity and Speed dictatenumber of Disk Controllers Speed and quantity of switches

Controllers Quantity and Speed dictatenumber of DisksSpeed and quantity of switches

Disk Quantity and Speed

Data Warehouse hardware configuration best practices

• Build a balance hardware configuration– Total throughput = # cores X 100-200 MB (depends on chip set) – Total HBA throughput = Total core throughput

• If total core throughput =1.6GB will need 4 4Gb HBAs– Use 1 disk controller per HBA Port (throughput capacity must be

equal) – Switch must be same capacity as HBA and disk controllers – Max of 10 physical disks per controller(Use smaller drives 146 or

300 GB)

• Minimum of 4GB of Memory per core (8GB if using compression)

Throughput in Real Systems MB/sec

• Graph shows throughput achieved in real-world deployments– Infiniband is held back by PCIe 1.0 x8 bus on typical host systems

0

200

400

600

800

1000

1200

1400

GigabitEthernet

4Gb Fibre 20Gb Infiniband

MB/sec

120 MB/sec400 MB/sec

Single Connection Throughput

CPU Throughput

1 CPU 4 CPU 8 CPUs 16 CPUs 20 CPUs

100 400 800 1,600 2,000

200 800 1,600 3,200 4,000

MB/Sec MB/Sec MB/Sec MB/Sec MB/Sec

HBA Throughput

1 HBA 2 HBAs 4 HBAs 8 HBAs 16 HBAs

2 Gb 200 400 800 1,600 3,200

4 Gb 400 800 1,600 3,200 6,400


15,000 RPM SAS Disk

1 Disk 2 Disks 4 Disks 8 Disks 12 Disks

90 180 360 720 1,080


CPU and Memory

1 CPU 4 CPU 8 CPUs 16 CPUs 20 CPUs

4 16 32 64 80

8 32 64 128 160

GB GB GB GB GB

Sizing Data Warehouses

Database CPUs Memory Actuators LUNs Disks Raid

Database CPUs Memory Actuators LUNs Disks Raid

An unbalanced configuration

A balanced configuration

100%Possible

Efficiency

100%Possible

Efficiency

100%AchievedEfficiency

< 50%AchievedEfficiency


The New WorldThe Oracle Database MachineExadata

Exadata V2 Goals

• Ideal Database Platform – Best Machine for Data Warehousing– Best Machine for OLTP– Best Machine for Database Consolidation

• Unique Architecture Makes it– Fastest, Lowest Cost

© 2010 Oracle Corporation 131

Oracle – Customer Internal Use Only

• Large data warehouses want to scan dozens, hundreds, or thousands of disks at full disk speed

• Pipes between disks and servers constrain bandwidth by 10x or more• Result is that warehouses can become slower as they get bigger

The Performance ChallengeStorage Data Bandwidth Bottleneck

Oracle – Sun Database Backgrounder

Solutions To Data Bandwidth Bottleneck

• Add more pipes – Massively parallel architecture• Make the pipes wider – 10X faster than conventional

storage• Ship less data through the pipes – Process data in storage

134

Exadata is Smart Storage

• Exadata cell is smart storage, not a database node– Storage remains an independent tier

• Database Servers– Perform complex database processing such as

joins, aggregation, etc.

• Exadata Cells– Search tables and indexes filtering out data that is not

relevant to a query– Cells serve data to multiple databases enabling

OLTP and consolidation– Simplicity, and robustness of storage appliance

Data Intensive Processing

Compute and Memory Intensive Processing

Exadata

© 2010 Oracle Corporation

• Data Intensive processing runs in Exadata Storage Grid– Filter rows and columns as data streams from disks (112 Intel Cores)– Scale-out storage removes bottlenecks

• Example: How much product X sold in month Y

10TB Read

DB CPUs Filter

Hours!

Exadata Intelligent Storage GridMost Scalable Data Processing

Traditional Storage

Exadata Storage

10TB ReadExadata Filters

100GB Sent

Seconds!

10TB Sent

DB Servers


Exadata Hardware Architecture

Database Grid• 8 compute servers

(1U)

• 64 Intel cores• 576 GB RAM

Storage Grid

• 14 storage servers (2U)

• 112 Intel cores in storage• 100 TB SAS disk, or

336 TB SATA disk• 5 TB PCI Flash

• Data mirrored across storage servers

Scaleable Grid of industry standard servers for Compute and Storage

• Eliminates long-standing tradeoff between Scalability, Availability, Cost

InfiniBand Network• 3 36-port 40Gb/s switches

• Unified Net- servers & storage• 324 FC Ports equivalent

© 2010 Oracle Corporation

136

Keys to Speed and Cost Advantage


Exadata Hybrid Columnar

Compression

Exadata Intelligent Storage

Grid

Exadata Smart Flash Cache

Oracle – Customer Internal Use Only

Benefits Multiply

1 TBwith compression

10 TB of user dataRequires 10 TB of

IO

100 GBwith partition pruning

20 GB with Storage Indexes

5 GB Smart Scan on Memory or Flash

SubsecondOn Database

Machine

Data is 10x Smaller, Scans are 2000x faster

I/O Scheduling, the Traditional Way

– With traditional storage, I/O schedulers are black boxes• You cannot influence their behavior!

– I/O requests are processed in FIFO order– Some reordering may be done to improve disk efficiency

Disk Queue

Traditional

Storage Server

H L H L L L

High-Priority Workload

Low-Priority Workload

RDBMSI/O

Requests

Best Data Warehouse Machine

• Massively parallel high volume hardware to quickly process vast amounts of data

– Exadata runs data intensive processing directly in storage

• Most complete analytic capabilities – OLAP, Statistics, Spatial, Data Mining, Real-time

transactional ETL, Efficient point queries

• Powerful warehouse specific optimizations– Flexible Partitioning, Bitmap Indexing, Join indexing, Materialized Views, Result

Cache

• Dramatic new warehousing capabilitiesData Mining

OLAP

ETL

New

Thanks For Coming !!

Daniel Liu Contact InformationEmail: [email protected]

Email: [email protected]

Company Web Site:http://www.oracle.com