Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
<Insert Picture Here>
Oracle Database 11g for Data Warehousing and Business Intelligence
Daniel T. LiuPrincipal Solution Architect
UTOUG Training DayDate: Wednesday, March 30, 2011
Time: 09:00 AM – 11:00 AM Venue: 9750 South 300 West
Sandy, Utah 84070
Agenda
1. Introduction2. Oracle DW/BI
Overview3. Oracle Database
Features for DW/BI4. Summary5. Q & A
Why a Data Warehouse?
Business Intelligence Tools
E2EEnd to end view of the business
Single TruthConsolidated, cleansed, high quality system of record
360° ViewComplete view of global customers
Enterprise Data Warehouse
User Dilemma
"Looks like you've got all the data—what's the holdup?"Cartoon by Davis Harbaugh, HBR January 2006
Data Warehouse: Essential to BI
Transaction Systems
Business Intelligence Suite
Financial Performance Management Applications
Operational BI Applications
Enterprise Data Warehouse
Staging areafor supporting ETL as neededStaging Area
Atomic Data Layer
Performance Data Layer
Dimensional view of dataApplication-specific performance structuresSummary data / materialized viewsSupports specific end-users, tools, and applications
Physical Layers of a Data Warehouse
Base data warehouse schemaAtomic-level data, 3nf designSupports general end-user queriesData feeds to all dependent systems
Different PerspectivesInformation Technology & Line of Business
• Information Access• Purchased by LOB• Rapid business benefits• Heterogeneous &
ubiquitous access• Information consumption
represents value
Business Intelligence
• Information Management• Purchased by IT• Robust data architecture• Single version of the truth• Information management
represents value
Data Warehouse
Oracle’s Goal for Data Warehousing and Business Intelligence
• A comprehensive platform for data warehousing and business intelligence:– Industry-leading scalability and performance– Deep integrated analytics– Embedded data quality
All in a single platform running on a low-cost, highly-available enterprise grid
OLTP (Online Transaction Processing)- Characteristics
• Small data volumes• Read/write intensive• Lots of users • High transaction rates• Transactions are relatively small in terms of data processed• Significant write activity (insert/update/delete) • Fast access via indexes • High buffer cache usage • Limited scope for parallelism
OLAP (Online Analytical Processing)- Characteristics
• Deals with historical data• Large data volumes not uncommon
• Operations often require lots of resources• Access many tables• Perform expensive calculations• Complex SQL• Long running (may be hours)
OLAP (Online Analytical Processing)- Characteristics
• Filled through a controlled process– Extract, Transform, Load (ETL) – Often multiple sources, including flat files
• Supports different pre-defined workloads, e.g.– Scheduled ETL– Scheduled reporting– Ad-hoc end user access at business hours
• Peak usage of different workload patterns at different times– System have to be sized appropriately
IT Dilemma
“I am in a hurry, can I have my steak in 2 minutes?"
OLTP OLAP
Oracle’s Approach to Data Warehousing
Data Warehouse
Oracle Optimized Warehouse
Oracle Database EE
Scalability PartitioningRAC
Analytics OLAPData Mining
IntegrationOWB Ent ETLOWB Data QualityOWB ConnectorsODI, GoldenGateDatabase Gateways
Security VPD
Managability Diagnostics PackTuning Pack
Oracle’s Approach to Data Warehousing
Data Warehouse
Oracle Optimized Warehouse
Oracle Database EE
Availability Data GuardRAC
Agile RACASM
Performance PartitioningMaterialized ViewParallel QuerySQL Query Result Cache
Hardware Exadata
Storage ASMAdvanced Compression
Oracle Business IntelligenceComplete, Open, Integrated
Data WarehouseData Mart
SAP, OraclePeopleSoft, Siebel,Custom Apps
Files ExcelXML
BusinessProcessEssbase
Common Enterprise Information Model
Data Integration
Ad-hoc Analysis
Interactive Dashboards
Essbase Reporting &Publishing
ProactiveDetectionand Alerts
Disconnected& MobileAnalytics
MS Office& OutlookIntegration
OLTP & ODSSystems
BI Server• Integrated Security, User Management, Personalization• Intelligence Caching• Intelligent Request Generation and Optimized Data Access Services
<Insert Picture Here>
Oracle11g Key Featuresfor Data Warehousing and
Business Intelligence
Broaden Oracle’s Technology Lead
New Features for BI/DW
• Manageability – Partition Advisor– Interval Partitioning– SQL Plan Management– Automatic SQL Tuning with Self-Learning Capabilities– Enhanced Optimizer Statistics Maintenance– Multi-Column Optimizer Statistics– Parallel Query enhancements with RAC– ASM Fast Resync, Fast VLDB Startup and other
enhancements• VLDB
– Composite Range-Range– Composite List-Range– Composite List-List– Composite List-Hash– REF Partitioning– Virtual Column Partitioning– Compression enhancements
• Performance– Result Cache
• Data loading– Change data capture enhancements– Materialized view refresh enhancements
• SQL– SQL Pivot and Unpivot– Materialized view rewrite enhancements
• OLAP– Simplified application development
• Fully declarative cube calculations• Cost-Based Aggregation• Simpler calculation definitions
– Continued database integration• Cube metadata in the Data Dictionary• SQL optimizer enhanced for cubes• Fine-grained data security on cubes
• Data Mining– Simplified development and deployment of models
• Supermodels: data preparation combined with mining model
• Additional packaged predictive analytics • Integration in database dictionary
– New algorithms: “General Linear Models”• Encapsulates several widely used analytic
methods• Multivariate linear regression; logistic regression
Large-Scale Data WarehousesFeature Usage
Source: Oracle ST Survey
Key DW/BI Features
1. Partitioning2. Advance
Compression3. Invisible Index4. Bitmap Index5. Materialized View6. SQL Query Result
Cache7. DW Security
Key DW/BI Features
8. Real Application Cluster9. RAC and PQ10. RAC and ETL11. Automatic Storage
Management12. Information Life Cycle
Management13. Integrating Unstructured
Data14. DW Architecture
Approach15. Exadata
<Insert Picture Here>
Oracle Partitioning in Oracle Database 11g
Oracle PartitioningTen Years of Development
Interval PartitioningPartition Advisor
More composite choicesREF PartitioningVirtual Column Partitioning
Oracle Database 11g
Fast drop table“Multi-dimensional” pruning
1M partitions per tableOracle10g R2
Local Index maintenance
Global hash indexesOracle10g
Fast partition splitComposite range-list partitioning
Oracle9i R2
Global index maintenance
List partitioningOracle9i
Merge operationPartition-wise joins“Dynamic” pruning
Hash and composite range-hash partitioning
Oracle8i
Basic maintenance operations: add, drop, exchange
“Static” partition pruning
Range partitioningGlobal range indexes
Oracle8ManageabilityPerformanceCore functionality
• Complete the basic partitioning strategiesdefines HOW data is going to be partitioned
– new composite partitioning methods
• Introduce partitioning extensionsdefines WHAT controls the data placement
– enhance the manageability and automation– virtual column based partitioning– REF partitioning – interval partitioning– partition advisor
Oracle PartitioningEnhancements
<Insert Picture Here>
Composite Partitioning in Oracle Database 11g
• Concept of composite partitioning– Data is partitioned along two dimensions (A,B) – A distinct value pair for the two dimensions uniquely
determines the target partitioning
• Composite partitioning is complementary to multi-column range partitioning
• Extensions in Oracle Database 11g ..
Extended Composite Partitioning Strategies
New 11g Strategy Use CaseList – Range Geography -TimeRange - Range ShipDate - OrderDateList - Hash Geography - OrderIDList - List Geography - Product
Table SALESRANGE(order_date)-RANGE(ship_date)
Composite Partitioning - Concept
ship_date
... ...
... ...
... ...
...
...
...
...Jan 2006 Feb 2006 Jan 2007
Jan 2006
Feb 2006
May 2006
Mar 2006order_date
Composite Partitioning - Concept
Jan 2006
... ...
Feb 2006 Mar 2006 Jan 2007
... ...
... ...
...
...
...
...
Jan 2006
Feb 2006
May 2006
Mar 2006
• All records with order_date in March 2006
ship_date
order_date
Table SALESRANGE(order_date)-RANGE(ship_date)
Composite Partitioning - Concept
Jan 2006
... ...
Feb 2006 Mar 2006 Jan 2007
... ...
... ...
...
...
...
...
Jan 2006
Feb 2006
May 2006
• All records with ship_date in May 2006
May2006
ship_date
order_date
Table SALESRANGE(order_date)-RANGE(ship_date)
Composite Partitioning - Concept
Jan 2006
... ...
Feb 2006 Mar 2006 Jan 2007
... ...
... ...
...
...
...
...
Jan 2006
Feb 2006
May 2006
Mar 2006
• All records with order_date in March 2006ANDship_date in May 2006
May2006
ship_date
order_date
Table SALESRANGE(order_date)-RANGE(ship_date)
<Insert Picture Here>
Virtual Column based Partitioning
Virtual Columns
Business Problem• Extended Schema attributes are fully derived and dependent on
existing common data• Redundant storage or extended view definitions are solving this
problem today– requires additional maintenance and creates overhead
Solution• Oracle Database 11g introduces virtual columns
– purely virtual, meta-data only• Treated as real columns except no DML
– can have statistics– eligible as partitioning key
• Enhanced performance and manageability
Virtual Columns - Example
• Base table with all attributes ...
CREATE TABLE accounts (acc_no number(10) not null,acc_name varchar2(50) not null, ...
12500 Adams12507 Blake1266612875 Smith
King
Virtual Columns - Example
12500 Adams12507 12Blake12666 1212875 12Smith
King
CREATE TABLE accounts (acc_no number(10) not null,acc_name varchar2(50) not null, ...acc_branch number(2) generated always as (to_number(substr(to_char(acc_no),1,2)))
12
• Base table with all attributes ...• ... is extended with the virtual (derived) column
Virtual Columns - Example
12500 Adams12507 12Blake12666 1212875 12Smith
King
CREATE TABLE accounts (acc_no number(10) not null,acc_name varchar2(50) not null, ...acc_branch number(2) generated always as
(to_number(substr(to_char(acc_no),1,2))) partition by list (acc_branch) ...
12
• Base table with all attributes ...• ... is extended with the virtual (derived) column• ... and the virtual column is used as partitioning key
32320 Jones32407 32Clark32758 3232980 32Phillips
32
... Hurd
<Insert Picture Here>
Interval Partitioning
Interval Partitioning
• Partitioning is key-enabling functionality for managing large volumes of data– one logical object for application transparency– multiple physical segments for administration
but• Physical segmentation requires additional data
management overhead– new partitions must be created on-time for new data
MarJanFeb
CDRs
SQLApplication
Automate the partition management
Interval PartitioningHow it worksCREATE TABLE sales (order_date DATE, ...) PARTITON BY RANGE (order_date) INTERVAL(NUMTOYMINTERVAL(1,'month') (PARTITION p_first VALUES LESS THAN ('01-JAN-2006');
Table SALES
Jan 2006
... ...
Feb 2006 Mar 2006 Jan 2007 Oct 2009 Nov 2009
...
First segment is created
Table SALES
Jan 2006
... ...
Feb 2006 Mar 2006 Jan 2007 Oct 2009 Nov 2009
...
Other partitions only exist in metadata
CREATE TABLE sales (order_date DATE, ...) PARTITON BY RANGE (order_date) INTERVAL(NUMTOYMINTERVAL(1,'month') (PARTITION p_first VALUES LESS THAN ('01-JAN-2006');
Interval PartitioningHow it works
Table SALES
Jan 2006
... ...
Feb 2006 Mar 2006 Jan 2007 Oct 2009 Nov 2009
...
INSERT INTO sales (order_date DATE, ...) VALUES ('04-MAR-2006',...);
New segment is automatically allocated
CREATE TABLE sales (order_date DATE, ...) PARTITON BY RANGE (order_date) INTERVAL(NUMTOYMINTERVAL(1,'month') (PARTITION p_first VALUES LESS THAN ('01-JAN-2006');
Interval PartitioningHow it works
Table SALES
Jan 2006
... ...
Feb 2006 Mar 2006 Jan 2007 Oct 2009 Nov 2009
...
INSERT INTO sales (order_date DATE, ...) VALUES ('17-OCT-2009',...);
... whenever data for a new partition arrives
CREATE TABLE sales (order_date DATE, ...) PARTITON BY RANGE (order_date) INTERVAL(NUMTOYMINTERVAL(1,'month') (PARTITION p_first VALUES LESS THAN ('01-JAN-2006');
Interval PartitioningHow it works
Table SALES
Jan 2006
... ...
Feb 2006 Mar 2006 Jan 2007 Oct 2009 Nov 2009
...
• Interval partitioned table can have classical range and automated interval section– Automated new partition management plus full partition
maintenance capabilities: “Best of both worlds”
Interval PartitioningHow it works
Table SALES
... ...
2006 Jan 2007 Oct 2009 Nov 2009
...
• Interval partitioned table can have classical range and automated interval section– Automated new partition management plus full partition
maintenance capabilities: “Best of both worlds”
MERGE and move old partitions for ILM
Range partition section
Interval PartitioningHow it works
Table SALES
... ...
Jan 2007 Oct 2009 Nov 2009
...
• Interval partitioned table can have classical range and automated interval section– Automated new partition management plus full partition
maintenance capabilities: “Best of both worlds”
MERGE and move old partitions for ILM
Range partition section Interval partition section
Insert new data- Automatic segment creation
Interval PartitioningHow it works
2006
INSERT INTO sales (order_date DATE, ...) VALUES ('13-NOV-2009',...);
<Insert Picture Here>
REF Partitioning
REF Partitioning
Business Problem• Related tables benefit from same partitioning strategy
– e.g. order – lineitem • Redundant storage of the same information solves this problem
– data overhead– maintenance overhead
Solution• Oracle Database 11g introduces REF Partitioning
– child table inherits the partitioning strategy of parent table through PK-FK relationship
– intuitive modelling• Enhanced Performance and Manageability
Before REF Partitioning
Table ORDERS
Jan 2006
... ...
Feb 2006
Table LINEITEMS
Jan 2006
... ...
Feb 2006
• Redundant storage of order_date• Redundant maintenance
• RANGE(order_date) • Primary key order_id
• RANGE(order_date) • Foreign key order_id
REF Partitioning
Table ORDERS
Jan 2006
... ...
Feb 2006
Table LINEITEMS
Jan 2006
... ...
Feb 2006
• RANGE(order_date) • Primary key order_id
• RANGE(order_date) • Foreign key order_id
PARTITION BY REFERENCE• Partitioning key inherited
through PK-FK relationship
Select sum(sales_amount)
From
SALES s, CUSTOMER c
Where s.cust_id = c.cust_id;
Both tables have the same degree of parallelism and are partitioned the same way on the join column (cust_id)
SalesRange partition May 18th 2008
Sub part 2
Sub part 3
Sub part 4
Sub part 1
CustomerRange partition May 18th 2008
Sub part 2
Sub part 3
Sub part 4
Sub part 1
Sub part 2
Sub part 3
Sub part 4
Sub part 1
Sub part 2
Sub part 3
Sub part 4
Sub part 1
A large join is divided into multiple smaller joins, each joins a pair of partitions in parallel
Partition Wise join
Well-tunedSQL &
Schema
Partitioning Advisor
• Considers entire query workload to improve query performance
• Advises on partitioning methods – Range (equal-interval), range
key and interval– Hash, hash key
• Integrated, non-conflicting advice with Indexes, MVs
SQL Workload
PackagedApps
Custom Apps
SQL Advisor
SQL PlanTuning
SQLStructureAnalysis
AccessAnalysis
SQLProfile
SQLAdvice
Indexes& MVs
PartitionAnalysis
Partition Advice
New!
<Insert Picture Here>
Oracle Database 11g
Advanced Compression Option
Challenges
• Explosion in data volume managed by Enterprises– Government regulations (Sarbanes-Oxley, HIPPA, etc)– User generated content (Web 2.0)
• IT managers must support larger volumes of data with limited technology budgets – Need to optimize storage consumption– Also maintain acceptable application performance
• Intelligent and efficient compression technology can help address these challenges
Introducing Advanced Compression Option
• Oracle Database 11g introduces a comprehensive set of compression capabilities – Structured/Relational data compression– Unstructured data compression– Compression for backup data– Network transport compression
• Reduces resource requirements and costs – Storage System– Network Bandwidth– Memory Usage
Redo logs BackupsStandby
OLTP Table Compression
• Oracle Database 11g extends compression for OLTP data– Support for conventional DML Operations
(INSERT, UPDATE, DELETE)
• New algorithm significantly reduces write overhead– Batched compression ensures no impact for most OLTP
transactions
• No impact on reads– Reads may actually see improved performance due to fewer
IOs and enhanced memory efficiency
OLTP Table Compression
Overhead
Free Space
Uncompressed
Compressed
Inserts are uncompressed
Block usage reaches PCTFREE – triggers Compression
Inserts are again uncompressed
Block usage reaches PCTFREE – triggers Compression
• Adaptable, continuous compression• Compression automatically triggered when block usage
reaches PCTFREE• Compression eliminates holes created due to deletions
and maximizes contiguous free space in block
OLTP Table Compression
SmithJohn3DoeJane4
DoeJane2DoeJohn1
LAST_NAMEFIRST_NAMEID
Employee Table Initially Uncompressed Block
INSERT INTO EMPLOYEEVALUES (5, ‘Jack’, ‘Smith’);
COMMIT;
1•John•Doe 2•Jane• Doe 3•John•Smith 4• Jane • Doe
Free Space
Header
OLTP Table Compression
Block
John= |Doe= |Jane= |Smith=
Header
DoeJane4SmithJohn3
SmithJack5
DoeJane2DoeJohn1
LAST_NAMEFIRST_NAMEID
Employee Table
1•John•Doe 2•Jane• Doe 3•John•Smith 4• Jane • Doe
Free Space
1• • 2• • 3• • 4 • • 5•Jack•
Free Space
Compressed Block
Local Symbol Table
OLTP Table Compression
1•John•Doe 2•Jane• Doe 3•John•Smith 4• Jane • Doe 5•Jack •Smith Free Space
Header
Uncompressed Block
John= |Doe= |Jane= |Smith=
Header
1•John•Doe 2•Jane• Doe 3•John•Smith 4• Jane • Doe
Free Space
1• • 2• • 3• • 4 • • 5•Jack•
Free Space
Compressed Block
Local Symbol Table More Data
Per Block
Advanced Compression OptionSave Disk, Reduce I/O, Maximise Memory
Compression4XUp To
• Compress Large Application Tables• Transaction processing, data warehousing
• Compress All Data Types• Structured and unstructured data types
• Compress Backup Data– Faster RMAN compression– Data Pump compression
• Typical Compression of 2-4X• Cascade storage savings throughout data center
Real World Compression Results10 Largest ERP Database Tables
0
500
1000
1500
2000
2500
3x Saving
Data Storage
Table Scans
0
0.1
0.2
0.3
0.4
2.5x Faster
DML Performance
0
10
20
30
40
< 3% Overhead
<Insert Picture Here>
Invisible Index
Invisible Index
• An invisible index is an index that is ignored by the optimizer unless you explicitly set the OPTIMIZER_USE_INVISIBLE_INDEXES initialization parameter to TRUE at the session or system level. The default value for this parameter is FALSE.
• Making an index invisible is an alternative to making it unusable or dropping it. Using invisible indexes, you can do the following:– Test the removal of an index before dropping it.– Use temporary index structures for certain operations or modules of
an application without affecting the overall application.
Invisible Index
• Here are a few examples:
SQL> alter index emp_id_idx invisible;
SQL> alter index emp_id_idx visible;
SQL> create index emp_id_idx on emp (emp_id) invisible;
<Insert Picture Here>
Bitmap Index
Bitmap Index vs B-tree Index
• In bitmap structures, a two-dimensional array is created with one column for every row in the table being indexed.
• Bitmap indexes are good for low-cardinality columns• Tables that have little insert/update are good
candidates (static data in warehouse)• The advantages of them are that they have a highly
compressed structure, making them fast to read and their structure makes it possible for the system to combine multiple indexes together for fast access to the underlying table.
Bitmap Index
<Insert Picture Here>
Materialized View
Summary Management
• Summary Management:– Improves query response time– Is the key to the performance of data warehouse
• A summary is a table that– Stores pre-aggregated and pre-joined data– Is based on user query requirements
• Materialized views:– Store pre-computed aggregates in joins– Results are stored in the database– Use query rewrite– Improve query performance
Materialized ViewsSales
by RegionSales
by Date
Sales by Product
Sales by Channel
QueryRewrite
Materialized Views Typical Architecture Today
Region Date
Product Channel
SQL Query
RelationalStar Schema
New in Oracle Database 11gCube Organized Materialized Views
Materialized Views
Region Date
Product Channel
SQL Query
QueryRewrite
AutomaticRefresh
OLAP Cube
• Case study: Automotive industry reporting application
– Cube-organized MVs replaced table-based MVs
– Time to build aggregate data reduced by 89%
– Longer running queries reduced from 5 minutes to 12 seconds
– Transparent access to cube-MV• No changes to reporting
applications
385
380
100200300400500
11g Table MVs 11g Cube MV
Time to Build MVs (Minutes)
New in Oracle Database 11gCube Organized Materialized Views
300
120
100
200
300
11g Table MVs 11g Cube MV
Long Running Queries (Seconds)
<Insert Picture Here>
SQL Query Result Cache
Data Warehouse Workload
• Analyze data across large data sets– reporting– forecasting – trend analysis– data mining
• Use parallel execution for good performance• Result
– very IO intensive workload – direct reads from disk– memory is less important
• mostly execution memory
Data Warehouse Query Example
• accesses very many rows• returns few rows
select p.prod_category, sum(s.amount_sold) revenuefrom products p, sales swhere s.prod_id = p.prod_idand s.time_idbetween to_date('01-JAN-2006','dd-MON-yyyy') and to_date('31-DEC-2006','dd-MON-yyyy')
group by rollup (p.prod_category)
Data Warehouse ConfigurationSizing
• Critical success factors– IO throughput
• number of physical disks• number of channels to disks
– CPU power• Everything else follows
– Storage capacity (500GB – 1TB common)- use surplus for high availability and ILM
– Memory capacity (4GB/CPU is “standard”)- use surplus for... RESULT CACHE
SQL Query Result CacheBenefits
• Caches results of queries, query blocks, or pl/sql function calls• Read consistency is enforced
– DML/DDL against dependent database objects invalidates cache• Bind variables parameterize cached result with variable values
Table 1
Table 2 Table 3
join
join
Group by
query 1executes
Table 1
Table 2 Table 3
join
join
Group bycachedresult
result iscached
Table 5 Table 6
join
Table 4
join
Group by
joinquery 2 uses cachedresult transparently
SQL Query Result CacheEnabling
• result_cache_mode initialization parameter– MANUAL, use hints to populate and use– FORCE, queries will use cache without hint
• result_cache_max_size initialization parameter– default is dependent on other memory settings
(0.25% of memory_target or 0.5% of sga_target or 1% of shared_pool_size)
– 0 disables result cache– never >75% of shared pool (built-in restriction)
• /*+ RESULT_CACHE */ hint in queries
SQL Query Result Cache Example
• Use RESULT_CACHE hint
select /*+ RESULT_CACHE */ p.prod_category, sum(s.amount_sold) revenuefrom products p, sales swhere s.prod_id = p.prod_idand s.time_idbetween to_date('01-JAN-2006','dd-MON-yyyy') and to_date('31-DEC-2006','dd-MON-yyyy')
group by rollup (p.prod_category)
SQL Query Result Cache Example
• Execution plan fragment
------------------------------------------------------------------| Id | Operation | Name |------------------------------------------------------------------| 0 | SELECT STATEMENT | || 1 | RESULT CACHE | fz6cm4jbpcwh48wcyk60m7qypu || 2 | SORT GROUP BY ROLLUP | ||* 3 | HASH JOIN | || 4 | PARTITION RANGE ITERATOR| ||* 5 | TABLE ACCESS FULL | SALES || 6 | VIEW | index$_join$_001 ||* 7 | HASH JOIN | || 8 | INDEX FAST FULL SCAN | PRODUCTS_PK || 9 | INDEX FAST FULL SCAN | PRODUCTS_PROD_CAT_IX |------------------------------------------------------------------
<Insert Picture Here>
Data Warehousing Security
Data Security: Oracle Key Products
Core Platform Security
AuthenticationUser Management• Oracle Identity Management• Enterprise User Security
Data ProtectionEncryption• Oracle Advanced Security• Oracle Secure Backup• EM Data Masking
AuthorizationAccess Control• Oracle Database Vault• Virtual Private Database• Oracle Label Security
Auditing Monitoring• Database Auditing• Oracle Audit Vault• EM Configuration Pack
Virtual Private DatabaseReal Time Fine Grained Access Control
where account_mgr_id = sys_context('APP','CURRENT_MGR');
381-395-9223431-395-9332
483-562-0912461-978-8212
581-295-7603181-095-1232121-791-4212701-495-2123
1500017000
1200010000
1500025000
Select * from customers;
APPSSN
SYS_CONTEXT can be initialized via database login trigger or application login module
VPD
Virtual Private DatabaseColumn Relevant Fine Grained Access Control
• Introduced in Oracle Database 10g• Filter rows if specific column is referenced• Optionally return all rows but mask column
where account_mgr_id = sys_context('APP','CURRENT_MGR');
381-395-9223431-395-9332
483-562-0912461-978-8212
581-295-7603181-095-1232121-791-4212701-495-2123
1500017000
1200010000
1500025000
Select * from customers;
APPSSN
VPD
<Insert Picture Here>
Real Application Cluster
Real Application Clusters
Benefits• Highest Availability• On-demand flexible
scalability• Lower computing costs• World record performance
Database
Storage
RAC: The Cluster Database
ClusteredDatabase Servers
Mirrored Disk Subsystem
High Speed Switch or Interconnect
Hub or Switch Fabric
Network
Centralized Management Console
Drive and Exploit Industry Advances in Clustering
Users
No SinglePoint Of Failure
Interconnect
Storage Area Network
<Insert Picture Here>
RAC and PQ
SQL Parallel Execution
Messages QC connectionParallel server connection
QC is the user session that initiates the parallel SQL statement & it will distribute the work to parallel servers
Parallel servers - individual sessions that perform work in parallel They are allocated from a pool of globally available parallel server processes and assigned to a given operation
Parallel servers communicate among themselves & the QC using messages that are passed via memory buffers in the shared pool
Parallel Servers do majority of the work
Query Coordinator
SQL Parallel Execution Plan
ID Operation Name TQ IN-OUT PQ Distribution
0 SELECT STATEMENT
1 PX COORDINATOR
2 PX SEND QC {RANDOM} Q1,01 P->S
3 HASH JOIN Q1,01 PCWP
4 PX RECEIVE Q1,01 PCWP
5 PX SEND BROADCAST Q1,01 P->P BROADCAST
6 PX BLOCK ITERATOR Q1,01 PCWP
7 TABLE ACCESS FULL CUSTOMERS Q1,01 PCWP
8 PX BLOCK ITERATOR Q1,01 PCWP
9 TABLE ACCESS FULL SALES Q1,01 PCWP
SELECT c.cust_name, s.purchase_date, s.amount
FROM sales s, customers c
WHERE s.cust_id = c.cust_id;
select sum(revenue), storefrom line_itemsWhere profit(price,units) > 0.2order by store
Data on Disk Query Servers
scan
scan
scan
sort A-K
sort L-S
sort T-Z
dispatch work;assemble
results
Producers or scanners
ConsumersOr Aggregators)
Coordinator
Parallel Execution in action
ProducersProducers
ConsumersConsumers Query Coordinator
Producers and Consumer PQ sets in the execution plan
ID Operation Name TQ IN-OUT PQ Distribution
0 SELECT STATEMENT
1 PX COORDINATOR
2 PX SEND QC {RANDOM} Q1,01 P->S
3 HASH JOIN Q1,01 PCWP
4 PX RECEIVE Q1,01 PCWP
5 PX SEND HASH Q1,01 P->P
6 PX BLOCK ITERATOR Q1,01 PCWP
7 TABLE ACCESS FULL CUSTOMERS Q1,01 PCWP
8 PX RECEIVE Q1,01 PCWP
9 PX SEND HASH Q1,01 P->P
10 PX BLOCK ITERATOR Q1,01 PCWP
11 TABLE ACCESS FULL SALES Q1,01 PCWP
• Data is Partitioned into Granules (block range or partition)
• Each Parallel Server is Assigned Multiple Granules
• No two Parallel Servers ever contend for the same granule
• Granules are assigned so that the Load is Balanced Across all Parallel Scanners
• Dynamic Granules chosen by the optimizer
Oracle Parallel Query - Scanning
. . .
Parallel Server #1
Parallel Server #2
Parallel Server #3
PQ integration with Services
• Parallel Query slaves will only execute on nodes where the service of the query owner is active.
• No longer have to code instance_groups
DW
OLTP 1
OLTP 2
OLTP 3
OLTP 4
Node-4Node-3Node-2Node-1 Node-6Node-5
BatchReporting
New in 11g
<Insert Picture Here>
RAC and ETL
• Very large queries utilize all resources on the cluster
RAC and Parallel Execution
Large Query
• Many large-scale DWs have many concurrent jobs–Multiple “small-to-medium” size queries –Degree of parallelism < CPUs-per-node
• With Oracle, queries will automatically run on a single node, eliminating traffic over the interconnect
RAC and Parallel Execution
Q1 Q2 Q4Q3Q5 Q7Q6 Q8
Q9 Q12Q11Q10
Controlling PQ on RAC Using services
Create two servicesSrvctl add service –d database_name
-s ETL-r sid1, sid2
Srvctl add service –d database_name-s AHOC-r sid3, sid4
ETL Ad-Hoc queries
Note: Prior to 11g use init.ora parameters instance_groups and parallel_instance_group to control PQ on RAC
<Insert Picture Here>
RAC and ETL
Typical Architecture
SGA
ETL
SGA
Reporting
Disk Copy
Real Application Clusters
SGA
ETL
SGA
Reporting
IO Issues with Real Application Clusters
SGA
ETL
SGA
Reporting
IO Contention???
IO Issues
SGA
ETL
SGA
Reporting
IO Issues
SGA
ETL
SGA
Reporting
IO Issues
SGA
ETL
SGA
Reporting
SGA
ETL
SGA
Reporting
IO Issues
<Insert Picture Here>
Automatic Storage Management
The Ideal Storage Configuration
• S.A.M.E.– Stripe And Mirror Everything– Optimize throughput across as many physical disks as
possible – stripe across all devices– Exception: storage tiers
• Automatic Storage Manager (ASM) – Implements S.A.M.E. per disk group (mirroring optional) – Simplifies and automates database storage management
• Automatic rebalancing• Separate disk groups for different storage tiers
ASMOptimal Performance – No Space Wasted
• Balance IO– Across disks– Across disk arrays
Automatic Storage ManagementLowers the cost of storage management
• Virtualize and share storage resources• Advanced data striping for maximum I/O
performance• Online addition and migration of storage
HR SALES ERP
© 2009 Oracle Corporation – Proprietary and Confidential
Information Lifecycle ManagementOptimize storage cost and performance
• Low Cost
• Enforceable Compliance Policies
• Transparent to Applications
Active
Oracle Database 9i 10g 11gwith Partitioning Option, VPD &
Advanced Compression
LessActive Historical
ApplicationsOracle Desktop
AppsPortals &
Browsers ISV Apps
<Insert Picture Here>
Integrating Unstructured Data in Data Warehousing
Integrating Unstructured Data
Images
New in Oracle Database 11gCritical New Data Types
RFID Data Types
DICOMMedical Images
3D SpatialImages
<Insert Picture Here>
Data Warehousing- An Architecture Approach
Data Volume Growth
Byte Value Name Value 1,000 1.E+03 kilobyte (KB)
1,000,000 1.E+06 megabyte (MB) 1,000,000,000 1.E+09 gigabyte (GB)
1,000,000,000,000 1.E+12 terabyte (TB) 1,000,000,000,000,000 1.E+15 petabyte (PB)
1,000,000,000,000,000,000 1.E+18 exabyte (EB) 1,000,000,000,000,000,000,000 1.E+21 zettabyte (ZB)
1,000,000,000,000,000,000,000,000 1.E+24 yottabyte (YB)
Data Volume Growth
• 2K – A typewritten page• 5M – The complete works of
Shakespeare• 10 M – One minute of high
fidelity sound• 2 T – Information generated
on YouTube in one day• 10T – 530,000,000 miles of
bookshelves at Library of congress
• 20P – All hard-disk drives in 1995 (or your database in 2010)
Data Volume Growth
• 700P – Data of 700,000 companies with Revenues less than $200M
• 1E – Combined Fortune 1000 company database (1P each) • 1E – Next 9000 world company databases (average 100T each)• 8E – Capacity of ONE Oracle10g/11g Database (CURRENT)• 12E to 16E – Info generated before 1999 (memory resident in
64-bit)• 16E – Addressable memory with 64-bit (CURRENT)• 161E – New information in 2006 (most images not stored in DB)• 1Z – 1000E (Zettabyte – Grains of sand on beaches – 125
Oracle DBs)• 100TY – Yottabytes – Addressable memory 128-bit (FUTURE)
8 Exabytes:Look what fits in one 10g/11g Database!
• All databases of the largest 1,000,000 companies in the world (3E).
• All Information generated in the world in 1999 (2E)• All Information generated in the world in 2006 (5E)• All Email generated in the world in 2006 (6E)• 1 Mount Everest filled with Documents (approx.)
DW Performance Tuning – An Architecture Approach
• End-to-End Approach– Web tier– Application tier– Database tier– Storage– Network
• Design and Configuration– Hardware– Logical model– Physical model– System Management
DW Performance Tuning – A Mathematical Approach
• A Balanced Configuration• CPU Throughput• HBA Throughput• Network Throughput • Disk Throughput• Memory and CPU ratios
DiskArray 1
DiskArray 2
DiskArray 3
DiskArray 4
DiskArray 5
DiskArray 6
DiskArray 7
DiskArray 8
FC-Switch1 FC-Switch2H
BA1
HBA
2
HBA
1H
BA2
HBA
1H
BA2
HBA
1
HBA
2
Balanced Configuration “The weakest link” defines the throughput
CPU Quantity and Speed dictatenumber of HBAs capacity of interconnect
HBA Quantity and Speed dictatenumber of Disk Controllers Speed and quantity of switches
Controllers Quantity and Speed dictatenumber of DisksSpeed and quantity of switches
Disk Quantity and Speed
Data Warehouse hardware configuration best practices
• Build a balance hardware configuration– Total throughput = # cores X 100-200 MB (depends on chip set) – Total HBA throughput = Total core throughput
• If total core throughput =1.6GB will need 4 4Gb HBAs– Use 1 disk controller per HBA Port (throughput capacity must be
equal) – Switch must be same capacity as HBA and disk controllers – Max of 10 physical disks per controller(Use smaller drives 146 or
300 GB)
• Minimum of 4GB of Memory per core (8GB if using compression)
Throughput in Real Systems MB/sec
• Graph shows throughput achieved in real-world deployments– Infiniband is held back by PCIe 1.0 x8 bus on typical host systems
0
200
400
600
800
1000
1200
1400
GigabitEthernet
4Gb Fibre 20Gb Infiniband
MB/sec
120 MB/sec400 MB/sec
Single Connection Throughput
CPU Throughput
1 CPU 4 CPU 8 CPUs 16 CPUs 20 CPUs
100 400 800 1,600 2,000
200 800 1,600 3,200 4,000
MB/Sec MB/Sec MB/Sec MB/Sec MB/Sec
HBA Throughput
1 HBA 2 HBAs 4 HBAs 8 HBAs 16 HBAs
2 Gb 200 400 800 1,600 3,200
4 Gb 400 800 1,600 3,200 6,400
MB/Sec MB/Sec MB/Sec MB/Sec MB/Sec
15,000 RPM SAS Disk
1 Disk 2 Disks 4 Disks 8 Disks 12 Disks
90 180 360 720 1,080
MB/Sec MB/Sec MB/Sec MB/Sec MB/Sec
CPU and Memory
1 CPU 4 CPU 8 CPUs 16 CPUs 20 CPUs
4 16 32 64 80
8 32 64 128 160
GB GB GB GB GB
Sizing Data Warehouses
Database CPUs Memory Actuators LUNs Disks Raid
Database CPUs Memory Actuators LUNs Disks Raid
An unbalanced configuration
A balanced configuration
100%Possible
Efficiency
100%Possible
Efficiency
100%AchievedEfficiency
< 50%AchievedEfficiency
<Insert Picture Here>
The New WorldThe Oracle Database MachineExadata
Exadata V2 Goals
• Ideal Database Platform – Best Machine for Data Warehousing– Best Machine for OLTP– Best Machine for Database Consolidation
• Unique Architecture Makes it– Fastest, Lowest Cost
© 2010 Oracle Corporation 131
Oracle – Customer Internal Use Only
• Large data warehouses want to scan dozens, hundreds, or thousands of disks at full disk speed
• Pipes between disks and servers constrain bandwidth by 10x or more• Result is that warehouses can become slower as they get bigger
The Performance ChallengeStorage Data Bandwidth Bottleneck
Oracle – Sun Database Backgrounder
Solutions To Data Bandwidth Bottleneck
• Add more pipes – Massively parallel architecture• Make the pipes wider – 10X faster than conventional
storage• Ship less data through the pipes – Process data in storage
134
Exadata is Smart Storage
• Exadata cell is smart storage, not a database node– Storage remains an independent tier
• Database Servers– Perform complex database processing such as
joins, aggregation, etc.
• Exadata Cells– Search tables and indexes filtering out data that is not
relevant to a query– Cells serve data to multiple databases enabling
OLTP and consolidation– Simplicity, and robustness of storage appliance
Data Intensive Processing
Compute and Memory Intensive Processing
Exadata
© 2010 Oracle Corporation
• Data Intensive processing runs in Exadata Storage Grid– Filter rows and columns as data streams from disks (112 Intel Cores)– Scale-out storage removes bottlenecks
• Example: How much product X sold in month Y
10TB Read
DB CPUs Filter
Hours!
Exadata Intelligent Storage GridMost Scalable Data Processing
Traditional Storage
Exadata Storage
10TB ReadExadata Filters
100GB Sent
Seconds!
10TB Sent
DB Servers
© 2010 Oracle Corporation 135
Exadata Hardware Architecture
Database Grid• 8 compute servers
(1U)
• 64 Intel cores• 576 GB RAM
Storage Grid
• 14 storage servers (2U)
• 112 Intel cores in storage• 100 TB SAS disk, or
336 TB SATA disk• 5 TB PCI Flash
• Data mirrored across storage servers
Scaleable Grid of industry standard servers for Compute and Storage
• Eliminates long-standing tradeoff between Scalability, Availability, Cost
InfiniBand Network• 3 36-port 40Gb/s switches
• Unified Net- servers & storage• 324 FC Ports equivalent
© 2010 Oracle Corporation
136
Keys to Speed and Cost Advantage
© 2010 Oracle Corporation 137
Exadata Hybrid Columnar
Compression
Exadata Intelligent Storage
Grid
Exadata Smart Flash Cache
Oracle – Customer Internal Use Only
Benefits Multiply
1 TBwith compression
10 TB of user dataRequires 10 TB of
IO
100 GBwith partition pruning
20 GB with Storage Indexes
5 GB Smart Scan on Memory or Flash
SubsecondOn Database
Machine
Data is 10x Smaller, Scans are 2000x faster
I/O Scheduling, the Traditional Way
– With traditional storage, I/O schedulers are black boxes• You cannot influence their behavior!
– I/O requests are processed in FIFO order– Some reordering may be done to improve disk efficiency
Disk Queue
Traditional
Storage Server
H L H L L L
High-Priority Workload
Low-Priority Workload
RDBMSI/O
Requests
Best Data Warehouse Machine
• Massively parallel high volume hardware to quickly process vast amounts of data
– Exadata runs data intensive processing directly in storage
• Most complete analytic capabilities – OLAP, Statistics, Spatial, Data Mining, Real-time
transactional ETL, Efficient point queries
• Powerful warehouse specific optimizations– Flexible Partitioning, Bitmap Indexing, Join indexing, Materialized Views, Result
Cache
• Dramatic new warehousing capabilitiesData Mining
OLAP
ETL
New
Thanks For Coming !!
Daniel Liu Contact InformationEmail: [email protected]
Email: [email protected]
Company Web Site:http://www.oracle.com