Upload
phamtu
View
227
Download
4
Embed Size (px)
Citation preview
<Insert Picture Here>
Exadata MAA Best Practices SeriesSession 5: Using Resource Manager on ExadataSue K LeeSue K. LeeSenior Development Manager, Oracle Resource Manager
Exadata MAA Best PracticesSeries
<Insert Picture Here>
Series1. E-Business Suite on Exadata 2. Siebel on Exadata 3. PeopleSoft on Exadata 4. Exadata and OLTP Applications 5. Using Resource Manager on Exadata6. Migrating to Exadata 7. Using DBFS on Exadata 8. Exadata Monitoring 9. Exadata Backup & Recovery 10. Exadata MAA 11. Troubleshooting Exadata 12. Exadata Patching & Upgrades 13. Exadata Health Check
Resource Manager and Exadata
1. Manage multiple workloads in an Exadata database with Resource Manager
2. Consolidate multiple databases on Exadata using Resource ManagerManager
Key point #1:Manage mixed workloads in an Exadata
database with Resource Manager
By managing how workloads share critical resources, Resource M id t th k t ti i iManager provides customers the key to optimizing resource
usage while fulfilling performance objectives.
Scenario: Mixed Workloads in an Exadata DatabaseExadata Database
OLTP Applications Low-Priority
Exadata Database
OLTP Applications
Tuned workload
Requires consistently good performance
Low Priority
• Ad-hoc queries
• Data exportgood performance Resource intensive and
unpredictable
Apt to disrupt system
Reports
• Long running reportsg g p
• Large batch jobs
Moderate performance requirementsrequirements
Requirements
• Workloads should use critical system resources according to their priority• CPU, I/O, parallel servers
• Fully utilize critical resources• Avoid inefficient schemes that require dedicated resources, e.g.
Avoid servers dedicated to servicesAvoid separate databases for reporting
• Manage runaway queries• OLTP should have no long-running operations. Any such operations
should be identified and aborted.• Ad-hoc queries should not use resources excessively
Step 1: Identify Workloads• Create Consumer Groups for each type of workload• Create rules to dynamically map sessions to Consumer Groups
OLTPservice = ‘CRM’
Session to Consumer Group Mapping Rules Consumer Groups
OLTPservice = CRM
client program = ‘OBIEE’ Reportsclient program = OBIEE
client program = ‘OBIEE’ && module = ‘AdHoc’Low-Priority
client program = OBIEE && module = AdHocquery has been running > 1 hour
estimated execution time of query > 12 hours
i ‘ETL’service = ‘ETL’
Step 2: Manage CPU
• CPU is a critical resource on Exadata• Exadata Smart Scan only returns useful data blocks• Exadata Flash Cache completes I/Os in microseconds• Result is heavy CPU loads
• Goal• Allocate sufficient CPU to OLTP to satisfy performance objectives• Allocate excess CPU to other workloads
• Solution• Configure CPU allocations in Database Resource Plan• Enable Database Resource Managerg
Step 2: Manage CPUTh DBA t Ni ht
Day Time PlanLevel 1 Level 2
The DBA can create a Night Time Plan that allocates
more CPU to Batch
OLTP 100%
Reports 80% Any CPU unused by OLTP is
allocated to Reports and Low-Priority sessions
Low-Priority 20% o o y sess o s
• Very fine-grained scheduling• Resource Manager schedules at a 100 ms quantum like an OS scheduler• Resource Manager schedules at a 100 ms quantum, like an OS scheduler• All sessions run, but some run more frequently than others• Low-priority session yields to a high-priority session within a quantum
• Background processes are not managedBackground processes are not managed• Backgrounds are high-priority and not CPU-intensive
• Bonus: managing foregrounds results in • Stable OS loads• Backgrounds not starved
CPU Scheduling with Resource Manager
Oracle-Sessions wait on
“resmgr:cpu quantum”tOracle
Internal CPU Queue
Resource Plan:
OLTP Reportsevent
CPU Resource Manager
OLTP 75% Reports 25%
(OLTP picked 3 out of 4 Sessions
scheduled every gtimes)
y100 ms
Step 3: Manage I/O
• Disk bandwidth is a critical resource on Exadata• Key to exceptional query performance? One query can utilize a high
percentage of each disk’s bandwidth• Multiple concurrent parallel queries result in heavy disk loads and
l di k l t ilong disk latencies• Goal
• Shared ASM disk groups for efficient resource utilization• Allocate sufficient I/O bandwidth to OLTP to satisfy performance
objectives• Allocate excess I/O bandwidth to Reports and Low-Priority
workloadsworkloads• Solution
• Configure I/O allocations in Database Resource PlanEnable Exadata I/O Resource Manager• Enable Exadata I/O Resource Manager
Exadata I/O Resource ManagerIssue enough I/Os to keep each disk busy. Queue the rest.When an I/O completes:
1) Pick a Consumer Group queue2) Issue the I/O request from the head of that queue
DatabaseOO
2) Issue the I/O request from the head of that queue
D t b
Database Resource
Plan
Reports I/Os
OLTP I/Os
RRRR
O L OL
Low-Priority I/Os
Database I/O Resource Manager
p
TO O O LOO
O
Exadata Storage
Cell
BBBB
Background I/Os
Outstanding I/O Requests Disk
Cell
Exadata I/O Resource Manager
• Configure Exadata I/O Resource Manager using the Database Resource Plan • Same plan used to manage CPU• Specify resource allocations per Consumer Group• Resource allocation == disk utilization
• Background and ASM I/Os automatically managed• Critical I/Os prioritized: instance recovery, LGWR, control file, etc.
• Specify optimization objective• Use “low_latency” or “balanced” for OLTP-oriented databases• Use “high_throughput” for data warehouses
• Use IORM metrics to track • I/O load per Consumer Group (IOPS, MBPS, disk utilization %)• I/O throttling per Consumer Group
© 2010 Oracle Corporation
Step 4: Manage Parallel Execution
• Parallel servers are a limited resource• Limit specified by parallel_max_servers• Too many concurrent parallel statements causes thrashing
• When there are no more parallel servers• Critical statements may run serially• When parallel servers free up, no way to boost DOP of running
statements• Non-ideal solutions
• Under-utilize the system • Manually schedule large queries during off hours
© 2010 Oracle Corporation
Parallel Statement Queuing
• Goals:1. Run enough parallel statements to fully utilize system resources 2. Queue subsequent parallel statements3. Dequeue a parallel statement when it won’t thrash system
• Enable by setting parallel_degree_policy = “auto”
© 2010 Oracle Corporation
Parallel Statement Queuing
Available Servers: 0Available Servers: 32Available Servers: 64Available Servers: 128No more parallel servers available –Parallel statements are now queued
Parallel servers are available –Parallel statements run immediatelyq y
Parallel Statement
QueueParallel
Statement Queue
Queue Coordinator
Running Parallel St t t
© 2010 Oracle Corporation
Statements
Ordering Parallel Statements
DBAs want to control the order that parallel queries are dequeued• Prioritize tactical queries over batch and ad hoc queries• Prioritize tactical queries over batch and ad-hoc queries• Impose a user-defined policy for ordering queued parallel statements
SolutionSolution Separate queues per Consumer Group Resource Plan specifies which queue’s parallel statements are issued
next
© 2010 Oracle Corporation
Ordering Parallel StatementsSi T ti l i P i it 1 it ll lSi th T ti l ll l t t t i k
Available Servers: 16
When parallel servers become available, the resource plan is used to select a queue. The head parallel
statement from that queue is run.
Since Tactical is Priority 1, its parallel statements are always selected first.
Since there are no more Tactical parallel statements, we pick either Batch or Ad-Hoc. Batch is selected 70% of the time after
Ad-Hoc.Available Servers: 0
64
Tactical Queue
Parallel Statement
QueueBatch Queue
Queue Coordinator
Ad-Hoc QueueResource Plan:
Priority 1: TacticalPriority 2, 70%: Batch Running Queries
© 2010 Oracle Corporation
Priority 2, 70%: BatchPriority 2, 30%: Ad-Hoc
Reserving Parallel Servers for Critical WorkloadsWorkloads
Flood of batch queries can use up all parallel servers• Tactical queries are forced to queue• Tactical queries are forced to queue
Solution Limit the percentage of parallel servers a Consumer Group can Limit the percentage of parallel servers a Consumer Group can
use• For example, parallel queries from the Batch Consumer Group
can only use 50% of the parallel servers• Reserves parallel servers for Tactical queries
Limit the degree of parallelism of non-critical workloads
© 2010 Oracle Corporation
Reserving Parallel Servers for Critical Workloads
Available Servers: 32
WorkloadsSince parallel servers are available,
Tactical queries can be run immediately Available Servers: 48Available Servers: 64
64
Tactical Queue
Batch limited to 50% of the
parallel servers
Parallel Statement
QueueBatch Queue
Queue Coordinator
Ad-Hoc QueueResource Plan:
Priority 1: TacticalPriority 2, 70%: Batch Running Queries
© 2010 Oracle Corporation
Priority 2, 70%: BatchPriority 2, 30%: Ad-Hoc
Step 5: Restrict Resource Usage
Requirement• Consistent, predictable performance for workloads• Useful for hosted environments and departmental apps
Solution Cap the CPU utilization for a Consumer Groupp p Cap the disk utilization for a Consumer Group
D Ti PlDay Time PlanAllocation Limit
Tactical 60%
Sales Reports 15% 30%
Marketing Reports 15% 30%
ETL 10%
© 2010 Oracle Corporation
ETL 10%
Step 6: Manage Runaway Queries• Runaway queries are caused by
• Missing indicies• Unexpected inputsUnexpected inputs• Bad execution plans
• Severely impact performance of well-behaved queries• Very hard to completely eradicate!• Very hard to completely eradicate!
Query 1
Query 2
Query 3
Query 4
Query Time
Manage Runaway Queries
Define runaway Manage runaway yqueries:
Estimated execution time
g yqueries:
Switch to another consumer groupActual execution time
Actual number of I/Os (11.1)Actual bytes of I/O (11.1)
group• Lower-priority consumer group• Consumer group with CPU
utilization limit (11 2)utilization limit (11.2)Abort callKill session
Manage Runaway Queries
For Tactical consumer group,For Tactical consumer group, runaway means:
30+ sec
Switch to “Low Priority”
consumer group!
For Reports consumer group, runaway means:
32GB I/OAbort query!
For Ad-Hoc consumer group, D ’t t !
32GB+ I/Os
g p,runaway means:
24+ hour estimated execution time
Don’t execute!
Resource Manager - End to End
Test scenario:• 2 workloads in a data warehouse
• Tactical queries (short TPC-H queries)• Batch jobs (long TPC-H queries)
• Goal: • Run Batch jobs with Tactical queries • Don’t impact response time of Tactical queries!p p q
© 2010 Oracle Corporation
Key point #2:Consolidate multiple databases on Exadata
using Resource Manager
By managing how databases share critical resources, Resource M id t th bilit t lid t lti lManager provides customers the ability to consolidate multiple
databases on Exadata.
Scenario: ConsolidationExadata Servers Exadata Storage Cells
ase
BD
atab
aA ab
ase
C
abas
e A
abas
e B
abas
e C
Dat
abas
e A
Dat
a
Dat
a
Dat
a
Dat
a
Server Consolidation
D
Storage Consolidation
• Better server utilization - X2-8 has 128 cores!
• Some deployments not ready for database consolidation
• More cells => higher peak throughput
• Better storage cell utilization
• ASM triple redundancy requires many
© 2010 Oracle Corporation
database consolidation p y q ydisks
Step 1: Instance Caging• Instance Caging is an Oracle feature for “caging” or limiting the
amount of CPU that a database instance can use at any time• Important tool for server consolidationImportant tool for server consolidation • Available in 11.2.0.1• Just 2 steps:
1 Set “cpu count” parameter1. Set “cpu_count” parameter• Maximum number of CPUs the instance can use at any time
2. Set “resource_manager_plan” parameterE bl CPU R M• Enables CPU Resource Manager
• E.g. out-of-box plan “DEFAULT_PLAN”
CPU Usage Without Instance Caging
Wait for CPU on O/S run
queue Oracle processes qfrom one Database Instance try to use
all CPUs
RunningRunning Processes
© 2010 Oracle Corporation
CPU Usage With Instance Caging
Wait for CPUWait for CPU on Resource Manager run
queuesInstance Caging
limits the number of Oracle
processes running
Running
processes running at any moment in
timeRunning
Processes
© 2010 Oracle Corporation
Partitioning Approach
32
CPU Allocations• Provides maximum
isolation
24
28isolation
• For performance-critical databasesIf d t b i idl it
16
20• If one database is idle, its CPU allocation is unused
Number of CPUs on
ServerI t C 2 CPUInstance D: 2 CPUs
8
12Server
Instance B: 4 CPUsInstance C: 2 CPUs
0
4 Instance A: 8 CPUs
Over-Provisioning Approach
32
CPU Allocations• For non-critical databases
that are typically well
24
28that are typically well-behaved
• Contention for CPU if databases are sufficiently Instance D: 4 CPUs
16
20databases are sufficiently loaded
• Not enough contention to destabilize OS or database
Number of CPUs on
Server
Instance C: 4 CPUs
Instance D: 4 CPUs
8
12instances
• Best approach if goal is fully utilize CPUs
ServerInstance B: 8 CPUs
0
4 Instance A: 8 CPUs
Instance Caging Results
• 4 CPU server
• Workload is a mix of OLTP transactions, parallel queries, and DMLs from Oracle Financials
© 2010 Oracle Corporation
DMLs from Oracle Financials
Instance Caging: Under the Covers
If cpu_count is set to 4 on a 16 CPU server• All foreground processes make progress• But only 4 foregrounds are running at any time• Fine-grained scheduling!• Most backgrounds not managed
• Critical and use very little CPU• MMON, Job Scheduler slaves are managed
• No CPU affinity! • All CPUs may be used• CPU utilization averaged across all CPUs ≤ 25%
© 2010 Oracle Corporation
Best Practices for Instance Caging
• Cage size, cpu_count, is a dynamic parameter• Changes take place immediately!• Some overhead, so limit changes to once an hour• Changes to cpu_count also affects other settings, such as parallel
executionA id h h t t ti l l f ll i iti l• Avoid huge changes to cpu_count, particularly from a small initial value (e.g. 1 or 2)
• cpu_count controls the number of logical CPUs or threads used not cores or sockets!used - not cores or sockets!
• Monitor Instance Caging throttling• AWR reports: “resmgr:cpu quantum” wait event
I di h hi i ld b fi f l i• Indicates that this instance would benefit from larger cage size
© 2010 Oracle Corporation
Step 2: Exadata I/O Resource Manager
Scenario• Multiple databases share Exadata storage cells• Should databases share disks (ASM disk groups)?
• No!No! • Load from one database doesn’t affect another• Dedicated disks offer more predictable performance
• Yes!Yes!• Shared disks offer better bandwidth utilization• Shared disks offer better space utilization• But you need a way to manage how database use disks• But, you need a way to manage how database use disks
© 2010 Oracle Corporation
Exadata I/O Resource Manager Plans
Allocation Limit
Sales DB 50%
You can guarantee each database a certain amount of the disk
bandwidth.
Sales DB 50%
Finance DB 25% 50%
Marketing DB
St db 10% 50%
You can specify different allocations for a database, depending on whether it’s currently the primary or the standby.
Standby 10% 50%
Primary 25% 50% Unused allocations are redistributed to needy databases.
You can limit the disk bandwidth a
Exadata I/O Resource Manager gives you• Predictability of dedicated disks
• Tools for allocating disk bandwidth to a database
database can use.
g• Tools for limiting disk bandwidth for a database - useful for hosted
environments• Efficient disk utilization of shared disks
• Unused allocations are redistributed
© 2010 Oracle Corporation
Exadata I/O Resource Manager
Sales Database1. Pick a database
OLTP Queue
2. Pick a Consumer Group
3 Issue the head I/O request
SalesDatabase
Resource Plans
OO
OLTP Queue 3. Issue the head I/O request
O T OFinance DatabaseI/O
Resource Manager
Reports Queue
R
RRR
FinanceT
Tactical Queries Queue
Manager
Outstanding I/O Requests
DatabaseExadata Storage
Cell
BBBB
Batch Queries QueueCell
Exadata I/O Resource Manager
An Inter-Database Resource Plan manages databases sharing
Exadata storage cellsA Database Resource Plan
manages workloads within a database
Exadata storage cells
Database AOLTP
Database BReports
Database CLow-Priority
Exadata Storage
© 2010 Oracle Corporation
Storage
I/O Utilization Limit Results
80%90%
100%
50%60%70%80%
Disk U ili i
No I/O Limit75% I/O Limit
20%30%40%50%Utilization 75% I/O Limit
50% I/O Limit25% I/O Limit
0%10%
Time
Queries from TPC-H benchmark suiteDisk utilization measured via iostat
© 2010 Oracle Corporation
Business Value Take-Aways
1. For mixed workload databases, use Resource Manager to ensure sufficient resources for workloads that are performance critical.
• CPU Resource Manager• I/O Resource Manager• Parallel Statement Queuing• Runaway Query Management
2. For server consolidation, use Instance Caging to distribute CPU among the databases.
3. For storage consolidation, use IORM to distribute disk bandwidth among the databases.
Best Practice Take-Aways1) R M t ti1) Resource Manager presentationshttps://stbeehive.oracle.com/content/dav/st/Database%20Resource%20M
anager/Public%20Documents
2) Resource Manager white paperhttp://www.oracle.com/technetwork/database/features/performance/res
ource-manager-twp-133705 pdfource-manager-twp-133705.pdf
3) Instance Caginghttp://www oracle com/technetwork/database/features/performance/inshttp://www.oracle.com/technetwork/database/features/performance/ins
tance-caging-wp-166854.pdf
4) MetaLink Notes for known issues4) MetaLink Notes for known issues1207483.1 – CPU Resource Manager1208064.1 – Instance Caging1208104.1 – max_utilization_limit_ _1208133.1 – Managing Runaway Queries