Upload
markus-michalewicz
View
1.202
Download
0
Embed Size (px)
Citation preview
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Oracle Real Applica@on Clusters (RAC)
Anil Nair / Markus Michalewicz Oracle Real Applica@on Clusters (RAC) Product Management, Database Development November 22nd, 2015
@OracleRACpm h/p://www.linkedin.com/in/markusmichalewicz h/p://www.slideshare.net/MarkusMichalewicz
Best Prac@ces for Upgrading to Oracle 12c and New Features not to Miss!
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement The following is intended to outline our general product direc@on. It is intended for informa@on purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or func@onality, and should not be relied upon in making purchasing decisions. The development, release, and @ming of any features or func@onality described for Oracle’s products remains at the sole discre@on of Oracle.
3
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Why and How to Upgrade
Features You Don’t Want to Miss
Appendixes
Grid Infrastructure Upgrade
Enhanced OJVM Patching steps
Hang Manager in Ac@on
1
2
4
A
B
C
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Why and How to Upgrade
Features You Don’t Want to Miss
Appendixes
Grid Infrastructure Upgrade
Enhanced OJVM Patching steps
Hang Manager in Ac@on
1
2
5
A
B
C
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Why and How to Upgrade
Features You Don’t Want to Miss
Appendixes
Grid Infrastructure Upgrade
Enhanced OJVM Patching steps
Hang Manager in Ac@on
1
2
6
A
B
C
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
The Ques@on is not Whether or Not to Upgrade!
• The upgrade is the result of a long plan: – OOW 2013 we presented: “Oracle RAC 12c [Install] Best Prac@ces”(**)
– OOW 2014 we presented: “Oracle RAC 12c Opera@onal Best Prac@ces” (**)
– In both years we presented “Oracle RAC Internals” with a different focus: • Oracle Grid Infrastructure and Configura@on Internals(**) • Oracle RAC Internals – The Cache Fusion Edi@on(**)
• Bokom line: You are all prepared for using Oracle RAC 12c (aler this session)!
(**) See: hkp://www.slideshare.net/MarkusMichalewicz for previous years’ slides
7
The ques@ons are why, when and how?
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Why Upgrade?
• Oracle 12c has been out for more than two years – current 12.1.0.2
• Adop@on rate is exemplary – Oracle 12c op@ons such as Oracle Mul@tenant and Oracle In-‐Memory Database facilitate upgrade
• Benefit from the latest features and avoid running a de-‐supported version of your solware stack
8
Why upgrade now? 11.2.0.2/3
PSU 11.2.0.2/3.X
11.2.0.4 Patch Set
11.2.0.4.X
PSU
Upgrade 12.1.0.2
12.1.0.2.X
PSU De-‐Supported
Almost De-‐Supported
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Why Upgrade?
• Using Oracle RAC, you have a choice – You can upgrade both the database
and Grid Infrastructure (GI) OR only GI. • More informa@on in MOS note 756671.1
• Upgrading Oracle GI is always recommended: – Apply PSUs, Patch Sets & upgrades rolling! – Oracle GI needs to be of at least the same
if not higher version as the highest version database you want to operate
– Most applica@ons only cer@fy the database, not the Oracle GI version
– Benefit from all new features in Oracle GI • Some features may require Oracle 12c databases
9
You have a choice to upgrade step-‐by-‐step
Oracle Flex Cluster
More Informa@on in Appendix B
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
How to Upgrade?
• Use “rolling” as much as possible! – Oracle RAC and Oracle RAC One Node provide rolling upgrade features – Drain workload prior to upgrading • $srvctl stop service – Stops sending new work to the node – Ac@ve sessions con@nue to work
• Wait for sessions to drain • Execute dbms_service.disconnect_session
10
With a plan and based on tes@ng
Gold_Svc Instance 1
Gold_Svc Instance 2
Gold_Svc Instance 3
Singleton Instance 4
Connec@on pool
2
3
1 1 1
2 2 3
3 2 3
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
• Before Oracle RAC 12.1: – All nodes of the cluster have to be available during upgrade
– If any node crashes (e.g. HW/OS issues) and cannot be restarted in @me, then the upgrade cannot proceed
– Result: down@me required to downgrade, deleteNode and then akempt an upgrade
• With Oracle RAC 12.1: – Comple@on of the upgrade can be enforced despite unavailable nodes
11
Oracle GI 12.1 – Node Failure Handling During Upgrade 1
Node 1 Node 2 Node 3 Node 4
MyCluster
1. Execute the following as root on all the available nodes except one: # rootupgrade.sh
2. Execute the following as root on the last node: # rootupgrade.sh -‐-‐force
3. Click Con@nue in the install GUI screen which will run cluster verifica@on
4. Once node3 is fixed, Execute the following as root: #rootcrs.pl –join –exis@ngNode node3
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
• Case: the node on which the upgrade was ini@ated crashes and remains unavailable – Before Oracle 12.1, the node on which the update was ini@ated (first node) needed to be re-‐ac@vated prior to proceeding
• Star@ng with Oracle RAC 12.1 you can: – Run “rootupgrade.sh –first –force” as root on any other node
– Aler successful execu@on of the upgrade on all remaining nodes, execute “$GRID_HOME/cfgtoollogs/configToolAllCommands” as the grid owner on one node
– If another node in the cluster is down at the @me of upgrade, refer to the previous solu@on
12
Oracle GI 12.1 – Node Failure Handling During Upgrade 2
Node 1 Node 2 Node 3 Node 4
1. On any other node, execute as root #rootupgrade.sh –first –force
2. Execute as root on rest of the available nodes # rootupgrade.sh
3. Execute as Grid Owner from new first node $GRID_HOME/cfgtoollogs/configToolAllCommands
4. Once node1 is fixed, Execute as root #rootcrs.pl –join –exis@ngNode node1
MyCluster
More Informa@on in Appendix A
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
• Oracle GI 12.1 BATCH upgrade is a new feature allowing to upgrade nodes in “batches of nodes”
• BATCH Upgrade improves upgrade performance in large clusters as they can be patched in groups.
• Down@me can be reduced by grouping nodes with different service availability requirements
13
Oracle GI 12.1 – BATCH Upgrade
Node 1 Node 2 Node 3 Node 4
1. Nodes with different services availability requirements (silver, gold and bronce)
2. Gold_Svc availability can be improved by ensuring rootupgrade execu@on on Node2 & Node3 is batched separately.
3. For example, node1 can be batch1, Node2 and Node4 can be second and Node 3 can be last.
Gold_Svc Gold_Svc Bronce_Svc
MyCluster
Silver_Svc
More Informa@on in Appendix A
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Why and How to Upgrade
Features You Don’t Want to Miss
Appendixes
Grid Infrastructure Upgrade
Enhanced OJVM Patching steps
Hang Manager in Ac@on
1
2
14
A
B
C
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Oracle RAC 12c Rel. 1 – The Standard Going Forward
Flex Cluster Flex ASM
Hang Manager
Oracle Confiden@al – Internal/Restricted/Highly Restricted 15
Full Oracle Mul@tenant Support
Detect
Heuris@cs Analyze and Verify
Decide and
Resolve
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Overlooked and Underes@mated – Hang Manager
• Customers experience database hangs for a variety of reasons – High system load, workload conten@on, network conges@on or errors
• Before Hang Manager was introduced with Oracle RAC 11.2.0.2 – Oracle required informa@on to troubleshoot a hang -‐ e.g.:
• System state dumps • For RAC: global system state dumps
– Customer usually had to reproduce with addi@onal parameters
• With Oracle RAC 11.2.0.2: – Mechanism to automa@cally detect and resolve hangs
– Note that Deadlock detec@on has been available in the database for years
16
Why is a Hang Manager required?
Detect
Heuris@cs Analyze and Verify
Decide and
Resolve
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
How does it work? -‐ Detec@on
• Hang(s) in the context of Hang Manager are database process(es) that are not progressing – Cross-‐layer hangs are also managed: – E.g.: Resolving a hang that is caused by a blocked-‐ASM resource.
• Hang Manager only considers DB sessions holding local/global resources on which sessions are wai@ng – Local resource: Local to the instance / Global resource: Global to the database
• Session(s) may hang due to underlying OS Resource issues
• Hung session(s) could be progressing but may be extremely slow
• Deadlocks and User Locks are not managed by Hang Manager
• Deadlock detec@on has been available in the database for years
17
The main problem to solve: When does a wait represent a hang?
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
How does it work? -‐ Resolu@on
• Once the holder is found – Confirm whether the stack is progressing. – If stack progresses, it’s not deemed a hang.
• If Quality of Service Management (QoS) is used and service level defini@ons have been set: – The algorithm considers service level defini@on
as part of the hang resolu@on.
• Everything considered, eliminate holder – Logs containing an ORA-‐32701 – Possible hangs up to hang ID=%s
detected – can be found in Alert log & diag trace
• This func@onality is available in Oracle RAC, RAC One Node & Single Instance(**)
18
A\er confirming hang, eliminate the holder
H
Res
W1
W2
W3
W4
W5
W6
W7 H
One holder and mul@ple waiters wai@ng on a resource
More Informa@on in Appendix C
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Appendix -‐ A Grid Infrastructure Upgrade
19
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 20
Upgrade to 12.1.0.2 1
Ensure “Upgrade” is automa*cally selected as this indicates that the system has detected the current installa@on.
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
• All cluster nodes should be listed Hint: The informa*on comes from inventory
• New features allows upgrade even if some nodes are down
21
Ensure Node List is complete 2
node node
node node
node node
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 22
Space requirements
Do not akempt to change groups during upgrade Ensure space requirements are met in /u01
3 4
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 23
Repeat Steps on all nodes of Cluster 5
• For I in 1 to max nodes {– #mkdir –p /u01/app/12.1.0/grid– #mkdir –p /u01/app/crsusr– chown –R grid:oinstall /u01/app/12.1.0/grid
– chown –R grid:oinstall /u01/app/crsusr
}
• The installer will call Cluster Verifica@on U@lity (CVU) to check and confirm
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 24
Use Batching (BATCH Upgrade) to Improve Service Up@me 6
• Use BATCH Upgrade to speed-‐up upgrade • “Batching” can help providing improved availability of services
• The feature prompts between batch runs, this gives opportunity to relocate services
• Local node will always be in “Batch 1”
node node
node node
node node
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 25
Review final Summary 7
• Double check Available Disk space • Ensure Install op@on is Upgrade, not “fresh install”
• Op@onally configure root script execu@on via sudo
• Ensure “Upgrade ASM” is true
node node node node node node
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 26
Confirm Upgrade is complete 8
• Ensure “The upgrade of Oracle Grid Infrastructure for a cluster was successful” message is displayed
• Success is reported aler the installer executes CVU for checks to ensure the newly upgraded Grid Infrastructure stack is healthy
• The checks include inventory checks
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 27
De-‐Install the Old Home 9
• Execute deinstall • This de-‐configures and de-‐installs the old Grid Infrastructure Home • Log files in:/tmp/deinstall`date`.log • Check for complete removal – Remove lel over files using rm –rf <old_grid_home> directory
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Appendix -‐ B Enhanced OJVM Patching Steps
28
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Enhanced steps to Patch OJVM Security Vulnerability • The process described herealer is an op@miza@on of the current OJVM patching process described in MOS Note:1929745.1 to further reduce down@me • This enhanced procedure is NOT fully RAC rolling and NOT applicable to Stand-‐By First but increases the availability of non-‐Java services during the patching process • This procedure s@ll requires stopping Java usage for a period of @me and restar@ng of instances to use the new oracle executables but allows increased availability of facili@es other than Java by doing the restar@ng in a standard RAC rolling manner
29
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
When should I use these steps?
• These steps have not been tested enough so we are not 100% sure there won't be side effects impac@ng customer's system. • Limited tes@ng has shown that it does help reduce the overall down@me • Down@me is reduced only for Applica@ons not using OJVM or OJVM dependent op@ons like XDB, Text, Spa@al etc
30
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Step by Step instruc@ons 1) Stop OJVM services: – $srvctl stop service –db <dbname> -‐service <java_srv>
** Do not omit –service else all services for that DB will be stopped
2) Ensure no sessions are using OJVM using the following query – select s.sid, s.username u_name, n.name p_name, st.value from v$session s, v$sesstat st,v $statname n where s.sid=st.sid and n.sta@s@c# = st.sta@s@c# and n.name like 'java call heap total size' and st.value > 0 order by s.sid; 0 row(s) returned
3) $srvctl disable service –db <dbname> -‐service <java_srv> 4) Terminate sessions that have not stopped using – dbms_service.disconnect_session(‘java_srv’, DBMS_SERVICE.POST_TRANSACTION)
5) Kill long running sessions using alter system kill session ‘XX,XXXX’;
31
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Con@nua@on of Instruc@ons 6) $srvctl stop service –db <dbname> -‐service <java_srv> – Do not omit –service else all services for that DB will be stopped
7) Ensure that no session is using OJVM using the following query – select s.sid, s.username u_name, n.name p_name, st.value from v$session s, v$sesstat st,v $statname n where s.sid=st.sid and n.sta@s@c# = st.sta@s@c# and n.name like 'java call heap total size' and st.value > 0 order by s.sid; 0 row(s) returned
8) $srvctl disable service –db <dbname> -‐service <java_srv> 9) Terminate sessions that have not stopped with the above steps using
dbms_service.disconnect_session(‘java_srv’, DBMS_SERVICE.POST_TRANSACTION)
32
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Con@nua@on of Instruc@ons 10) Disable JIT using “sqlplus / as sysdba”
alter system set events '29560 trace name context forever, level 2'; 11) Install OJVM patch using steps documented in Note 1929745.1 but do
not stop the RAC instances 12) Relink the binaries ($ORACLE_HOME/bin/relink all) on each node and
restart the instances, one at a @me as in a) relink one instance while others remain up b) restart the instance and all NON-‐JAVA services. c) proceed to the next instance and repeat (a) and (b)
33
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Con@nua@on of Instruc@ons 13) Replace Java classes – Perform post install steps of the OJVM patching process without stopping the
instance
14) Re-‐Enable JIT using “$sqlplus “/as sysdba” SQL>alter system set events '29560 trace name context forever, level 2';
15) Enable Java services – $srvctl enable service –db <db_name> -‐service Javasvc – $srvctl start service –db <db_name> -‐service Javasvc
34
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Appendix -‐ C Hang Manager in Ac@on
35
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Early Warning entries
*** 2015-04-03T15:54:27.919500+17:00 HM: Early Warning - Session ID 32 serial# 39797 OS PID 15825 is in an involuntary wait 'enq: TX - contention' for 32 seconds blocking 5 sessions p1=0x54580006, p2=0x5, p3=0x0 Blocked by Session ID 117 serial# 65210 on instance 1 which is waiting on 'latch free' for 33 seconds p1=0x69535c98, p2=0xf0, p3=0x0 HM: Dumping Short Stack of pid[39.15825] (sid:32, ser#:39797) ... -- Short Stack – EXCEPT on WINDOWS ... *** 2015-04-03T15:54:27.919500+17:00 HM: Current SQL: BEGIN simulate_hang(7, 14, 6, 1, 0); END;
36
Sample diag0 entries ** (Subject to change)
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Hang Manager In Ac@on – Part 1
37
Session informa@on
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Hang Manager In Ac@on – Part 2
38
Con@nua@on of dia0 trace
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Hang Manager Ini@ates Resolu@on
39
dia0 trace
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Hang Manager Ini@ated Resolu@on
40
Sample dia0 and Alert log entries on vic@m
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Hang Manager Ini@ated Resolu@on
41
Sample dia0 and Alert log entries at Master