67
Exadata Maintenance tasks 101 Nelson Calero OTN Tour Latinoamérica Agosto 2015

Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Embed Size (px)

Citation preview

Page 1: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Exadata Maintenance tasks 101

Nelson Calero

OTN Tour Latinoamérica

Agosto 2015

Page 2: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

About me

• Database Consultant at Pythian

• Computer Engineer

• Oracle Certified Professional DBA 10g/11g

• Oracle ACE

• Working with Oracle tools and Linux environments since 1996

• DBA Oracle (since 2001) & MySQL (since 2005)

• Oracle University Instructor

• Co-founder and President of the Oracle user Group of Uruguay

• LAOUC Director of events

• Blogger and frequent speaker: Oracle Open World, Collaborate, OTN Tour, JIAP, MySQL/NoSQL conferences

http://www.linkedin.com/in/ncalero @ncalerouy

2 © 2014 Pythian Confidential

Page 3: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Pythian overview• 18 Years of data infrastructure management consulting

• 200+ Top brands

• 11700+ Systems under management

• Over 387 DBAs in 30 countries

• Top 5% of DBA work force, 10 Oracle ACEs, 4 ACED, 3 OakTable

members, 2 OCM, 5 Microsoft MVPs, 1 Cloudera Champion of Big Data

• Oracle, Microsoft, MySQL, Hadoop, Cassandra, MongoDB, and more

• Infrastructure, Cloud, DevOps, and application expertise

3 © 2014 Pythian Confidential

Page 4: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Today’s topics

1. Introduction to Exadata

2. Changes for the DBA

3. Monitoring– Configuring ASR

4. Maintenance– Common procedures

– Patching

– Replacing parts

– Some examples

4 © 2014 Pythian Confidential

Page 5: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Introduction to Exadata

• “The highest-performing platform for running Oracle Database” – X5-2 (ref: oracle.com)

• Best for all database workloads: DW/OLTP/In-Memory

• Part of the Engineered Systems familyhttps://www.oracle.com/engineered-systems/index.html

Supercluster

Private cloud appliance

Database appliance

Big data appliance

5 © 2014 Pythian Confidential

Exalogic Elastic Cloud

Exalytics In-Memory

Zero data loss Recovery Appliance

FS1 Flash Storage System

ZFS Storage Appliance

Page 6: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Exadata history

• V1: 2008 – HP Oracle Database machine

• V2: 2009 – Sun hardware

• X2: 2010 – X2-2/X2-8

– 2011 - Exadata Storage Expansion Rack

• X3: 2012 – X3-2/X3-8

• X4: 2013 – X4-2

2014 – X4-8

• X5: 2015 – X5-2

• Great summary in http://flashdba.com/history-of-exadata/

6 © 2014 Pythian Confidential

Page 7: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Exadata flavors• 2 or 8 CPU sockets on database servers (XN-2/XN-8)

• Full Rack – 8 database servers on Xn-2, 2 on Xn-8

– 14 storage servers

– 86.9Tb of flash disk (X5), 44.8Tb (X4), 22.4Tb (X3), 5.3Tb (X2)

– 200Tb disk (X5/X4), 100Tb (X3)

• Half Rack – only for Xn-2, half of full rack

• Quarter Rack – only for Xn-2, half of half rack – 3 storage servers

• Eighth Rack – since X3 half disk and flash, only for Xn-2,

same servers as Quarter, half disks and flash

• http://docs.oracle.com/cd/E50790_01/doc/doc.121/e51953/intro.htm#DBMSO109

7 © 2014 Pythian Confidential

Page 8: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Exadata hardware

• Hardware (example from latest X5-2)– PCI flash storage – up to 230TB per rack (4 cards per storage)

– InfiniBand internal connectivity (40Gb/s)

• 263 GB/s per rack from SQL

– 2 to 19 DB servers per rack (2x18 core, 256GB RAM each)

• Up to 684 CPU cores for database

• Up to 14.6Tb RAM per rack for database

– 3 to 18 storage servers per rack

• Up to 288 CPU for storage

• Software– Oracle database (11.2 / 12.1)

– Oracle Enterprise Linux (5.9 / 6.6)

– ZDP infiniband protocol, iDB for storage access

8 © 2014 Pythian Confidential

Page 9: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Exadata Workload Optimized Configurations (X5)

http://www.oracle.com/us/corporate/events/datacenter/index.html

9 © 2014 Pythian Confidential

Page 10: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Exadata architecture

10 © 2014 Pythian Confidential

http://www.oracle.com/technetwork/database/exadata/exadata-technical-whitepaper-134575.pdf

Page 11: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Exadata disks

11 © 2014 Pythian Confidential

Physical

Disk

LUN Cell Disk Grid

Disks

ASM

Diskgroup

Page 12: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Exadata licencing

• Exadata storage server licenses

• Oracle Database licenses

– plus additional options such as Real Application

Clusters, Partitioning, Diagnostic and Tuning Packs,

Multitenant

• Both varies depending on the model, as it is

based on #cores.

• Exadata hardware has its separate costs

12 © 2014 Pythian Confidential

Page 13: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Exadata functionalities

• Smart flash cache

• Database cell offloading– Queries processed at storage level (w/conditions)

– Uses smart scan and storage indexes (cell in memory)

• Hybrid columnar compression

• Optimized SQL protocol - iDB

– Exafusion in 12c – reimplementation of RAC cache fushion for direct calls from Database

• IO Resource Manager

• OVM support in 12c

13 © 2014 Pythian Confidential

Page 14: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Premier support and Platinum Services

Extra cost support: Premier and Premier for systemshttp://www.oracle.com/us/support/library/platinum-services-policies-1652886.pdf

Platinum services:

• Remote fault monitoring, accelerated response and

patch deployment (4 per year)

• Free for qualified customers who have:

– Certified platinum configuration – Matrix on oracle.com

– Support services contract for software and systems

– Oracle licences

– Gateway, VPN, connectivity, etc.

14 © 2014 Pythian Confidential

Page 15: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Today’s topics

1. Introduction to Exadata

2. Changes for the DBA

3. Monitoring– Configuring ASR

4. Maintenance– Common procedures

– Patching

– Replacing parts

– Some examples

15 © 2014 Pythian Confidential

Page 16: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Changes for the DBA• New components to manage

– Storage cells

– Infiniband switches

– KVM, PDU

• New utilities– cellcli

– dcli

– dbmcli -- on 12c

– ILOM access (DB, Cells, IB Switches) - Web / ssh / IPMI / remote console

• New troubleshooting tools– Exachk / sundiag / ILOM snapshots

• Monitoring and alerting– OEM exadata plugin

– ASR

16 © 2014 Pythian Confidential

Page 17: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Monitoring cells using v$ views• New views

• V$CELL

• V$CELL_CONFIG

• V$CELL_STATE

• V$CELL_THREAD_HISTORY

• V$CELL_REQUEST_TOTALS

• new stats recorded on V$SYSSTAT

• new wait events• cell multiblock physical read

• cell smart index scan

• cell%

http://docs.oracle.com/cd/E50790_01/doc/doc.121/e50471/monitoring.htm#SAGUG20487

17 © 2014 Pythian Confidential

• Columns added to existing v$• V$BACKUP_DATAFILE

• V$SQLFN_METADATA

• V$SQL

• V$SQLAREA

• V$SQLSTATS

• V$SQLAREA_PLAN_HASH

• V$SQLSTATS_PLAN_HASH

Page 18: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

cellcli sample output[root@exa1cel03 ~]# cellcli

CellCLI: Release 11.2.3.3.0 - Production on Sun Jul 26 19:21:16 EDT 2015

Copyright (c) 2007, 2013, Oracle. All rights reserved.

Cell Efficiency Ratio: 945

CellCLI> list griddisk attributes name,asmmodestatus,asmdeactivationoutcome

DATA_EXA1_CD_00_exa1cel03 ONLINE Yes

DATA_EXA1_CD_01_exa1cel03 ONLINE Yes

DATA_EXA1_CD_02_exa1cel03 ONLINE Yes

DBFS_DG_CD_02_exa1cel03 ONLINE Yes

DBFS_DG_CD_03_exa1cel03 ONLINE Yes

DBFS_DG_CD_04_exa1cel03 ONLINE Yes

RECO_EXA1_CD_00_exa1cel03 ONLINE Yes

RECO_EXA1_CD_01_exa1cel03 ONLINE Yes

RECO_EXA1_CD_02_exa1cel03 ONLINE Yes

18 © 2014 Pythian Confidential

Page 19: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

dcli sample output[root@exa1db01 ~]# cat /opt/oracle.SupportTools/onecommand/dbs_group

exa1db01

exa1db02

exa1db03

exa1db04

exa2db01

exa2db02

[root@exa1db01 ~]# cd /opt/oracle.SupportTools/onecommand

[root@exa1db01 onecommand]# dcli -l root -g dbs_group "who -b"

exa1db01: system boot 2015-03-12 10:38

exa1db02: system boot 2015-03-12 11:27

exa2db01: system boot 2015-01-19 01:28

exa2db02: system boot 2015-01-19 01:56

exa2db03: system boot 2015-02-10 14:38

exa2db04: system boot 2015-02-10 10:46

19 © 2014 Pythian Confidential

Page 20: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

ILOM access – ssh example[root@exa1db02 ~]# ssh exa1db03-ilom

Password:

Oracle(R) Integrated Lights Out Manager

Version 3.1.2.10.c r81825

Copyright (c) 2013, Oracle and/or its affiliates. All rights reserved.

-> show /SP/policy

/SP/policy

Targets:

Properties:

ENHANCED_PCIE_COOLING_MODE = disabled

HOST_AUTO_POWER_ON = disabled

HOST_LAST_POWER_STATE = enabled

20 © 2014 Pythian Confidential

Page 21: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Listing flash storage installed – from OS[root@exa1cel02 ~]# lsscsi | grep -i ATA

[8:0:0:0] disk ATA MARVELL SD88SA02 D21Y /dev/sdn

[8:0:1:0] disk ATA MARVELL SD88SA02 D21Y /dev/sdo

[8:0:2:0] disk ATA MARVELL SD88SA02 D21Y /dev/sdp

[8:0:3:0] disk ATA MARVELL SD88SA02 D21Y /dev/sdq

[9:0:0:0] disk ATA MARVELL SD88SA02 D21Y /dev/sdr

[9:0:1:0] disk ATA MARVELL SD88SA02 D21Y /dev/sds

[9:0:2:0] disk ATA MARVELL SD88SA02 D21Y /dev/sdt

[9:0:3:0] disk ATA MARVELL SD88SA02 D21Y /dev/sdu

[10:0:0:0] disk ATA MARVELL SD88SA02 D21Y /dev/sdv

[10:0:1:0] disk ATA MARVELL SD88SA02 D21Y /dev/sdw

[10:0:2:0] disk ATA MARVELL SD88SA02 D21Y /dev/sdx

[10:0:3:0] disk ATA MARVELL SD88SA02 D21Y /dev/sdy

[11:0:0:0] disk ATA MARVELL SD88SA02 D21Y /dev/sdz

[11:0:1:0] disk ATA MARVELL SD88SA02 D21Y /dev/sdaa

[11:0:2:0] disk ATA MARVELL SD88SA02 D21Y /dev/sdab

[11:0:3:0] disk ATA MARVELL SD88SA02 D21Y /dev/sdac

21 © 2014 Pythian Confidential

Page 22: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Listing flash storage installed – from cellcliCellCLI> list physicaldisk attributes name, makemodel, physicalrpm, physicalport, status where disktype=flashdisk

FLASH_1_0 "Sun Flash Accelerator F40 PCIe Card" normal

FLASH_1_1 "Sun Flash Accelerator F40 PCIe Card" normal

FLASH_1_2 "Sun Flash Accelerator F40 PCIe Card" normal

FLASH_1_3 "Sun Flash Accelerator F40 PCIe Card" normal

FLASH_2_0 "Sun Flash Accelerator F40 PCIe Card" normal

FLASH_2_1 "Sun Flash Accelerator F40 PCIe Card" normal

FLASH_2_2 "Sun Flash Accelerator F40 PCIe Card" normal

FLASH_2_3 "Sun Flash Accelerator F40 PCIe Card" normal

FLASH_4_0 "Sun Flash Accelerator F40 PCIe Card" normal

FLASH_4_1 "Sun Flash Accelerator F40 PCIe Card" normal

FLASH_4_2 "Sun Flash Accelerator F40 PCIe Card" normal

FLASH_4_3 "Sun Flash Accelerator F40 PCIe Card" normal

FLASH_5_0 "Sun Flash Accelerator F40 PCIe Card" failed

FLASH_5_1 "Sun Flash Accelerator F40 PCIe Card" failed

FLASH_5_2 "Sun Flash Accelerator F40 PCIe Card" failed

FLASH_5_3 "Sun Flash Accelerator F40 PCIe Card" failed

22 © 2014 Pythian Confidential

Page 23: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Public documentation available

• Oracle® Exadata Storage Server Software User's Guidehttp://docs.oracle.com/cd/E50790_01/doc/doc.121/e50471/toc.htm

• Oracle® Exadata Database Machine Maintenance Guidehttp://docs.oracle.com/cd/E50790_01/doc/doc.121/e51951/toc.htm

• Several working examples on Oracle Learning Library Search for “Database Machine”:

https://apexapps.oracle.com/pls/apex/f?p=44785:2::FORCE_QUERY::2%2CCIR%2CRIR:P2_TAGS:Database+Machine

• Arup Nanda series: Oracle Exadata Commands Reference, June 2011http://www.oracle.com/technetwork/articles/oem/exadata-commands-intro-402431.html

23 © 2014 Pythian Confidential

Page 24: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Today’s topics

1. Introduction to Exadata

2. Changes for the DBA

3. Monitoring– Configuring ASR

4. Maintenance– Common procedures

– Patching

– Replacing parts

– Some examples

24 © 2014 Pythian Confidential

Page 25: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Exadata monitoring

• OEM using plugin– New pages with all the information

– All Exadata components are monitored and emails are sent when thresholds are crossed – as usual

– OEM 12c Exadata discovery cookbook for configurationhttp://www.oracle.com/technetwork/oem/exa-mgmt/em12c- exadata-discovery-cookbook-1662643.pdf

• Auto Service Request (ASR) – Automatically creates an SR on support.oracle.com when a failure is

detected

– It gets replied in seconds for well known issues that requires maintenance, with links to support notes

– After initial configuration, we see emails notification of SR created, does no need user interaction as OEM

25 © 2014 Pythian Confidential

Page 26: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

OEM Exadata plugin

26 © 2014 Pythian Confidential

Page 27: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

ASR – sample email from failure detectedOracle ASR: Service Request 3-19771251173 Created

[email protected]

May 26

to

Service Request: 3-19771251173

Oracle Auto Service Request (ASR) has created a Service Request (SR) for the following ASR asset

Hostname: exa1cel02

Serial#: 1234FNN0A0

Please login to My Oracle Support to see the details of this SR. My Oracle Support can also be used to make any changes to the SR or to provide additional information.

The Oracle Auto Service Request documentation can be accessed on http://oracle.com/asr.

Please use My Oracle Support https://support.oracle.com for assistance.

27 © 2014 Pythian Confidential

Page 28: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

ASR – sample SR on MOS

28 © 2014 Pythian Confidential

Page 29: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

ASR configuration

• Optionally done by Oracle under Platinum support

• ASR server external to Exadata

• MOS account must have Administrator role or Admin Assets Access privilege

• Each Exadata node/IB switch must be configured (SNMP Traps)

• Assets must be accepted on MOS under each CSI

• Notification based on SNMP messages generated by ILOMs• If ASR server is down, messages are lost

• On Solaris it can use another protocol to avoid loss

29 © 2014 Pythian Confidential

Page 30: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

ASR configuration and usage• ASR documentation

http://www.oracle.com/technetwork/systems/asr/documentation/index.html

• Auto Service Request Installation and Operations Guidehttps://docs.oracle.com/cd/E37710_01/install.41/e18475/ch1_asr_overview.htm#ASRUD108

• Oracle Auto Service Request (ASR) (Doc ID 1185493.1)

• How To Manage and Approve Pending ASR Assets In My Oracle Support (Doc ID 1329200.1)

• Engineered Systems ASR Configuration Check via ASREXACHK (Doc ID 1450112.1)

30 © 2014 Pythian Confidential

Page 31: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

ASR installation is easyASR Manager server requires:

• connectivity to the Internet using HTTPS

• network connectivity to Exadata assets, ILOM, and eth0 from ASR manager server

• JDK 7 (JDK 1.7.0_13) or later

• rpm-build package

Installation on Linux

export JAVA_HOME=/usr/java/jdk1.8.0_25/

export PATH=$JAVA_HOME/bin:$PATH:/opt/asrmanager/bin

export CLASSPATH=.

rpm -i asrmanager.5.0.2-20141215170108.rpm

/opt/asrmanager/bin/asr register

<enter MOS user and password>

/opt/asrmanager/bin/asr test_connection

/opt/asrmanager/bin/asr start

31 © 2014 Pythian Confidential

Page 32: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

ASR configuration

• Configure each Exadata component to report to ASR Manager server– Storage servers, Database nodes and Infiniband switches

– IB switch version and serial# should be validated, as they may be updated

• Activate Nodes on the ASR Manager– If ILOM auto-activation didn’t occurred, it should be activated manually

• Verify all nodes are visible on the ASR Manager

• Complete the registration on MOS, approving the ASR activation

• Make sure packages from DB nodes to ASRM uses eth0

32 © 2014 Pythian Confidential

Page 33: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

ASR configuration - example

• ASR manager server:10.20.30.123

• Validate current configuration of storage server:

[root@exa1db02 ~]# dcli -g /opt/oracle.SupportTools/onecommand/cell_group -l root "cellcli -e list cell attributes snmpsubscriber"

exa1cel01: ((host=exa1db01.acme.com,port=1830,community=public),(host=exa1db02.acme.com,port=1830,community=public),(host=exa2db02.acme.com,port=3872,community=public),(host=exa2db03.acme.com,port=3872,community=public))

33 © 2014 Pythian Confidential

Page 34: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

ASR configuration - example– port should be the agent listener port

– cells report to OEM agent on each DB node

• Modify previous output adding ASR manager host

[root@exa1db01 ~]# dcli -g /opt/oracle.SupportTools/onecommand/cell_group -l root "cellcli -e alter cell snmpsubscriber=\

\(\(host=\'exa1db01.acme.com\',port=3872,community=public\),\

\(host=\'exa1db02.acme.com\',port=3872,community=public\),\

\(host=\'exa2db01.acme.com\',port=3872,community=public\),\

\(host=\'exa2db02.acme.com\',port=3872,community=public\),\

\(host=\'exa2db03.acme.com\',port=3872,community=public\),\

\(host=\'exa2db04.acme.com\',port=3872,community=public\),\

\(host=\'10.20.30.123\',port=162,community=public,type=ASR\)\)”

34 © 2014 Pythian Confidential

Page 35: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

ASR configuration - example[root@asrm]~# asr status

ASR Manager (pid 10794) is RUNNING.

[root@asrm]~#

[root@asrm]~# asr list_asset

IP_ADDRESS HOST_NAME SERIAL_NUMBER ASR PROTOCOL SOURCE PRODUCT_NAME

--------------- ---------------- ------------------- -------- --------- -------------- ------------------------------------

10.102.100.25 exa1sw-ib3 1234ABC-1234R114WY Enabled SNMP ILOM Sun Datacenter InfiniBand Switch 36

10.102.100.24 exa1sw-ib2 1234ABC-1234R114XY Enabled SNMP ILOM Sun Datacenter InfiniBand Switch 36

10.102.100.18 exa1cel01-ilom 2233BTT0C1 Enabled SNMP ILOM SUN FIRE X4270 M2 SERVER

10.102.100.20 exa1cel03-ilom 2233BTT0CB Enabled SNMP ILOM SUN FIRE X4270 M2 SERVER

10.102.100.16 exa1db01-ilom 2254QUI0RJ Enabled SNMP ILOM SUN FIRE X4170 M2 SERVER

10.102.100.17 exa1db02-ilom 2254QUI0U3 Enabled SNMP ILOM SUN FIRE X4170 M2 SERVER

10.102.100.49 exa2sw-ibs0 BT00123455 Enabled SNMP ILOM Sun Datacenter InfiniBand Switch 36

10.102.100.51 exa2sw-ibb0 BT00123458 Enabled SNMP ILOM Sun Datacenter InfiniBand Switch 36

10.102.100.50 exa2sw-iba0 BT00123459 Enabled SNMP ILOM Sun Datacenter InfiniBand Switch 36

10.102.100.11 exa1db01 2254QUI0RJ Enabled SNMP,HTTP EXADATA-SW,ADR SUN FIRE X4170 M2 SERVER

10.102.100.12 exa1db02 2254QUI0U3 Enabled SNMP,HTTP EXADATA-SW,ADR SUN FIRE X4170 M2 SERVER

35 © 2014 Pythian Confidential

Page 36: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

ASR configuration - example• Activate Nodes on the ASR Manager

asr activate_asset -i [Node ILOM IP]

asr activate_exadata -i [Node IP] -h exa1cel01 -l [Node ILOM IP]

[root@asrm] asr activate_asset -i 10.105.200.17

exa2sw-iba0.acme.com : 1 service tags

Successfully submitted activation for the asset

Host Name: exa2sw-iba0

IP Address: 10.105.200.17

Serial Number: BC0001234

The e-mail address associated with the registration id for this asset's ASR Manager will receive an e-mail highlighting the asset activation

status and any additional instructions for completing activation.

Please use My Oracle Support http://support.oracle.com to complete the activation process.

The Oracle Auto Service Request documentation can be accessed on http://oracle.com/asr.

• For IB switches, an empty rule should be added using ILOM:spsh

show /SP/alertmgmt/rules/[NUMBER]

set /SP/alertmgmt/rules/[NUMBER] type=snmptrap level=minor

destination=10.20.30.123 snmp_version=2c community_or_username=public

36 © 2014 Pythian Confidential

Page 37: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

ASR configuration - example-> show 4

/SP/alertmgmt/rules/4

Targets:

Properties:

community_or_username = public

destination = 0.0.0.0

destination_port = 0

email_custom_sender = (none)

email_message_prefix = (none)

event_class_filter = (none)

event_type_filter = (none)

level = disable

snmp_version = 1

testrule = (Cannot show property)

type = snmptrap

37 © 2014 Pythian Confidential

Page 38: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

View from MOS

• Assets are listed under Systems tab

• Hardware serial# identifies each component

– One for the Exadata machine groups them all

• CSI includes each

• SR is created for a specific CSI

38 © 2014 Pythian Confidential

Page 39: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

View from MOS - assets

39 © 2014 Pythian Confidential

Page 40: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

View from MOS – user privileges over assets

40 © 2014 Pythian Confidential

Page 41: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Today’s topics

1. Introduction to Exadata

2. Changes for the DBA

3. Monitoring– Configuring ASR

4. Maintenance– Common procedures

– Patching

– Replacing parts

– Some examples

41 © 2014 Pythian Confidential

Page 42: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Maintenance• Software updates

– OS / DB / Switches - Patching

– OS / DB / Switches - Upgrades

– OS / DB / Switches - Configuration change

• Hardware upgrade

• Preventive tasks– on site health checks (after the second year with Permier/Platinum support)

– EOL parts are replaced: RAID HBA Batteries and Energy Storage Modules (ESM) in flash cards

– It does not include patching or upgrading

• Failed components– hard drives

– flash cards

– Infiniband switch riser

– network cables

42 © 2014 Pythian Confidential

Page 43: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Maintenance – only planned?

• No SPOF, many redundant parts

• External issues can cause unplanned outage– Electricity – been there

– All usual named non-planned failures: flooding, earthquakes, etc.

43 © 2014 Pythian Confidential

Page 44: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Maintenance – all Xn have same failures?

• Different parts in newest models

• Different configuration options

• Example with Flash Cards

X2 X3 X4 X5 F20 PCIe F40 PCIe F80 PCIe F160 NVMe PCIe

Battery Capacitor Capacitor Capacitor

Battery must be replaced each 3 years

44 © 2014 Pythian Confidential

Page 45: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Maintenance procedures

• Rolling fashion– no outage required

– One server at a time

– Cells needs to rebalance disks. Process for each cell is:

• Turn off grid disks

• Patch cell

• Turn on gird disks

• Wait for ASM rebalance to finish – time depends on activity

– Total time is more than double of the outage procedure

45 © 2014 Pythian Confidential

Page 46: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Maintenance procedures

• Rolling fashion

– Watch out for bug 16788832 - ORA-27609: SMART I/O FAILED

DUE TO A NETWORK ERROR TO THE CELL AFTER SHUTDOWN.

Patchset available, fixed on 11.2.0.4

– Normal redundancy ASM: a disk failure during the maintenance will

bring system down, recoverable through backups

– High redundancy ASM: two disk failures will have the same effect

46 © 2014 Pythian Confidential

Page 47: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Maintenance procedures

• With complete outage

– services are shut down before starting

– cells are patched in parallel

• no need to rebalance disks

– Total time is less than half of the rolling procedure

47 © 2014 Pythian Confidential

Page 48: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Exadata Maintenance - Patches• Quarter Full Stack Download Patch (QFSDP)

– single patch for OS + Firmware + drivers

– storage and compute nodes

– full outage option faster than the rolling option

• Quarterly DB patch / Bundle Patch– DB + GI + diskmon

– Rolling

– Includes latest PSU

– Platinum Services: 4 per year remotely done by Oracle (w/restrictions)

• PSU – classical – DB requires outage

– BP must be installed first

48 © 2014 Pythian Confidential

Page 49: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Exadata Maintenance - Patches

Components:– Node firmware

– Operating system

– GI and RDBMS binaries

– Infiniband Switches

– Others: KVM, PDU

Infiniband Switches patch are not cumulative, must apply intermediate patches if any

Exadata Database Machine and Exadata Storage Server Supported Versions (Doc ID 888828.1)

49 © 2014 Pythian Confidential

Page 50: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Exadata Maintenance - Patches

• Storage server patches– applied with patchmgr - binary included with the patch

– runs from compute node (DB)

– uses dcli utility

• compute nodes are patched with cells– Updating key software components on database hosts to match those

on the cells (Doc ID 1284070.1)

– OS updated using yum repository, it can be local

• More resources:http://www.pythian.com/blog/upgrade-exadata-to-11-2-0-3/

http://www.pythian.com/blog/exadata-patching-overview/

50 © 2014 Pythian Confidential

Page 51: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Exadata bundle patch - overview1) download and copy patch files to all servers

– dcli makes it easier

2) prerequisites check– Cell: ./patchmgr -cells /opt/oracle.SupportTools/onecommand/cell_group -patch_check_prereq -rolling

– Switch: ./patchmgr -ibswitches -upgrade -ibswitch_precheck

– Database: ./dbnodeupdate.sh -u -l 17809253_112330_Linux-x86-64.zip -v

3) upgrade opatch to latest version (MOS patch 6880880)

4) Blackout involved targets on OEM

5) Single One-Off rolling patch to apply to database homes prior to Bundle Patch (example 17854520)– time consuming depending on amount of Oracle Homes installed.

– Example: 4 database homes on each database server, 6 database servers => database patch to be applied 24 times

6) Run rolling patch to Cell Servers - estimate 1:30h per cell

7) Run patch to Infiniband switches - 1:30h per switch

8) Run rolling patch to Database Servers - 1:30h per server

9) Run rolling patch to GI instances

10) Run rolling patch to DB instances - per each server and Oracle home

Half rack = 4 db servers, 7 cell servers, 2 infiniband switches => insane amount of hours

51 © 2014 Pythian Confidential

Page 52: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Exadata Maintenance – Replacing parts

• Two types: customer and field replaceable unit.

– CRU are in charge of customer

• Oracle Support takes care and sends the bill

• List of replaceable parts on all Xn servershttp://docs.oracle.com/cd/E50790_01/doc/doc.121/e51951/app_fru.htm#DBMMN21100

• Examples to see in detail:

– RAID HBA Batteries

– Flash disks

52 © 2014 Pythian Confidential

Page 53: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Replacing parts - procedure• Through an SR

– automatically created for failures if using ASR

– Automatically created by Oracle for preventive maintenance

• We run checks and upload results to SR

– sundiag, exachk, ILOM snaphsots

– Be careful to include only current files to avoid misunderstandings

• Oracle Support identifies the problem and creates a field task for the activity

• We propose a time

• A Field Engineer is assigned

• We communicate with OFE– Define details: rolling, servers and schedule

– Review the procedure – usually a MOS note

– Set expectations – we are the responsible for the systems

• We get access granted for OFE to datacenter

• Oracle Support gets the new parts delivered to DC or OFE

• Communicate with OFE at scheduled date and work together

53 © 2014 Pythian Confidential

Page 54: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Replacing parts – checks - sundiag[root@exa1cel02 ~]# /opt/oracle.SupportTools/sundiag.sh

Oracle Exadata Database Machine - Diagnostics Collection Tool

Gathering Linux information

Skipping ILOM collection. Use the ilom or snapshot options, or login to ILOM

over the network and run Snapshot separately if necessary.

driveTool Version 1.30

Library loaded for MegaRAID SAS Controller.

Generating diagnostics tarball and removing temp directory

==============================================================================

Done. The report files are bzip2 compressed in /tmp/sundiag_exa1cel02_1152FMM0C0_2015_05_25_07_32.tar.bz2

==============================================================================

54 © 2014 Pythian Confidential

Page 55: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Replacing parts – checks - exachk

• Oracle Exadata Database Machine exachk or HealthCheck [ID 1070954.1]

• Original version installed on /opt/oracle.SupportTools/exachk

• Download latest version from MOS.

• From DB node:./exachk -a -o verbose

./exachk -clusternodes exa2db01,exa2db02 -excludeprofiles storage,switch

./exachk -clusternodes exa1db01,exa1db02 -cells exa1cel01,exa1cel02,exa1cel03,exa1cel04,exa1cel05,exa1cel06,exa1cel07 -ibswitches

export RAT_ORACLE_HOME=/u01/app/oracle/product/11.2.0.3

./exachk -localonly -excludeprofile storage,switch

55 © 2014 Pythian Confidential

Page 56: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Replacing parts – checks – ILOM Snapshot• How to run an ILOM Snapshot on a Sun/Oracle X86 System (Doc ID 1448069.1)

[root@exa2db02 ~]# ssh exa2cel02-ilom

Password:

Oracle(R) Integrated Lights Out Manager

Version 3.1.2.12.c r81826

Copyright (c) 2013, Oracle and/or its affiliates. All rights reserved.

-> set /SP/diag/snapshot dataset=normal

Set 'dataset' to 'normal'

-> set /SP/diag/snapshot dump_uri=sftp://root:[email protected]/temp

Set 'dump_uri' to 'sftp://root:[email protected]/temp'

-> cd /SP/diag/snapshot

/SP/diag/snapshot

56 © 2014 Pythian Confidential

-> show

/SP/diag/snapshot

Targets:

Properties:

dataset = normal

dump_uri = (Cannot show property)

encrypt_output = false

result = Collecting data into

sftp://root:*****@10.10.10.74/tmp/exa2cel02-

ilom_2419EZ419H_2015-04-13T17-23-51.zip

Snapshot Complete.

Done.

Exa2db02 IP: 10.10.10.74

Page 57: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Replacing parts examples

1. failing Flash disks

2. failing hard disks

3. proactive RAID HBA Batteries

4. troubleshooting server not powering up

57 © 2014 Pythian Confidential

Page 58: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Example - replace failing Flash disksHost=exa2db03.acme.com

Target type=Oracle Exadata Storage Server

Target name=exa1cel02.acme.com

Categories=Fault

Message=Flash disk failed. Status : FAILED Manufacturer : Sun Model Number : Flash Accelerator F20 PCIe Card Size : 23GB Serial Number : 1039M04E85 Firmware : D21Y Slot

Number : PCI Slot: 1; FDOM: 2 Cell Disk : FD_02_exa1cel02 Grid Disk : Not configured Flash Cache : Present Flash Log : Present

Severity=Critical

Event reported time=Feb 12, 2015 3:48:21 AM PDT

Target Lifecycle Status=Production

Line of Business=ExaProd_Grp

Location=Production_DC

Operating System=Linux

Platform=x86_64

Associated Incident Id=12345

Associated Incident Status=New

Associated Incident Owner=

Associated Incident Acknowledged By Owner=No

Associated Incident Priority=None

Associated Incident Escalation Level=0

Event Type=Metric Alert

Event name=Cell_Generated_Alert:alerttype

Notification Count=1

Metric Group=Cell Generated Alert

Metric=Alert Type

Metric value=Stateful

Key Value=A3C12C4EF7E4FB97480CF3HBA1471EA4

Key Column 1=Alert Name

Key Column 1 Value=Hardware

Key Column 2=Alert Sequence

Key Column 2 Value=63

58 © 2014 Pythian Confidential

Page 59: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Example - replace failing Flash disksPrevious OEM alert followed by many others, and ASR creates a SR.

At scheduled date when the Field Engineer brings the part for replacement:

1) Put a blackout on OEM to avoid pages when powering off the affected cell Watch out for Bug 18297754 - "ALERTLOGADR ALERTS OCCURING IN BLACKOUT PERIOD GET REPORTED WHEN BLACKOUT ENDS“

alerts arrive together after blackout ends using OMS 12.1.0.3 / DB 11.2.0.3

2) Before shutting down cellCellCLI> alter cell led on

Disks to be offline should have redundant copy online (asmdeactivationoutcome=YES)CellCLI> list griddisk attributes name, asmmodestatus , asmdeactivationoutcome

3) Shutdown cellCellCLI> alter cell shutdown services all

[root@exa1cel02 ~]# shutdown -h now

59 © 2014 Pythian Confidential

Page 60: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Example - replace failing Flash disks4) Oracle Field Engineer replaces the flash disk

5) Engineer to power up cell - cell services starts automaticallyValidate flash disks now have normal status

CellCLI> list physicaldisk

20:0 E2WXF9 normal

...

FLASH_1_0 1219M0E48A normal

...

CellCLI> alter cell led off

6) Operation finishes after ASM rebalance operation finishes

CellCLI> list griddisk attributes name, asmmodestatus

DATA_EXA1_CD_00_exa1cel02 SYNCING

DATA_EXA1_CD_01_exa1cel02 SYNCING

DATA_EXA1_CD_10_exa1cel02 ONLINE

…[grid@exa1db02 ~]# SQL> select * from gv$asm_operation;

7) Remove OEM blackout

60 © 2014 Pythian Confidential

Page 61: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Example - replace failing hard disks

Also detailed MOS notes to replace failing hard drives:– Note 1390836.1 for predictive failures

– Note 1386147.1 for hard failures

• Oracle ASM disks associated with the grid disks on the physical drive are automatically dropped - Pro-Active Disk Quarantine– If cell also goes offline, disks are not dropped for DISK_REPAIR_TIME

– Hard failures drops the disk with FORCE option and ASM rebalance starts to restore data redundancy

– Predictive failures triggers an ASM rebalance to relocate the data to other disks, and we should wait for it to complete before replacement.

• We should identify the physical disk

CellCLI> LIST PHYSICALDISK WHERE diskType=HardDisk AND status like failed DETAIL

61 © 2014 Pythian Confidential

Page 62: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Replacing RAID HBA Batteries - overviewExample for X2-2 half rack (4 db nodes, 7 storage nodes):

• HBA backup battery units– 11: one per DB and Storage node.

– Protects all internal drives connected to each RAID HBA

– Must be replaced every two years

– Node needs to be restarted

– since Exadata X3 with image 12.1.2.1.2: Remote mounted batteries, no need node restart

• Energy Storage Module (ESM) in PCI flash cards– 28 in storage nodes, four per node

– Protects the DRAM cache

– F20 PCIe card – battery must be replaced every four years (X2 and older models)

– Since X3 new cards does not use batteries

• X3 uses F40 PCIe card, F80 on X4, F160 on X5

https://docs.oracle.com/cd/E19682-01/E21358/z40002bc1401289.html

62 © 2014 Pythian Confidential

Page 63: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Replacing RAID HBA Batteries - overviewRolling or full outage operation

• Restart DB nodes[root@exa1db02 ~]# crsctl stop crs

[root@exa1db02 ~]# shutdown -y -h now

• Field engineer starts up the server, we validate[root@exa1db02 ~]# crsctl check crs

[root@exa1db02 ~]# crsctl stat res -t

• Restart cell nodes

– Similar to previous procedure for replacing failing Flash disk, but disks should be

offlined before shutdown and onlined after

CellCLI> alter griddisk all inactive -- active to activate

GridDisk DATA_EXA1_CD_00_exa2cel03 successfully altered

GridDisk DATA_EXA1_CD_01_exa2cel03 successfully altered

GridDisk DATA_EXA1_CD_02_exa2cel03 successfully altered

63 © 2014 Pythian Confidential

Page 64: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Troubleshooting server not powering up• Login to server ILOM console

ssh root@exa1db01-ilom

show faulty

cd /SP/faultmgmt

start shell

Are you sure you want to start /SP/faultmgmt/shell (y/n)? y

faultmgmtsp> fmadm faulty

------------------- ------------------------------------ -------------- --------

Time UUID msgid Severity

------------------- ------------------------------------ -------------- --------

2014-02-04/18:45:49 c6ac83c9-bd36-c984-85ff-c940887f4925 SPX86-8001-VY Major

Fault class : fault.security.enclosure-open

FRU : /SYS/SP

(Part Number: unknown)

(Serial Number: unknown)

Description : A chassis intrusion failure has occurred.

Response : The chassis-wide service required LED will be illuminated.

Impact : Server is immediately powered off and the service processor will operate in a degraded mode.

Action : The administrator should review the ILOM event log for additional information pertaining

to this diagnosis. Please refer to the Details section of the Knowledge Article for

additional information.

64 © 2014 Pythian Confidential

Page 65: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Troubleshooting server not powering up

• Clear the fault using UUID part

faultmgmtsp> fmadm repair baab83b7-bd3a-b784-8aff-b740887f472a

• Fault should be cleared

faultmgmtsp> fmadm faulty

No faults found

65 © 2014 Pythian Confidential

Page 66: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

Questions?

66

[email protected]

@ncalerouy

http://www.linkedin.com/in/ncalero

© 2014 Pythian Confidential

Page 67: Oracle Exadata Maintenance tasks 101 - OTN Tour 2015

References• Oracle Exadata Database Machine System Overview 12cR1

http://docs.oracle.com/cd/E50790_01/doc/doc.121/e51953/intro.htm#DBMSO109

• Oracle Learning Library - topics for Exadata: https://apexapps.oracle.com/pls/apex/f?p=44785:2::FORCE_QUERY::2%2CCIR%2CRIR:P2_TAGS:Database+Machine

• Exadata Smart Flash Cache Features http://www.oracle.com/technetwork/database/exadata/exadata-smart-flash-cache-366203.pdf

• Flash storage modelshttp://www.oracle.com/us/products/servers-storage/storage/flash-storage/f20/overview/index.html

• Exadata Storage Server Software User's Guidehttp://docs.oracle.com/cd/E50790_01/doc/doc.121/e50471/toc.htm

67 © 2014 Pythian Confidential