Presentation virtualizing oracle unlocked enterprise wide benefits

© Amor Group 2012

Virtualizing Oracle Unlocked

Enterprise Wide Benefits

Dr Alastair Rennie

Principal IT Management Consultant

MBCS CITP CENG

May 2012

© Amor Group 2012 2

Introduction

• There’s Water (Flooding) in Scotland

• Where SEPA Started

• Criticality of Service Delivery

• Virtualisation

• Slowly but Surely we changed

• Business Change and DR Requirements

• First Design

• Second Design

• Final Design

• Where to now?


There’s water in Scotland, and a whole lot

more

• SEPA

– helps regulate and protect the environment: Water, Air, Land, Noise

– Provides flood warning and flood warning dissemination service

– provides environmental information www.environment.scotland.gov.uk www.sepa.org.uk

– provides advice and guidance on environmental management

– educates

• Formed from 65 bodies in 1996

• Centralised systems over 5 years

• Expanded from 600 to 1300 staff

• Turnover increased £34m to £85m

• Developed complex licensing processes and systems

• Implemented complex science and monitoring systems

http://www.environment.scotland.gov.uk/

http://www.sepa.org.uk/


We started small, but

• In 2000, we had:

– 10 servers

– 800 desktops

– stand-alone laboratory equipment

– manually polling river monitoring systems

– no flood warning or management services

• By 2010, we had:

– 140 servers

– 1600 desktops

– fully integrated laboratory equipment

– automatically managing monitoring stations

– sophisticated flood monitoring and prediction systems

– active flood warning services

– flood warning dissemination services

– Server and application growth continuing

0

20

40

60

80

100

120

140

160

180

2000 2002 2004 2006 2008 2010 2012

servers

Applications


Changes to the Criticality of the service?

• Increased business critical services

• Category 1 responder under the Civil Contingencies Act

• Flooding life risk notification services

• SEPA awake 24x7


Virtualisation – Utopia for SEPA • How many engineers does it take to change a light bulb

(ok manage a server farm)?

– 1:100?

– 1:300?

– 1:1000?

– 1:10,000?

– All of the above?

– At least 2!

• How much power to make the light bulb work?

– Less, I hope, and less to throw away

• How much space does the light bulb need?

– Not much, I hope

• We were looking for reduced space, power & management

0

20

40

60

80

100

120

2000 2005 2010

cooling

power

space


ca’ canny but ca’ awa

• In 2007, Built a test virtualisation suite on two spare HP DL 380’s

using VMWare ESXi (free!!!) for the team to play with. Success!!!

• In 2008 we “acquired” an HP MSA1000 storage arrays and couple

of HP DL580s

• Installed VMWare ESX Enterprise – we needed management of

the virtual servers now

• Started building virtual machines in earnest

• By 2010, out of 140 production servers, 40 had been virtualised

• Delivering improved resilience and disaster recovery


Disaster Recovery and Resilience

Shared storage

Primary Site

Dynamic Management

Replication Process

Resilient Site

Dynamic Management

Shared storage

Manual

Transfer

1 Gb Fibre x 2


Recovery Time Objectives

Business/Infrastructure Unit

ALA

RM

NE

TW

OR

K D

RIV

ES

EM

AIL

AR

C

Procurement Facilities and Estates 2 2 2

HR 2 2 2

EPI 2 1

Finance

Environmental Science - Hydrology 4 1 1

SCC 1 3 1

Environmental Science - Non-Hydrology 3

Environmental & Organisational Strategy 3 1

National Administration 1 1

1 = within 2 hrs

2 = within 4 hrs

3 = within 24 hrs

4 = within 48 hrs

5 = within 1 week

Recovery Point Objective

Business/Infrastructure Unit

ALA

RM

NE

TW

OR

K D

RIV

ES

EM

AIL

AR

C

Procurement Facilities and Estates 4 2 4

HR 2

EPI 1 1

Finance

Environmental Science - Hydrology 4 1 1

SCC 1 3 1

Environmental Science - Non-Hydrology

Environmental & Organisational Strategy 3 3

National Administration 1 1

1 = no loss of data

2 = up to 4 hrs

3 = up to 24 hrs

4 = up to 48 hrs

5 = up to 1 week

• External review determined required RTO & RPO

• Present solutions were not able to meet these

• Some were unattainable

• Realism was applied: RPO of 2 hours and RTO of 24 hours agreed

with business operations

All change for continuity, next stop disaster


How? Design, design, design to succeed!

• Specification created with expert help

• Solutions received

• Options appraised and tenders assessed

• To deliver DR, we were initially looking at only critical services

~60% service fail over

• Meanwhile, virtualisation needed to continue

– 40 business services to be virtualised

– 30 systems services to be retained as physical

– 12 Tb of data, 4 Tb of which is user files from other sites

– main virtualisation issues:

• Oracle database

• Citrix application delivery


From Detailed Analysis of Designs

• The options were a hybrid of virtual and physical technologies

• Not integrated and not fit for purpose

– Too complex, not a robust solution

– Too expensive to implement

– Too expensive to run, savings consumed

– Needed more people to manage

– Needed more network resources

Virtual to virtual

Physical to Physical

Physical to Physical

File to File

Virtual

Servers

Physical

Servers

Storage

File

Storage


Back to the Drawing Board

• Test Oracle virtualisation – core business systems

• Test Citrix – science system, quality system

• Re-design for an integrated fully virtualised solution

• Plan 100% virtual for business applications

• Plan for 90% virtual for systems – retain physical DC

• vSphere Site Recovery Manager to automate fail over

Virtual to virtual

Virtual to Virtual

File to File

Virtual

Servers

Enterprise

Storage

File

Storage

Physical

Servers

Mirrored


Should we virtualise Oracle, because ……

• Oracle was identified as the single biggest risk to full virtualisation

– No certification from Oracle support

– Business perceived it as a risk through unknowns:

• Critical systems

• Performance

• Availability

• Support

• Licensing

• Confidence

Major business lack of confidence


Oracle goes forth and has multiplied

• Solutions:

– Find other organisations using virtualised Oracle

• Pressure on Oracle to certify support

– Agreed licensing with Oracle support

– Discuss with SEPA business user groups, agree test plans

– Demonstrate performance with automated load testing

– Review architecture & design for supportability and availability

• Blade architecture with redundancy

• Disk arrays with redundancy

• vMotion for availability and performance

• Test, test, test ……..


Technical challenges in design

• Oracle – needs fast disks

– Had to replace standard SATA raid 6 with SAS raid 10

• Citrix – needs plenty of resources and management

– Lower server density, resource management servers are physical

• Synchronisation – fast, low latency communications

– Dedicated high speed link with acceleration

• Volume of storage – needed 30Tb

– Original calculations on VM size were low, final calculations gave a 50%

increase


This is where we arrived at

• Fully managed DR & Resilience service with vSphere

• Fully virtualised applications

• Fully virtualised storage

• Full backup of VMs

• Full backup of data


What else did we get?

• Increased scope of failover and recovery

• Improved RTO & RPO

• Additional secondary backup, tertiary DR


Here’s what we got out of this

• Costs

– Reduced implementation budget by c£180,000 in £500,000

– Reduced annual revenue by c£80,000 in £260,000

• Space

– Reduced space by 5 cabinets (x2 for DR solution)

• Power

– Reduced power by c20kW x 2

• Reduced Support

– Reduced server engineers by 1FTE

• SLA – SRM has helped deliver

– RTO target was 24 hours, achieve ~2 hours

– RPO target was 2 hours, achieve ~5 minutes

– Improved resilience and downtime of existing service

• Achieving better than 99.98 availability (<2 hours pa unplanned outage)


What did we get right?

• Reduced Power & Space

• Reduced Management

• Correct Virtualisation technology

• Correct storage technology

• Design, design, design

• Test, test, test

What was challenging?

• Storage management – speed and capacity

• Original design not bold enough

• Storage Integration and sizing

• Data flow to distant site

• No fail back


And where to now?

• Full DR testing is required

– No simple return from DR activation on most platforms

– Cannot be achieved easily with vSphere 4

• vSphere 5 is the answer

– Automated failover and failback

– Improved storage management and replication

– Improved management of environment


Thank you

Documents

Presentation virtualizing oracle unlocked enterprise wide benefits