16
Lead from the front Texas Nodal http://nodal.ercot.com 1 Texas Nodal Market Implementation Program Infrastructure Update October 22, 2007

Texas Nodal Market Implementation Program Infrastructure Update October 22, 2007

  • Upload
    teneil

  • View
    49

  • Download
    2

Embed Size (px)

DESCRIPTION

Texas Nodal Market Implementation Program Infrastructure Update October 22, 2007. INF: Project Summary - Initial Charter with Approved Changes. Project area: Infrastructure. Description: Provision of development, testing, EDS and production environments across the Program. - PowerPoint PPT Presentation

Citation preview

Page 1: Texas Nodal Market Implementation Program  Infrastructure Update October 22, 2007

Lead from the frontTexas Nodalhttp://nodal.ercot.com 1

Texas Nodal Market Implementation Program Infrastructure Update

October 22, 2007

Page 2: Texas Nodal Market Implementation Program  Infrastructure Update October 22, 2007

Lead from the frontTexas Nodal

INF: Project Summary - Initial Charter with Approved Changes

Project area: Infrastructure

Description: Provision of development, testing, EDS and production environments across the Program

Vendor(s): IBM, EMC, OracleProject Manager: David Forfia

Key deliverables/short term deliverables:– Hardware specifications– Hardware procurement– Data center capacity resolution– IT Services Catalogue– Service Level Agreements for all Nodal projects– Project development & test (FAT) environments– Integration testing (SAT) environments– EDS environments– Production environments

– Market Participant Identity Management (new)– Release Management processes (from INT)– Oracle 10g upgrade for EDW (new)– High availability monitoring environment (new)

Key Assumptions:– Infrastructure capacity can be incrementally added as the project progresses using IBM’s capacity upgrade on-demand

model– Data center capacity issues will be resolved in the next 90 daysChallenges/Risks:– Existing Data Center capacity (power)

Comments:– IT Operations will be the first ERCOT function to transition to Nodal operations, starting with setting up development

environments

http://nodal.ercot.com 2TPTF Nodal Update 10/22/07

Page 3: Texas Nodal Market Implementation Program  Infrastructure Update October 22, 2007

Lead from the frontTexas Nodal

INF: Project Summary - Initial Delivery Plan

EDS 3 Release

NMMS

Computing Infrastructure

Q2 2008 Q3 2009Q4 2008Q3 2008Q2 2007Q1 2007Q4 2006 Q4 2007 Q1 2008Q3 2006 Q1 2009 Q2 2009

EDS 3

Design, Build, Pre-FAT FAT ITEST

12/01/08Real Time Operations

GO LIVE

ITESTBuild FATRequirements, Conceptual Design

Requirements, Conceptual Design EDS 4 EDS 4

MP SAT

EDS 3 Data Validation

Planning UI

FAT ITEST Common model update processConceptual Design Build

6/30/08LMP Market

Readiness Criteria MET

Q3 2007

Operations UI

3/31/08Single Entry Model

GO LIVE

12/08/08Day Ahead Market/CRR

GO LIVEZonal Shutdown GO/NO GO

Storage

Software

Data Center Capacity

Design to Min Requirements

Database Licensing Strategy

Portal/Integration Licensing Strategy

Decommissioning plan

Data center virtualization

Colocation

Existing Data Center Upgrade

New Data Center Facility

EDS 4 Release

Capacity UpgradeDev

Disaster Site #1

Design to Min Requirements

Prod #1 Prod #2 Dev

DR #2

p5p5

p5p5 Capacity

needs met?

Capacity Planning

Capacity Upgrade on Demand Enabled

Sufficient Power Recovered ?

Sufficient Power Recovered ?

Capacity Upgrade

Initial Storage Upgrade

Capacity Planning

P U S H

SunS egdErot

m metysorci s s

9 019

9900H I T A CH I

S ty s e msDa at

St o rag eFr e ed omHi tac hi

R E A D Y

AL A R M

M E S S A G E

EME RGE NCYUN ITOFFPOWER

UN ITEM ERG ENC YPO WER OF F

P U S H

R E A D YA LA RM

M E S S A GE Sunm ic r o sy s t e m s

S t o r E d g e9 9 1 0

D a ta S y st e m sH I T A C H I

9 900Hi ta ch i Fr ee do m

St or ag e

P U S H

SunS egdErot

m metysorci s s

9 019

9900H I TA C H I

S ty s e m sD a at

S to rag eFr e ed omHi tac hi

R E A D Y

AL A R M

M E S S A G E

EM ERGE NCYUNIT OFFPO WER

UNITEM ERGE NCYPO WER OFF

PU SH

R E AD YAL A R M

M E S S AGE Sunm i cr o s y st e m s

S t o r E d ge9 9 1 0

D a t a S y s te m sH I T A C H I

99 00Hi ta ch i F re e do m

St o rag e

Capacity Upgrade

Capacity Upgrade

Capacity Upgrade

Capacity Upgrade

P U S H

SunS egdErot

m metysorci s s

9 019

9900H I TA C H I

S ty s em sD a at

S to ra geFr ee d omH itac hi

R E AD YAL AR M

M E S S AG E

EM ERG ENC YUN IT

OF FPO WER

P U S H

SunS egdErot

m metysorci s s

9 019

9900H I TA C H IS ty s e m sD a at

St or a geF re ed omHi tac hi

R E A D YA L A R M

M E S S A G E

EM ERG EN CYUNIT

OFFPO WER

PU SH

SunS egdErot

m metysorci s s

9 019

9900H I T A C HI

S ty s em sD a at

St o ra geFr ee do mHi tac hi

R E AD YAL AR M

M E S S AG E

EME RG ENCYUN IT

OFFPOWER

P U S H

SunS egdErot

m metysorci s s

9 019

9900H I TA C H I

S ty s e m sD a at

St or a geF re ed omHi tac hi

R E A D YA L A R M

M E S S A G E

EM ERG EN CYUNIT

OFFPO WER

P U SH

SunS egdErot

m metysorci s s

9 019

9900H I T A C HI

S ty s em sD a at

St o ra geFr ee do mHi tac hi

R E AD YAL AR M

M E S S AG E

EME RG ENCYUN IT

OFFPOWER

http://nodal.ercot.com 3TPTF Nodal Update 10/22/07

Page 4: Texas Nodal Market Implementation Program  Infrastructure Update October 22, 2007

Lead from the frontTexas Nodal

INF: Project Summary – Actual Delivery Plan is slightly later than planned

EDS 3 Release

NMMS

Computing Infrastructure

Q2 2008 Q3 2009Q4 2008Q3 2008Q2 2007Q1 2007Q4 2006 Q4 2007 Q1 2008Q3 2006 Q1 2009 Q2 2009

EDS 3

Design, Build, Pre-FAT FAT ITEST

12/01/08Real Time Operations

GO LIVE

ITESTBuild FATRequirements, Conceptual Design

Requirements, Conceptual Design EDS 4 EDS 4

MP SAT

EDS 3 Data Validation

Planning UI

FAT ITEST Common model update processConceptual Design Build

6/30/08LMP Market

Readiness Criteria MET

Q3 2007

Operations UI

3/31/08Single Entry Model

GO LIVE

12/08/08Day Ahead Market/CRR

GO LIVEZonal Shutdown GO/NO GO

Storage

Software

Data Center Capacity

Design to Min Requirements

Database Licensing Strategy

Portal/Integration Licensing Strategy

Decommissioning plan

Data center virtualization

Colocation Existing Data Center Upgrade

New Data Center Facility

EDS 4 Release

Capacity UpgradeDev

Disaster Site #1

Design to Min Requirements

Prod #1 Prod #2

DevDR #2

p5p5

p5p5Capacity

needs met?

Capacity Planning

Capacity Upgrade on Demand Enabled

Sufficient Power Recovered ?

Sufficient Power Recovered ?

Capacity Upgrade

Initial Storage Upgrade

Capacity Planning

P U S H

SunS egdErot

m metysorci s s

9 019

9900HI T A C H I

S ty s e m sDa at

S to ra geFr e ed omHi tac hi

R E A D Y

AL A R M

M E S S A G E

EM ERGE NC YUNIT OF FPO WER

UN ITEM ERG EN CYPO WER O FF

P U S H

R E A D YA L A R M

M E S S A GE Sunm ic r o sy s t e m s

S t o r E d g e9 9 1 0

D a ta S y s te m sH I T A C H I

990 0Hi ta ch i Fr ee do m

St or ag e

PU SH

SunS egdErot

m metysorci s s

9 019

9900HI T AC H I

S ty s e msDa at

St or ag eFr ee do mHit ach i

R E A D Y

A LA RM

M E S S A G E

EM ERGE NC YUNIT OFFPO WER

UNITEM ERGE NC YPO WER OF F

P U S H

R E A D YA LA RM

M E S S A GE Sunm i c ro s y st e m s

S t o r E d g e9 9 1 0

D a ta S y st e m sH I T A C H I

99 00Hi ta ch i F r ee do m

S to ra ge

Capacity Upgrade

Capacity Upgrade

Capacity Upgrade

Capacity Upgrade

P U S H

SunS egdErot

m metysorci s s

9 019

9900H I T A CH I

S ty s e msDa at

St or ag eFr e ed omHit ac hi

R E A D YA L A RM

M E S S A G E

EM ERG ENC YUN IT

OFFPO WER

PU SH

SunS egdErot

m metysorci s s

9 019

9900H I T A CH IS ty s e msDa at

St o ra geFr ee do mHi tac hi

R EA D YAL A R M

M ES SA G E

EME RG ENCYUN IT

OFFPOWER

P U S H

SunS egdErot

m metysorci s s

9 019

9900HI T A C H I

S tys e m sDa at

S to ra geFr ee d omH itac hi

R E AD YAL AR M

M E S S AG E

EM ERG ENC YUN IT

OF FPO WER

PU SH

SunS egdErot

m metysorci s s

9 019

9900H I T A CH I

S ty s e msDa at

St o ra geFr ee do mHi tac hi

R E A D YA L A R M

M E S S A G E

EME RG ENCYUN IT

OFFPOWER

P U S H

SunS egdErot

m metysorci s s

9 019

9900HI T A C H I

S tys e m sDa at

St or ag eFr ee d omH itac hi

R E AD YA L AR M

M E S S AG E

EM ERG ENC YUN IT

OF FPO WER

http://nodal.ercot.com 4TPTF Nodal Update 10/22/07

Page 5: Texas Nodal Market Implementation Program  Infrastructure Update October 22, 2007

Lead from the frontTexas Nodal

Data Center virtualization was the only viable strategy to make the Nodal timeline

• Move to a Collocation Site– RFP issued in October 2006

• Insufficient capacity available to support ERCOT specialized needs

• Expand Existing Data Centers– Taylor

• Lead times for core equipment longer than Nodal program

– Austin• Existing facilities already expanded to maximum capacity• Long term viability of the facility not determined

• Acquire a new Data Center Facility – Lead times for acquisition and relocation outside Nodal timelines– Currently being explored with the viability of the Austin facility

http://nodal.ercot.com 5TPTF Nodal Update 10/22/07

Page 6: Texas Nodal Market Implementation Program  Infrastructure Update October 22, 2007

Lead from the frontTexas Nodal

Expanding existing data center capacity is an integrated process

• There are 3 components which are balanced to ensure a reliable data center

– Standby Generator Capacity– Uninterruptible Power Supply (UPS) capacity– Data Center Air Conditioning (DCAC) capacity

• The maximum capacity for equipment is determined by the minimum carrying capacity of any one of the components.

• We have taken all steps possible to maximize the current capacity of the data centers to running at the available capacity of the UPS systems in each site.

http://nodal.ercot.com 6

All possible near term facility upgrades have been completed.

Page 7: Texas Nodal Market Implementation Program  Infrastructure Update October 22, 2007

Lead from the frontTexas Nodal

The growth in Nodal server deployments and capacity was correctly forecast

Feb 2006 Mar 2006 Apr 2006 May 2006 Jun 2006 Jul 2006 Aug 2006 Sep 2006 Oct 2006 Nov 2006 Dec 2006 Jan 2007 Apr 2007 Jul 2007 Oct 2007 2008 2009 2010 2011

5 kVA

25 kVA

20 kVA

15 kVA

10 kVA

40 kVA

35 kVA

30 kVA

50 kVA

45 kVA

50 kVA

30 kVA

35 kVA

40 kVA

45 kVA

15 kVA

20 kVA

25 kVA

5 kVA

10 kVA

70 kVA

65 kVA

60 kVA

55 kVA

80 kVA

75 kVA

Dev Virtualization2/5

Fastrak 1/3

Non-EMS Dell Refresh5/5

Dev Virtualization2/5

QA Move to ACC2/5

$970K

Retire HDS 1/5

Domain Restructuring1/5

Test/Prod Virtualization 4/5

Test/Prod Virtualization4/5

Market Redesign Dev Buildout3/4

Database Hosting Environment Refresh3/5

Market Redesign Test Buildout2/4

Market Redesign Prod Buildout2/4

Non-EMS Dell Refresh5/5

EMS Dell Refresh3/5

EMS Dell Refresh3/5

Domain Restructuring5/3

Recla

matio

nCo

nsum

ption

Small (<200Hrs)

Med (200-1000Hrs)

Large (>1000Hrs)

Project Impact

Initiated

Not Defined

Below Line

Status

Risk/RewardLow 1…….5 High

HW Decommission 1/5

Current TCC Threshold: 202.5kVA

$2.8M

How Much is Too Much?

TCC PDUs are rated for 225kVA with a not-to-exceed rating of 90%.

Can we operate at 95%? Likely, but not

smart. What if we only focused

on active projects?Domain Restructuring will free up ~8kVA and QA will move 37kVA from TCC to ACC. This will push the threshold out to early fall.

Calculating Power: Pulled from Aperture, these

figures are nameplate values with a 70%

adjustment for manufacturer conservatism

When do we Buy?The server market is going

through dramatic change over the next few years with

virtualization, multi-cores, and reduced thermals driving

demand for innovation. Short answer: tomorrow is always

better.

X86 Failure Rates:Months 0-36: 6%Months 0-48: 50% Months 0-60: 100%

Includes failures that impact and do not impact service.

Source: Gartner

Understanding the requirements in terms of service demands is critical in making technology decisions. Thinking in terms of 1:1 replacements will lead to

overspend and undercommit.

2008Today

1M TpMIntel IA-64 (Madison)

64 Processors

82 RU / 42 kVA

Intel IA-64 (Tukwila)

4 Processors

4RU / 4 kVA

2006 Xeon Processor Lineup

Q1Paxville DP

Q2Perf. Optimized

Dempsey

Q2Rack Optimized

Dempsey

Q3Woodcrest

Q4Ultra-Dense Woodcrest

There is light at the end of the tunnel!

Is Virtualization a Dream?Intel, AMD, Microsoft, HP,

and others are heavily vested in virtualization.

The technology is production ready today and

will be ubiquitous in 24 months

Benefits of Decommissioning and Tech Refresh:

Aside from power reclamation, this will define technology

lifecycle, create a process for technology refresh and validate

IT as the owner of ERCOT technologies.

Why is tech refresh risky?Technologically, it is fairly

straightforward, easily packaged, and readily

outsourced. Coordination and cooperation are the

real challenge.

Density is the Real Problem:

In the datacenter, thermal demand is a product of

power consumption. Space is an independent

variable. ERCOT has usable space, but no

usable power. Thus, we need to increase density.

EA Recommendations

Relieves TCC Congestion, Defines utility computing for ERCOT

Initiate DC Virtualization PR-60011

2

Relieves TCC Congestion, Prep for DR

Accelerate QA Buildout PR-40070

1

Relieves TCC congestion, Preps business for IT ownership of tech lifecycles

Initiate Decommission/Dell Refresh Project

1

BenefitActionPriority

Relieves TCC Congestion, Defines utility computing for ERCOT

Initiate DC Virtualization PR-60011

2

Relieves TCC Congestion, Prep for DR

Accelerate QA Buildout PR-40070

1

Relieves TCC congestion, Preps business for IT ownership of tech lifecycles

Initiate Decommission/Dell Refresh Project

1

BenefitActionPriority

ERCOT Enterprise ArchitectureDC Capacity Plan AssessmentBrian A Cook 02/28/2006

http://nodal.ercot.com 7TPTF Nodal Update 10/22/07

Page 8: Texas Nodal Market Implementation Program  Infrastructure Update October 22, 2007

Lead from the frontTexas Nodal

Majority of the roadmap is completed, but not all assumptions were right

Achievements to date

Power Recovery Executed the Enterprise Architecture Power Recovery

Plan Development Storage RetiredDevelopment VirtualizedDomain RestructuredQuality Assurance Moved to ACCRetired unused equipment½ Server refreshes EMS & Non-EMSTest/Prod Virtualization¾Database Hosting Refresh

Identified additional compression activitiesRelocated Development databases servers to Blue BuildingAustin SAN RefreshTaylor SAN RefreshDatabase Server ClusteringACC to dedicated UPSRemote access server farm redesigno Application Server Refresho Self cooled equipment racks (ordered)

Key assumptions which were invalid

Power consumption server would drop exponentially• Power consumption per CPU has declined almost as much as

assumed• Server memory power consumption has offset power savings in CPU

power consumption

Nodal redundancy requirements would mirror Zonal• The nodal systems require more active/passive and active/active

deployments than the current Zonal market.• The required level of redundancy and recoverability for the nodal

systems was not fully understood when the projections were made in February 2006.

Nodal environment requirements would mirror Zonal• Integrating a large number of best of breed solution requires more

environments to successfully develop the integration points.• Market participants required structured and unstructured testing

environments to complete their development activities.

http://nodal.ercot.com 8TPTF Nodal Update 10/22/07

Page 9: Texas Nodal Market Implementation Program  Infrastructure Update October 22, 2007

Lead from the frontTexas Nodal

Server consolidation timeline was constrained to minimize risks

• We will do this with the minimum disruption• We have a plan to:

– Minimize the risk– Maximize the benefit– Lower overall costs– Safeguard Market Operations– Improve service levels– Not affect Texas Set 3.0

• However, there are always risks to be aware of in server migration• We are working with all the project managers and ERCOT committees to reduce risk and to

optimize the timing.

• In the process of server consolidation, one production migration was deferred and successfully rolled back two.

– The net effect was a delay in final migrations and power recovery by 5 weeks.

http://nodal.ercot.com 9TPTF Nodal Update 10/22/07

Page 10: Texas Nodal Market Implementation Program  Infrastructure Update October 22, 2007

Lead from the frontTexas Nodal

Migration Metrics

Servers

Starting Total Decom Retired % Remaining

74 16 5 72%

Databases

Starting Total MigratedRetired/Refresh % Remaining

91 58 27 7%

Applications

Total Files Migrated Remaining % Remaining

62 5 57 92%

ScriptsScripts Reviewed Remaining % Remaining

229 229 0 0%

Production Database Storage (in GB)Pre-MigrationBytes Used

Post MigrationBytes Used Reclaimed

Storage Savings

12,080 8,480 3,240 36%

Servers

Annual maintenance contracts will be cancelled on the retired database and application servers

Servers will be made available on the secondary market to recoup their residual values

Databases

Unused or underused databases where eliminated resulting in additional licenses for use at Nodal vendor locations

Database Storage

Properly sizing the databases and compressing the data files for data which has been removed is resulting in

a 36% reduction in used storage in production.

Server consolidation will result in lower expenses during Nodal and after Nodal implementation

http://nodal.ercot.com 10TPTF Nodal Update 10/22/07

Page 11: Texas Nodal Market Implementation Program  Infrastructure Update October 22, 2007

Lead from the frontTexas Nodalhttp://nodal.ercot.com 11

System requirements are driven by project detailed design documents

Each project team has an assigned architect who is responsible for the architecture for the project’s deliverables

that are consolidated into a deployment diagram by IDA

that become individual work requests for operations

to deploy the systems

Page 12: Texas Nodal Market Implementation Program  Infrastructure Update October 22, 2007

Lead from the frontTexas Nodalhttp://nodal.ercot.com 12

Market trials and Integration testing will drive changes to the environments

The assumption in the infrastructure plan is that the initial deployment specifications will have to be changed.

The infrastructure technologies selected as the core for the Nodal system were picked because they adapt well to change.

Page 13: Texas Nodal Market Implementation Program  Infrastructure Update October 22, 2007

Lead from the frontTexas Nodalhttp://nodal.ercot.com 13

TPTF Nodal Update 10/22/07

• Provides the ability to scale up to the maximum capacity of the system to meet usage demands

• System can recover from multiple component failures (CPU/Memory/Power Supply/ IO) with spare capacity within the system.

• System availability above the 99.9% threshold.

• Provides the ability to balance the load across all applications running on the system.

• Minimum power usage configuration.

Critical Path Mitigation Strategies Capacity issues – Scale Up Option

Scale up options are implemented on the largest computing systems in the data center. Extra capacity is available to enable when necessary inside the system or can be added without system down time.

Page 14: Texas Nodal Market Implementation Program  Infrastructure Update October 22, 2007

Lead from the frontTexas Nodalhttp://nodal.ercot.com 14

TPTF Nodal Update 10/22/07

• Requires duplicate computing resources on two separate physical servers.

• Provides the ability to scale up to the maximum capacity of all systems in the cluster to meet usage demands.

• Classic Windows / Linux capacity expansion strategy typically with a load balancing appliance.

• Provides the ability to do maintenance on a server without impacting the system.

• System availability above 99.99%

Critical Path Mitigation Strategies Capacity issues – Scale Out Option

Scale out options are implemented on the smaller computing systems in the data center. Processing load is split across multiple systems to meet the business requirements.

ERCOT will utilize both a scale up and scale out strategy to meet the business requirements of Nodal

Page 15: Texas Nodal Market Implementation Program  Infrastructure Update October 22, 2007

Lead from the frontTexas Nodal

Critical Path Mitigation Strategies Procurement and Architectural Delays

Risk Mitigation Strategy

Long procurement lead times Define system requirements as soon as possible and place orders.

Utilize existing systems or virtual machines where appropriate

Late Technical Architecture design documents Preorder unassembled equipment and have ERCOT staff built to specification

Authorize overtime to build systems

Pre-build standard server configurations based upon service catalogue and assign to projects as requirements become known

Server consolidation decommissions behind schedule

Run data center in the safe “buffer” zone of capacity

http://nodal.ercot.com 15TPTF Nodal Update 10/22/07

Page 16: Texas Nodal Market Implementation Program  Infrastructure Update October 22, 2007

Lead from the frontTexas Nodal

Questions

http://nodal.ercot.com TPTF Nodal Update 10/22/07 16