Measuring Network Administration Complexity white paper

MAINFRAME Building a Resilient Integrated Replication Network for BC/DR

WHITE PAPER www.brocade.com

2

Multisite replication networks in IBM z Systems environments need an end-to-end extended distance architecture that enables fast, continuous, and easy access to mission-critical data from anywhere in the world. Brocade delivers a resilient integrated replication solution for mainframe data center connectivity to provide organizations with the best reliability, availability, scalability, and performance in a world where critical mainframe information and applications reside anywhere. This paper provides an overview of the capabilities and advantages included within a Business Continuance (BC) and Disaster Recovery (DR) solution for mainframe IBM z Systems data centers

INTRODUCTION Availability and predictability are key in the mainframe world, which means ensuring infrastructure uptime, while at the same time controlling operating costs. Robust BC/DR solutions are challenging for mainframe data center replication networks for these reasons:

• The ever-increasing need for High Availability (HA) for applications driven by e-commerce and customer demands to access applications anytime and anywhere

• Efficiency, portability, and hardware independence provided by Virtual Machines (VMs), which are easing and increasing BC/DR adoption

• Private cloud-based services driving new DR demand for “always on” services

• Data growth and prevalence of 10 gigabit Ethernet (GbE), which demands greater BC/DR throughput and port-speed scalability

• The problematic nature of Wide Area Network (WAN) connections, because corporate Recovery Time Objective (RTO) and Recovery Point Objective (RPO) are hard to maintain

To address these problems, Brocade delivers a robust disaster recovery solution that delivers increased performance, continuous availability, and simplified monitoring and diagnostics for enterprise-class storage replication across data centers. This solution enables customers to build a mainframe extension infrastructure capable of exceeding requirements driven by nonstop data growth, higher Service Level Agreements (SLAs), and faster connections.

3

INSIGHTS INTO RECENT TRENDS AND DIRECTIONS Current research confirms that even robust DR replication networks cannot meet increasingly demanding corporate goals for network performance and resilience with fewer unplanned outages.

Aberdeen Group Research Brief: “Data Center Downtime: How Much Does It Really Cost?”In February 2012, the Aberdeen Group published an in-depth analysis of factors surrounding data center downtime. Data from the survey was used to identify three so-called “maturity classes” as Best-in-Class, Industry Average, and Laggards and to characterize the top performers in reducing downtime (see Table 1).

Table 1. Mean class performance for the three maturity classes in the survey1.

Definition of Maturity Class

Mean Class Performance

Best-in-Class: Top 20% of aggregate performance scorers

• Recorded fewer than 1 business interruption over the last 12 months

• Averaged only 6 minutes of downtime per each event

• Took less than 1 hour to restore 90% of busines operational functionality after the last interruption

Industry Average: Middle 50% of aggregate performance scorers

• Recorded 2.3 usiness interruptions over the last 12 months

• Averaged 1 hour of downtime per each event

• Took 2 hours to restore 90% of business operational functionality after the last interruption

Laggard: Bottom 30% of aggregate performance scorers

• Recorded 4.4 business interruptions over the last 12 months

• Averaged 9 hours of downtime per each event

• Took 11 hours to restore 90% of business operational functionality after the last interruption

Source: Aberdeen Group, February 2012

Aberdeen examined the cost of an hour of downtime in June 2010, and then again in February 20122, and reported that the average cost for all respondents increased by 38 percent (see Figure 1).


1 Source: Datacenter Downtime: How Much Does It Really Cost?, Research Brief -– Aberdeen Group, Inc. March 20122 Ibid.

Figure 1. Increase in the average cost per hour of data center downtime.

4

In the same study3, Best-in-Class organizations increased their data center performance to the point where they lost virtually nothing to data center downtime; whereas, the total cost for Laggards totaled more than 10 times the total cost for Best-in-Class respondents (see Table 2).

Table 2. Annual cost of downtime broken down for the three maturity classes.

Yearly Cost Metrics Best-in-ClassIndustry Average

Laggards

Business interruption events .3 2.3 4.4

Time per business interruption event (hours) .1 1 9

Total disruption (hours) .03 2.3 39.6

Average cost per hour of disruption $101,600 $181,770 $99,150

Total cost of business interruption events $3,048 $418,071 $3,926,340


Ponemon Institute Study: “Cost of Data Center Outages”In 2013, the Ponemon Institute’s benchmark research4 estimated costs of unplanned data center outages from 67 data centers nationwide. The study reported that 91 percent of data centers had experienced at least one unplanned data center outage in the past 24 months, and cost breakdowns confirmed the increases that were reported by the Aberdeen Group study.

The Ponemon study reported that the cost of a data center outage has increased since 2010, as follows:

• The cost of data center outages now range between $45 and $95 per square foot.

• Costs have increased from a maximum of $74,223 to a maximum of $1,734,433 per organization.

• The overall average cost is $627,418 per incident.

One of the key findings from the study was that the total cost of partial and complete outages can be a significant expense for organizations. The total cost of outages is systematically related to the duration of the outage and the size of the data center. The Ponemon study also found that certain causes of the outage are more expensive than others. Specifically, IT equipment failure is the most expensive root cause.

Other FindingsRecent studies of BC/DR replication networks report that configuration and management complexity have a major impact. The IT Process Institute, in The Visible Ops Handbook: Implementing ITIL in 4 Practical and Auditable Steps5, states that 80 percent of unplanned outages are due to configuration change and management errors made by staff and administrators.

From Gartner6: “Through 2015, 80 percent of outages impacting mission-critical services will be caused by people and process issues.” More than 50 percent of those outages will be caused by change configuration, integration, and management issues.

3 Ibid.4 http://www.ponemon.org/local/upload/file/2013%20Cost%20of%20Data%20Center%20Outages%20FINAL%2012.pdf.5 Behr, Kevin, et al., The Visible Ops Handbook: Implementing ITIL in 4 Practical and Auditable Steps, Information Technology Process Institute, Eugene, OR, 2005.6 Gartner RAS Core Research, Ronni J. Colville, RA6 05012011.

http://www.ponemon.org/local/upload/file/2013%20Cost%20of%20Data%20Center%20Outages%20FINAL%2012.pdf

http://www.ponemon.org/local/upload/file/2013%20Cost%20of%20Data%20Center%20Outages%20FINAL%2012.pdf

5

POINTING TO A NEW DIRECTION FOR BC/DR REPLICATION NETWORKSConsulting with mainframe customers for over 28 years, Brocade often discovers that a customer’s z Systems environments are suboptimal. They span multiple, geographically dispersed data centers, and yet different groups and stakeholders implement different strategies and components—and these groups seldom meet to share information. Typically, no overall responsibility exists for complete end-to-end architecture solution planning, deployment, and monitoring.

Customers report that this situation results in these issues:

• Lower performance and higher costs

• Complex and difficult network changes and updates

• Less flexibility, resilience, and availability for an enterprise’s most important applications and data

These weaknesses in existing DR networks point to the need to build end-to-end connectivity architecture teams in z Systems environments, so that a true, coordinated extended distance architecture can be built with these benefits:

• Is managed by one team responsible for end-to-end connectivity

• Has an identified single point of control and ownership

• Has representation from all stakeholder groups working together

The reward for such an initiative is simple: a less fractured infrastructure that is less complex and thus easier to manage. Today, data center Storage Area Network (SAN) teams go to Local Area Network (LAN) teams for additional network bandwidth for their enterprise BC/DR solution. But a better scenario calls for the SAN team to become the focus for IP connectivity so that they can architect an integrated BC/DR network solution.

WHY MAINFRAME PROFESSIONALS SHOULD WORRY ABOUT IPFor mainframe SAN teams, stability is a key objective. Management requires that they provide the most robust, highly available, and predictable environment possible. Increasingly, that mandate is extended across and between data centers to provide the same robust, five-9s of availability, predictability, and scalability as the data centers themselves have achieved.

Extension between data centers requires the SAN teams to review beyond the storage infrastructure to protect mission-critical data and minimize risk and exposure. Often LAN/WAN and Fibre Channel over IP (FCIP) technologies have historically not been highly available or predictable. In order to ensure that data is protected and available anytime and anywhere, the solution is for mainframe SAN teams to take charge of all of the components, including the IP network.

To predict how resilient and performant the BC/DR replication network will be, these concerns should be raised and discussed:

• Will Director-class IP switching and redundant connections be deployed to ensure high availability?

• Is maximum WAN throughput always guaranteed via multiple and robust paths?

• Will multiple routes really be isolated from each other? (See Figure 2.)

• Will static routing be deployed so that an outage does not find FCIP and network switches fighting over recovery?

• Do network professionals always provision for five-9s availability? Is their SLA to respond in minutes to errors and issues?

• Are good change control practices always followed? Is everyone advised before changes are made?

THE MOST PROBLEMATIC COMPONENT TIED TO OUTAGESWAN connection disruptions are frequent and have a great range of errors that can occur, leading to a complete loss of connectivity, especially in networks where multiple hops and multiple vendors are involved. Inherent problems to WAN connections include latency, packet loss, and out-of-order delivery.

6

WAN EXAMPLE“We have you covered. We have your back.”

In a WAN example of what not to do, the IP networking group told the mainframe team that multiple paths connected their two sites and displayed a diagram with two paths between data centers over a metro network. The mainframe group believed this, since they had never worried about WAN connectivity before and saw no need to worry.

In reality, the two WAN trunks ran along a railroad right-of-way—one trunk on each side of the tracks—with no more than 20 feet separating the trunks. Shortly thereafter, the railroad accidentally cut both trunks and severed site-to-site communications (see Figure 2).

The lesson learned was: If you are not in charge of your interests, no one is protecting your interests.

WHAT IS AN INTEGRATED BC/DR NETWORK?The optimal network architecture to support enterprise BC/DR goals must include integrated end-to-end planning, deployment, and monitoring of the network. A resilient IT architecture must be developed for a robust BC/DR network between data centers. A “single-pane-of-glass” approach must be implemented for simplified device and network management.

FICON, FCIP, and IP connectivity can all be managed with the same management software: Brocade® Network Advisor. Brocade Network Advisor initially focused on the IBM TS7700 grid network solution (IP-based replication) and then was expanded to bring in all replication done over FCIP or IP from all IBM vendors and other vendors. This expansion makes Brocade Network Advisor the perfect solution for mainframe data center connectivity.

Building IT Resilience into the Network ArchitectureResilience is the ability to rapidly adapt and respond to any internal or external disruption, demand, or threat and continue business operations without significant impact, resulting in near-continuous application availability or Continuous Availability (CA) during both planned and unplanned outages. Resilience is broader in scope than DR, which focuses only on recovering from unplanned events such as an earthquake, cut cables or power, or a security breach.

Building IT resilience into the network ensures that enterprises can meet the strictest requirements for business continuance under any circumstances.

Challenges of Designing a Resilient IT ArchitectureWhile a multisourcing strategy has some advantages, it cannot provide a synergistic, integrated environment. Meeting uptime requirements, reducing costs, and ensuring security and efficiencies—while at the same time architecting for growth and scalability—is a very difficult proposition. Unless all of the data center components are themselves synergistic (designed to work together), it is almost impossible to meet all strategic goals while monitoring, managing, and troubleshooting a disparate collection of vendors, hardware, and services.

Figure 2. A railroad worker severed

site-to-site communications doing routine maintenance.

7

When the IT network architecture is developed cross-functionally, ideally led by the SAN team, the next step is to resolve the issue of configuration and management complexity. Customers usually do not have an integrated suite of monitoring or management tools for their systems that cross vendor lines, so there often is no single view of the performance and health of all of the links and devices on the FICON as well as the Ethernet LAN/WAN.

In many cases, customers do not have the internal strength to monitor, manage, and maintain their SAN/LAN/WAN in terms of people, skills, products, and tools, so they often look for a provider who can help them.

Requirements for BC/DR Solutions for z SystemsThe following requirements for a BC/DR solution must be met by the provider. These requirements can be considered mandatory for regulatory compliance and, in some cases, for company survival. The solution must do the following:

• Eliminate single points of failure to increase system resiliency and maximize data availability

• Incorporate failover software to prevent or better tolerate both planned and unplanned system outages

• Streamline data backup and recovery processes to reduce time to recovery

• Enable high-performance remote backup, electronic vaulting, and mirroring at data centers separated by great distances

• Comply with corporate regulatory compliance requirements

• Provide no-impact encryption for data at rest and in flight

ARCHITECTING AN INTEGRATED BC/DR NETWORKTo address these requirements, customers need a purpose-built data center BC/DR solution for mainframe that maximizes performance and scalability for large, complex environments, allowing IT organizations to future-proof their storage extension infrastructure by adding more throughput capacity as their needs grow.

Brocade advocates a resilient BC/DR network solution to unify management in highly dynamic storage, Ethernet, and converged networks. Brocade Network Advisor helps organizations unify and simplify management, providing a single view of the performance and health of all of the links and devices on the FICON and Ethernet LAN/WAN.

The Brocade solution is ideal for building a high-performance data center infrastructure for replication and backup solutions. This solution leverages the Brocade DCX® 8510 Directors, Brocade 7840 Extension Switches, and Brocade MLXe® Core Routers to extend mainframe disk and tape storage applications over distances.

Brocade DCX 8510 DirectorsThe Brocade DCX Series of Directors provides the perfect Director for evolving data center fabrics:

• Provides a highly robust 16 gigabit per second (Gbps) platform in two modular form factors for enterprise data centers supporting z Systems.

• Maximizes chassis bandwidth, performance, and density with Inter-Chassis Links (ICLs) between Directors without adding hops

• Provides over four times the performance and five times the energy efficiency of competitive offerings

• Secures and protects data with plug-in blades for data encryption and SAN extension

• Enables logical partitioning of platforms and fabrics into virtual data and management domains without sacrificing performance, scalability, security, or reliability

8

Brocade 7840 Extension Switch and Brocade FX8-24 Extension BladeThe Brocade 7840 is built for replication and backup workloads over distance that require near-zero downtime and unprecedented performance:

• Moves more storage data between data centers to meet BC/DR objectives with industry-leading performance and scalability

• Encrypts storage data flows over distance without a performance penalty

• Improves load balancing and network resilience with Extension Trunking

• Maintains always-on business operations with nondisruptive firmware upgrades

• Extends proactive monitoring to automatically detect WAN anomalies and avoid unplanned downtime

• Accelerates troubleshooting of end-to-end Input/Output (I/O) flows over distance

The Brocade 7840 maximizes replication and backup throughput over distance using data compression, disk and tape protocol acceleration, and FCIP networking technology. Advanced features and technologies include:

• Extension Hot Code Load (eHCL): Brocade 7840 Extension Switch nondisruptive firmware updates. A Brocade patented process that is lossless to data in-flight. All data is guaranteed to be delivered in order. eHCL does not cause Interface Control Checks (IFCC).

• Lossless Link Loss (LLL): When using Extension Trunking, LLL is a Brocade patented technology that ensures all data in-flight is delivered in order, even when an operational link goes offline. When a link is detected as going down, the data in-flight is retransmitted across remaining links and placed back in order prior to delivery to the upper layer protocols. LLL prevents IFCC in FICON environments and prevents Small Computer Systems Interface (SCSI) errors in FC environments.

Figure 3. A Brocade end-to-end, integrated, synergistic

BC/DR network.

Virtual Tape Array

Data Center A

Data Center B

FC FC Ethernet 10 GbE or SONET

Ethernet

Brocade DCX 2

Brocade DCX 3

Brocade DCX 4

Brocade MLXe 1 Brocade MLXe 3

Brocade MLXe 2 Brocade MLXe 4

zEnterprise zEnterprise

Disk/ DASD

Disk/ DASD

Virtual Tape Array

Brocade DCX 1

Brocade DCX 1

2 x Brocade

7840

2 x Brocade

7840

9

• IP Security (IPsec): Ensures secure transport over WAN links by encrypting data in-flight with AES 256, SHA512 HMAC, IKEv2 and Suite B algorithms without a performance penalty. Brocade encryption is implemented in hardware for ultralow latency and high-speed applications.

• Unparalleled, extremely efficient architecture: Uniquely permits the high-speed, low-latency processing of frames, making extension of synchronous applications possible. Brocade reduces overhead considerably with highly optimized protocol.

• Advanced compression architecture: Provides multiple modes to optimize compression ratios for various throughput requirements.

• WAN-Optimized Transmission Control Protocol (TCP): Provides an aggressive TCP stack, optimizing TCP window size and flow control and accelerating TCP transport for high throughput storage applications.

• PerPriority-TCP-Quality of Service (PTQ): Provides high-, medium-, and low-priority handling of initiator-target flows within the same tunnel for transmission over the WAN, with individual TCP sessions per Quality of Service (QoS) class.

• FCIP FastWrite: Accelerates SCSI write processing, maximizing performance of synchronous and asynchronous replication applications across high-latency WAN connections over any distance.

• Open Systems Tape Pipelining: Accelerates read and write tape processing over distance, significantly reducing backup and recovery times over distance anywhere in the world.

• Brocade Advanced Accelerator for FICON: Uses advanced networking technologies, data management techniques, and protocol intelligence to accelerate IBM zGM, mainframe tape read and write operations, and z/OS host connection to Teradata warehousing systems over distance.

• Jumbo Frames: The Brocade 7840 Extension Switch leverages 9216-byte jumbo frames for increased efficiency, performance, and optimization.

Brocade MLXe Core RoutersBrocade MLXe Core Routers are scalable, multiservice IP/Multiprotocol Label Switching (MPLS) routers that provide these features and technologies:

• Industry-leading port density, low latency, and high throughput with modular architecture for ease of adding interface ports to meet growing traffic requirements

• Wire-speed forwarding performance for VLAN, IPv4, IPv6, MPLS, VPLS, and IPsec traffic

• Support for 10 GbE, 40 GbE, and 100 GbE ports with extended and long-range optic support to enable flexible data transmission to and from any off-site BC/DR sites

• Advanced QoS control for traffic policing of disk and virtual tape replication traffic

• Hitless failover and software upgrades for nonstop transmission of traffic between data centers

• A rich set of Simple Network Management Protocol (SNMP) Management Information Bases (MIBs) enables integration with Brocade Network Advisor or third-party management applications to enable monitoring and reporting

• Scalable 256-bit Suite B IPsec encryption module for IP traffic residing outside of the Brocade 7840, in order to protect data over a third-party WAN from theft or snooping

10

Brocade Fabric Vision TechnologyExtending Brocade Fabric Vision™ technology between data centers enables customers to simplify and automate management, monitoring, and diagnostic capabilities to avoid problems before they impact operations, helping organizations meet SLAs. A single click deploys a policy for monitoring extension resources to automatically detect anomalies, ensure performance, and avoid unplanned downtime. The policies, rules, and thresholds are based on more than fifteen years of Brocade design and support of data center extension customers all over the world.

Brocade Fabric Vision technology, an extension of Brocade Gen 5 Fibre Channel, delivers unprecedented performance, continuous availability, and simplified management to handle the unrelenting growth of data traffic between data centers. Brocade Fabric Vision technology provides:

• Unprecedented insight and visibility across the storage network with powerful built-in monitoring, management, and diagnostic tools to simplify monitoring, increase availability, and dramatically reduce costs

• Integration into Brocade Network Advisor (see Figure 4), providing instant visibility into hot spots at a switch level for rapid corrective actions

• Seamless integration with leading partner orchestration frameworks such as IBM Storage Productivity Center and Systems Director

• Monitoring and Alerting Policy Suite (MAPS), a new, easy-to-use solution for policy-based threshold monitoring and alerting (administrators can use prebuilt common rules/policies or customize)

• The Flow Vision diagnostic tool, which enables administrators to identify, monitor, and analyze specific application data flows

• Other capabilities such as health and performance dashboards and ClearLink Diagnostics

Brocade Network AdvisorDeploying Brocade Network Advisor results in a need for fewer management software tools and associated servers. This simplifies troubleshooting and problem resolution, since there is no need to cross-reference and coordinate other management tools. Mainframe/storage administrators have control over their data and their data’s network for better insight end-to-end. Brocade Network Advisor provides dashboard-based management (see Figure 5), with customization for customer environments.

Figure 4. Brocade Fabric Vision

technology provides unmatched performance,

scalability, and availability across data centers.

Site A Site B

Move 6X more data

Zero downtime

Automated tasks

Proactive Monitoring

Automatic Error Recovery

Validate Infrastructure Accelerate

Troubleshooting

Brocade Extension Technology

11

SUMMARYAlthough customers are deploying DR solutions today, these solutions more often than not do not meet the requirements to address the explosion of mass amounts of new data and the need for deeper insight on business impacts. Planning and architecting of end-to-end solutions under the leadership of the mainframe SAN team with a single-sourcing strategy means unified and simplified network management. Using Brocade hardware and software integration with IBM Storage Productivity Center and Systems Director delivers best-in-class reliability, availability, serviceability, scalability, and performance in a world where their critical mainframe information and applications reside anywhere.

For more information about Brocade solutions, visit www.brocade.com.

Figure 5. Brocade Fabric Vision technology is integrated into Brocade Network Advisor dashboards. The intelligent Brocade Network Advisor performance dashboard displays a customizable summary of all discovered Brocade devices and third-party IP devices.

www.brocade.com

12

© 2015 Brocade Communications Systems, Inc. All Rights Reserved. 04/15 GA-WP-1918-00

ADX, Brocade, Brocade Assurance, the B-wing symbol, DCX, Fabric OS, HyperEdge, ICX, MLX, MyBrocade, OpenScript, The Effortless Network, VCS, VDX, Vplane, and Vyatta are registered trademarks, and Fabric Vision and vADX are trademarks of Brocade Communications Systems, Inc., in the United States and/or in other countries. Other brands, products, or service names mentioned may be trademarks of others.

Notice: This document is for informational purposes only and does not set forth any warranty, expressed or implied, concerning any equipment, equipment feature, or service offered or to be offered by Brocade. Brocade reserves the right to make changes to this document at any time, without notice, and assumes no responsibility for its use. This informational document describes features that may not be currently available. Contact a Brocade sales office for information on feature and product availability. Export of technical data contained in this document may require an export license from the United States government.

Corporate Headquarters San Jose, CA USAT: [email protected]

European Headquarters Geneva, SwitzerlandT: +41-22-799-56-40 [email protected]

Asia Pacific Headquarters SingaporeT: +65-6538-4700 [email protected]

Documents

Measuring Network Administration Complexity white paper