SteelFusion Deployment Guide - Riverbed

SteelFusion™ Deployment Guide

April 2015

© 2015 Riverbed Technology, Inc. All rights reserved.

Riverbed and any Riverbed product or service name or logo used herein are trademarks of Riverbed. All other trademarks used herein belong to their respective owners. The trademarks and logos displayed herein cannot be used without the prior written consent of Riverbed or their respective owners.

Akamai® and the Akamai wave logo are registered trademarks of Akamai Technologies, Inc. SureRoute is a service mark of Akamai. Apple and Mac are registered trademarks of Apple, Incorporated in the United States and in other countries. Cisco is a registered trademark of Cisco Systems, Inc. and its affiliates in the United States and in other countries. EMC, Symmetrix, and SRDF are registered trademarks of EMC Corporation and its affiliates in the United States and in other countries. IBM, iSeries, and AS/400 are registered trademarks of IBM Corporation and its affiliates in the United States and in other countries. Juniper Networks and Junos are registered trademarks of Juniper Networks, Incorporated in the United States and other countries. Linux is a trademark of Linus Torvalds in the United States and in other countries. Microsoft, Windows, Vista, Outlook, and Internet Explorer are trademarks or registered trademarks of Microsoft Corporation in the United States and in other countries. Oracle and JInitiator are trademarks or registered trademarks of Oracle Corporation in the United States and in other countries. UNIX is a registered trademark in the United States and in other countries, exclusively licensed through X/Open Company, Ltd. VMware, ESX, ESXi are trademarks or registered trademarks of VMware, Inc. in the United States and in other countries.

This product includes Windows Azure Linux Agent developed by the Microsoft Corporation (http://www.microsoft.com/). Copyright 2012 Microsoft Corporation.

This product includes software developed by the University of California, Berkeley (and its contributors), EMC, and Comtech AHA Corporation. This product is derived from the RSA Data Security, Inc. MD5 Message-Digest Algorithm.

The SteelHead Mobile Controller (virtual edition) includes VMware Tools. Portions Copyright © 1998-2013 VMware, Inc. All Rights Reserved.

NetApp Manageability Software Development Kit (NM SDK), including any third-party software available for review with such SDK which can be found at http://communities.netapp.com/docs/DOC-1152, and are included in a NOTICES file included within the downloaded files.

For a list of open source software (including libraries) used in the development of this software along with associated copyright and license agreements, see the Riverbed Support site at https//support.riverbed.com.

This documentation is furnished “AS IS” and is subject to change without notice and should not be construed as a commitment by Riverbed. This documentation may not be copied, modified or distributed without the express authorization of Riverbed and may be used only in connection with Riverbed products and services. Use, duplication, reproduction, release, modification, disclosure or transfer of this documentation is restricted in accordance with the Federal Acquisition Regulations as applied to civilian agencies and the Defense Federal Acquisition Regulation Supplement as applied to military agencies. This documentation qualifies as “commercial computer software documentation” and any use by the government shall be governed solely by these terms. All other use is prohibited. Riverbed assumes no responsibility or liability for any errors or inaccuracies that may appear in this documentation.

Riverbed Technology

680 Folsom Street

San Francisco, CA 94107

Fax: 415-247-8801

Web: http://www.riverbed.com

Phone: 415-247-8800

Part Number

712-00079-07

Contents

Preface.........................................................................................................................................................1

About This Guide ..........................................................................................................................................1Audience ..................................................................................................................................................2Document Conventions .........................................................................................................................2

Documentation and Release Notes .............................................................................................................2

Contacting Riverbed......................................................................................................................................3

What Is New ...................................................................................................................................................3

Chapter 1 - Overview of Core and Storage Edge as a System...............................................................5

Introducing Branch Converged Infrastructure..........................................................................................5

How the SteelFusion Product Family Works.............................................................................................6

System Components and Their Roles .........................................................................................................7

SteelFusion Edge Appliance Architecture..................................................................................................9

Chapter 2 - Deploying Core and Edge as a System .............................................................................. 11

The SteelFusion Family Deployment Process..........................................................................................11Provisioning LUNs on the Storage Array .........................................................................................12Installing the SteelFusion Appliances ...............................................................................................12Pinning and Prepopulation LUNs in the Core .................................................................................13Configuring Snapshot and Data Protection Functionality .............................................................14Managing vSphere Datastores on LUNs Presented by Core .........................................................14

Single-Appliance Versus High-Availability Deployments ....................................................................14Single-Appliance Deployment............................................................................................................15High-Availability Deployment ...........................................................................................................15

Connecting Core with Storage Edge .........................................................................................................16Prerequisites ..........................................................................................................................................16Process Overview: Connecting the SteelFusion Product Family Components ...........................16Adding Storage Edges to the Core Configuration...........................................................................18Configuring Storage Edge ...................................................................................................................18Mapping LUNs to Storage Edges.......................................................................................................18

SteelFusion Deployment Guide iii

Contents

Riverbed Turbo Boot....................................................................................................................................20

Chapter 3 - Deploying the Core...............................................................................................................23

Getting Started .............................................................................................................................................23

Interface and Port Configuration ..............................................................................................................24Core Ports ..............................................................................................................................................24Configuring Interface Routing ...........................................................................................................26Configuring Core for Jumbo Frames .................................................................................................29

Configuring the iSCSI Initiator ..................................................................................................................30

Configuring LUNs.......................................................................................................................................30Exposing LUNs .....................................................................................................................................30Resizing LUNs ......................................................................................................................................31Configuring Fibre Channel LUNs......................................................................................................31Removing a LUN from a Core Configuration..................................................................................31

Configuring Redundant Connectivity with MPIO .................................................................................32MPIO in Core ........................................................................................................................................32Configuring Core MPIO Interfaces ....................................................................................................32

Core Pool Management...............................................................................................................................33Overview of Core Pool Management ................................................................................................33Pool Management Architecture..........................................................................................................34Configuring Pool Management ..........................................................................................................34Changing Pool Management Structure .............................................................................................37High Availability in Pool Management.............................................................................................38

Chapter 4 - SteelFusion and Fibre Channel ...........................................................................................39

Overview of Fibre Channel ........................................................................................................................39Fibre Channel LUN Considerations ..................................................................................................41How VMware ESXi Virtualizes Fibre Channel LUNs.....................................................................41How Core-v Connects to RDM Fibre Channel LUNs .....................................................................43Requirements for Core-v and Fibre Channel SANs ........................................................................44Specifics About Fibre Channel LUNs Versus iSCSI LUNs .............................................................44

Deploying Fibre Channel LUNs on Core-v Appliances.........................................................................45Deployment Prerequisites ...................................................................................................................45Configuring Fibre Channel LUNs......................................................................................................45

Configuring Fibre Channel LUNs in a Core-v HA Scenario .................................................................48The ESXi Servers Hosting the Core-v Appliances Are Managed by vCenter .............................49The ESXi Servers Hosting the Core-vs Are Not Managed by vCenter.........................................51

Populating Fibre Channel LUNs ...............................................................................................................51

Best Practices and Recommendations.......................................................................................................52Best Practices .........................................................................................................................................53Recommendations ................................................................................................................................53

Troubleshooting ...........................................................................................................................................54

iv SteelFusion Deployment Guide

Contents

Chapter 5 - Configuring Storage Edge ...................................................................................................57

Interface and Port Configurations.............................................................................................................57Edge Appliances Ports .........................................................................................................................58Moving Storage Edge to a New Location .........................................................................................59Configuring Edge for Jumbo Frames.................................................................................................60Configuring iSCSI Initiator Timeouts ................................................................................................61

Edge Appliances Storage Specifications...................................................................................................61

Configuring Disk Management .................................................................................................................61Disk Management on BlockStream-Enabled SteelHead EX...........................................................61Disk Management on the SteelFusion Edge Appliance..................................................................63

Configuring SteelFusion Storage...............................................................................................................64

MPIO in Storage Edge.................................................................................................................................65

Chapter 6 - SteelFusion Appliance High-Availability Deployment.......................................................67

Overview of Storage Availability ..............................................................................................................67

Core High Availability ................................................................................................................................68Core with MPIO....................................................................................................................................69Core HA Concepts................................................................................................................................69Configuring HA for Core ....................................................................................................................70

Storage Edge High Availability .................................................................................................................80BlockStream-Enabled SteelHead EX High Availability ..................................................................80SteelFusion Edge Appliance High Availability................................................................................88

Recovering from Split-Brain Scenarios Involving Edge Appliance HA..............................................95

Testing HA Failover Deployments............................................................................................................95

Configuring WAN Redundancy................................................................................................................96Configuring WAN Redundancy with No Core HA ........................................................................96Configuring WAN Redundancy in an HA Environment ...............................................................97

Chapter 7 - SteelFusion Replication (FusionSync) ...............................................................................99

Overview of SteelFusion Replication........................................................................................................99

Architecture of SteelFusion Replication .................................................................................................100SteelFusion Replication Components..............................................................................................100SteelFusion Replication Design Overview......................................................................................100

Failover Scenarios ......................................................................................................................................102Secondary Site Down .........................................................................................................................103Replication Is Suspended at the Secondary Site ............................................................................103Primary Site Is Down (Suspended)..................................................................................................104

FusionSync High-Availability Considerations ......................................................................................105

FusionSync Core HA .................................................................................................................................105Replication HA Failover Scenarios ..................................................................................................106

SteelFusion Replication Metrics...............................................................................................................107

SteelFusion Deployment Guide v

Contents

Chapter 8 - Snapshots and Data Protection.........................................................................................109

Setting Up Application-Consistent Snapshots ......................................................................................109

Configuring Snapshots for LUNs............................................................................................................110

Volume Snapshot Service Support .......................................................................................................... 111

Implementing Riverbed Host Tools for Snapshot Support ................................................................ 111Overview of RHSP and VSS..............................................................................................................112Riverbed Host Tools Operation and Configuration ......................................................................112

Configuring the Proxy Host for Backup.................................................................................................113

Configuring the Storage Array for Proxy Backup ................................................................................113

Data Protection...........................................................................................................................................114

Data Recovery ............................................................................................................................................115

Branch Recovery ........................................................................................................................................116Overview of Branch Recovery ..........................................................................................................116How Branch Recovery Works...........................................................................................................117Branch Recovery Configuration .......................................................................................................117Branch Recovery CLI Configuration Example...............................................................................118

Chapter 9 - Data Resilience and Security.............................................................................................121

Recovering a Single Core..........................................................................................................................121Recovering a Single Physical Core ...................................................................................................122Recovering a Single Core-v ...............................................................................................................122

Storage Edge Replacement .......................................................................................................................123

Disaster Recovery Scenarios.....................................................................................................................124SteelFusion Appliance Failure—Failover........................................................................................124SteelFusion Appliance Failure—Failback .......................................................................................125

Best Practice for LUN Snapshot Rollback ..............................................................................................127

Using CHAP to Secure iSCSI Connectivity............................................................................................128One-Way CHAP..................................................................................................................................128Mutual CHAP .....................................................................................................................................129

At-Rest and In-Flight Data Security........................................................................................................130Enable Data At-Rest Blockstore Encryption ...................................................................................130Enable Data In-Flight Secure Peering Encryption .........................................................................132

Clearing the Blockstore Contents ............................................................................................................132

Additional Security Best Practices ..........................................................................................................133

Chapter 10 - SteelFusion Appliance Upgrade......................................................................................135

Planning Software Upgrades ...................................................................................................................135

Upgrade Sequence .....................................................................................................................................136

Minimize Risk During Upgrading ..........................................................................................................137

Performing the Upgrade...........................................................................................................................137

vi SteelFusion Deployment Guide

Contents

Storage Edge Upgrade .......................................................................................................................137Core Upgrade......................................................................................................................................138

Chapter 11 - Network Quality of Service ..............................................................................................139

Rdisk Protocol Overview..........................................................................................................................139

QoS for SteelFusion Replication Traffic ..................................................................................................141

QoS for LUNs .............................................................................................................................................141QoS for Unpinned LUNs...................................................................................................................141QoS for Pinned LUNs ........................................................................................................................141

QoS for Branch Offices ..............................................................................................................................141QoS for Branch Offices That Mainly Read Data from the Data Center ......................................142QoS for Branch Offices Booting Virtual Machines from the Data Center ..................................142

Time-Based QoS Rules Example..............................................................................................................142

Chapter 12 - Deployment Best Practices .............................................................................................143

Storage Edge Best Practices ......................................................................................................................143Segregate Traffic..................................................................................................................................144Pin the LUN and Prepopulate the Blockstore ................................................................................144Segregate Data onto Multiple LUNs................................................................................................144Ports and Type of Traffic....................................................................................................................145Changing IP Addresses on the Storage Edge, ESXi Host, and Servers ......................................145Disk Management...............................................................................................................................146Rdisk Traffic Routing Options ..........................................................................................................146Deploying SteelFusion with Third-Party Traffic Optimization ...................................................147Windows and ESX Server Storage Layout—SteelFusion-Protected LUNs Vs. Local LUNs ...148VMFS Datastores Deployment on SteelFusion LUNs...................................................................152Enable Windows Persistent Bindings for Mounted iSCSI LUNs ................................................152Set Up Memory Reservation for VMs Running on VMware ESXi in the VSP ..........................153Boot from an Unpinned iSCSI LUN.................................................................................................154Running Antivirus Software .............................................................................................................154Running Disk Defragmentation Software.......................................................................................154Running Backup Software.................................................................................................................154Configure Jumbo Frames...................................................................................................................155

Core Best Practices.....................................................................................................................................155Deploy on Gigabit Ethernet Networks............................................................................................155Use CHAP............................................................................................................................................155Configure Initiators and Storage Groups or LUN Masking.........................................................155Core Hostname and IP Address .......................................................................................................156Segregate Storage Traffic from Management Traffic .....................................................................156When to PIN and Prepopulate the LUN .........................................................................................156Core Configuration Export................................................................................................................157Core in HA Configuration Replacement.........................................................................................157

iSCSI Initiators Timeouts ..........................................................................................................................157Microsoft iSCSI Initiator Timeouts...................................................................................................157ESX iSCSI Initiator Timeouts ............................................................................................................158

SteelFusion Deployment Guide vii

Contents

Operating System Patching......................................................................................................................158Patching at the Branch Office for Virtual Servers Installed on iSCSI LUNs ..............................158Patching at the Data Center for Virtual Servers Installed on iSCSI LUNs.................................158

Chapter 13 - SteelFusion Appliance Sizing..........................................................................................161

General Sizing Considerations.................................................................................................................161

Core Sizing Guidelines..............................................................................................................................161

Storage Edge Sizing Guidelines...............................................................................................................163

Appendix A - Edge Appliance Network Reference Architecture........................................................165

Converting In-Path Interfaces to Data Interfaces..................................................................................165

Multiple VLAN Branch with Four-Port Data NIC................................................................................167

Single VLAN Branch with Four-Port Data NIC....................................................................................169

Multiple VLAN Branch Without Four-Port Data NIC .........................................................................171

viii SteelFusion Deployment Guide

Preface

Welcome to the SteelFusion Deployment Guide. Read this preface for an overview of the information provided in this guide and the documentation conventions used throughout, hardware and software dependencies, additional reading, and contact information. This preface includes the following sections:

“About This Guide” on page 1

“Documentation and Release Notes” on page 2

“Contacting Riverbed” on page 3

“What Is New” on page 3

About This Guide

The SteelFusion Deployment Guide provides an overview of the SteelFusion Core and Edge appliances. It discusses how to configure them as a system.

Riverbed product names have changed. At the time of publication, the user interfaces for the BlockStream-enabled SteelHead EX continue to call SteelFusion, Granite. For the product naming key, see http://www.riverbed.com/products/#Product_List.

This guide includes information relevant to the following products:

Riverbed SteelFusion Core (Core)

Riverbed SteelFusion Core Virtual Edition (Core-v)

Riverbed SteelFusion Edge (Edge)

Riverbed Optimization System (RiOS)

Riverbed SteelHead EX (SteelHead EX or BlockStream-enabled SteelHead EX)

Riverbed SteelHead (SteelHead)

Riverbed SteelCentral Controller for SteelHead (SCC or Controller)

Riverbed Virtual Services Platform (VSP)

This guide is intended to be used together with all the documentation and technical notes available at https://support.riverbed.com.

For information on naming conventions in this manual, see “How the SteelFusion Product Family Works” on page 6.

SteelFusion Deployment Guide 1

https://support.riverbed.com

http://www.riverbed.com/products/#Product_List

http://www.riverbed.com/products/#Product_List

Preface Documentation and Release Notes

Audience

This guide is written for storage, virtualization and network administrators familiar with administering and managing storage arrays, snapshots, backups, and virtual machines (VMs), Fibre Channel, and iSCSI.

This guide requires you to be familiar with virtualization technology, the SteelFusion Core Management Console User’s Guide, the SteelFusion Edge Management Console User’s Guide, the SteelFusion Edge Hardware Installation and Maintenance Guide, the Riverbed Command-Line Interface Reference Manual, the SteelFusion Command-Line Interface Reference Manual, the SteelFusion Edge Installation and Configuration Guide, and the SteelHead Management Console User’s Guide (xx60).

Document Conventions

This guide uses the following standard set of typographical conventions.

Documentation and Release Notes

To obtain the most current version of all Riverbed documentation, go to the Riverbed Support site at https://support.riverbed.com.

If you need more information, see the Riverbed Knowledge Base for any known issues, how-to documents, system requirements, and common error messages. You can browse titles or search for keywords and strings. To access the Riverbed Knowledge Base, log in to the Riverbed Support site at https://support.riverbed.com.

Each software release includes release notes. The release notes identify new features in the software as well as known and fixed problems. To obtain the most current version of the release notes, go to the Software and Documentation section of the Riverbed Support site at https://support.riverbed.com.

Examine the release notes before you begin the installation and configuration process.

Convention Meaning

italics Within text, new terms and emphasized words appear in italic typeface.

boldface Within text, CLI commands, CLI parameters, and REST API properties appear in bold typeface.

Courier Code examples appear in Courier font:

amnesiac > enable amnesiac # configure terminal

< > Values that you specify appear in angle brackets: interface <ip-address>

[ ] Optional keywords or variables appear in brackets: ntp peer <ip-address> [version <number>]

{ } Elements that are part of a required choice appear in braces: {<interface-name> | ascii <string> | hex <string>}

| The pipe symbol represents a choice to select one keyword or variable to the left or right of the symbol. The keyword or variable can be either optional or required: {delete <filename> | upload <filename>}

2 SteelFusion Deployment Guide





Contacting Riverbed Preface

Contacting Riverbed

This section describes how to contact departments within Riverbed.

Technical support - If you have problems installing, using, or replacing Riverbed products, contact Riverbed Support or your channel partner who provides support. To contact Riverbed Support, open a trouble ticket by calling 1-888-RVBD-TAC (1-888-782-3822) in the United States and Canada or +1 415-247-7381 outside the United States. You can also go to https://support.riverbed.com.

Professional services - Riverbed has a staff of professionals who can help you with installation, provisioning, network redesign, project management, custom designs, consolidation project design, and custom coded solutions. To contact Riverbed Professional Services, email [email protected] or go to http://www.riverbed.com/services-training/Services-Training.html.

Documentation - The Riverbed Technical Publications team continually strives to improve the quality and usability of Riverbed documentation. Riverbed appreciates any suggestions you might have about its online documentation or printed materials. Send documentation comments to [email protected].

What Is New

Since the Granite Core and Edge Appliances Deployment Guide (December 2014), the guide has changed its name to the SteelFusion Deployment Guide, and the following information has been added or updated:

Entire guide updated to include the new SteelFusion Edge (SFED xx00)

Updated - “How the SteelFusion Product Family Works” on page 6

New - “SteelFusion Edge Appliance Architecture” on page 9

Updated - “Riverbed Turbo Boot” on page 20

New - “Resizing LUNs” on page 31

Updated - “Edge Appliances Ports” on page 58

New - “Moving Storage Edge to a New Location” on page 59

New - “SCSI Reservations Between Core and Storage Arrays” on page 77

Updated - “Failover States and Sequences” on page 77

New - “Recovering from Failure of Both Cores in HA Configuration” on page 78

Updated - “Storage Edge High Availability” on page 80

New- “Recovering from Split-Brain Scenarios Involving Edge Appliance HA” on page 95

New - “Testing HA Failover Deployments” on page 95

New - “SteelFusion Replication (FusionSync)” on page 99

Updated - “Storage Edge Replacement” on page 123

New - “Using CHAP to Secure iSCSI Connectivity” on page 128

New - “Clearing the Blockstore Contents” on page 132

New - “Additional Security Best Practices” on page 133



mailto:[email protected]

http://www.riverbed.com/services-training/Services-Training.html

mailto:[email protected]

Preface What Is New

Updated - “Core Upgrade” on page 138

Updated - “Converting In-Path Interfaces to Data Interfaces” on page 165


CHAPTER 1 Overview of Core and Storage Edge

as a System

This chapter describes the Core and Storage Edge components as a virtual storage system. It includes the following sections:

“Introducing Branch Converged Infrastructure” on page 5

“How the SteelFusion Product Family Works” on page 6

“System Components and Their Roles” on page 7

“SteelFusion Edge Appliance Architecture” on page 9

Introducing Branch Converged Infrastructure

Core and Storage Edge consolidate data and applications that deliver LAN performance at the branch office over the WAN. By functioning as a single converged platform in the branch office, the SteelFusion product family eliminates the need for dedicated server and storage, including management and related backup resources, at the branch office.

With the SteelFusion product family, data center administrators can extend a data center storage array to a remote location, even over a low-bandwidth link. SteelFusion delivers business agility, enabling you to effectively deliver global storage infrastructure anywhere you need it.

The SteelFusion product family provides the following functionality:

Innovative block storage optimization ensures that you can centrally manage data storage while keeping that data available to business operations in the branch, even in the event of a WAN outage.

A local authoritative cache ensures LAN-speed reads and fast cold writes at the branch.

Integration with Microsoft Volume Shadow Copy Service enables consistent point-in-time data snapshots and seamless integration with backup applications.

Integration with the snapshot capabilities of the storage array and enables you to configure application-consistent snapshots through the Core Management Console.

Integration with industry-standard Challenge-Handshake Authentication Protocol (CHAP) authenticates users and hosts.

A secure vault protects sensitive information using AES 256-bit encryption.

Solid-state disks (SSDs) that guarantee data durability and performance.


Overview of Core and Storage Edge as a System How the SteelFusion Product Family Works

An active-active high-availability (HA) deployment option for SteelFusion ensures the availability of storage array logical unit numbers (LUNs) for remote sites.

Customizable reports provide visibility to key utilization, performance, and diagnostic information.

By consolidating all storage at the data center and creating diskless branches, SteelFusion eliminates data sprawl, costly data replication, and the risk of data loss at the branch office.

How the SteelFusion Product Family Works

The SteelFusion product family is designed to enable branch office server systems to efficiently access storage arrays over the WAN. The SteelFusion product family is typically deployed in conjunction with SteelHeads and is comprised of the following components:

Core - Core is a physical or virtual appliance deployed in the data center alongside SteelHeads and the centralized storage array. Core mounts iSCSI LUNs provisioned for the branch offices. Additionally, Core-v can mount LUNs through Fibre Channel.

Edge appliance - Edge refers to the branch component of the SteelFusion solution. It refers to the appliance present in the remote site of a customer and includes a SteelHead EX appliance with BlockStream license (xx60 series) or a SteelFusion Edge (SFED xx00) appliance. The Edge hosts three distinct components:

– WAN Optimization through the SteelHead

– Hypervisor platform with VMware vSphere

– Storage access and consolidation with BlockStream through Storage Edge

For more information about each appliances architecture, see “SteelFusion Edge Appliance Architecture” on page 9 and “Edge Appliance Network Reference Architecture” on page 165.

Storage Edge - Storage Edge refers to the storage software component on the Edge appliance. Storage Edge presents itself to application servers in the branch as a storage portal. From the portal, the application server mounts the iSCSI LUNs that are projected across the WAN from the data center. Storage Edge can also host local LUNs for use as temporary storage that are not projected from the data center: for example, temporary or local copies of software repositories.

Riverbed strongly recommends that you read the SteelFusion Appliance Interoperability Matrix at https://splash.riverbed.com/docs/DOC-4204.

The branch office server connects to Storage Edge, which implements handlers for the iSCSI protocol. The Storage Edge also connects to the blockstore, a persistent local cache of storage blocks.

When the branch office server requests blocks, those blocks are served locally from the blockstore (unless they are not present, in which case Storage Edge retrieves them from the data center LUN through the Core). Similarly, newly written blocks are spooled to the local cache, acknowledged by the Storage Edge to the branch office server, and then asynchronously propagated to the data center. Because each Storage Edge implementation is linked to one or more dedicated LUNs at the data center, the blockstore is authoritative for both reads and writes and can tolerate WAN outages without affecting cache coherency.

Blocks are transferred between Storage Edges and Cores through an internal protocol. The Core then writes the updates to the data center LUNs through the iSCSI or Fibre Channel protocol. SteelFusion is designed to be coupled with the SteelHead WAN optimization. You can further optimize traffic between the branch offices and the data center by implementing SteelHeads.

For more information about Fibre Channel, see “SteelFusion and Fibre Channel” on page 39.


https://splash.riverbed.com/docs/DOC-4204

https://splash.riverbed.com/docs/DOC-4204

System Components and Their Roles Overview of Core and Storage Edge as a System

SteelFusion initially populates the blockstore using the following methods:

On-Demand prefetch - The system observes block requests, applies heuristics based on these observations to intelligently predict the blocks most likely to be requested in the near future, and then requests those blocks from the data center LUN in advance.

Policy-based prefetch - Configured policies identify the blocks that are likely to be requested at a given branch office site in advance; the Edge then requests those blocks from the data center LUN in advance.

First request - Blocks are added to the blockstore when first requested. Because the first request is cold, it is subject to standard WAN latency. Subsequent traffic is optimized.

System Components and Their Roles

At the data center, Core integrates with existing storage systems, virtual infrastructure and SteelHead deployments. Core connects dedicated LUNs with each Edge appliance at the branch office.

The blockstore is the Storage Edge authoritative persistent cache of storage blocks. The blockstore is local from a branch perspective and holds blocks from all the LUNs available through a specific Edge. The blockstore is authoritative because it includes the latest-written blocks before they are sent through the Core to a storage array at the data center. When a server at the branch office requests data blocks, those blocks are served locally from the blockstore if those blocks are currently present. If they are not present, the Storage Edge retrieves them through the Core from the data center LUN. Similarly, newly written blocks are committed to the blockstore cache, acknowledged by the branch Edge, and then asynchronously flushed to the data center LUN through the Core. The Storage Edge provides LAN like performance in the branch office and is not affected by the latencies with the data center storage array.

Blocks are communicated between the Storage Edges and the Core through an internal protocol. When the Core receives the data, it writes the updates to the LUN on the storage array through the iSCSI or Fibre Channel protocol.

The data cache in the blockstore is stored as is, and it is not deduplicated. Edge appliances include the SteelHead, and in the data center, the Cores are coupled with SteelHead products, which assist with data reduction and streamlining between the Storage Edge and the Core.

You can encrypt the blockstore cache at rest using AES 128/192/256-bit encryption. This encryption eliminates the risk if your appliances are stolen. Similarly, because SteelFusion enables the removal of physical tape media and backup devices from the remote offices, this also eliminates risk of data theft. As a result, the blockstore eliminates the need for separate block storage facilities at the branch office and all the associated maintenance, tools, backup services, hardware, service resources, and so on.


Overview of Core and Storage Edge as a System System Components and Their Roles

For more information about blockstore encryption, see “At-Rest and In-Flight Data Security” on page 130.

Figure 1-1. Generic SteelFusion Deployment

The basic SteelFusion system components are:

Branch server - The branch-side server that accesses data from the SteelFusion system instead of a local storage device. This server can also run as a VSP VM on the local SteelHead EX.

Blockstore - A persistent local cache of storage blocks. Because each Edge is linked to a dedicated LUN at the data center, the blockstore is authoritative for both reads and writes. In Figure 1-1, the blockstore on the branch side synchronizes with one of the LUNs at the data center.

iSCSI Initiator - The branch-side server that sends SCSI commands to its iSCSI target that is the Storage Edge in the branch. At the data center, the Core is an iSCSI initiator that sends SCSI commands to access LUNs through an iSCSI target in the storage array.

Storage Edge - Also referred to as a BlockStream-enabled SteelHead EX or the SteelFusion Edge appliance, the branch-side component of the SteelFusion system and links the blockstore through the Core to the LUN at the data center. The SteelHead provides WAN optimization services.

Data center SteelHead - The data center-side SteelHead peer for WAN optimization.

Core - The data center component of the SteelFusion product family. Core manages block transfers between the LUN and the Edge.

iSCSI target - Depending on what side you are on:

– In the branch office, the Storage Edge that communicates with the branch-side iSCSI initiator in the branch server.

– In the data center, the storage array that communicates with the Core.

LUNs- A unit of block storage deployed from the storage array and projected through the Core to the Storage Edge.


SteelFusion Edge Appliance Architecture Overview of Core and Storage Edge as a System

SteelFusion Edge Appliance Architecture

The SteelFusion Edge appliance differs from the BlockStream-enabled SteelHead EX in a number of ways; the details of which are beyond the scope of this guide. However, one of the key and important differences is that the SteelFusion Edge appliance contains two distinct computing nodes within the same hardware chassis (Figure 1-2).

Figure 1-2. SteelFusion Edge Appliance Architecture

The two nodes are as follows:

The RiOS node provides networking, WAN optimization, direct attached storage available for SteelFusion use, and VSP functionality.

The hypervisor node provides hypervisor-based hardware resources and software virtualization.

The two-node design provides hardware resource separation and isolation.

For details on the RiOS and hypervisor nodes, see the SteelFusion Edge Installation and Configuration Guide and the SteelFusion Edge Hardware Installation and Maintenance Guide.


Overview of Core and Storage Edge as a System SteelFusion Edge Appliance Architecture


CHAPTER 2 Deploying Core and Edge as a

System

This chapter describes the process and procedures for deploying the SteelFusion product family at both the branch office and the data center. This chapter is a general introduction to one of the possible scenarios to form a basic, but typical, SteelFusion deployment. Further details on specific stages of deployment, such as a Core and Storage Edge configuration, high availability, configuration scenarios for snapshots, and so on, are covered in following chapters of this guide.

This chapter includes the following sections:

“The SteelFusion Family Deployment Process” on page 11

“Single-Appliance Versus High-Availability Deployments” on page 14

“Connecting Core with Storage Edge” on page 16

“Riverbed Turbo Boot” on page 20

The SteelFusion Family Deployment Process

This section provides a broad outline of the process for deploying the SteelFusion product family. Depending on the type of deployment and products involved (for example, with or without redundancy, iSCSI or Fibre Channel connected storage, and so on) the details of certain steps can vary. Use the outline below to create a deployment plan that is specific for your requirements. The steps are listed in approximate order; dependencies are listed when required.

The tasks are as follows:

“Provisioning LUNs on the Storage Array” on page 12

“Installing the SteelFusion Appliances” on page 12

“Pinning and Prepopulation LUNs in the Core” on page 13

“Configuring Snapshot and Data Protection Functionality” on page 14

“Managing vSphere Datastores on LUNs Presented by Core” on page 14


Deploying Core and Edge as a System The SteelFusion Family Deployment Process

Provisioning LUNs on the Storage Array

This section describes how to provision LUNs on the storage array.

To provision LUNs on the storage array

1. Enable the connections for the type of LUNs you intend to expose to the branch: for example, iSCSI and Fibre Channel.

2. Determine the LUNs you want to dedicate to specific branches.

Note: Step 3 and Step 4 are optional. The LUNs to be exposed to the branch can be empty and populated later. For example, if you require the LUNs to be preloaded with virtual machine images as part of the ESX datastore, you only need perform Step 3 and Step 4 if you want to preload the LUNs with data.

3. By connecting to a temporary ESX server, you can deploy virtual machines (VMs) for branch services (including the branch Windows server) to the LUNs.

Riverbed recommends that you install the optional Windows Server plug-ins at this point. This installation is useful if you use the Boot Over WAN functionality available for Windows 2008 and Windows 2012. For details, see “Implementing Riverbed Host Tools for Snapshot Support” on page 111.

4. After you deploy the VMs, disconnect from the temporary ESX server.

5. Create the necessary initiator or target groups.

For more information, see the documentation for your storage array.

Installing the SteelFusion Appliances

This section describes at a high level how to install and configure Core and Edge appliances. For complete installation procedures, see the SteelFusion Core Installation and Configuration Guide, the SteelFusion Edge Installation and Configuration Guide, and the SteelHead EX Installation and Configuration Guide.

To install and configure Core

1. Install the Core or Core-v in the data center network.

2. Connect the Core appliance to the storage array.

3. Through the Core, discover and configure the desired LUNs on the storage array.

4. (Recommended) Enable and configure HA.

5. (Recommended) Enable and configure multipath I/O (MPIO) for iSCSI connected storage.

If you have decided to use MPIO, you must configure it at two separate and independent points:

iSCSI Initiator

iSCSI Target


The SteelFusion Family Deployment Process Deploying Core and Edge as a System

If you are using Fibre Channel connected LUNs, make sure you enable multiple paths on the vSphere host on which the Core-v is deployed.

Additional steps are required on the Edge to complete a typical installation. A high-level series of steps is shown the following procedure.

To install and configure the Edge appliance

1. Install the SteelHead EX or the SteelFusion Edge in the branch office network.

2. On the appliance, configure disk management to enable SteelFusion storage mode.

3. Preconfigure the Edge for connection to Core.

4. Connect the Edge and the Core.

Pinning and Prepopulation LUNs in the Core

LUN pinning and prepopulation are two separate features configured through the Core that together determine how block data is kept in the blockstore on the Storage Edge.

When you pin a LUN in the Core configuration, you reserve space in the Storage Edge blockstore that is equal in size to the LUN at the storage array. Furthermore, when blocks are fetched by the Storage Edge, they remain in the blockstore in their entirety; by contrast, blocks in unpinned LUNs might be cleared on a first-in, first-out basis.

Pinning only reserves blockstore space; it does not populate that space with blocks. The blockstore is populated as the application server in the branch requests data not yet in the blockstore (causing the Storage Edge to issue a read through the Core), or through prepopulation.

The prepopulation functionality enables you to prefetch blocks to the blockstore. You can prepopulate a pinned LUN on the blockstore in one step; however, if the number of blocks is very large, you can configure a prepopulation schedule that prepopulates the blockstore only during specific intervals of your choice: for example, not during business hours. After the prepopulation process is completed, the schedule stops automatically.

Note: Prefetch does not optimize access for VMs that contain any SE SPARSE (Space Efficient Sparse) format snapshots.

For more information about pinning and prepopulation, see the SteelFusion Core Management Console User’s Guide.

To configure pinning and prepopulation

1. Choose Configure > Manage: LUNs to display the LUNs page.

2. Click the LUN configuration to display the configuration settings.

3. Select the Pin/Prepop tab.

4. To pin the LUN, select Pinned from the drop-down list and click Update.

When the LUN is pinned, the prepopulation settings are activated for configuration.


Deploying Core and Edge as a System Single-Appliance Versus High-Availability Deployments

Configuring Snapshot and Data Protection Functionality

Core integrates with the snapshot capabilities of the storage array and enables you to configure application-consistent snapshots through the Core Management Console.

For details, see “Snapshots and Data Protection” on page 109.

Understanding Crash Consistency and Application Consistency

In the context of snapshots and backups and data protection in general, two types or states of data consistency are distinguished:

Crash consistency - A backup or snapshot is crash consistent if all of the interrelated data components are as they were (write-order consistent) at the instant of the crash. To better understand this type of consistency, imagine the status of the data on your PC’s hard drive after a power outage or similar event. A crash-consistent backup is usually sufficient for nondatabase operating systems and applications like file servers, DHCP Servers, print servers, and so on.

Application consistency - A backup or snapshot is application consistent if, in addition to being write-order consistent, running applications have completed all their operations and flush their buffers to disk (application quiescing). Application-consistent backups are recommended for database operating systems and applications such as SQL, Oracle, and Exchange.

The SteelFusion product family ensures continuous crash consistency at the branch and at the data center by using journaling and by preserving the order of WRITEs across all the exposed LUNs. For application-consistent backups, administrators can directly configure and assign hourly, daily, or weekly snapshot policies on the Core. Edge interacts directly with both VMware ESXi and Microsoft Windows servers, through VMware Tools and volume snapshot service (VSS) to quiesce the applications and generate application-consistent snapshots of both Virtual Machine File System (VMFS) and New Technology File System (NTFS) data drives.

Managing vSphere Datastores on LUNs Presented by Core

Through the vSphere client, you can view inside the LUN to see the VMs previously loaded in the data center storage array. You can add a LUN that contains vSphere VMs as a datastore to the ESXi server in the branch. This can either be a regular hardware platform hosting ESXi, the hypervisor node in the Edge appliance, or the VSP inside the SteelHead EX. You can next add the VMs to the ESXi inventory run as services through the SteelHead EX.

Similarly, you can use vSphere to provision LUNs with VMs from VSP on the SteelHead EX. For more information, see the SteelHead Management Console User’s Guide.

Single-Appliance Versus High-Availability Deployments

This section describes types of SteelFusion appliance deployments. It includes the following topics:

“Single-Appliance Deployment” on page 15

“High-Availability Deployment” on page 15

This section assumes that you understand the basics of how the SteelFusion product family works together, and are ready to deploy your appliances.


Single-Appliance Versus High-Availability Deployments Deploying Core and Edge as a System

Single-Appliance Deployment

In a single-appliance deployment (basic deployment), to the storage array through a data interface. Depending on the model of the Core, the data interface is named ethX_Y in which X and Y are some numerical value such as eth0_0, eth0_1, or so on. The primary (PRI) interface is dedicated to the traffic VLAN, and the auxiliary (AUX) interface is dedicated to the management VLAN. More complex designs generally use the additional network interfaces. For more information about Core interface names and their possible uses, see “Interface and Port Configuration” on page 24.

Figure 2-1. Single Appliance Deployment

High-Availability Deployment

In a high-availability (HA) deployment, two Cores operate as failover peers. Both appliances operate independently with their respective and distinct Edges until one fails; then the remaining operational Core handles the traffic for both appliances.

For more information about HA, see “SteelFusion Appliance High-Availability Deployment” on page 67.

Figure 2-2. HA Deployment


Deploying Core and Edge as a System Connecting Core with Storage Edge

Connecting Core with Storage Edge

This section describes the prerequisites for configuring the data center and branch office components of the SteelFusion product family, and it provides an overview of the procedures required. It includes the following topics:

“Prerequisites” on page 16

“Process Overview: Connecting the SteelFusion Product Family Components” on page 16

“Adding Storage Edges to the Core Configuration” on page 18

“Configuring Storage Edge” on page 18

“Mapping LUNs to Storage Edges” on page 18

Prerequisites

Before you configure Core with Storage Edge, ensure that the following tasks have been completed:

Assign an IP address or hostname to the Core.

Determine the iSCSI Qualified Name (IQN) to be used for Core.

When you configure Core, you set this value in the initiator configuration.

Set up your storage array:

– Register the Core IQN.

– Configure iSCSI portal, targets, and LUNs, with the LUNs assigned to the Core IQN.

Assign an IP address or hostname to the Edge.

Process Overview: Connecting the SteelFusion Product Family Components

The following table summarizes the process for connecting and configuring Core and Storage Edge as a system. You can perform some of the steps in the table in a different order, or even in parallel with each other in some cases. The sequence shown is intended to illustrate a method that enables you to complete one task so that the resources and settings are ready for the next task in the sequence.

Component Procedure Description

Core Determine the network settings for Core.

Prior to deployment:

• Assign an IP address or hostname to Core.

• Determine the IQN to be used for Core.

When you configure Core, you set this value in the initiator configuration.

iSCSI-compliant storage array

Register the Core IQN. SteelFusion uses the IQN name format for iSCSI Initiators. For details about IQN, see http://tools.ietf.org/html/rfc3720.

Prepare the iSCSI portals, targets, and LUNs, with the LUNs assigned to the Core IQN.

Prior to deploying Core, you must prepare these components.


http://tools.ietf.org/html/rfc3720

http://tools.ietf.org/html/rfc3720

Connecting Core with Storage Edge Deploying Core and Edge as a System

Fibre channel-compliant storage array

Enable Fibre Channel connections.

For details, see “SteelFusion and Fibre Channel” on page 39.

Core Install Core. For details, see the SteelFusion Core Installation and Configuration Guide.

Storage Edge Install the Edge or the BlockStream-enabled SteelHead EX. (A separate SteelFusion license might be required.)

For details, see the SteelFusion Edge Installation and Configuration Guide, or the SteelHead EX Installation and Configuration Guide.

Storage Edge Configure disk management. You can configure the disk layout mode to allow space for the SteelFusion blockstore in the Disk Management page.

Free disk space is divided between the Virtual Services Platform (VSP) and the SteelFusion blockstore.

For details, see “Configuring Disk Management” on page 61.

Configure SteelFusion storage settings.

The SteelFusion storage settings are used by the Core to recognize and connect to the Storage Edge.

For details, see “Configuring SteelFusion Storage” on page 64.

Core Run the Setup Wizard to perform initial configuration.

The Setup Wizard performs the initial, minimal configuration of the Core, including:

• Network settings

• iSCSI Initiator configuration

• Mapping LUNs to the Edges

For details, see the SteelFusion Core Installation and Configuration Guide.

Core Configure iSCSI Initiators and LUNs.

Configure the iSCSI Initiator and specify an iSCSI portal. This portal discovers all the targets within that portal.

Add and configure the discovered targets to the iSCSI Initiator configuration.

Configure Targets. After a target is added, all the LUNs on that target can be discovered, and you can add them to the running configuration.

Map LUNs to the Edges. Using the previously defined Edge self-identifier, connect LUNs to the appropriate Edges.

For details about the above procedures, see the SteelFusion Core Management Console User’s Guide.



Deploying Core and Edge as a System Connecting Core with Storage Edge

Adding Storage Edges to the Core Configuration

You can add and modify connectivity with Storage Edges in the Configure > Manage: SteelFusion Edges page in the Core Management Console.

This procedure requires you to provide the Edge Identifier for the Storage Edge. This value is defined by choosing:

EX Features > Granite: Storage in the SteelHead EX Management Console, or specified through the CLI.

Storage > Storage Edge in the SteelFusion Edge Management Console.

For more information, see the SteelFusion Core Management Console User’s Guide, the SteelFusion Command-Line Interface Reference Manual, and the Riverbed Command-Line Interface Reference Manual.

Configuring Storage Edge

For information about Storage Edge configuration for deployment, see “Configuring Storage Edge” on page 57.S

Mapping LUNs to Storage Edges

This section describes how to configure LUNs and map them to Storage Edges. It includes the following topics:

“Configuring iSCSI Settings” on page 18

“Configuring LUNs” on page 19

“Configuring Storage Edges for Specific LUNs” on page 19

Configuring iSCSI Settings

You can view and configure the iSCSI Initiator, portals, and targets in the iSCSI Configuration page.

The iSCSI Initiator settings configure how the Core communicates with one or more storage arrays through the specified portal configuration.

After configuring the iSCSI portal, you can open the portal configuration to configure targets.

For more information and procedures, see the SteelFusion Core Management Console User’s Guide, the SteelFusion Command-Line Interface Reference Manual, and the Riverbed Command-Line Interface Reference Manual.

Core Configure CHAP users and storage array snapshots.

Optionally, you can configure CHAP users and storage array snapshots.

For details, see the SteelFusion Core Management Console User’s Guide.

Storage Edge Confirm the connection with Core.

After completing the Core configuration, confirm that the Storage Edge is connected to and communicating with the Core.

For details, see “Mapping LUNs to Storage Edges” on page 18.



Connecting Core with Storage Edge Deploying Core and Edge as a System

Configuring LUNs

You configure Block Disk (Fibre Channel), Edge Local, and iSCSI LUNs in the LUNs page.

Typically, Block Disk and iSCSI LUNs are used to store production data. They share the space in the blockstore cache of the associated Edges, and the data is continuously replicated and kept synchronized with the associated LUN in the data center. The Storage Edge blockstore caches only the working set of data blocks for these LUNs; additional data is retrieved from the data center when needed.

Block-disk LUN configuration pertains to Fibre Channel support. Fibre Channel is supported only in Core-v deployments. For more information, see “Configuring Fibre Channel LUNs” on page 31.

Storage Edge local LUNs are used to store transient and temporary data or local copies of software distribution repositories. Local LUNs also use dedicated space in the blockstore cache of the associated Storage Edges, but the data is not replicated back to the data center LUNs.

Configuring Storage Edges for Specific LUNs

After you configure the LUNs and Storage Edges for the Core, you can map the LUNs to the Storage Edges.

You complete this mapping through the Storage Edge configuration in the Core Management Console Configure > Manage: SteelFusion Edges.

When you select a specific Storage Edge, the following controls for additional configuration are displayed.

Control Description

Status This panel displays the following information about the selected Edge:

• IP Address - The IP address of the selected Edge.

• Connection Status - Connection status to the selected Edge.

• Connection Duration - Duration of the current connection.

• Total LUN Capacity - Total storage capacity of the LUN dedicated to the selected Edge.

• Blockstore Encryption - Type of encryption selected, if any.

The panel also displays a small-scale version of the Edge Data I/O report.

Target Settings This panel displays the following controls for configuring the target settings:

• Target Name - Displays the system name of the selected Edge.

• Require Secured Initiator Authentication - Requires CHAP authorization when the selected Edge is connecting to initiators.

If the Require Secured Initiator Authentication setting is selected, you must set authentication to CHAP in the adjacent Initiator tab.

• Enable Header Digest - Includes the header digest data from the iSCSI protocol data unit (PDU).

• Enable Data Digest - Includes the data digest data from the iSCSI PDU.

• Update Target - Applies any changes you make to the settings in this panel.


Deploying Core and Edge as a System Riverbed Turbo Boot

Riverbed Turbo Boot

Riverbed Turbo Boot uses the Windows Performance Toolkit to generate information that enables faster boot times for Windows VMs in the branch office on either external ESXi hosts or on VSP. Turbo Boot can improve boot times by two to ten times, depending on the customer scenario. Turbo Boot is a plugin that records the disk I/O during boot up of the host operating system it has been installed on. The disk I/O activity is logged to a file. During any subsequent boots of the host system, the Turbo Boot log file is used by the Core to perform more accurate prefetch of data.

Initiators This panel displays controls for adding and managing initiator configurations:

• Initiator Name - Specify the name of the initiator you are configuring.

• Add to Initiator Group - Select an initiator group from the drop-down list.

• Authentication - Select the authentication method from the drop-down list:

None - No authentication required.

CHAP - Only the target authenticates the initiator. The secret is set just for the target; all initiators that want to access that target must use the same secret to begin a session with the target.

Mutual CHAP - The target and the initiator authenticate each other. A separate secret is set for each target and for each initiator in the storage array.

If Require Secured Initiator Authentication is selected for the Edge in the Target Settings tab, authentication must be configured for a CHAP option.

• Add Initiator - Adds the new initiator to the running configuration.

Initiator Groups This panel displays controls for adding and managing initiator group configurations:

• Group Name - Specifies a name for the group.

• Add Group - Adds the new group. The group name displays in the Initiator Group list.

After this initial configuration, click the new group name in the list to display additional controls:

• Click Add or Remove to control the initiators included in the group.

LUNs This panel displays controls for mapping available LUNs to the selected Storage Edge.

After mapping, the LUN displays in the list in this panel. To manage group and initiator access, click the name of the LUN to access additional controls.

Prepopulation This panel displays controls for configuring prepopulation tasks:

• Schedule Name - Specify a task name.

• Start Time - Select the start day and time from the respective drop-down lists.

• Stop Time - Select the stop day and time from the respective drop-down list.

• Add Prepopulation Schedule - Adds the task to the Task list.

This prepopulation schedule is applied to all virtual LUNs mapped to this appliance if you do not configure any LUN-specific schedules.

To delete an existing task, click the trash icon in the Task list.

The LUN must be pinned to enable prepopulation. For more information, see “Pinning and Prepopulation LUNs in the Core” on page 13.

Control Description


Riverbed Turbo Boot Deploying Core and Edge as a System

At the end of each boot made by the host, the log file is updated with changes and new information. This update ensures an enhanced prefetch on each successive boot.

Note: Turbo Boot only applies to Windows VMs using NTFS.

If you are booting a Windows server or client VM from an unpinned LUN, Riverbed recommends that you install the Riverbed Turbo Boot software on the Windows VM.

The list of supported operating systems for the Riverbed Turbo Boot software are as follows:

Windows Vista

Windows 7

Windows Server 2008

Windows Server 2008 r2

Windows Server 2012

Windows Server 2012r2

For installation information, see the SteelFusion Core Installation and Configuration Guide.


Deploying Core and Edge as a System Riverbed Turbo Boot


CHAPTER 3 Deploying the Core

This chapter describes the deployment processes specific to the Core. It includes the following sections:

“Getting Started” on page 23

“Interface and Port Configuration” on page 24

“Configuring the iSCSI Initiator” on page 30

“Configuring LUNs” on page 30

“Configuring Redundant Connectivity with MPIO” on page 32

“Core Pool Management” on page 33

Getting Started

Complete the following tasks:

1. Install and connect the Core in the data center network.

Include both Cores if you are deploying a high-availability solution. For more information on installation, see the SteelFusion Core Installation and Configuration Guide.

2. Configure the iSCSI Initiators in the Core using the iSCSI Qualified Name (IQN) format.

Fibre Channel connections to the Core-v are also supported. For more information, see “Configuring Fibre Channel LUNs” on page 31.

3. Enable and provision LUNs on the storage array.

Make sure to include registering the Core IQN and configuring any required LUN masks. For details, see “Provisioning LUNs on the Storage Array” on page 12.

4. Define the Edge identifiers so you can later establish connections between the Core and the corresponding Storage Edges.

For details, see “Managing vSphere Datastores on LUNs Presented by Core” on page 14.


Deploying the Core Interface and Port Configuration

Interface and Port Configuration

This section describes a typical port configuration. You might require additional routing configuration depending on your deployment scenario.

This section includes the following topics:

“Core Ports” on page 24

“Configuring Interface Routing” on page 26

“Configuring Core for Jumbo Frames” on page 29

Core Ports

The following table summarizes the ports that connect the Core appliance to your network. Unless noted, the port and descriptions are for all Core models: 2000, 3000, and 3500.

Port Description

Console Connects the serial cable to a terminal device. You establish a serial connection to a terminal emulation program for console access to the Setup Wizard and the Core CLI.

Primary (PRI)

Connects Core to a VLAN switch through which you can connect to the Management Console and the Core CLI. You typically use this port for communication with Storage Edges.

Auxiliary (AUX) Connects the Core to the management VLAN.

You can connect a computer directly to the appliance with a crossover cable, enabling you to access the CLI or Management Console.

eth0_0 to eth0_3

Applies to SFCR 2000 and 3000

Connects the eth0_0, eth0_1, eth0_2, and eth0_3 ports of Core to a LAN switch using a straight-through cable. You can use the ports either for iSCSI SAN connectivity or failover interfaces when you configure Core for high availability (HA) with another Core. In an HA deployment, failover interfaces are usually connected directly between Core peers using crossover cables.

If you deploy the Core between two switches, all ports must be connected with straight-through cables.

eth1_0 onwards

Applies to SFCR 2000 and 3000

Cores have four gigabit Ethernet ports (eth0_0 to eth0_3) by default. For additional connectivity, you can install optional NICs in PCIe slots within the Core. These slots are numbered 1 to 5. Supported NICs can be either 1 Gb or 10 Gb depending on connectivity requirements. The NIC ports are automatically recognized by the Core, following a reboot. The ports are identified by the system as ethX_Y where X corresponds to the PCIe slot number and Y corresponds to the port on the NIC. For example, a two-port NIC in PCIe slot 1 is displayed as having ports eth1_0 and eth1_1.

Connect the ports to LAN switches or other devices using the same principles as the other SteelFusion network ports.

For more details about installing optional NICs, see the Network and Storage Card Installation Guide. For more information about the configuration of network ports, see the SteelFusion Core Management Console User’s Guide.


Interface and Port Configuration Deploying the Core

Figure 3-1 shows a basic HA deployment indicating some of the SFCR 2000 and 3000 ports and use of straight-through or crossover cables. You can use the same deployment and interface connections for the 3500, but the interface names are different.

For more information about HA deployments, see “SteelFusion Appliance High-Availability Deployment” on page 67.

Figure 3-1. Core Ports for Core models 2000 and 3000

eth1_0 to eth1_3

Applies to SRCR 3500

Connects the eth1_0, eth1_1, eth1_2, and eth1_3 ports of Core to a LAN switch using a straight-through cable. You can use the ports either for iSCSI SAN connectivity or failover interfaces when you configure Core for high availability (HA) with another Core. In an HA deployment, failover interfaces are usually connected directly between Core peers using crossover cables.

If you deploy the Core between two switches, all ports must be connected with straight-through cables.

eth2_0 onwards

Applies to SRCR 3500

Cores have four gigabit Ethernet ports (eth1_0 to eth1_3) by default. For additional connectivity, you can install optional NICs in PCIe slots within the Core. These slots are numbered 2 to 6. Supported NICs can be either 1 Gb or 10 Gb depending on connectivity requirements. The NIC ports are automatically recognized by the Core, following a reboot. The ports are identified by the system as ethX_Y where X corresponds to the PCIe slot number and Y corresponds to the port on the NIC. For example, a two-port NIC in PCIe slot 2 is displayed as having ports eth2_0 and eth2_1.

Connect the ports to LAN switches or other devices using the same principles as the other SteelFusion network ports.

For more details about installing optional NICs, see the Network and Storage Card Installation Guide. For more information about the configuration of network ports, see the SteelFusion Core Management Console User’s Guide.

Port Description



Configuring Interface Routing

You configure interface routing by choosing Configure > Networking: Management Interfaces of the Core Management Console.

Note: If all the interfaces have different IP addresses, you do not need additional routes.

This section describes the following scenarios:

“All Interfaces Have Separate Subnet IP Addresses” on page 26

“All Interfaces Are on the Same Subnets” on page 26

“Some Interfaces, Except Primary, Share the Same Subnets” on page 27

“Some Interfaces, Including Primary, Share the Same Subnets” on page 28

All Interfaces Have Separate Subnet IP Addresses

In this scenario, you do not need additional routes.

The following table shows a sample configuration in which each interface has an IP address on a separate subnet.

All Interfaces Are on the Same Subnets

If all interfaces are in the same subnet, only the primary interface has a route added by default. You must configure routing for the additional interfaces.

The following table shows a sample configuration.

Interface Sample Configuration Description

Auxiliary 192.168.10.2/24 Management (and default) interface.

Primary 192.168.20.2/24 Interface to WAN traffic.

eth0_0 10.12.5.12/16 Interface for storage array traffic.

eth0_1 Optional, additional interface for storage array traffic.

eth0_2 192.168.30.2/24 HA failover peer interface, number 1.








To configure additional routes

1. In the Core Management Console, choose Configure > Networking: Management Interfaces.

Figure 3-2. Routing Table on the Management Interfaces Page

2. Under Main IPv4 Routing Table, use the following controls to configure routing as necessary..

3. Repeat for each interface that requires routing.

4. Click Save to save your changes permanently.

You can also perform this configuration using the ip route CLI command. For details, see the SteelFusion Command-Line Interface Reference Manual.

Some Interfaces, Except Primary, Share the Same Subnets

If a subset of interfaces, excluding primary, are in the same subnet, you must configure additional routes for those interfaces.


Control Description

Add a New Route Displays the controls for adding a new route.

Destination IPv4 Address Specify the destination IP address for the out-of-path appliance or network management device.

IPv4 Subnet Mask Specify the subnet mask. For example, 255.255.255.0.

Gateway IPv4 Address Optionally, specify the IP address for the gateway.

Interface From the drop-down list, select the interface.

Add Adds the route to the table list.












Some Interfaces, Including Primary, Share the Same Subnets

If some but not all interfaces, including primary, are in the same subnet, you must configure additional routes for those interfaces.





eth0_1 192.168.10.4/24 Additional interface for storage array traffic.

Control Description








Aux 10.10.10.2/24 Management (and default) interface.



eth0_1 192.168.10.4/24 Additional interface for storage array traffic.










Configuring Core for Jumbo Frames

If your network infrastructure supports jumbo frames, Riverbed recommends that you configure the connection between the Core and the storage system as described in this section. Depending on how you configure Core, this can mean you configure the primary interface, or one or more data interfaces.

In addition to configuring Core for jumbo frames, you must configure the storage system and any switches, routers, or other network devices between Core and the storage system.

To configure Core for jumbo frames

1. From the Core Management Console, choose Configure > Networking and open the relevant page (Management Interfaces or Data Interfaces) for the interface used by the Core to connect to the storage network. For example, eth0_0.

2. On the interface on which you want to enable jumbo frames:

– Enable the interface.

– Select the Specify IPv4 Address Manually option and enter the correct value for your implementation.

– Riverbed recommends you specify 9000 bytes for the MTU setting.

3. Click Apply to apply the settings to the current configuration.


To configure jumbo frames on you storage array, see the documentation from your storage array vendor.

Control Description








Deploying the Core Configuring the iSCSI Initiator

Configuring the iSCSI Initiator

The iSCSI Initiator settings dictate how the Core communicates with one or more storage arrays through the specified portal configuration.

iSCSI configuration includes:

Initiator name

Enabling header or data digests (optional)

Enabling CHAP authorization (optional)

Enabling MPIO and standard routing for MPIO (optional)

CHAP functionality and MPIO functionality are described separately in this document. For more information, see “Using CHAP to Secure iSCSI Connectivity” on page 128 and “Use CHAP” on page 155.

In the Core Management Console, you can view and configure the iSCSI Initiator, local interfaces for MPIO, portals, and targets by choosing Configure > Storage: iSCSI, Initiators, MPIO. For more information, see the SteelFusion Core Management Console User’s Guide.

In the Core CLI, use the following commands to access and manage iSCSI Initiator settings:

storage lun modify auth-initiator to add or remove an authorized iSCSI Initiator to or from the LUN

storage iscsi data-digest to include or exclude the data digest in the iSCSI (PDU)

storage iscsi header-digest to include or exclude the header digest in the iSCSI PDU

storage iscsi initiator to access numerous iSCSI configuration settings

Configuring LUNs


“Exposing LUNs” on page 30

“Configuring Fibre Channel LUNs” on page 31

“Resizing LUNs” on page 31

“Removing a LUN from a Core Configuration” on page 31

Before you can configure LUNs in Core, you must provision the LUNs on the storage array and configure the iSCSI Initiator. For more information, see “Provisioning LUNs on the Storage Array” on page 12 and “Configuring the iSCSI Initiator” on page 30.

Exposing LUNs

You expose LUNs by scanning for LUNs on the storage array, and then mapping them to the Storage Edges. After exposing LUNs, you can further configure them for failover, MPIO, snapshots, and pinning and prepopulation.

In the Core Management Console, you can expose and configure LUNs by choosing Configure > Manage: LUNs. For more information, see the SteelFusion Core Management Console User’s Guide.


Configuring LUNs Deploying the Core

In the Core CLI, you can expose and configure LUNs with the following commands:

storage iscsi portal host rescan-luns to discover available LUNs on the storage array

storage lun add to add a specific LUN

storage lun modify to modify an existing LUN configuration

For more information, see the SteelFusion Command-Line Interface Reference Manual.

Resizing LUNs

Granite v.2.6 introduced the LUN expansion feature. Prior to Granite v2.6, to resize a LUN you needed to unmap the LUN from a Storage Edge, remove the LUN from Core, change the size on the storage array, add it back to Core, and map it to the Storage Edge.

The LUN expansion feature generally automatically detects LUN size increases made on a data center storage array if there are active read and writes, and then propagates the change to the Storage Edge. However, if there are no active read and writes, you must perform a LUN rescan on the Core Configure > Manage: LUNs page for the Core to detect the new LUN size.

If the LUN is pinned, you need to make sure the blockstore on its Storage Edge can accommodate the new size of the LUN.

Note: If you have configured SteelFusion Replication, the new LUN size on the primary Core is updated only when the replica LUN size is the same or greater.

Configuring Fibre Channel LUNs

The process of configuring Fibre Channel LUNs for Core requires configuration in both the ESXi server and the Core.

For more information, see “SteelFusion and Fibre Channel” on page 39 and the Fibre Channel on SteelFusion Core Virtual Edition Solution Guide.

Removing a LUN from a Core Configuration

This section describes the process to remove a LUN from a Core configuration. This process requires actions on both the Core and the server running at the branch.

Note: In the following example procedure, the branch server is assumed to be a Windows server; however, similar steps are required for other types of servers.

To remove a LUN

1. At the branch where the LUN is exposed:

Power down the local Windows server.

If the Windows server runs on ESXi, you must also unmount and detach the LUN from ESXi.


Deploying the Core Configuring Redundant Connectivity with MPIO

2. At the data center, take the LUN offline in the Core configuration.

When you take a LUN offline, outstanding data is flushed to the storage array LUN and the blockstore cache is cleared. The offline procedure can take a few minutes.

Depending on the WAN bandwidth, latency, utilization, and the amount of data in the Edge blockstore that has not yet been synchronized back to the data center, this operation can take seconds to many minutes or even hours. Use the reports on the Edge to help understand just how much data is left to be written back. Until all the data is safely synchronized back to the LUN in the data center, the Core keeps the LUN in an offlining state. Only when the data is safe does the LUN status change to offline.

To take a LUN offline:

CLI - Use the storage lun modify offline command.

Management Console - Choose Configure > Manage: LUNs to open the LUNs page, select the LUN configuration in the list, and select the Details tab.

3. Remove the LUN configuration using one of the following methods:

CLI - Use the storage lun remove command.

Management Console - Choose Configure > Manage: LUNs to open the LUNs page, locate the LUN configuration in the list, and click the trash icon.

For details about CLI commands, see the SteelFusion Command-Line Interface Reference Manual. For details about using the Core Management Console, see the SteelFusion Core Management Console User’s Guide.

Configuring Redundant Connectivity with MPIO

The MPIO feature enables you to configure multiple physical I/O paths (interfaces) for redundant connectivity with the local network, storage system, and iSCSI Initiator.

Both Core and Edge offer MPIO functionality. However, these features are independent of each other and do not affect each other.

MPIO in Core

The MPIO feature enables you to connect Core to the network and to the storage system through multiple physical I/O paths. Redundant connections help prevent loss of connectivity in the event of an interface, switch, cable, or other physical failure.

You can configure MPIO at the following separate and independent points:

iSCSI Initiator - This configuration allows you to enable and configure multiple I/O paths between the Core and the storage system. Optionally, you can enable standard routing if the iSCSI portal is not in the same subnet as the MPIO interfaces.

iSCSI Target - This configuration allows you to configure multiple portals on the Storage Edge. Using these portals, an initiator can establish multiple I/O paths to the Storage Edge.

Configuring Core MPIO Interfaces

You can configure MPIO interfaces through the Core Management Console or the Core CLI.


Core Pool Management Deploying the Core

In the Core Management Console, choose Configure > Storage Array: iSCSI, Initiator, MPIO. Configure MPIO using the following controls:

Enable MPIO.

Enable standard routing for MPIO. This option is required if the backend iSCSI portal is not in the same subnet of at least two of the MPIO interfaces.

Add (or remove) local interfaces for the MPIO connections.

For details about configuring MPIO interfaces in the Core Management Console, see the SteelFusion Core Management Console User’s Guide.

In the Core CLI, open the configuration terminal mode and run the following commands:

storage iscsi session mpio enable to enable the MPIO feature.

storage iscsi session mpio standard-routes enable to enable standard routing for MPIO. This command is required if the backend iSCSI portal is not in the same subnet of at least two of the MPIO interfaces.

storage lun modify mpio path to specify a path.

These commands require additional parameters to identify the LUN. For details about configuring MPIO interfaces in the Core CLI, see the SteelFusion Command-Line Interface Reference Manual.

Core Pool Management

This section describes Core pool management. It includes the following topics:

“Overview of Core Pool Management” on page 33

“Pool Management Architecture” on page 34

“Configuring Pool Management” on page 34

“Changing Pool Management Structure” on page 37

“High Availability in Pool Management” on page 38

Overview of Core Pool Management

Core Pool Management simplifies the administration of large installations in which you need to deploy several Cores. Pool management enables you to manage storage configuration and check storage-related reports on all the Cores from a single Management Console.

Pool management is especially relevant to Core-v deployments when LUNs are provided over Fibre Channel. VMware ESX has a limitation for raw device mapping (RDM) LUNs, which limits Core-v to 60 LUNs. In releases prior to SteelFusion v3.0, to manage 300 LUNs, you needed to deploy five separate Core-vs.To ease the Core management in SteelFusion v3.0 and later, you can combine Cores into management pools.

In SteelFusion v3.0 and later, you can enable access to the SteelHead REST API framework. This access enables you to generate a REST API access code for use in SteelFusion Core pool management. You can access the REST API by choosing Configure > Pool Management: REST API Access.

For more information about pool management, see SteelFusion Core Management Console User’s Guide.


Deploying the Core Core Pool Management

Pool Management Architecture

Pool management is a two-tier architecture that allows each Core to become either manager or a member of a pool. A Core can be part of only one pool. The pool is a single-level hierarchy with a flat structure, in which all members of the pool except the manager have equal priority and cannot themselves be managers of pools. The pool has a loose membership, in which pool members are not aware of one another, except for the manager. Any Core can be the manager of the pool, but the pool manager cannot be a member of any other pool. You can have up to 32 Cores in one pool, not including the manager.

The pool is dissolved when the manager is no longer available (unless the manager has an HA peer). Management of a pool can be taken over by a failover peer. However, a member failover peer cannot be managed by the member pool manager through the member, even if the failover peer is down.

For details about HA, see “High Availability in Pool Management” on page 38.

From a performance prospective, it does not matter which Core you choose as the manager. The resources required by the pool manager have little to no differences from regular Core operations.

Figure 3-3. Core Two-Tier Pool Management

Configuring Pool Management

This section describes how to configure pool management.

These are the high-level steps:

1. “To create a pool” on page 35

2. “To generate a REST access code for a member” on page 35

3. “To add a member to the pool” on page 36

You can configure pool management only through the Management Console.



To create a pool

1. Choose the Core you want to become the pool manager.

2. From the Management Console of the pool manager, choose Configure > Pool Management: Edit Pool.

3. Specify a name for the pool in the Pool Name field.

4. Click Create Pool.

To generate a REST access code for a member

1. From the Management Console of the pool member, choose Configure > Pool Management: REST API Access.

Figure 3-4. REST API Access Page

2. Select Enable REST API Access and click Apply.

3. Select Add Access Code.

4. Specify a useful description, such as For Pool Management from <hostname>, in the Description of Use field.

5. Select Generate New Access Code and click Add.

A new code is generated.

6. Expand the new entry and copy the access code.



Continue to “To add a member to the pool” on page 36 to finish the process.

Figure 3-5. REST API Access

Note: You can revoke access of a pool manager by removing the access code or disabling REST API access on the member.

Before you begin the next procedure, you need the hostnames or the IP addresses of the Cores you want add as members.

To add a member to the pool


2. Select Add a Pool Member.

3. Add the member by specifying the hostname or the IP address of the member.

4. Paste the REST API access code that you generated in the API Access Code field on the Management Console of the pool member.



When a member is successfully added to the pool, the pool manager Pool Management page displays statistics about the members, such as health, number of LUNs, model, failover status, and so on.

Figure 3-6. Successful Pool Management Configuration

Changing Pool Management Structure

A pool manager can remove individual pool members or dissolve the whole pool. A pool member can release itself from the pool.

To remove a pool relationship for a single member or to dissolve the pool completely


2. To remove an individual pool member, click the trash can icon in the Remove column of the desired member you want to remove.

To dissolve the entire pool, click Dissolve Pool.

Riverbed recommends that you release a member from a pool from the Management Console of the manager. Use the following procedure to release a member from the pool only if the manager is either gone or cannot contact the member.

To release a member from a pool

1. From the Management Console of the pool member, choose Configure > Pool Management: Edit Pool.

You see the message This appliance is currently a part of <Pool name> pool and is being managed by <Manager-hostname>.

2. Click Release me from the Pool.



This releases the member from the pool, but you continue to see the member in the pool table on the manager.

Figure 3-7. Releasing a Pool Member from the Member Management Console

3. Manually delete the released member from the manager pool table.

High Availability in Pool Management

When you use pool management in conjunction with an HA environment, Riverbed recommends that you configure both peers as members of the same pool. If you choose one of the peers to be a pool manager, its failover peer should join the pool as a member. Without pool management, Core cannot manage its failover peer storage configuration unless failover is active (the failover peer is down). With pool management, the manager can manage the failover peer storage configuration even while the failover peer is up. The manager failover peer can manage the manager storage configuration only when the manager is down.

The following scenarios show how you can use HA in pool management:

The manager is down and its failover peer is active.

In this scenario, when the manager is down the failover peer can take over the management of a pool. The manager failover peer can manage storage configuration for the members of the pool using the same configuration as the manager.

The member is down and its failover peer is active.

When a member of a pool is down and it has a failover peer configured (and the peer is not the manager of the member), the failover peer takes over servicing the LUNs of the member. The failover peer can access the storage configuration of the member when it is down. However, the pool manager cannot access the storage configuration of the failed member. To manage storage configuration of the down member, you need to log in to the Management Console of its failover peer directly.

Note: The pool is dissolved when the manager is no longer available, unless the manager has an HA peer.

For more details about HA deployments, see “SteelFusion Appliance High-Availability Deployment” on page 67.


CHAPTER 4 SteelFusion and Fibre Channel

This chapter includes general information about Fibre Channel LUNs and how they interact with SteelFusion. It includes the following sections:

“Overview of Fibre Channel” on page 39

“Deploying Fibre Channel LUNs on Core-v Appliances” on page 45

“Configuring Fibre Channel LUNs in a Core-v HA Scenario” on page 48

“Populating Fibre Channel LUNs” on page 51

“Best Practices and Recommendations” on page 52

“Troubleshooting” on page 54

Overview of Fibre Channel

Core-v can connect to Fibre Channel LUNs at the data center and export them to the branch office as iSCSI LUNs. The iSCSI LUNs can then be mounted by VMware ESX or ESXi hypervisor running internally on VSP or on external ESX or ESXi servers or directly by Microsoft Windows virtual servers through Microsoft iSCSI Initiator. A virtual Windows file server running on VSP (Figure 4-1) can then share the mounted drive to branch office client PCs through CIFS protocol.


“Fibre Channel LUN Considerations” on page 41

“How VMware ESXi Virtualizes Fibre Channel LUNs” on page 41

“How Core-v Connects to RDM Fibre Channel LUNs” on page 43

“Requirements for Core-v and Fibre Channel SANs” on page 44

“Specifics About Fibre Channel LUNs Versus iSCSI LUNs” on page 44


SteelFusion and Fibre Channel Overview of Fibre Channel

Figure 4-1. SteelFusion Solution with Fibre Channel

Fibre Channel is the predominant storage networking technology for enterprise business. Fibre Channel connectivity is estimated to be at 78 percent versus iSCSI at 22 percent. IT administrators still rely on the known, trusted, and robust Fibre Channel technology.

Fibre Channel is a set of integrated standards developed to provide a mechanism for transporting data at the fastest rate possible with the least delay. In storage networking, Fibre Channel is used to interconnect host and application servers with storage systems. Typically, servers and storage systems communicate using the SCSI protocol. In a storage area network (SAN), the SCSI protocol is encapsulated and transported through Fibre Channel frames.

The Fibre Channel (FC) protocol processing on the host servers and the storage systems is mostly carried out in hardware. Figure 4-2 shows the various layers in the FC protocol stack and the portions implemented in hardware and software for an FC host bus adapter (HBA). FC HBA vendors are Qlogic, Emulex, and LSI.

Figure 4-2. HBA FC Protocol Stack


Overview of Fibre Channel SteelFusion and Fibre Channel

Special switches are also required to transport Fibre Channel traffic. Vendors in this market are Cisco and Brocade. Switches implement many of the FC protocol services, such as name server, domain server, zoning, and so on. Zoning is particularly important because, in collaboration with LUN masking on the storage systems, it implements storage access control by limiting access to LUNs to specific initiators and servers through specific targets and LUNs. An initiator and a target are visible to each other only if they belong to the same zone.

LUN masking is an access control mechanism implemented on the storage systems. NetApp implements LUN masking through initiator groups, which enable you to define a list of worldwide names (WWNs) that are allowed to access a specific LUN. EMC implements LUN masking using masking views that contain storage groups, initiator groups, and port groups.

LUN masking is important because Windows-based servers, for example, attempt to write volume labels to all available LUNs. This attempt can make the LUNs unusable by other operating systems and can result in data loss.

Fibre Channel LUN Considerations

Fibre Channel LUNs are distinct from iSCSI LUNs in several important ways:

No MPIO configuration - Multipathing support is performed by the ESXi system.

SCSI reservations - SCSI reservations are not taken on Fibre Channel LUNs.

Additional HA configuration required - Configuring HA for Core-v failover peers requires that each appliance be deployed on a separate ESXi system.

Maximum of 60 Fibre Channel LUNs per ESXi system - ESXi allows a maximum of 60 RDMs into a VM. Within a VM an RDM is represented by a virtual SCSI device. A VM can only have four virtual SCSI controllers with 15 virtual SCSI devices each.

How VMware ESXi Virtualizes Fibre Channel LUNs

The VMware ESXi hypervisor provides not only CPU and memory virtualization but also host-level storage virtualization, which logically abstracts the physical storage layer from virtual machines. Virtual machines do not access the physical storage or LUNs directly, but instead use virtual disks. To access virtual disks, a virtual machine uses virtual SCSI controllers.

Each virtual disk that a virtual machine can access through one of the virtual SCSI controllers resides on a VMware Virtual Machine File System (VMFS) datastore or a raw disk. From the standpoint of the virtual machine, each virtual disk appears as if it were a SCSI drive connected to a SCSI controller. Whether the actual physical disk device is being accessed through parallel SCSI, iSCSI, network, or Fibre Channel adapters on the host is transparent to the guest operating system.

Virtual Machine File System

In a simple configuration, the disks of virtual machines are stored as files on a Virtual Machine File System (VMFS). When guest operating systems issue SCSI commands to their virtual disks, the virtualization layer translates these commands to VMFS file operations.



Raw Device Mapping (RDM)

A raw device mapping (RDM) is a special file in a VMFS volume that acts as a proxy for a raw device, such as a Fibre Channel LUN. With the RDM, an entire Fibre Channel LUN can be directly allocated to a virtual machine.

Figure 4-3. ESXi Storage Virtualization


Overview of Fibre Channel SteelFusion and Fibre Channel

How Core-v Connects to RDM Fibre Channel LUNs

Core-v uses RDM to mount Fibre Channel LUNs and export them to the Storage Edge component running on the SteelHead EX at the branch office. Edge exposes those LUNs as iSCSI LUNs to the branch office clients.

Figure 4-4. Core-VM FC LUN to RDM Mapping

When Core-v interacts with an RDM Fibre Channel LUN, the following process takes place:

1. Core-v issues SCSI commands to the RDM disk.

2. The device driver in the Core-v operating system communicates with the virtual SCSI controller.

3. The virtual SCSI controller forwards the command to the ESXi virtualization layer or VMkernel.

4. The VMkernel performs the following tasks:

Locates the RDM file in the VMFS.

Maps the SCSI requests for the blocks on the RDM virtual disk to blocks on the appropriate Fibre Channel LUN.

Sends the modified I/O request from the device driver in the VMkernel to the HBA.

5. The HBA performs the following tasks:

Packages the I/O request according to the rules of the FC protocol.

Transmits the request to the storage system.

A Fibre Channel switch receives the request and forwards it to the storage system that the host wants to access.



Requirements for Core-v and Fibre Channel SANs

The following table describes the hardware and software requirements for deploying Core-v with Fibre Channel SANs.

Specifics About Fibre Channel LUNs Versus iSCSI LUNs

Using Fibre Channel LUNs on Core-v in conjunction with VMware ESX/ESXi differs from using iSCSI LUN directly on the Core in a number of ways.

Requirements Notes

SteelHead EX v3.0 or later or SteelFusion Edge v4.0 and later

Core-v with SteelFusion v2.5 or later

VMware ESX/ESXi version 4.1 or later

Storage system, HBA, and firmware combination supported in conjunction with ESX/ESXi systems

For details, see the VMware Compatibility Guide.

Reserve CPU(s) and RAM on the ESX/ESXi system Core model V1000U: 2 GB RAM, 2 CPU

Core model V1000L: 4 GB RAM, 4 CPU

Core model V1000H: 8 GB RAM, 8 CPU

Core model V1500L: 32 GB RAM, 8 CPU

Core model V1500H: 48 GB RAM, 12 CPU

Fibre Channel license on the storage system In some storage systems, Fibre Channel is a licensed feature.

Feature Fibre Channel LUNs vs. iSCSI LUNs

Multipathing The ESX/ESXi system, and not the Core, performs multipathing for the Fibre Channel LUNs.

VSS snapshots Snapshots created using the Microsoft Windows diskshadow command are not supported on Fibre Channel LUNs.

SCSI reservations SCSI reservations are not taken on Fibre Channel LUNs.

Core HA deployment Active and failover Core-vs must be deployed in a separate ESX/ESXi system.

Max 60 Fibre Channel LUNs per ESX/ESXi system

ESX/ESXi systems enable a maximum of 4 SCSI controllers. Each controller supports a maximum of 15 SCSI devices. Hence, a maximum of 60 Fibre Channel LUNs are supported per ESX/ESXi system.

VMware vMotion not supported

Core-vs cannot be moved to a different ESXi server using VMware vMotion.

VMware HA not supported

A Core-v cannot be moved to another ESXi server through VMware HA mechanism. Riverbed recommends that to ensure that the Core-v stays on the specific ESXi server by creating an affinity rule as described in this knowledge base: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1005508


http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1005508

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1005508

Deploying Fibre Channel LUNs on Core-v Appliances SteelFusion and Fibre Channel

Deploying Fibre Channel LUNs on Core-v Appliances

This section describes the process and procedures for deploying Fibre Channel LUNs on Core-v appliances. It includes the following sections:

“Deployment Prerequisites” on page 45

“Configuring Fibre Channel LUNs” on page 45

Deployment Prerequisites

Before you can deploy Fibre Channel LUNs on Core-v appliances, the following conditions must be met:

The active Core-v must be deployed and powered up on the ESX/ESXi system.

The failover Core-v must be deployed and powered up on the second ESX/ESXi system.

A Fibre Channel LUN must be available on the storage system.

Preconfigured initiator and storage groups for LUN mapping to the ESX/ESXi systems must be available.

Preconfigured zoning on the Fibre Channel switch for LUN visibility to the ESX/ESXi systems across the SAN fabric must be available.

You must have administrator access to the storage system, the ESX/ESXi system, and SteelFusion appliances.

For more information about how to set up Fibre Channel LUNs with the ESX/ESXi system, see the VMware Fibre Channel SAN Configuration Guide and the VMware vSphere ESXi vCenter Server 5.0 Storage Guide.

Configuring Fibre Channel LUNs

Perform the procedures in the following sections to configure the Fibre Channel LUNs:

1. “Discovering and configuring Fibre Channel LUNs as Core RDM disks on an ESX/ESXi system” on page 46

2. “Discovering and configuring exposed Fibre Channel LUNs though an ESX/ESXi system on the Core-v” on page 47


SteelFusion and Fibre Channel Deploying Fibre Channel LUNs on Core-v Appliances

Discovering and configuring Fibre Channel LUNs as Core RDM disks on an ESX/ESXi system

1. Navigate to the ESX system Configuration tab, click Storage Adapters, select the FC HBA, and click Rescan All to discover the Fibre Channel LUNs.

Figure 4-5. FC Disk Discovery

2. Right-click the name of the Core-v and select Edit Settings.

The virtual machine properties dialog box opens.

3. Click Add and select Hard Disk for device type.

4. Click Next and select Raw Device Mappings for type of disk to use.

Figure 4-6. Select Raw Device Mappings

5. Select the LUNs to expose to the Core-v.

If you do not see the LUN, follow the steps described in “Troubleshooting” on page 54.

6. Select the datastore on which you want store the LUN mapping.


Deploying Fibre Channel LUNs on Core-v Appliances SteelFusion and Fibre Channel

7. Select Store with Virtual Machine.

Figure 4-7. Store mappings with VM

8. For compatibility mode, select Physical.

9. For advanced options, use the default virtual device node setting.

10. Review the final options and click Finish.

The Fibre Channel LUN is now set up as an RDM and ready to be used by the Core-v.

Discovering and configuring exposed Fibre Channel LUNs though an ESX/ESXi system on the Core-v

1. From the Core Management Console, choose Configure > Manage: LUNs and select Add a LUN.

2. Select Block Disk.

3. From the drop-down menu, select Rescan for new LUNs to discover the newly added RDM LUNs (Figure 4-8).

Figure 4-8. Rescan for new block disks

4. Select the LUN Serial Number.


SteelFusion and Fibre Channel Configuring Fibre Channel LUNs in a Core-v HA Scenario

5. Select Add Block Disk LUN to add it to the Core-v. Map the LUN to the desired Storage Edge and configure the access lists of the initiators.

Figure 4-9. Add New Block Disk

Configuring Fibre Channel LUNs in a Core-v HA Scenario

This section describes how to deploy Core-vs in HA environments. It includes the following topics:

“The ESXi Servers Hosting the Core-v Appliances Are Managed by vCenter” on page 49

“The ESXi Servers Hosting the Core-vs Are Not Managed by vCenter” on page 51

Riverbed recommends that when you deploy Core-v appliances in an HA environment, you install the two appliances on separate ESX servers so that there is no single point of failure. You can deploy the Core-v appliances differently depending on whether the ESX servers hosting the Core-v appliances are managed by a vCenter or not. The methods described in this section are only relevant when Core-v appliances manage FC LUNs (also called block disk LUNs).


Configuring Fibre Channel LUNs in a Core-v HA Scenario SteelFusion and Fibre Channel

For both the deployment methods, modify the storage system Storage Group to expose the LUN to both ESXi systems. Figure 4-10 shows that LUN 0 is assigned to both worldwide names of the HBAs or the ESXi HBAs.

Figure 4-10. Core-v HA Deployment

The ESXi Servers Hosting the Core-v Appliances Are Managed by vCenter

Two Core-v appliances are deployed in HA and hosted on ESX, managed by vCenter. After adding a LUN in as RDM to Core-v1, vCenter does not present the LUN in the list of LUNs available to add as RDM to Core-v2. It is not available because the LUN filtering mechanism being turned on in vCenter by default to help prevent LUN corruption.

One way to solve the problem is by adding LUNs to the two Core-v appliances in HA, with the ESX servers in a vCenter without turning off LUN filtering by using the following procedures. You must also have a shared datastore on a SAN that the ESXi hosts can access and that can be used to store the RDM-mapping files.

Add LUNs to the first Core-v

1. In the vSphere Client inventory, select the first Core-v and select Edit Settings.

The Virtual Machine Properties dialog box opens.

2. Click Add, select Hard Disk, and click Next.

3. Select Raw Device Mappings and click Next.

4. Select the LUN to be added and click Next.

5. Select a datastore and click Next.


SteelFusion and Fibre Channel Configuring Fibre Channel LUNs in a Core-v HA Scenario

This datastore must be on a SAN because you need a single shared RDM file for each shared LUN on the SAN.

6. Select Physical as the compatibility mode and click Next. A SCSI controller is created when the virtual hard disk is created.

7. Select a new virtual device node. For example, select SCSI (1:0), and click Next.

This must be a new SCSI controller. You cannot use SCSI 0.

8. Click Finish to complete creating the disk.

9. In the Virtual Machine Properties dialog box, select the new SCSI controller and set SCSI Bus Sharing to Physical and click OK.

To Add LUNs to the second Core-v

1. In the vSphere Client inventory, select the HA Core-v and select Edit Settings.

The Virtual Machine Properties dialog box appears.

2. Click Add, select Hard Disk, and click Next.

3. Select Use an existing virtual disk and click Next.

4. In Disk File Path, browse to the location of the disk specified for the first node. Select Physical as the compatibility mode and click Next.

A SCSI controller is created when the virtual hard disk is created.

5. Select the same virtual device node you chose for the first Core-v's LUN (for example, SCSI [1:0]), and click Next.

The location of the virtual device node for this LUN must match the corresponding virtual device node for the first Core-v.

6. Click Finish.

7. In the Virtual Machine Properties dialog box, select the new SCSI controller and set SCSI Bus Sharing to Physical and click OK.

Keep in mind the following caveats:

You cannot use SCSI controller 0; so the number of RDM LUNs supported on Core-v running on ESXi 5.x reduces from 60 to 48.

You can only change the SCSI controller SCSI bus sharing setting when the Core-v is powered down; so you need to power down the Core-v each time you want to add a new controller. Each controller supports 16 disks.

vMotion is not supported with Core-v.

Another solution is to turn off LUN filtering (RDM filtering) on the vCenter. If you are not be willing to turn off RDM filtering for the entire vCenter, you cannot disable LUN filtering per data center or per LUN in vCenter.


Populating Fibre Channel LUNs SteelFusion and Fibre Channel

If you turn off LUN filtering temporarily, you can do the following:

1. Turn off RDM filtering on vCenter. The LUN filtering mechanism during RDM creation adds LUNs to both Core-vs.

2. Turn RDM filtering back on.

You must repeat these steps every time new LUNs are added to the Core-v appliances. However, VMware does not recommend turning LUN filtering off unless there are other methods to prevent LUN corruption. This method should be used with caution.

The ESXi Servers Hosting the Core-vs Are Not Managed by vCenter

When ESX is hosting the Core-v appliances in HA that are not managed by the same vCenter or not managed by vCenter at all, you can add the LUNs as RDM to both the Core-vs without any issues or special configuration requirements.

Populating Fibre Channel LUNs

This section provides the basic steps you need to populate Fibre Channel LUNs prior to deploying them into the Core.

To populate a Fibre Channel LUN

1. Create a LUN (Volume) in the storage array and allow the ESXi host where the Core is installed to access it.

2. Go to the ESXi host and chose Configuration > Advance Settings > RdmFilter and deselect RdmFilter to disable it.

You must complete this step if you intend on deploying Core in HA config.

3. Navigate to the ESX system Configuration tab, click Storage Adapters, select the FC HBA, and click Rescan All… to discover the Fibre Channel LUNs (Figure 4-5 on page 46).

4. On the ESXi server, select Storage click Add.

5. Select Disk/LUN for the storage type and click Next.

You might need to wait a few moments before the new Fibre Channel LUN appears in the list.

6. Select the Fibre Channel drive and click Next.

7. Select VMFS-5 for the file system version and click Next.

8. Click Next, enter a name for the datastore, and click Next.

9. For Capacity, use the default setting of Maximum available space and click Next.

10. Click Finish.


SteelFusion and Fibre Channel Best Practices and Recommendations

11. Copy files from an existing datastore to the datastore you just added.

12. Select the new datastore and unmount.

You need to unmount and detach the device and rescan it, and then reattach it before you can proceed.

To unmount and detach a datastore

1. To unmount the device, right-click the device in the Devices list and choose Unmount.

2. To detach the device, right-click the device in the Devices list and choose Detach.

3. Rescan the data twice.

4. Reattach the device by right-clicking the device in the Devices list and choosing Attach.

Do not rescan the device.

To add the LUN to the Core-v

1. Right-click the Core-v and select Edit Settings.

2. Click Add and select Hard Disk.

3. Click Next, and when prompted to select a disk to use, choose Raw Device Mappings.

4. Select the target LUN to use.

5. Select the datastore on which to store the LUN mapping and choose Store with Virtual Machine.

6. Select Physical for compatibility mode.

7. For advanced options, use the default setting.

8. Review the final options and click Finish.

The Fibre Channel LUN is now set up as RDM and ready to be used by the Core-v.

When the LUN is projected in the branch site and is attached to the Branch ESXi Server (VSP or other device), you are prompted to select VMFS mount options. Select Keep the existing signature.

Best Practices and Recommendations

This section describes the best practices for deploying Fibre Channel on Core-v. Riverbed recommends that you follow these suggestions because they lead to designs that are easier to configure and troubleshoot, and they get the most out of your SteelFusion appliances.


“Best Practices” on page 53

“Recommendations” on page 53


Best Practices and Recommendations SteelFusion and Fibre Channel

Best Practices

The following table shows the Riverbed best practices for deploying Fibre Channel on Core-v.

The following table shows the CPU and RAM guidelines for deployment.

Recommendations

The following table shows the Riverbed recommendations for deploying Fibre Channel on the Core-v.

Best Practice Description

Keep iSCSI and Fibre Channel LUNs on separate Cores

Do not mix iSCSI and Fibre Channel LUNs in the same Core-v.

Use ESX/ESXi version 4.1 or later

Make sure that the ESX/ESXi system is running version 4.1 or later.

Use gigabit links Make sure that you map the Core-v interfaces to gigabit links and are not shared with other traffic.

Dedicate physical NICs Use one-to-one mapping between physical and virtual NICs for the Core data interfaces.

Reserve CPU(s) and RAM Reserve CPU(s) and RAM for the Virtual Core appliance, following the guidelines listed the following table.

Model Memory Disk Space CPU Data Set Size Branches

VGC-1000-U 2 GB 25 GB 2 @ 2.2 GHz 2 TB 5

VGC-1000-L 4 GB 25 GB 4 @ 2.2 GHz 5 TB 10

VGC-1000-M 8 GB 25 GB 8 @ 2.2 GHz 10 TB 20

VGC-1500-L 32 GB 350 GB 8 @ 2.2 GHz 20 TB 30

VGC-1500-M 48 GB 350 GB 12 @ 2.2 GHz 35 TB 30

Best Practice Description

Deploy a dual-redundant FC HBA

The FC HBA connects the ESXi system to the SAN. Dual-redundant HBAs help to keep an active path always available. ESXi multipath software is used for controlling and monitoring HBA failure. In case of path or HBA failure, the workload is failed-over to the working path.

Use recommended practices for removing/deleting FC LUNs

Before deleting, offlining, or unmapping the LUNs from the storage system or removing the LUNs from the zoning configuration, remove the LUNs/block disks from the Core and unmount the LUNs from the ESXi system. ESXi might become unresponsive and sometimes might need to be rebooted if all paths to a LUN are lost.

Do not use the block disks on the Core

Fibre Channel LUNs (also known as block disks) are not supported on the physical Core.


SteelFusion and Fibre Channel Troubleshooting

Troubleshooting

This section describes common deployment issues and solutions.

If the FC LUN is not detected on the ESXi system on which the Core-v is running, try performing these debugging steps:

1. Rescan the ESXi system storage adapters.

2. Make sure that you are looking at the right HBA on ESXi system.

3. Make sure that the ESXi system has been allowed to access the FC LUN on the storage system, and check initiator and storage groups.

4. Make sure that the zoning configuration on the FC switch is correct.

5. Refer to VMware documentation and support for further assistance on troubleshooting FC connectivity issues.

If you deployed the VM on the LUN with the same ESXi or ESXi cluster to deploy the Core-v, and the datastore is still mounted, you might detect the FC LUN on the ESXi system, but the LUN does not appear on the list of the LUNs that are presented as RDM to Core-v. If this is the case, perform the following procedure to unmount the datastore from the ESXi system.

To unmount the datastore from the ESXi system

1. To unmount the FC VMFS datastore, select the Configuration tab, view Datastores, right-click a datastore, and select Unmount.

Figure 4-11. Unmounting a Datastore

2. To detach the corresponding device from ESXi, view Devices, right-click a device, and select Detach.


Troubleshooting SteelFusion and Fibre Channel

3. Rescan twice.

Figure 4-12. Rescanning a Device

4. To attach the device by viewing Devices, right-click a device, and select Attach.

5. Do not rescan but view Devices and verify that the datastore is removed from the datastore list.

6. Readd the device as RDM disk to the Core-v.

If the FC RDM LUN is not visible on the Core-v, try the following debugging procedures:

Select the Rescan for new LUNs process on the Core-v several times.

Check the Core-v logs for failures.


SteelFusion and Fibre Channel Troubleshooting


CHAPTER 5 Configuring Storage Edge

This chapter describes the process of configuring Storage Edge at the branch office. It includes the following:

“Interface and Port Configurations” on page 57

“Edge Appliances Storage Specifications” on page 61

“Configuring Disk Management” on page 61

“Configuring SteelFusion Storage” on page 64

“MPIO in Storage Edge” on page 65

Interface and Port Configurations

This section describes a typical port configuration for the Storage Edge. You might require additional routing configuration depending on your deployment scenario.


“Edge Appliances Ports” on page 58

“Moving Storage Edge to a New Location” on page 59

“Configuring Edge for Jumbo Frames” on page 60

“Configuring iSCSI Initiator Timeouts” on page 61


Configuring Storage Edge Interface and Port Configurations

Edge Appliances Ports

The following table summarizes the interfaces that connect the BlockStream-enabled SteelHead EX and the SteelFusion Edge appliance to your network. For more information about the Edge appliances, see the SteelFusion Edge Hardware Installation and Maintenance Guide and the SteelHead EX Installation and Configuration Guide.

Port Description

Console The console port connects the serial cable to a terminal device. You establish a serial connection to a terminal emulation program for console access to the Setup Wizard and the SteelHead EX CLI.

Primary (PRI) When Storage Edge is enabled in the SteelHead EX, this interface is typically used for iSCSI traffic. The iSCSI traffic is between external application servers in the branch office and the LUNs provided by the Storage Edge blockstore.

This interface is also used to connect to Core through the SteelHead EX in-path interface. If Storage Edge is not enabled on the SteelHead EX, you can use this port for the management VLAN.

Auxiliary (AUX) When Storage Edge is enabled on the SteelHead EX, use this port to connect SteelHead EX to the management VLAN.

You can connect a computer directly to the appliance with a crossover cable, enabling you to access the CLI or Management Console of the SteelHead EX.

lan0_0 The SteelHead EX uses one or more in-path interfaces to provide Ethernet network connectivity for optimized traffic. Each in-path interface is comprised of two physical ports: the LAN port and the WAN port. Use the LAN port to connect the SteelHead EX to the internal network of the branch office. When Edge is enabled on the SteelHead EX you can also use this port for a connection to the Primary port. This port enables the blockstore traffic sent between Storage Edge and Core to transmit across the WAN link.

The in-path interfaces and their corresponding LAN and WAN ports are individually identified as in-pathX_Y, lanX_Y, and wanX_Y. The numbers increment with each additional in-path interface (for example, inpath0_0, lan0_0, wan0_0, and then inpath0_1, lan0_1, wan0_1, and then inpath1_0, and so on).

wan0_0 The WAN port is the second of two ports that comprise the SteelHead in-path interface. The WAN port is used to connect the SteelHead EX toward WAN-facing devices such as a router, firewall, or other equipment located at the WAN boundary.

eth1_0 to eth1_3 These ports are available as part of an optional four-port NIC addition to the SteelHead EX.

The SteelHead EX 560 and 760 models do not support the use of additional NICs.

When configured for use by Storage Edge, the ports can provide additional iSCSI interfaces for storage traffic to external servers. These configured ports allow greater bandwidth and the ability to provide redundancy in the form of MPIO or Storage Edge clustering. The eth1_0, eth1_1, eth1_2, and eth1_3 ports of Storage Edge are connected to a LAN switch using a straight-through cable.


Interface and Port Configurations Configuring Storage Edge

Note: All the above interfaces are gigabit capable. Where it is practical, use gigabit speeds on interface ports that are used for iSCSI traffic.

Figure 5-1 shows a typical combination of ports in use by the Storage Edge that is enabled on a BlockStream-enabled SteelHead EX. Notice the external application server in the branch can use both the primary port and the eth1_0 port of the Edge for iSCSI traffic to and from NIC-A and NIC-B.

Figure 5-1. BlockStream-Enabled SteelHead EX Ports

Moving Storage Edge to a New Location

If you began your SteelFusion deployment by initially configuring and loading the Storage Edge appliance in the data center, this probably means that you have to change the IP addresses of various network ports on the Edge after you move it to its final location in the remote office.

The Storage Edge configuration includes the IP address of the Core and it initiates the connection to the Core when it is active. Because Core does not track the Storage Edge by IP address, it is safe to change the IP addresses of the network ports on the Storage Edge when you move it to its final location.

The iSCSI adapter within the VSP of the Storage Edge needs to be reconfigured with the new IP address of the Edge.

eth0_0 to eth0_1 These ports are available as standard on the SteelFusion Edge appliance.

When configured for use by SteelFusion Edge RiOS node, the ports can provide additional iSCSI interfaces for storage traffic to external servers. These configured ports allow additional bandwidth and the ability to provide redundancy in the form of MPIO or SteelFusion Edge high availability (HA). In an HA design, the ports are recommend to be used for the heartbeat between the SteelFusion Edge peers.

gbe0_0 to gbe0_3 These ports are available as standard on the SteelFusion Edge appliance.

When configured for use by SteelFusion Edge hypervisor node, the ports provide LAN connectivity to external clients. The ports are connected to a LAN switch using a straight-through cable.

Port Description


Configuring Storage Edge Interface and Port Configurations

Configuring Edge for Jumbo Frames

You can have one or more external application servers in the branch office that use the LUNs accessible from the Storage Edge iSCSI portal. If your network infrastructure supports jumbo frames, Riverbed recommends that you configure the connection between the Storage Edge and application servers as described below. If you are using VSP for hosting all your branch application servers, then you can ignore the following two procedures because the iSCSI traffic is internal to the Storage Edge.

Note: VSP VMs do not support jumbo frames.

In addition to configuring Storage Edge for jumbo frames, you must configure the external application servers and any switches, routers, or other network devices between Storage Edge and the application server for jumbo frame support.

To configure Storage Edge primary interface for jumbo frames

1. From the SteelFusion Edge or the BlockStream-enabled SteelHead EX Management Console, choose Networking > Networking: Base Interfaces.

2. In the Primary Interface box:

Select Enable Primary Interface.

Select Specify IPv4 Address Manually option, and specify the correct values for your implementation.

For the MTU setting, specify 9000 bytes.



For more details about interface settings, see the SteelHead Management Console User’s Guide and the SteelFusion Edge Management Console User’s Guide.

To configure Storage Edge Ethernet interfaces for jumbo frames

1. From the SteelFusion Edge or the BlockStream-enabled SteelHead EX Management Console, choose Networking > Networking: Data Interfaces.

2. In the Data Interface Settings box:

Select the required data interface (for example: eth1_0).

Select Enable Interface.

Select the Specify IPv4 Address Manually option and specify the correct values for your implementation.

For the MTU setting, specify 9000 bytes.




Edge Appliances Storage Specifications Configuring Storage Edge

For more details about interface settings, see the SteelHead Management Console User’s Guide and the SteelFusion Edge Management Console User’s Guide. For more information about jumbo frames, see “Configure Jumbo Frames” on page 155.

Configuring iSCSI Initiator Timeouts

The Storage Edge acts as the iSCSI portal for any internal (VSP) hosted application servers, but also for any external application servers. In the case of external servers, consider adjusting the iSCSI Initiator timeout settings on the server. This adjustment could improve the ability for the initiator to survive minor outages involving MPIO or other HA configurations. For more details and guidance, see “Microsoft iSCSI Initiator Timeouts” on page 157 and documentation provided by the iSCSI Initiator supplier.

Edge Appliances Storage Specifications

The Edge branch storage features are available only on the SteelFusion Edge appliance xx00 models, and the SteelHead EX xx60 models. You can configure how the free disk space usage on the SteelHead EX is divided between the blockstore and VSP.

For details about the possible disk space allocations between VSP and Edge storage on xx60 models and installing and configuring xx60 model series appliances, see the SteelHead EX Installation and Configuration Guide.

Configuring Disk Management

This section describes how to configure disk management. It includes the following topics:

“Disk Management on BlockStream-Enabled SteelHead EX” on page 61

“Disk Management on the SteelFusion Edge Appliance” on page 63

For more information on best practices for disk management, see “Disk Management” on page 146.

Disk Management on BlockStream-Enabled SteelHead EX

You can configure the disk layout mode to allow space for the Storage Edge blockstore by choosing Administration > System Settings: Disk Management of the SteelHead EX Management Console. You can partition disk space in the SteelHead EX in different ways depending on how you use the appliance and which license you purchase.


Configuring Storage Edge Configuring Disk Management

The Disk Management page does not allow you to allot disk space; you can only select the desired mode.

Note: You cannot change the disk layout mode unless all VSP slots are currently uninstalled. For details, see the SteelHead Management Console User’s Guide.

Note: You cannot change the disk layout when Storage Edge is already connected to a Core. You must reboot after you change the disk layout. Riverbed recommends that you change the disk layout before connecting to a Core for the first time.

For SteelFusion deployments, choose one of the following modes:

Extended VSP and Granite Storage Mode - Select this mode to reclaim disk space that was reserved for upgrading legacy RSP virtual machines to ESXi format. This mode is available in SteelHead EX v2.1.0 and later and evenly divides the disk space between VSP functionality and Granite. Use this mode for all new deployments or if the virtual machines were already converted to the new format.

Granite Storage Mode - Select this mode to use the storage delivery capability of the Storage Edge. This mode dedicates most of the disk space to the Storage Edge blockstore while still allotting the required amount for VSP functionality and WAN optimization. This mode also allows you to consolidate at the data center the operating systems and production drives of the virtual servers running on the SteelHead EX.

VSP and Granite Storage Mode - Evenly divides the available disk space between VSP and SteelFusion functionality. Some space is reserved for upgrading legacy RSP virtual machines to ESXi format.

The remaining modes (not documented here) are for non-SteelFusion deployments. For details about these other modes and disk layout configuration in general, see the SteelHead Management Console User’s Guide.


Configuring Disk Management Configuring Storage Edge

Disk Management on the SteelFusion Edge Appliance

On the SteelFusion Edge appliance, you can specify the size of the local LUN during the hypervisor installation, before the appliance is connected to the Core. During the installation, choose Direct Attached Storage. You are prompted to choose the percentage amount of the available blockstore you want to use as local storage. A single LUN is created, formatted as VMFS5, and mounted as a datastore to ESX as rvbd_vsp_datastore. Click Advanced Storage Settings to enter the exact size of the local LUN, change file system type to VMFS3, and choose a different name for the datastore.

Figure 5-2. Hypervisor Installer

To install multiple local LUNs

1. Connect Storage Edge to Core.

2. Create the LUNs on the backend storage and map them to Storage Edge.

3. Pin the LUNs and finish synchronization to Storage Edge.

4. Offline the LUNs on the Core.


Configuring Storage Edge Configuring SteelFusion Storage

5. Remove the Core.

Select Preserve local iSCSI Storage Configuration (Figure 5-3).

Figure 5-3. Removing the Core

SteelFusion v.4.0 and later can preserve a SteelFusion Edge configuration for Local LUNs, initiators, and initiator groups after unpairing from the Core. Now you can connect the SteelFusion Edge to a Core; the preserved LUNs remain as local LUNs, and the rest of the local space is used for the blockstore.

Configuring SteelFusion Storage

Complete the connection to the Core by choosing EX Features > Granite: Granite Storage on the SteelHead EX Management Console or Storage > Storage Edge Configuration on the SteelFusion Edge Management Console, and specifying the Core IP address and defining the Edge Identifier (among other settings).

You need the following information to configure Edge storage:

Hostname/IP address of the Core.

Edge Identifier, the value of which is used on the Core-side configuration for mapping LUNs to specific Edge appliances. The Core identifier is case sensitive.

If you configure failover, both appliances must use the same self-identifier. In this case, you can use a value that represents the group of appliances.

Port number of the Core. The default port is 7970.

The interface for the current Edge to use when connecting with the Core.

For details about this procedure, see the SteelHead Management Console User’s Guide and the SteelFusion Edge Management Console User’s Guide.


MPIO in Storage Edge Configuring Storage Edge

MPIO in Storage Edge

In Storage Edge, you enable multiple local interfaces through which the iSCSI Initiator can connect to the Core. Redundant connections help prevent loss of connectivity in the event of an interface, switch, cable, or other physical failure.

In the Core Management Console, choose Configure > Storage Arrays: iSCSI, Initiators, MPIO to access controls to add or remove MPIO interfaces. Once specified, the interfaces are available for the iSCSI Initiator to connect with the Edge.

For details, see the SteelFusion Edge Management Console User’s Guide, and the SteelHead Management Console User’s Guide.


Configuring Storage Edge MPIO in Storage Edge


CHAPTER 6 SteelFusion Appliance High-

Availability Deployment

This chapter describes high-availability (HA) deployments for Core and Storage Edge. It includes the following sections:

“Overview of Storage Availability” on page 67

“Core High Availability” on page 68

“Storage Edge High Availability” on page 80

“Recovering from Split-Brain Scenarios Involving Edge Appliance HA” on page 95

“Testing HA Failover Deployments” on page 95

“Configuring WAN Redundancy” on page 96

For information about setting up Core-v HA, see “Configuring Fibre Channel LUNs in a Core-v HA Scenario” on page 48.

Overview of Storage Availability

Applications of any type that read and write data to and from storage can suffer from two fundamental types of availability loss:

Loss of storage itself or access to the storage

Loss of the data residing on the storage

As with a typical storage deployment, you might consider data HA and redundancy as a mandatory requirement rather than an option. Applications accessing data are always expecting the data, and the storage that the data resides on, to be available at all times. If for some reason the storage is not available, then the application ceases to function.

Storage availability can be described as the requirement to protect against loss of access to stored data or loss of the storage in which the data resides. It is subtly different to that of data loss. In the case of data loss, whether due to accidental deletion, corruption, theft, or another event, it is a question of recovering the data from a snapshot, backup, or some other form of archive. Of course, in this case, if you can recover the lost data it implies that you previously had a process to copy data, either through snapshot, backup, replication, or another data management operation.

In general, the net effect of data loss or lack of storage availability is the same—loss of productivity. But the two types of data loss are distinct and addressed in different ways.


SteelFusion Appliance High-Availability Deployment Core High Availability

The subject of data availability in conjunction with the SteelFusion product family is documented in a number of white papers and other documents that describe the use of snapshot technology and data replication as well as backup and recovery tools.

To read the white papers, go to https://support.riverbed.com.

The following sections discuss how to make sure you have storage availability in both the Core and Storage Edge deployments.

Note: Core HA and Storage Edge HA are independent from each. You can have Core HA with no Storage Edge HA, and vice versa.

Core High Availability

This section describes HA deployments for the Core. It contains the following topics:

“Core with MPIO” on page 69

“Core HA Concepts” on page 69

“Configuring HA for Core” on page 70

You can deploy a Core as a single, stand-alone implementation. However, Riverbed strongly recommends that you always deploy the Core as pairs in an HA cluster configuration. The storage arrays and the storage area network (SAN) the Core attaches to are generally deployed in a redundant manner.

For more information about Core HA clusters, see “Core HA Concepts” on page 69. For more information about single-appliance implementation, see “Single-Appliance Deployment” on page 15.

In addition to the operational and hardware redundancy provided by the deployment of Core clusters, you can also cater to network redundancy. When connecting to a SAN using iSCSI, Cores support the use of multiple path input and output (multipath I/O or MPIO).

MPIO uses two separate network interfaces on the Core to connect to two separate iSCSI portals on the storage array. The storage array must support MPIO. Along with network redundancy, MPIO enables for scalability by load-balancing storage traffic between the Core and the storage array.

Note: MPIO is also supported on Storage Edge deployments in which the LUNs available from Storage Edge are connected to servers operating in the branch office.

For more information about MPIO with Core, see “Core with MPIO” on page 69. For information about MPIO with Edge, see “BlockStream-Enabled SteelHead EX with MPIO” on page 85 and “SteelFusion Edge with MPIO” on page 92.

For information about setting up Core-v HA, see “Configuring Fibre Channel LUNs in a Core-v HA Scenario” on page 48.

For information about setting up Core HA with FusionSync (SteelFusion Replication), see “SteelFusion Replication (FusionSync)” on page 99.



Core High Availability SteelFusion Appliance High-Availability Deployment

Core with MPIO

MPIO ensures that a failure of any single component (such as a network interface card, switch, or cable) does not result in a communication problem between the Core and the storage array.

Figure 6-1 shows an example of a basic Core deployment using MPIO. The figure is comprised of a single Core with two network interfaces connecting to the iSCSI SAN. The SAN has a simple full mesh network design enabling each Core interface to connect to each iSCSI portal on the storage array.

Figure 6-1. Basic Topology for Core MPIO

When you configure a Core for MPIO, by default it uses a round-robin policy for any read operations to the LUNs in the storage array. Write operations use a fixed-path policy, only switching to an alternative path in the event of a path or portal failure.

For more details about MPIO configuration for the Core, see the SteelFusion Core Management Console User’s Guide.

Core HA Concepts

A pair of Cores deployed in an HA-failover cluster configuration are active-active. In other words, each Core is the primary to itself and secondary to its peer. Both peers in the cluster are attached to storage in the data center. But individually they each are responsible for projecting one or more LUNs to one or more Edge devices in branch locations.

Each Core is configured separately for the LUNs and Storage Edges it is responsible for. When you enable failover on the Core, you can choose which individual LUNs are part of the HA configuration. By default in a Core HA deployment, all LUNs are automatically configured for failover. You can selectively disable failover on an individual LUN basis in the Management Console by choosing Configure > Manage: LUNs. LUNs that are not included in the HA configuration are not be available at the Storage Edge if the Core fails.

As part of the HA deployment, you configure each Core with the details of its failover peer. This deployment comprises of two IP addresses of network interfaces called failover interfaces. These interfaces are used for heartbeat and synchronization of the peer configuration. After the failover interfaces are configured, the failover peers use their heartbeat connections (failover interfaces) to share the details of their storage configuration. This information includes the LUNs they are responsible for and the Storage Edges they are projecting the LUNs to.



If either peer fails, the surviving Core can take over control of the LUNs from the failed peer and continue projecting them to the Storage Edges.

Note: Make sure that you size both failover peers correctly so that they have enough capacity to support the other Core storage in the event of a peer failure. If the surviving peer does not have enough resources (CPU and memory), then performance might degrade in a failure situation.

After a failed Core has recovered, the failback is automatic.

Configuring HA for Core

This section describes best practices and the general procedure for configuring high availability between two Cores.

Note: Core HA configuration is independent of Edge HA configuration.

This section contains the following topics:

“Cabling and Connectivity for Clustered Cores” on page 71

“Configuring Failover Peers” on page 72

“Accessing a Failover Peer from a Core” on page 76

“SCSI Reservations Between Core and Storage Arrays” on page 77

“Failover States and Sequences” on page 77

“Recovering from Failure of Both Cores in HA Configuration” on page 78

“Removing Cores from an HA Configuration” on page 78



Cabling and Connectivity for Clustered Cores

Figure 6-2 shows an example of a basic HA topology including details of the different network interfaces used.

Note: Riverbed strongly recommends that you use crossover cables for connecting ports in clustered Cores.

Figure 6-2. Basic Topology for Core HA

In the scenario shown in Figure 6-2, both Cores (Core A and Core B) connect to the storage array through their respective eth0_0 interfaces. Notice that the eth0_1 interfaces are not used in this example, but you could use them for MPIO or additional SAN connectivity. The Cores communicate between each other using the failover interfaces that are configured as eth0_2 and eth0_3. Their primary interfaces are dedicated to the traffic VLAN that carries data to and from Edge devices. The auxiliary interfaces are connected to the management VLAN and used to administer the Cores. You can to administer a Core from any of its configured interfaces assuming they are reachable. Riverbed strongly recommends that you use the AUX interface as a dedicated management interface rather than using one of the other interfaces that might be in use for storage data traffic.

When it is practical, Riverbed recommends that you use two dedicated failover interfaces for the heartbeat. Connect the interfaces through crossover cables and configure them using private IP addresses. This connection minimizes the risk of a split-brain scenario in which both Core peers consider the other to have failed. Directly connected, dedicated interfaces might not be possible for some reason. If the dedicated connections need to go through some combinations of switches and/or routers, they must use diverse paths and network equipment to avoid a single point of failure.

If you cannot configure two dedicated interfaces for the heartbeat, then an alternative is to specify the primary and auxiliary interfaces. Consider this option only if the traffic interfaces of both Core peers are connecting to the same switch or are wired so that a network failure means one of the Cores loses connection to all Edge appliances.

You can configure Cores with additional NICs to provide more network interfaces. These NICs are installed in PCIe slots within the Core. Depending on the type of NIC you install, the network ports could be 1-Gb Ethernet or 10-Gb Ethernet. In either case, you can use the ports for storage or heartbeat connectivity. The ports are identified as ethX_Y where X corresponds to the PCIe slot (from 1 to 5) and Y refers to the port on the NIC (from 0 to 3 for a four-port NIC and from 0 to 1 for a two-port NIC).

For more information about Core ports, see “Interface and Port Configuration” on page 24.



You can use these additional interfaces for iSCSI traffic or heartbeat. Use the same configuration guidance as already described above for the eth0_0 to eth0_3 ports.

Under normal circumstances the heartbeat interfaces need only be 1 Gb; therefore, it is simpler to keep with using eth0_2 and eth0_3 as already described. However, there can be a need for 10-Gb connectivity to the iSCSI SAN, in which case you can use an additional NIC with 10-Gb ports in place of eth0_0 and eth0_1. If you install the NIC in PCIe slot 1 of the Core, then the interfaces are identified as eth1_0 and eth1_1 in the Core Management Console.

When using multiple interfaces for storage connectivity in an HA deployment, Riverbed strongly recommends that all interfaces are matched in terms of their capabilities. Therefore, avoid mixing combinations of 1 Gb and 10 Gb for storage connectivity.

Configuring Failover Peers

You configure Core high availability by choosing Configure > Failover: Failover Configuration. To configure failover peers for Core, you need to provide the following information for each of the Core peers:

The IP address of the peer appliance

The local failover interface through which the peers exchange and monitor heartbeat messages

An additional IP address of the peer appliance

An additional local failover interface through which the peers exchange and monitor heartbeat messages

Figure 6-3 shows an example deployment with failover interface IP addresses. You can configure any interface as a failover interface, but to maintain some consistency Riverbed recommends that you configure and use eth0_2 and eth0_3 as dedicated failover interfaces.

Figure 6-3. Core HA Failover Interface Design



Figure 6-4 shows the Failover Configuration page for Core A in which the peer is Core B. The failover interface IP addresses are 20.20.20.22 and 30.30.30.33 through interfaces eth0_2 and eth0_3 respectively. The page shows eth0_2 and eth0_3 selected from the Local Interface drop-down list and the IP addresses of Core B interfaces are completed. Notice that from the Configuration page you can select the interface you want to use for connections from the failover peer Edge devices. This example shows that the primary interface has been chosen.

Figure 6-4. Core Failover Configuration Page



After you click Enable Failover, the Core attempts to connect through the failover interfaces sending the storage configuration to the peer. If successful, you see the Device Failover Settings as shown in Figure 6-5.

Figure 6-5. Core HA Failover Configuration Page 2



After the Core failover has been successfully configured, you can log in to the Management Console of the peer Core and view its Failover Configuration page. Figure 6-6 shows that the configuration page of the peer is automatically configured with the relevant failover interface settings from the other Core.

Figure 6-6. Core HA Peer Failover Configuration Page 3

Even though the relevant failover interfaces are automatically configured on the peer, you must configure the peer Preferred Interfaces for Edge Connections. By default, the primary interface is selected.

For more information about HA configuration settings, see the SteelFusion Core Management Console User’s Guide.

In the Core CLI, you can configure failover using the device-failover peer and device-failover peerip commands. To display the failover settings use the show device-failover command.

For more information, see the SteelFusion Command-Line Interface Reference Manual.

If the failover configuration is not successful, then details are available in the Core log files and a message is displayed in the user interface. The failure can be for any number of different reasons. Some examples, along with items to check, are as follows:

Unable to contact peer - Check the failover interface configurations (IP addresses, interface states and cables).

Peer is already configured as part of a failover pair - Check that you have selected the correct Core.

The peer configuration includes one or more LUNs that are already assigned to the other Core in the failover pair - Check the LUN assignments and correct the configuration.

The peer configuration includes one or more Edge devices that are already assigned to the other Core in the failover pair - Check the Edge assignments and correct the configuration.



After the failover configuration is complete and active, the configurations of the two peers in the cluster are periodically exchanged through a TCP connection using port 7971 on the failover interfaces. If you change or save either Core configuration, the modified configuration is sent to the failover peer. In this way, each peer always has the latest configuration details of the other.

You configure any Storage Edge that is connecting to a Core HA configuration with the primary Core details (hostname or IP). After connected to the primary Core, the Storage Edge is automatically updated with the peer Core information. This information ensures that during a Core failover situation in which a Storage Edge loses its primary Core, the secondary Core can signal the Storage Edge that it is taking over. The automatic update also minimizes the configuration activities required at the Storage Edge regardless of whether you configure Core HA or not.

Accessing a Failover Peer from a Core

When you configure a Core for failover with a peer Core, all storage configuration pages include an additional feature that enables you to access and modify settings for both the current appliance you are logged in to and its failover peer.

You can use a drop-down list below the page title to select Self (the current appliance) or Peer. The page includes the message Device Failover is enabled, along with a link to the Failover Configuration page.

Figure 6-7 shows two sample iSCSI Configuration pages: one without HA enabled and one with HA enabled, showing the drop-down list.

Figure 6-7. Failover-Enabled Feature on Storage Configuration Pages

Note: Because you can change and save the storage configuration settings for the peer in a Core HA deployment, ensure that any configuration changes are made for the correct Core.

Additionally, the Core storage report pages include a message that indicates when device failover is enabled, along with a link to the Failover Configuration page. You must log in to the peer Core to view the storage report pages for the peer.



SCSI Reservations Between Core and Storage Arrays

When you deploy two Cores as failover peers it is an active-active configuration. Each Core is primarily responsible for the LUNs that is has been configured with. As part of the HA configuration, some, if not all, of the LUNs are enabled for failover. During a failover scenario, the surviving peer takes over the LUNs of the failed peer that have been enabled for failover. To be able to take over the LUNs in a safe and secure way, the Core makes use of SCSI reservations to the back-end storage array.

SCSI reservations are similar in concept to client file-locking on a file server. The SCSI reservation is made by the initiator and provides a way to prevent other initiators from making changes to the LUN. Prior to making a reservation, the initiator must first make a Register request for the LUN. This request is in the form of a Reservation key. After the storage array acknowledges the reservation key, the reservation is made.

The Core registers requests to the storage array for each LUN it is responsible for. It then makes persistent reservations for each LUN. A persistent reservation is maintained across power failures and reboots of the initiator and target devices. It can only be cleared by the initiator releasing the reservation, or an initiator preempting the reservation.

In a Core HA deployment, each peer knows the LUNs that are enabled for failover on the other peer. Because of this, in a failover scenario, a surviving peer can send the storage array a request to read current reservations for each of the relevant LUNs. The storage array responds with the reservation keys of the failed Core. The surviving peer sends a preempt reservation request for each LUN that it needs to take control of from the failed peer. The preempt reservation request comprises of the reservation key of the failed peer and its own registration key for each LUN.

Because of the requirement to transfer persistent reservations between peer Cores during a failover or failback scenario, your storage array might need to be explicitly configured to allow this. The actual configuration steps required depend on the storage array vendor but might involve some type of setting for simultaneous access. For details, consult relevant documentation of the storage array vendor.

Failover States and Sequences

At the same time as performing their primary functions associated with projecting LUNs, each Core in an HA deployment is using its heartbeat interfaces to check if the peer is still active. By default, every three seconds, the peers check each other through a heartbeat message. The heartbeat message is sent through TCP port 7972 and contains the current state of the peer that is sending the message.

The state is one of the following:

ActiveSelf - The Core is healthy, running its own configuration and serving its LUNs as normal. It has an active heartbeat with its peer.

ActiveSolo - The Core is healthy but the peer is down. It is running its own configuration and that of the failed peer. It is serving its LUNs and also the LUNs of the failed peer.

Inactive - The Core is healthy but has just started up and cannot automatically transition to ActiveSolo or ActiveSelf. Typically this would occur if both Cores fail at the same time. To complete the transition, you must manually activate the correct Core. For more information, see “Recovering from Failure of Both Cores in HA Configuration” on page 78.

Passive - The default state when Core starts up. Depending on the status of the peer, the Core state transitions to Inactive, ActiveSolo, or ActiveSelf.

If there is no response from three consecutive heartbeats, then the secondary Core declares the primary failed and initiates a failover. Both Cores in an HA deployment are primary for their own functions and secondary for the peer. Therefore, whichever Core fails, it is the secondary that takes control of the LUNs from the failed peer.



After the failover is initiated, the following sequence of events occurs:

1. The secondary Core preempts a SCSI reservation to the storage array for all of the LUNs that the failed Core is responsible for in the HA configuration.

2. The secondary Core contacts all Edges that are being served by the failed (primary) Core.

3. The secondary Core begins serving LUNs to the Storage Edges.

The secondary Core continues to issue heartbeat messages. Failback is automatic after the failed Core comes back online and can send its own heartbeat messages again. The failback sequence is effectively a repeat of the failover sequence with the primary Core going through the three steps described above.

Recovering from Failure of Both Cores in HA Configuration

You can have a scenario in which both Cores in an HA configuration fail at the same time; for example, a major power outage. In this instance there is no opportunity for either Core to realize that its peer has failed.

When both Core devices reboot, each peer knows that itself has failed but does not have status from the other to say that it had been in an ActiveSolo state. Therefore, both Cores remain in Inactive state. This state ensures that neither Core is projecting LUNs until you manually activate the correct Core. To activate the correct Core, choose Configure > Failover: Failover Configuration and select Activate Config.

After you activate the correct Core, it transitions to ActiveSolo. Both Core appliances transition to ActiveSelf.

Removing Cores from an HA Configuration

This section describes the procedure for removing two Cores from their failover configuration.

To remove Cores from an HA configuration (basic steps)

1. Force one of the Cores into a failed state by stopping its service.

2. Disable failover on the other Core.

3. Start the service on the first Core again.

4. Disable the failover on the second Core.



You can perform these steps using either the Management Console or the CLI. Figure 6-8 shows an example configuration.

Figure 6-8. Example Configuration of Core HA Deployment

To remove the Cores from an HA deployment using the Management Console (as shown in Figure 6-8)

1. From the Management Console of Core A, choose Settings > Maintenance: Service.

2. Stop the Core service.

3. From the Management Console of Core B, choose Configure > Failover: Failover Configuration.

4. Click Disable Failover.

5. Return to the Management Console of Core A, and choose Settings > Maintenance: Service.

6. Start the Core service.

7. From the Management Console of Core A, choose Configure > Failover: Failover Configuration.

8. Click Disable Failover.

9. Click Activate Local Configuration.

Core A and Core B are no longer operating in an HA configuration.

To remove the Cores from an HA deployment using the CLI (as shown in Figure 6-8)

1. Connect to the CLI of Core A and enter the following commands to stop the Core service:

enableconfigure terminalno service enable

2. Connect to the CLI of Core B and enter the following commands to clear the local failover configuration:

enableconfigure terminaldevice-failover peer clear write memory


SteelFusion Appliance High-Availability Deployment Storage Edge High Availability

3. Return to the CLI of Core A and enter the following commands to start the Core service, clear the local failover configuration, and return to nonfailover mode:

enableconfigure terminalservice enable device-failover peer clear device-failover self-config activate.write memory

Core A and Core B are no longer operating in an HA configuration.

Storage Edge High Availability


“BlockStream-Enabled SteelHead EX High Availability” on page 80

“SteelFusion Edge Appliance High Availability” on page 88

BlockStream-Enabled SteelHead EX High Availability

This section describes high-availability (HA) deployments for BlockStream-enabled SteelHead EX. It contains the following topics:

“Using the Correct Interfaces for BlockStream-Enabled SteelHead EX Deployment” on page 81

“Choosing the Correct Cables” on page 83

“Overview of BlockStream-Enabled SteelHead EX HA” on page 84

“BlockStream-Enabled SteelHead EX with MPIO” on page 85

“BlockStream-Enabled SteelHead EX HA Using Blockstore Synchronization” on page 86

“BlockStream-Enabled SteelHead EX HA Peer Communication” on page 87

Note: This section assumes that you understand the procedures for VSP HA, Edge HA, and SteelFusion storage, and have read the relevant sections in the SteelHead Management Console User’s Guide for SteelHead EX (xx60).

Storage Edge presents itself to application servers—either located inside VSP on the BlockStream-enabled SteelHead EX (SteelHead EX) or externally to physical or virtual server platforms—in the branch as a storage portal. Depending on the model of SteelHead EX, the Storage Edge function is either a licensed option or a no-cost option.

From a Storage Edge perspective, the SteelHead EX can be deployed with all three functions enabled or in Granite-only mode, in which VSP and Edge are the only functions available on the appliance. In Granite-only mode, both Data Streamlining and Transport Streamlining are still applied to the Rdisk connections for each projected LUN. This process requires the presence of a SteelHead in the data center in which the Core is located.


Storage Edge High Availability SteelFusion Appliance High-Availability Deployment

Depending on the requirements in the branch, Storage Edge can offer both projected LUNs and local LUNs. In the case of the former, the LUNs are hosted within storage arrays in the data center and projected by a Core device across the WAN to the Edge. Both read and write operations are serviced by the Edge, and any written data is asynchronously sent back across the WAN link to the data center. Local LUNs are hosted internally by the Edge, and any read and write operations are performed only within the storage of the Edge itself. Whether the LUNs are pinned, unpinned, or local, they are all occupying disk capacity in the blockstore of the Edge.

The following sections describe configuring SteelHead EX in HA with iSCSI access to the blockstore by the application servers and the contents of the blockstore (the LUNs).

Using the Correct Interfaces for BlockStream-Enabled SteelHead EX Deployment

This section reviews the network interfaces on SteelHead EXs and how you can configure them for Storage Edge. For more information about Edge ports, see “Edge Appliances Ports” on page 58.

By default, all SteelHead EXs are equipped with the following physical interfaces:

Primary

Auxiliary

lan0_0

wan0_0

lan0_1

wan0_1

Traditionally, the LAN and WAN interface pairs are used by the SteelHead as an in-path interface for WAN optimization. The primary and auxiliary interfaces are generally used for management and other services, like RiOS data store synchronization between SteelHead pairs.

A SteelHead EX configured with Storage Edge can use these interfaces in different ways. For details about port usage for both Edge and VSP, see the SteelHead Management Console User’s Guide for SteelHead EX (xx60).

While there are many combinations of port usage, you can generally expect that iSCSI traffic to and from external servers in the branch uses the primary interface. Likewise, the Rdisk traffic to and from the Core uses the primary interface by default and is routed through the SteelHead inpath0_0 interface. The Rdisk traffic gains some benefit from WAN optimization. Management traffic for the SteelHead and Storage Edge typically uses the auxiliary interface.



Figure 6-9 shows a basic configuration example for SteelHead EX deployment. The Storage Edge traffic flows for Rdisk and iSCSI traffic are shown.

Figure 6-9. Basic Interface Configuration for SteelHead EX with External Servers

Figure 6-10 shows no visible sign of iSCSI traffic because the servers that are using the LUNs projected from the data center are hosted within the VSP resident on the SteelHead EX. Therefore, all iSCSI traffic is internal to the appliance. If there is no other need for the SteelHead or Storage Edge functions to be connected for general branch office WAN optimization purposes (in the case of a Granite-only deployment), then the primary interface can be connected directly to the lan0_0 interface using a crossover cable, enabling the Rdisk traffic to flow in and out of the primary interface. In this case, management of the appliance is performed through the auxiliary interface.

Figure 6-10. Basic Interface Configuration for SteelHead EX with Servers Hosted in VSP

Figure 6-11 shows a minimal interface configuration. The iSCSI traffic is internal to the appliance in which the servers are hosted within VSP. Because you can configure Storage Edge to use the SteelHead in-path interface for Rdisk traffic, this makes for a very simple and nondisruptive deployment. The primary interface is still connected and can be used for management.



Riverbed does not recommend this type of deployment for permanent production use, but it can be suitable for a proof of concept in lieu of a complicated design.

Figure 6-11. Alternative Interface Configuration for Edge with Servers Hosted in VSP

Riverbed recommends that you make full use of all the connectivity options available in the SteelHead EX for production deployments of Storage Edge. Careful planning can ensure that important traffic, such as iSCSI traffic to external servers, Rdisk to and from the Core, and blockstore synchronization for high availability, are kept apart from each other. This separation helps with ease of deployment, creates a more defined management framework, and simplifies any potential troubleshooting activity.

Depending on the model, SteelHead EX can be shipped or configured in the field with one or more additional four-port network interfaces cards (NICs). By default, when the additional NIC is installed the SteelHead recognizes it as a four-port bypass NIC that you can use for WAN optimization. You must reconfigure the NIC if you want Edge to use it for iSCSI traffic and high availability, using the hardware nic slot command.

The command requires the number of the slot where the NIC is located and the mode of operation. For example:

amnesiac (config) # hardware nic slot 1 mode data

This command configures the four-port NIC in slot one of the SteelHead EX into data mode so that it can be used exclusively by the Storage Edge.

For more details about this command, consult the latest version of the Riverbed Command-Line Interface Reference Manual. For additional details about four-port network interface cards, see “Edge Appliance Network Reference Architecture” on page 165.

Storage Edge requires an additional four-port NIC in Edge HA deployments. If you do not install an additional NIC, the primary and auxiliary interfaces easily become a bottleneck.

Choosing the Correct Cables

The LAN and WAN ports on the SteelHead bypass cards act like host interfaces during normal operation. During fail-to-wire mode, the LAN and WAN ports act as the ends of a crossover cable. Using the correct cable to connect these ports to other network equipment ensures proper operation during fail-to-wire mode and normal operating conditions. Correct cabling is especially important when you are configuring two SteelHead EXs in a serial in-path deployment for HA.

Riverbed recommends that you do not rely on automatic MDI/MDI-X to automatically sense the cable type. The installation might be successful when the SteelHead is optimizing traffic, but it might not be successful if the in-path bypass card transitions to fail-to-wire mode.



One way to help ensure that you use the correct cables during an installation is to connect the LAN and WAN interfaces of the SteelHead while the SteelHead is powered off. This proves that the devices either side of the SteelHead can communicate correctly without any errors or other problems.

In the most common in-path configuration, a SteelHead LAN port is connected to a switch and the SteelHead WAN port is connected to a router. In this configuration, a straight-through Ethernet cable can connect the SteelHead LAN to the switch, and you must use a crossover cable to connect the SteelHead WAN port to the router.

When you configure Storage Edge in HA, it is likely that you have one or more additional NICs installed into the SteelHead EX to provide extra interfaces. You can use the interfaces for MPIO and blockstore synchronization. In this scenario, configure the NIC for data mode. For details about configuring the NIC, see “Using the Correct Interfaces for BlockStream-Enabled SteelHead EX Deployment” on page 81.

When you configure a NIC for data mode, the individual interfaces do not behave in the same way as LAN and WAN ports described previously. There is no fail-to-wire capability, but instead, each interface (data port) behaves like any standard network interface port and you can choose cables accordingly.

This table summarizes the correct cable usage in the SteelHead when you are connecting LAN and WAN ports or when you are connecting data ports.

Overview of BlockStream-Enabled SteelHead EX HA

This section describes HA features, design, and deployment of Storage Edge on SteelHead EX. You can assign the LUNs provided by Storage Edge (which are projected from the Core in the data center) in a variety of ways. Whether used as a datastore for VMware ESXi in the VSP of the SteelHead EX, or for other hypervisors and discrete servers hosted externally in the branch office, the LUNs are always served from the Edge using the iSCSI protocol.

Because of this, you can achieve HA with Edge by using one or both of the following two options:

“BlockStream-Enabled SteelHead EX with MPIO” on page 85

“BlockStream-Enabled SteelHead EX HA Using Blockstore Synchronization” on page 86

Both of these options are independent of any HA Core configuration in the data center that is projecting one or more LUNs to the Storage Edge.

However, because of different SteelHead EX and Storage Edge deployment options and configurations, there are several scenarios for HA.

For example, you can consider hardware redundancy consisting of multiple power supplies or RAID inside the SteelHead EX a form of HA, or both. For more information, see the product specification documents.

Alternatively, when you deploy two SteelHead EXs in the branch, you can configure the VSP on both devices to provide an active-passive capability for any VMs that can be hosted on VSP. In this context, HA is purely from the point of view of the VMs themselves, and there is a separate SteelHead EX providing a failover instance of the VSP configuration.

For more details about how to configure Storage Edge HA and VSP HA, see the SteelHead Management Console User’s Guide for SteelHead EX (Series xx60).

Devices Cable

SteelHead to SteelHead Crossover

SteelHead to router Crossover

SteelHead to switch Straight-through

SteelHead to host Crossover



BlockStream-Enabled SteelHead EX with MPIO

In a similar way to how you use Core and data center storage arrays, you can use Edge with MPIO at the branch. Using Edge with MPIO ensures that a failure of any single component (such as a network interface card, switch, or cable) does not result in a communication problem between Edge and the iSCSI Initiator in the host device at the branch.

Figure 6-12 shows a basic MPIO architecture for the Edge. In this example, the primary and eth1_0 interfaces of the Edge are configured as the iSCSI portals and the server interfaces (NIC-A and NIC-B) are configured as iSCSI Initiators. Combined with the two switches in the storage network, this basic configuration allows for failure of any of the components in the data path while continuing to enable the server to access the iSCSI LUNs presented by the Storage Edge.

Figure 6-12. Basic Topology for the Edge MPIO

While you can use other interfaces on the SteelHead EX as part of an MPIO configuration, Riverbed recommends that you use the primary interface and one other interface that you are not using for another purpose. The SteelHead EX can have an additional four-port NIC installed to provide extra interfaces. The NIC is especially useful in HA deployments. The eth1_0 interface in this example is provided by the add-on four-port NIC.

For more information about four-port NICs, see “Using the Correct Interfaces for BlockStream-Enabled SteelHead EX Deployment” on page 81.

When using MPIO with SteelHead EX, Riverbed recommends that you verify and adjust certain timeout variables of the iSCSI Initiator in the server to make sure that you have correct failover behavior.

By default, the Microsoft iSCSI Initiator LinkDownTime timeout value is set to 15 seconds. This timeout value determines how much time the initiator holds a request before reporting an iSCSI connection error.

If you are using Storage Edge in an HA configuration, and MPIO is configured in the Microsoft iSCSI Initiator of the branch server, change the LinkDownTime timeout value to 60 seconds to allow the failover to finish.

Note: When you view the iSCSI MPIO configuration from the ESXi vSphere management interface, even though both iSCSI portals are configured and available, only iSCSI connections to the active Storage Edge are displayed.

For more details about the specific configuration of MPIO, see the SteelHead Management Console User’s Guide for SteelHead EX (Series xx60).



BlockStream-Enabled SteelHead EX HA Using Blockstore Synchronization

While MPIO can cater to HA requirements involving network redundancy in the branch office, it still relies on the Storage Edge itself being available to serve LUNs. To survive a failure of the Storage Edge without downtime, you must deploy a second appliance. If you configure two SteelHead EX appliances as an HA pair, the Storage Edge can continue serving LUNs without disruption to the servers in the branch, even if one of the SteelHead EX devices fails. The serving of LUNs in a SteelHead EX HA deployment can be used by the VSP and by external servers within the branch office.

The scenario described in this section has two SteelHead EXs operating in an active-standby role. This scenario is irrespective of whether the Core is configured for HA in the data center.

The active SteelHead EX is connected to the Core in the data center and is responding to the read and write requests for the LUNs it is serving in the branch. This process is effectively the same method of operation as with a single Storage Edge; however, there are some additional pieces that make up the complete picture for HA.

The standby SteelHead EX does not service any of the read and write requests but is ready to take over from the active peer.

As the server writes new data to LUNs through the blockstore of the active SteelHead EX, the data is replicated synchronously to the standby peer blockstore. When the standby peer has acknowledged to the active peer that it has written the data to its own blockstore, the active peer then acknowledges the server. In this way, the blockstores of the two Storage Edges are kept in lock step.

Figure 6-13 illustrates a basic HA configuration for Storage Edge on SteelHead EX. While this is a very simplistic deployment diagram, it highlights the importance of the best practice to use two dedicated interfaces between the Storage Edge peers to keep their blockstores synchronized. With an additional four-port NIC installed in the SteelHead EXs, you can configure the Storage Edges to use eth1_2 and eth1_3 as their interfaces for synchronization and failover status. Using dedicated interfaces through crossover cables ensures that a split-brain scenario (in which both peer devices think the other has failed and start serving independently) is minimized. If dedicated, directly connected interfaces are not possible for some reason then connections that go through some combinations of switches and/or routers must use diverse paths and network equipment to avoid a single point of failure.

For more information about split brain, see “Recovering from Split-Brain Scenarios Involving Edge Appliance HA” on page 95.

Figure 6-13. Basic Topology for SteelHead EX High Availability



BlockStream-Enabled SteelHead EX HA Peer Communication

When you configure two SteelHead EXs as active-standby peers for HA, they communicate with each other at regular intervals. The communication is required to ensure that the peers have their blockstores synchronized and that they are operating correctly based on their status (active or standby).

The blockstore synchronization happens through two network interfaces that you configure for this purpose on the SteelHead EX. Ideally, these are dedicated interfaces, preferably connected through crossover cables. Although not the preferred method, you can send blockstore synchronization traffic through other interfaces that are already being used for another purpose. If interfaces must be shared, then avoid using the same interfaces for both iSCSI traffic and blockstore synchronization traffic. These two types of traffic are likely to be quite intensive. Instead, use an interface that is more lightly loaded: for example, management traffic.

The interfaces used for the actual blockstore synchronization traffic are also used by each peer to check the status of one another through the heartbeat messages. The heartbeat messages provide each peer with the status of the other peer and can include peer configuration details.

A heartbeat message is sent by default every three seconds through TCP port 7972. If the peer fails to receive three successive heartbeat messages, then a failover event can be triggered. Because heartbeat messages are sent in both directions between Storage Edge peers, there is a worst-case scenario in which failover can take up to 18 (3 x 3 x 2) seconds.

Failovers can also occur due to administrative intervention: for example, rebooting or powering off a SteelHead EX.

The blockstore synchronization traffic is sent between the peers using TCP port 7973. By default, the traffic uses the first of the two interfaces you configure. If the interface is not responding for some reason, the second interface is automatically used.

If neither interface is operational, then the Edge peers enter into some predetermined failover state based on the failure conditions.

The failover state on an Edge peer can be one of the following:

Discover - Attempting to establish contact with the other peer.

Active sync - Actively serving client requests; the standby peer is in sync with the current state of the system.

Standby sync - Passively accepting updates from the active peer; in sync with the current state of the system.

Active Degraded - Actively serving client requests; cannot contact the standby peer.

Active Rebuild - Actively serving client requests; sending the standby peer updates that were missed during an outage.

Standby Rebuild - Passively accepting updates from the active peer; not yet in sync with the state of the system.

Note: In the scenario in which a failed Storage Edge is recovering or replaced such that the entire blockstore contents needs to be resynchronized from the active Storage Edge, the time for resynchronization is not fixed. The time varies depending on how much data there is, how much data is changing during resynchronization, and how busy the active Storage Edge is with other tasks. For a guideline, assume that the resynchronization proceeds at the rate of 5 to 6 hours per terabyte (TB) of blockstore data.



For detailed information about how to configure two SteelHead EXs as active-standby failover peers, the various failover states that each peer can assume while in an HA deployment, and the procedure required to remove an active-standby pair from that state, see the SteelHead Management Console User’s Guide for SteelHead EX (Series xx60).

SteelFusion Edge Appliance High Availability

The SteelFusion Edge appliance presents itself as an iSCSI target for storage to application servers in one or both of the following modes:

Storage is resident in the RiOS node and is accessed through iSCSI internally to the appliance by the hypervisor node. In this mode, the hypervisor running in the hypervisor node is acting as the iSCSI initiator.

Storage is accessed through an iSCSI initiator on a separate server or hypervisor host that is external to the SteelFusion Edge.

In either deployment mode, in the unlikely event of failure or a scheduled loss of service due to planned maintenance, you might need an alternative way to access the storage and ensure continued availability of services in the branch. Deploying two SteelFusion Edges in a high availability (HA) configuration enables this access.

This section describes HA deployments for the SteelFusion Edge appliance. It contains the following topics:

“Using the Correct Interfaces for SteelFusion Edge Deployment” on page 88

“Choosing the Correct Cables” on page 91

“Overview of SteelFusion Edge HA” on page 91

“SteelFusion Edge with MPIO” on page 92

“SteelFusion Edge HA Using Blockstore Synchronization” on page 93

“SteelFusion Edge HA Peer Communication” on page 94

Note: This guide requires you to be familiar with the SteelFusion Edge Management Console User’s Guide.

Using the Correct Interfaces for SteelFusion Edge Deployment

This section reviews the network interfaces on SteelFusion Edge and how you can configure them. For more information about SteelFusion Edge network interface ports, see “Edge Appliances Ports” on page 58.

By default, all SteelFusion Edge appliances are equipped with the following physical interfaces:

Primary, Auxiliary, eth0_0, eth0_1, lan1_0, wan1_0, lan1_1, wan1_1 - These interfaces are interfaces are owned and used by the RiOS node in SteelFusion Edge.

gbe0_0, gbe0_1, gbe0_2, gbe0_3 - These interfaces are owned and used by the hypervisor node in SteelFusion Edge.

Traditionally in a SteelFusion Edge appliance, the LAN and WAN interface pairs are used by the RiOS node as an in-path interface for WAN optimization. The primary and auxiliary interfaces are generally used for management and other services.



In a SteelFusion Edge HA deployment, the eth0_0 and eth0_1 interfaces are used for the heartbeat interconnect between the two SteelFusion Edge HA peers. If there is only a single SteelFusion Edge deployed in the branch, then you can use eth0_0 and eth0_1 as data interfaces for iSCSI traffic to and from servers in the branch that are external to the SteelFusion Edge.

While there are many additional combinations of port usage, you can generally expect that iSCSI traffic to and from external servers in the branch use the primary interface. Likewise, the Rdisk traffic to and from the Core uses the primary interface by default and is routed through the SteelFusion Edge in-path interface. The Rdisk traffic gains some benefit from WAN optimization. Management traffic for the SteelFusion Edge typically uses the auxiliary interface.

For the hypervisor node, you can use the gbe0_0 to gbe0_3 interfaces for general purpose LAN connectivity within the branch location. These interfaces enable clients to access services running in virtual machines installed on the hypervisor host.

Figure 6-14 shows a basic configuration example for SteelFusion Edge deployment. The SteelFusion Edge traffic flows for Rdisk and iSCSI traffic are shown.

Figure 6-14. Basic Interface Configuration for SteelFusion Edge with External Servers



Figure 6-15 shows no visible sign of iSCSI traffic because the servers that are using the LUNs projected from the data center are hosted within the hypervisor node resident on the SteelFusion Edge. Therefore, all iSCSI traffic is internal to the appliance. If a SteelFusion Edge deployment has no WAN optimization requirement, then you can connect the primary interface directly to the lan0_1 interface using a crossover cable, enabling the Rdisk traffic to flow in and out of the primary interface. In this case, management of the appliance is performed through the auxiliary interface. Clients in the branch access the servers in the hypervisor node by accessing through the network interfaces gbe0_0 to gbe0_3 (not shown in Figure 6-15).

Figure 6-15. Basic Interface Configuration for SteelFusion Edge with Servers Hosted in Hypervisor Node

Figure 6-16 shows a minimal interface configuration. The iSCSI traffic is internal to the appliance in which the servers are hosted within the hypervisor node. Because you can configure SteelFusion Edge to use the in-path interface for Rdisk traffic, this makes for a very simple and nondisruptive deployment. The primary interface is still connected and can be used for management. Client access to the servers in the hypervisor node is through the gbe0_0 to gbe0_3 network interfaces (Figure 6-16).

Riverbed does not recommend this type of deployment for permanent production use, but it can be suitable for a proof of concept in lieu of a complicated design.

Figure 6-16. Alternative Interface Configuration for SteelFusion Edge with Servers Hosted in Hypervisor Node

Riverbed recommends that you make full use of all the connectivity options available in the appliance for production deployments of SteelFusion Edge. Careful planning can ensure that important traffic, such as iSCSI traffic to external servers, Rdisk to and from the Core, and blockstore synchronization for high availability, are kept apart from each other. This separation helps with ease of deployment, creates a more defined management framework, and simplifies any potential troubleshooting activity.

Depending on the model, SteelFusion Edge can be shipped or configured in the field with one or more additional multiple-port network interface cards (NICs). There are a selection of both copper, and optical 1 GbE and 10GbE NICs that fall into one of two categories. The two categories are bypass cards suitable for use as in-path interfaces for WAN optimization or data cards suitable for LAN connectivity.

In the case of LAN connectivity, the data cards might be for any of the following examples:



iSCSI traffic to and from servers in the branch that are external to SteelFusion Edge.

Application traffic from clients in the branch connecting to application servers hosted in the SteelFusion Edge hypervisor node.

Additional LAN connectivity for redundancy purposes (for example, MPIO, SteelFusion Edge HA, and so on).

Unlike the NICs available for SteelHead EX, you cannot change the mode of the NIC from data to in-path or vice versa.

For a complete list of available NICs, their part numbers and installation details, see SteelFusion Edge Hardware Installation and Maintenance Guide.

Choosing the Correct Cables

The LAN and WAN ports on the SteelFusion Edge bypass cards act like host interfaces during normal operation. During fail-to-wire mode, the LAN and WAN ports act as the ends of a crossover cable. Using the correct cable to connect these ports to other network equipment ensures proper operation during fail-to-wire mode and normal operating conditions. This cabling is especially important when you are configuring two SteelFusion Edge appliances in a serial in-path deployment for HA.

Riverbed recommends that you do not rely on automatic MDI/MDI-X to automatically sense the cable type. The installation might be successful when the SteelFusion Edge is optimizing traffic, but it might not be successful if the in-path bypass card transitions to fail-to-wire mode.

One way to help ensure that you use the correct cables during an installation is to connect the LAN and WAN interfaces of the bypass card while the SteelFusion Edge is powered off. This proves that the devices either side of the appliance can communicate correctly without any errors or other problems.

In the most common in-path configuration, a SteelFusion Edge LAN port is connected to a switch and the SteelFusion Edge WAN port is connected to a router. In this configuration, a straight-through Ethernet cable can connect the LAN port to the switch, and you must use a crossover cable to connect the WAN port to the router.

When you configure SteelFusion Edge in HA, it is likely that you have one or more additional data NICs installed into the appliance to provide extra interfaces. You can use the interfaces for MPIO and blockstore synchronization.

This table summarizes the correct cable usage in the SteelFusion Edge when you are connecting LAN and WAN ports or when you are connecting data ports.

Overview of SteelFusion Edge HA

This section describes HA features, design, and deployment of SteelFusion Edge. You can assign the LUNs provided by SteelFusion Edge (which are projected from the Core in the data center) in a variety of ways. Whether used as a datastore for VMware ESXi in the hypervisor node, or for other hypervisors and discrete servers hosted externally in the branch office, the LUNs are always served from the SteelFusion Edge using the iSCSI protocol.

Devices Cable

SteelFusion Edge to SteelFusion Edge Crossover

SteelFusion Edge to router Straight-through

SteelFusion Edge to switch Crossover

SteelFusion Edge to host Crossover



Because of this, you can achieve HA with SteelFusion Edge by using one or both of the following two options:

“SteelFusion Edge with MPIO” on page 92

“SteelFusion Edge HA Using Blockstore Synchronization” on page 93

These options are independent of any HA Core configuration in the data center that is projecting one or more LUNs to the SteelFusion Edge. However, because of different SteelFusion Edge deployment options and configurations, there are several scenarios for HA. For example, you can consider hardware redundancy consisting of multiple power supplies or RAID inside the SteelFusion Edge appliance as a form of HA. For more information about hardware, see the product specification documents.

Alternatively, when you deploy two SteelFusion Edge appliances in the branch, you can configure the VSP on both devices to provide an active-passive capability for any VMs that are hosted on the hypervisor node. In this context, HA is purely from the point of view of the VMs themselves, and there is a separate SteelFusion Edge providing a failover instance of the hypervisor node.

For more details about how to configure SteelFusion Edge HA, see the SteelFusion Edge Management Console User’s Guide.

SteelFusion Edge with MPIO

In a similar way to how you use MPIO with Core and data center storage arrays, you can use SteelFusion Edge with MPIO at the branch. Using SteelFusion Edge with MPIO ensures that a failure of any single component (such as a network interface card, switch, or cable) does not result in a communication problem between SteelFusion Edge and the iSCSI Initiator in the host device at the branch.

Figure 6-17 shows a basic MPIO architecture for SteelFusion Edge. In this example, the primary and eth2_0 interfaces of the SteelFusion Edge are configured as the iSCSI portals, and the server interfaces (NIC-A and NIC-B) are configured as iSCSI Initiators. Combined with the two switches in the storage network, this basic configuration allows for failure of any of the components in the data path while continuing to enable the server to access the iSCSI LUNs presented by the SteelFusion Edge.

Figure 6-17. Basic Topology for SteelFusion Edge MPIO

While you can use other interfaces on the SteelFusion Edge as part of an MPIO configuration, Riverbed recommends that you use the primary interface and one other interface that you are not using for another purpose. The SteelFusion Edge can have an additional multiport NIC installed to provide extra interfaces. This additional card is especially useful in HA deployments. The eth2_0 interface in this example is provided by an optional add-on four-port NIC that has been installed in slot 2 of the appliance.

For more information about multiport NICs, see “Edge Appliances Ports” on page 58.



When using MPIO with SteelFusion Edge, Riverbed recommends that you verify and adjust certain timeout variables of the iSCSI Initiator in the server to make sure that you have correct failover behavior.

By default, the Microsoft iSCSI Initiator LinkDownTime timeout value is 15 seconds. This timeout value determines how much time the initiator holds a request before reporting an iSCSI connection error.

If you are using SteelFusion Edge in an HA configuration, and MPIO is configured in the Microsoft iSCSI Initiator of the branch server, change the LinkDownTime timeout value to 60 seconds to allow the failover to finish.

Note: When you view the iSCSI MPIO configuration from the ESXi vSphere management interface, even though both iSCSI portals are configured and available, only iSCSI connections to the active SteelFusion Edge are displayed.

For more details about the specific configuration of MPIO, see the SteelFusion Edge Management Console User’s Guide.

SteelFusion Edge HA Using Blockstore Synchronization

While MPIO can cater to HA requirements involving network redundancy in the branch office, it still relies on the SteelFusion Edge itself being available to serve LUNs. To survive a failure of the SteelFusion Edge without downtime, you must deploy a second appliance. If you configure two appliances as an HA pair, the SteelFusion Edge can continue serving LUNs without disruption to the servers in the branch, even if one of the SteelFusion Edge devices were to fail. The serving of LUNs in a SteelFusion Edge HA deployment can be used by the VSP of the second SteelFusion Edge and by external servers within the branch office.

The scenario described in this section has two SteelFusion Edges operating in an active-standby role. This scenario is irrespective of whether the Core is configured for HA in the data center.

The active SteelFusion Edge is connected to the Core in the data center and is responding to the read and write requests for the LUNs it is serving in the branch. This method of operation is effectively the same as with a single SteelFusion Edge; however, there are some additional pieces that make up a complete HA deployment.

Note: If you plan to configure two SteelFusion Edge appliances into an HA configuration at a branch, Riverbed strongly recommends you do this at the time of installation. Adding a second SteelFusion Edge to form an HA pair at a later date is possible but is likely to result in disruption to the branch services while the reconfiguration is performed.

The standby SteelFusion Edge does not service any of the read and write requests but is ready to take over from the active peer.

As the server writes new data to LUNs through the blockstore of the active SteelFusion Edge, the data is reflected synchronously to the standby peer blockstore. When the standby peer has acknowledged to the active peer that it has written the data to its own blockstore, the active peer then acknowledges the server. In this way, the blockstores of the two SteelFusion Edges are kept in lock step.

Figure 6-18 illustrates a basic HA configuration for SteelFusion Edge. While this is a very simplistic deployment diagram, it highlights the importance of the best practice to use two dedicated interfaces between the SteelFusion Edge peers to keep their blockstores synchronized. Riverbed strongly recommends you configure the SteelFusion Edges to use eth0_0 and eth0_1 as their interfaces for synchronization and failover status. Using dedicated interfaces through crossover cables ensures that a split-brain scenario (in which both peer devices think the other has failed and start serving independently) is minimized.



For more information about split-brain scenario, see “Recovering from Split-Brain Scenarios Involving Edge Appliance HA” on page 95.

Figure 6-18. Basic Topology for SteelFusion Edge High Availability

SteelFusion Edge HA Peer Communication

When you configure two SteelFusion Edges as active-standby peers for HA, they communicate with each other at regular intervals. The communication is required to ensure that the peers have their blockstores synchronized and that they are operating correctly based on their status (active or standby).

The blockstore synchronization happens through two network interfaces that you configure for this purpose on the SteelFusion Edge. Ideally, these are dedicated interfaces, preferably connected through crossover cables. Although not the preferred method, you can send blockstore synchronization traffic through other interfaces that are already being used for another purpose. If interfaces must be shared, then avoid using the same interfaces for both iSCSI traffic and blockstore synchronization traffic. These two types of traffic are likely to be quite intensive. Instead, use an interface that is more lightly loaded: for example, management traffic.

The interfaces used for the actual blockstore synchronization traffic are also used by each peer to check the status of one another through the heartbeat messages. The heartbeat messages provide each peer with the status of the other peer and can include peer configuration details.

A heartbeat message is sent by default every three seconds through TCP port 7972. If the peer fails to receive three successive heartbeat messages, then a failover event can be triggered. Because heartbeat messages are sent in both directions between SteelFusion Edge peers, there is a worst-case scenario in which failover can take up to 18 (3 x 3 x 2) seconds.

Failovers can also occur due to administrative intervention: for example, rebooting or powering off a SteelFusion Edge.

The blockstore synchronization traffic is sent between the peers using TCP port 7973. By default, the traffic uses the first of the two interfaces you configure. If the interface is not responding for some reason, the second interface is automatically used.

If neither interface is operational, then the SteelFusion Edge peers enter into some predetermined failover state based on the failure conditions.

The failover state on a SteelFusion Edge peer can be one of the following:

Discover - Attempting to establish contact with the other peer.

Active Sync - Actively serving client requests; the standby peer is in sync with the current state of the system.


Recovering from Split-Brain Scenarios Involving Edge Appliance HA SteelFusion Appliance High-Availability Deployment

Standby Sync - Passively accepting updates from the active peer; in sync with the current state of the system.

Active Degraded - Actively serving client requests; cannot contact the standby peer.

Active Rebuild - Actively serving client requests; sending the standby peer updates that were missed during an outage.

Standby Rebuild - Passively accepting updates from the active peer; not yet in sync with the state of the system.

For detailed information about how to configure two SteelFusion Edges as active-standby failover peers, the various failover states that each peer can assume while in an HA deployment, and the procedure required to remove an active-standby pair from that state, see the SteelFusion Edge Management Console User’s Guide.

Recovering from Split-Brain Scenarios Involving Edge Appliance HA

Even though the communication between the peers of an Edge HA deployment is designed for maximum resiliency, there is a remote possibility of a failure scenario known as split brain. Split brain can occur if the heartbeat communication between the peers fails simultaneously in all aspects. In other words, both heartbeat interfaces fail at the same time. If these interfaces are directly connected through cross-over cables then the possibility is extremely remote. But if the heartbeat interfaces are connected through network switches, then depending on the design and topology, split brain might occur.

In a split-brain condition, both Storage Edge devices act as if the peer has failed. This action can result in both peers being Active Degraded. Because both peers can be simultaneously trying to serve data and also be synchronizing data back through the Core, this could lead to data integrity issues.

There are ways to recover from this scenario, but the best course of action to begin with is to contact Riverbed Support and open a support case. Any recovery process is likely to be different from another so the actual procedure may vary depending on the failure sequence.

Note: The recovery process can involve accessing a previous snapshot of the affected LUNs.

Testing HA Failover Deployments

There are many ways that you can test a failover configuration of Core HA or Storage Edge HA. These tests may include power-cycling, SteelFusion HA peer device reboot, or any number of network connection failure scenarios (routers, switches, cables).

Your failover test should at least satisfy the basic requirements that ensure the SteelFusion HA deployment recovers as expected.

The simplest test is to perform an orderly reboot of a SteelFusion peer device (Core or Storage Edge) that is one half of an HA configuration.


SteelFusion Appliance High-Availability Deployment Configuring WAN Redundancy

Configuring WAN Redundancy

This section describes how to configure WAN redundancy. It includes the following topics:

“Configuring WAN Redundancy with No Core HA” on page 96

“Configuring WAN Redundancy in an HA Environment” on page 97

You can configure the Core and Storage Edge with multiple interfaces to use with MPIO. You can consider this a form of local network redundancy. Similarly, you can configure multiple interfaces to provide a degree of redundancy across the WAN between Core and Storage Edge. This redundancy ensures that any failure along the WAN path can be tolerated by Core and Storage Edge, and is called WAN redundancy.

WAN redundancy provides multiple paths for connection in case the main Core to Storage Edge link fails.

To configure WAN redundancy, you perform a series of steps on both the data center and branch side.

You can use both the in-path interfaces (inpathX_Y) or Ethernet interfaces (ethX_Y) for redundant WAN link configuration. In the examples below the term intf is used to imply either in-path or Ethernet network interfaces.

Configuring WAN Redundancy with No Core HA

This section describes how to configure WAN redundancy when you do not have a Core HA deployment.

To configure WAN redundancy

1. Configure local interfaces on the Storage Edge. The interfaces are used to connect to the Core:

From the SteelHead EX Management Console, choose EX Features > Granite: Granite Storage. From the Edge Management Console, choose Storage > Storage Edge Configuration.

Click Add Interface.

Figure 6-19. Storage Edge Interfaces


Configuring WAN Redundancy SteelFusion Appliance High-Availability Deployment

2. Configure preferred interfaces for connecting to Storage Edge on Core:

From the Core Management Console, choose Configure > Manage: SteelFusion Edges.

Select Show Preferred Interfaces for SteelFusion Edge Connections.

Click Add Interface.

Figure 6-20. Adding Core Interfaces

On first connection, the Core sends all the preferred interface information to the Storage Edge. The Storage Edge uses this information along with configured local interfaces to connect on each link (local-interface and preferred-interface pair) until a successful connection is formed. The Storage Edge tries each connection three times (and waits three seconds before the next try) before it moves on to the next. In other words, if the first link fails, the next link is tried in nine seconds.

After the Core and the Storage Edge have established a successful alternative link, the Storage Edge updates its Rdisk configuration with the change, so that the configuration is on the same link as the management channel between Core and Storage Edge.

3. Remove the local interface for WAN redundancy on Storage Edge:

From the SteelHead EX Management Console, choose EX Features > Granite: Granite Storage. From the Edge Management Console, choose Storage > Storage Edge Configuration.

Open the interface you want to remove.

Click Remove Interface.

4. Remove preferred interfaces for WAN redundancy on the Core:

From the Core Management Console, choose Configure > Manage: SteelFusion Edges.

Select Show Preferred Interfaces for SteelFusion Edge Connections.

Open the interface you want to remove.

Delete the interface.

Any change in the preferred interfaces on the Core is communicated to the Edge and the connection is updated as needed.

Configuring WAN Redundancy in an HA Environment

In an Core HA environment, the preferred interfaces information of the failover Core is sent to the Storage Edge by the primary Core. The connection between the Edge and the failover Core follows the same logic in which a connection is tried on each link (local interface and preferred interface pair) until a connection is formed.


SteelFusion Appliance High-Availability Deployment Configuring WAN Redundancy


CHAPTER 7 SteelFusion Replication

(FusionSync)

This section describes FusionSync, a replication-based technology between Cores that enables the seamless branch continuity between two data centers. It includes the following topics:

“Overview of SteelFusion Replication” on page 99

“Architecture of SteelFusion Replication” on page 100

“Failover Scenarios” on page 102

“FusionSync High-Availability Considerations” on page 105

“SteelFusion Replication Metrics” on page 107

As of the publication of this manual, your Cores and SteelFusion Edges must run SteelFusion v4.0 or later. SteelFusion Replication does not run on BlockStream-enabled SteelHead EX v3.6 and earlier.

Note: FusionSync is accessible through the Core Management Console through Replication menu and through coredr set of commands using the CLI. References to replication refer to the FusionSync feature.

For more information about SteelFusion Replication, see SteelFusion Core Management Console User’s Guide.

Overview of SteelFusion Replication

A single data center is susceptible to large-scale failures (power loss, natural disasters, hardware failures) that can bring down your network infrastructure. To mitigate such scenarios, SteelFusion Replication enables you to connect branch offices to data centers across geographic boundaries and replicate data between them.

In a SteelFusion setup, a typical storage area network (SAN) replication does not protect you from data loss in case of a data center failure, nor from network downtime that can affect many branch offices at the same time. FusionSync enables Cores in two data centers to remain in synchronization and enables the Storage Edges to switch to another Core in case of disaster and prevent data loss and downtime.


SteelFusion Replication (FusionSync) Architecture of SteelFusion Replication

Architecture of SteelFusion Replication

This section describes the architecture of SteelFusion Replication. It contains the following topics:

“SteelFusion Replication Components” on page 100

“SteelFusion Replication Design Overview” on page 100

SteelFusion Replication Components

SteelFusion Replication is comprised of the following components:

Primary Core - A role of Core that actively serves the storage LUNs to the Edges and replicates new writes to a secondary Core. During normal operations the primary Core is located at the preferred data center. When a disaster, a failure, or maintenance affects the preferred data center, the primary Core fails over to the secondary Core at the disaster recovery site. After the failover, the secondary Core becomes primary.

Secondary Core - A role of Core that receives replicated data from the primary Core. The secondary Core does not serve the storage LUNs to the Storage Edges. On failover, the secondary Core changes its role to primary Core.

Journal LUN - A dedicated LUN that is attached to the Cores from the local backend storage array and logs the write operations from Storage Edges when replication is suspended.

Witness - A role assigned to one of the Storage Edges. A Witness registers requests to suspend replication from the Cores. A Witness makes sure that only one Core at a time suspends its replication. This prevents the Storage Edges write operations from being logged to the Journal LUNs on both sides.

You must meet the following requirements to set up replication:

You must configure the backend storage array for each Core that is included in the replication configuration.

The primary data center can reach the secondary data center through the chosen interfaces.

The secondary Core cannot have any active Storage Edges or LUNs.

You must use symmetric HA configurations on each Core.

In that, if you have HA configured on the primary Cores of the data center, you must also configure it on the secondary Cores.

The Storage Edges can reach the secondary data center.

SteelFusion Replication Design Overview

This section describes the communication and general data flow between Cores that are deployed as part of a SteelFusion Replication (FusionSync) design.

A deployment in which there is just a single data center and a Core with out FusionSync, the Storage Edge acknowledges the write operations to local hosts in the branch and then asynchronously sends the write operations to the Core. The Core writes the data to the LUN in the data center storage array. After the Core has received an acknowledgment from the storage array, the Core then acknowledges the Storage Edge. In this way, the Storage Edge is always maintaining data consistency.

To maintain data consistency between the Storage Edge and the two data centers—with a Core in each data cneter and FusionSync configured—the data flow is somewhat different.


Architecture of SteelFusion Replication SteelFusion Replication (FusionSync)

In the steady state, the Storage Edge acknowledges the write operations to local hosts and asynchronously sends them to its Core. When you configure FusionSync, the primary Core applies the write operations to backend storage and replicates the write operations to secondary Core. The data is replicated between the Cores synchronously, meaning that a write operation is acknowledged to the Storage Edge only when both the local storage array and the secondary Core, along with the storage array, have acknowledged the data.

If the primary Core loses its connection to the secondary Core it pauses FusionSync. When FusionSync is paused, writes from the Storage Edge to the Core are not acknowledged by the Core. The Storage Edge continues to acknowledge the local hosts in the branch and buffer the writes similar to its behavior when the WAN connectivity to Core goes down, without FusionSync. Although write operations between Storage Edge and Core are not available, read operations are not affected, and read requests from the Edges continue to be serviced by the same Core as normal.

When the connectivity comes back up, FusionSync continues automatically. If, for any reason, the connectivity between the Cores takes a long time to recover, the uncommitted data in the Storage Edges might continue to increase. If the blockstore write reserve is in danger of reaching capacity, you can suspend FusionSync. When FusionSync is suspended, the primary Core accepts writes from the Storage Edges and keeps a log of the write operations on its Journal LUN.

When a primary Core is down, a secondary Core can take over the primary role. You have to manually initiate the failover on the secondary Core. Because the Storage Edges maintain connectivity to both Cores—primary and secondary—when the failover occurs, all Storage Edges data connections move to the secondary Core. At this point, the secondary Core becomes primary with its Replication suspended. Now the new primary Core acknowledges writes from Edges, applies them to the storage array, and logs the operations into the Journal LUN. When the connectivity between the Cores is restored, the new primary Core starts resynchronizing writes logged in the Journal LUN to the Core in the original data center (the old primary Core). In this recovery scenario, the old primary Core now becomes the secondary Core.

As with any high availability scenario, there may be a possibility of a split-brain condition. In the case of SteelFusion Replication, it is when both the primary and secondary Cores are up and visible to Edges, but cannot communicate to each other. FusionSync could become suspended on both sides and the Edges send writes to both Cores. Writing to both sides leads to a condition when both Cores are journaling and neither Core has a consistent copy of the data. More than likely, split brain results in a data loss. To prevent the issue, you must define one of the Storage Edges as a Witness.

The Witness must approve the request to suspend replication. When the request is approved, the Core can start logging the writes to Journal LUN. The Witness makes sure that both primary and secondary Cores do not get approval for suspension at the same time.

When the connectivity between data centers is restored, you can fallback to the original data center by initiating a failover at the original data center. Because a failover is only possible if the primary Core is not reachable by the secondary Core and the Witness, you must manually bring down the primary Core prior to the failover. You can accomplish this process by stopping the service on the primary Core.


SteelFusion Replication (FusionSync) Failover Scenarios

Figure 7-1 shows the general design of the FusionSync feature.

Figure 7-1. Replication Design Overview

The data flows on the diagram as the following:

1. A write operation is asynchronously propagated from the Storage Edge to the primary Core (Core X).

2. The Core applies the write operation to the backend storage and synchronously replicating the write to the secondary Core (Core XX).

3. The backend storage acknowledges the write.

4. The secondary Core (Core XX) applies the write to its backend storage and acknowledges the write.

5. Core acknowledges the write to Storage Edge.

Failover Scenarios

This section describes the following failover scenarios:

“Secondary Site Down” on page 103

“Replication Is Suspended at the Secondary Site” on page 103

“Primary Site Is Down (Suspended)” on page 104


Failover Scenarios SteelFusion Replication (FusionSync)

Secondary Site Down

Figure 7-2 shows the traffic flow if the secondary site goes down and FusionSync is paused.

Figure 7-2. Secondary Site Down

A write from Storage Edges is not acknowledged by primary Core. The Storage Edges start to buffer the writes locally. Read operations from the Core and read and write operations in the branch office between the Storage Edge and hosts are being performed as usual.

Replication Is Suspended at the Secondary Site

Figure 7-3 shows the traffic flow if FusionSync is suspended at the secondary site.

Figure 7-3. Replication Is Suspended at the Secondary Site


SteelFusion Replication (FusionSync) Failover Scenarios

1. A write operation is asynchronously propagated to the primary Core from the Storage Edge.

2. The Core applies the write operation to backend storage and logs the write to Journal LUN.


4. Wait for the write to be journaled.

5. The Core acknowledges the write to the Storage Edge.

Primary Site Is Down (Suspended)

Figure 7-4 shows the traffic flow if the primary site is down and the failover is initiated, the secondary Core assumes the primary role and the connected Storage Edges failover to the secondary site.

Figure 7-4. Primary Site Is Down

1. A write operation is asynchronously propagated from the Storage Edge to the secondary Core.

2. The Core applies the write operation to backend storage and logs the write to the Journal LUN.


4. Wait for the write to be journaled.

5. The Core acknowledges the write to the Storage Edge.


FusionSync High-Availability Considerations SteelFusion Replication (FusionSync)

FusionSync High-Availability Considerations

The section describes design considerations when FusionSync is implemented in conjunction with high availability (HA). It contains the following topics:

“FusionSync Core HA” on page 105

“Replication HA Failover Scenarios” on page 106

For information about HA, see “SteelFusion Appliance High-Availability Deployment” on page 67.

FusionSync Core HA

Riverbed strongly recommends that you set up your Cores for HA. Consider FusionSync as an enhancement to HA to protect against data center failures. Do not consider FusionSync as a replacement for HA within the data center. Core HA is active-active by design with each Core separately serving different sets of LUNs to their Storage Edges. Therefore, each Core needs FusionSync configured for those LUNs to their replication peers.

Note: When you have configured Core HA, one Core has the role of the leader and the other has the role of the follower.

Consider the following factors when configuring replication for Cores set up for HA:

Some configuration changes are only be possible on the leader Core. If the leader is down, the follower Core assumes the role of leader. For example, suspend and failover operations are only allowed on the leader.

You need to configure the same Journal LUN on both the leader and the follower Cores.

If the primary Core is setup for HA, the secondary Core must be configured for HA too.

HA must be setup before configuring replication.

Cores configured for HA must have the same replication role and data center name.

After you have configured replication, you cannot clear the Core HA.

When setting up the replication pair, both primary Cores in HA configuration need to be paired to their peers in the secondary data center. For example, Core X paired with Core X' and Core Y is paired with Y'.

Terminating replication from the primary Core that is set up for HA terminates FusionSync for all four nodes.


SteelFusion Replication (FusionSync) FusionSync Core HA

Replication HA Failover Scenarios

This section describes SteelFusion Core HA failover scenarios, with FusionSync.

Figure 7-5 shows the Core configured for HA at Site A that is replicating to Site B. Both nodes of the active-active HA cluster are operational and replicating different LUNs to their peers.

Figure 7-5. HA Replication in Normal State

Figure 7-6 shows how HA failover at Site A is affecting replication. Core Y has failed and Core X took over responsibility for servicing Core Y LUN(s) to Storage Edges, and Core X is replicating the LUN(s) to Core Y' at the same time as continuing to replicate to Core XX for its own LUNs.

Figure 7-6. Core HA Failover on Primary Data Center


SteelFusion Replication Metrics SteelFusion Replication (FusionSync)

Figure 7-7 shows how HA failover at Site B is affecting replication. Core YY has failed and Core XX takes over responsibility of the replication target and accepting replication from Core Y and continuing to accept replication traffic from Core X.

Figure 7-7. Core HA Failover on Secondary Data Center

Figure 7-8 shows how HA failover on both sites is affecting replication. After Core Y and Core YY fail, Core X assumes the role of replication source and Core XX assumes the role of replication target.

Figure 7-8. Core HA Failover on Primary and Secondary Data Centers

SteelFusion Replication Metrics

When you deploy FusionSync between Cores in two data centers, you must understand some of the traffic workload and other metrics that occur on the link between the Cores to help with sizing and troubleshooting.


SteelFusion Replication (FusionSync) SteelFusion Replication Metrics

When FusionSync is running, the packets sent between the Cores consist of a very small header and payload. During the initial synchronization of a LUN from primary to secondary data center, the payload is fixed at 128 KB.

After the initial synchronization is complete and the LUNs are active, the payload size is exactly the same as the iSCSI write that occurred at the remote site. This write is the write between the iSCSI initiator of the server and the iSCSI target of the Edge. The actual size depends on the initiator and server operating system, but the payload size can be as large as 512 KB.

Whatever the payload size is between the primary and secondary, the Core honors the MTU setting of the network between data centers.

The maximum replication bandwidth consumed between the two data centers is the sum of the write throughput across all locations in which there are active SteelFusion Edge appliances installed. This maximum replication bandwidth is because all active branch locations are sending committed data to the Core in the primary data center, which is then sent on to the secondary data center. Riverbed recommends to reduce the quantity of traffic between data centers that you use SteelHeads on the link between the two locations.

By default, each Core uses a single 1-Gbps network interface. A Core in the primary data center maintains two TCP replication connections for replication to the Core in the secondary data center. If you use multiple network interfaces on each Core, then multiple TCP connections share the available bandwidth on the link between data centers.

In general, the number of connections is calculated by using the following formula:

Total replication connections = ((2 x number of replication interfaces) x number of Cores in primary) + ((2 x number of replication interfaces) x number of Cores in secondary)

For example, consider the following deployments:

A single Core configured with one replication interface in the primary data center, and a single Core in the secondary data center, also with a single replication interface. In this scenario there would be two replication connections for the primary and two for the secondary, resulting in a total of four connections.

Single Cores that each have two network interfaces would mean a total of (2 x 2) + (2 x 2) = 8 replication connections.

Two Cores per data center, each with a single replication interface, means a total of (2 x 1) + (2 x 1) + (2 x 1) + (2 x 1) = 8 connections.

A deployment with two Cores and two replication interfaces per data center has a total of (2 x 2) + (2 x 2) + (2 x 2) + (2 x 2) = 16 connections.


CHAPTER 8 Snapshots and Data Protection

This chapter describes how Core integrates with the snapshot capabilities of the storage array, enabling you to configure application-consistent snapshots through the Core Management Console. It includes the following sections:

“Setting Up Application-Consistent Snapshots” on page 109

“Configuring Snapshots for LUNs” on page 110

“Volume Snapshot Service Support” on page 111

“Implementing Riverbed Host Tools for Snapshot Support” on page 111

“Configuring the Proxy Host for Backup” on page 113

“Configuring the Storage Array for Proxy Backup” on page 113

“Data Protection” on page 114

“Data Recovery” on page 115

“Branch Recovery” on page 116

For details about storage qualified for native snapshot and backup support, see the SteelFusion Core Installation and Configuration Guide.

Setting Up Application-Consistent Snapshots

This section describes the general process for setting up snapshots.

Core integrates with the snapshot capabilities of the storage array. You can configure snapshot settings, schedules, and hosts directly through the Core Management Console or CLI.

For a description of application consistency and crash consistency, see “Understanding Crash Consistency and Application Consistency” on page 14.


Snapshots and Data Protection Configuring Snapshots for LUNs

To set up snapshots

1. Define the storage array details for the snapshot configuration.

Before you can configure snapshot schedules, application-consistent snapshots, or proxy backup servers for specific LUNs, you must specify for Core the details of the storage array, such as IP address, type, protocol, and so on. The storage driver does not remap any blocks; the remapping takes place within the array.

To access snapshot schedule policy configuration settings:

In the Core Management Console, choose Configure > Backups: Snapshots to open the Snapshots page.

In the Core CLI, use the storage snapshot policy modify commands.

2. Define snapshot schedule policies.

You define snapshot schedules as policies that you can apply later to snapshot configurations for specific LUNs. After applied, snapshots are automatically taken based on the parameters set by the snapshot schedule policy.

Snapshot schedule policies can specify weekly, daily, or day-specific schedules. Additionally, you can specify the total number of snapshots to retain.

To access snapshot schedule policy configuration settings:

In the Core Management Console, choose Configure > Backups: Snapshots to open the Snapshots page.

In the Core CLI, use the storage snapshot policy modify commands.

3. Define snapshot host credentials.

You define snapshot host settings as storage host credentials that you can apply later to snapshot configurations for specific LUNs.

To access snapshot host credential configuration settings:

In the Core Management Console, choose Configure > Backups: Handoff Host to open the Storage Snapshots page.

In the Core CLI, use the storage host-info commands.

For details about CLI commands, see the SteelFusion Command-Line Interface Reference Manual. For details about using the Core Management Console, see the SteelFusion Core Management Console User’s Guide.

Configuring Snapshots for LUNs

This section describes the general steps for applying specific snapshot configurations to LUNs through Core. For information about configuring LUNs, see “Configuring LUNs” on page 19.

To apply snapshot configurations to a LUN

1. Select the LUN for the snapshot and access the snapshot settings.


Volume Snapshot Service Support Snapshots and Data Protection

You can access the snapshot settings for a specific LUN by choosing Configure > Manage: LUNs. Select the desired LUN to display controls that include the Snapshots tab. The Snapshots tab itself has three tabs: Configuration, Scheduler, and History.

2. Apply a snapshot schedule policy to the current LUN.

The controls in the Scheduler tab enable you to apply a previously configured policy to the current LUN. You can create a new schedule directly in this panel.

3. Specify the storage array where the LUN resides.

The controls in the Configuration tab enable you to specify the storage array where the current LUN resides and to apply a static name that is prepended to the names of snapshots.

4. Specify the client type.

The controls in the Configuration tab enable you to specify the client type. To configure application-consistent snapshots and a proxy backup, you must set this value to Windows or VMware.

5. Enable and configure application-consistent snapshots.

The controls in the Configuration tab enable you to enable and configure application-consistent snapshots. The settings vary depending on which client type is selected.

Volume Snapshot Service Support

Riverbed supports Volume Snapshot Service (VSS) through the Riverbed Hardware Snapshot Provider (RHSP) and Riverbed Snapshot Agent. For details, see “Implementing Riverbed Host Tools for Snapshot Support” on page 111.

Implementing Riverbed Host Tools for Snapshot Support

Riverbed Host Tools are installed and implemented separately on the branch office Windows Server. The toolkit provides the following services:

RHSP - Functions as a snapshot provider for the VSS by exposing Core snapshot capabilities to the Windows Server.

RHSP ensures that users get an application-consistent snapshot.

Riverbed Snapshot Agent - A service that enables the Storage Edge to drive snapshots on a schedule. This schedule is set through the Core snapshot configuration. For details, see the SteelFusion Core Management Console User’s Guide.

Riverbed Host Tools support 64-bit editions of Microsoft Windows Server 2008 R2 or later.


“Overview of RHSP and VSS” on page 112

“Riverbed Host Tools Operation and Configuration” on page 112


Snapshots and Data Protection Implementing Riverbed Host Tools for Snapshot Support

Overview of RHSP and VSS

RHSP exposes Storage Edge through iSCSI to the Windows Server as a snapshot provider. Only use RHSP when an iSCSI LUN is mounted on Windows through the Windows initiator.

The process begins when you (or a backup script) request a snapshot through the VSS on the Windows Server:

1. VSS directs all backup-aware applications to flush their I/O operations and to freeze.

2. VSS directs RHSP to take a snapshot.

3. RHSP forwards the command to the Storage Edge.

4. The Storage Edge exposes a snapshot to the Windows Server.

5. The Storage Edge and Core commit all pending write operations to the storage array.

6. The Storage Edge takes the snapshot against the storage array.

Note: The default port through which the Windows Server communicates with Edge appliance is 4000.

Riverbed Host Tools Operation and Configuration

Riverbed Host Tools installation and configuration requires configuration in both Windows Server and Core.

To configure Core

1. In the Core Management Console, choose Configure > Backups: Snapshots to configure snapshots.

2. In the Core Management Console, choose Configure > Storage Array: iSCSI, Initiators, MPIO to configure an iSCSI with the necessary storage array credentials.

The credentials must reflect a user account on the storage array appliance that has permissions to take and expose snapshots.

For details about both steps, see the SteelFusion Core Management Console User’s Guide.

To install and configure Riverbed Host Tools

1. Obtain the installer package from Riverbed.

2. Run the installer on your Windows Server.

3. Confirm the installation as follows:

From the Start menu, choose Run....

At the command prompt, enter diskshadow to access the Windows DiskShadow interactive shell.

In the DiskShadow shell, enter list providers.

Confirm that RHSP is among the providers returned.


Configuring the Proxy Host for Backup Snapshots and Data Protection

Configuring the Proxy Host for Backup

This section describes the general procedures for configuring the proxy host for backup in both ESXi and Windows environments.

To configure an ESXi proxy host

Configure the ESXi proxy host to connect to the storage array using iSCSI or Fibre Channel.

To configure a Windows proxy host

1. Configure the Windows proxy host to connect to the storage array using iSCSI or Fibre Channel.

2. Configure a local administrator user that has administrator privileges on the Windows proxy host.

To create a user with administrator privileges, create the following registry setting:

– HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System

– For Key Type, specify DWORD.

– For Key Name, specify LocalAccountTokenFilterPolicy.

– Set Key Value to 1.

3. Disable the Automount feature through the DiskPart command interpreter:

automount disable

4. Add the storage array target to the favorites list on the proxy host to ensure that the iSCSI connection is reestablished when the proxy host is rebooted.

Configuring the Storage Array for Proxy Backup

This section describes the general processes for configuring Dell EqualLogic, EMC CLARiiON, EMC VNX, and NetApp storage arrays for backup.

For details, see also the SteelFusion Solution Guide - Granite with NetApp Storage Systems, the SteelFusion Solution Guide - Granite with EqualLogic Storage Arrays, and the SteelFusion Solution Guide - Granite with EMC CLARiiON Storage Systems.

To configure Dell EqualLogic

1. Go to the LUN and select the Access Tab.

2. Add permissions to Core initiator/IP address and assign volume-only level access.

3. Add permissions to proxy host for the LUN and assign snapshots-only access.

For details, see also the SteelFusion Solution Guide - Granite with EqualLogic Storage Arrays.

To configure EMC CLARiiON and EMC VNX

1. Create a storage group.


Snapshots and Data Protection Data Protection

2. Assign the proxy servers to the group.

Riverbed recommends that you provide the storage group information to the proxy host storage group.

For details, see also the SteelFusion Solution Guide - Granite with EMC CLARiiON Storage Systems.

To configure NetApp

1. Create an initiator group.

2. Assign the proxy servers to the group.

Riverbed recommends that you provide the storage group information to the proxy host storage group.

For details, see also the SteelFusion Solution Guide - Granite with NetApp Storage Systems.

Data Protection

The SteelFusion system provides tools to preserve or enhance your existing data protection strategies. If you are currently using host-based backup agents or host-based consolidated backups at the branch, you can continue to do so within the SteelFusion context.

However, Core also enables a wider range of strategies, including:

Backing up from a crash-consistent LUN snapshot at the data center

The SteelFusion product family continuously synchronizes the data created at the branch with the data center LUN. As a result, you can use the storage array at the data center to take snapshots of the LUN and thereby avoid unnecessary data transfers across the WAN. These snapshots can be protected either through the storage array replication software or by mounting the snapshot into a backup server.

Such backups are only crash consistent because the storage array at the data center does not instruct the applications running on the branch server to quiesce their I/Os and flush their buffers before taking the snapshot. As a result, such a snapshot might not contain all the data written by the branch server up to the time of the snapshot.

Backing up from an application-consistent LUN snapshot at the data center

This option uses the SteelFusion Microsoft VSS integration in conjunction with Core storage array snapshot support. You can trigger VSS snapshots on the iSCSI data drive of your branch Windows servers at the branch and Storage Edge ensures that all data is flushed to the data center LUN and triggers application-consistent snapshots on the data center storage array.

As a result, backups are application consistent because the Microsoft VSS infrastructure has informed the applications to quiesce their I/Os before taking the snapshot.

This option requires the installation of the Riverbed Host Tools on the branch Windows server. For details about Riverbed Host Tools, see “Implementing Riverbed Host Tools for Snapshot Support” on page 111.

Backing up from SteelFusion-assisted consolidated snapshots at the data center

This option relieves backup load on virtual servers, prevents the unnecessary transfer of backup data across the WAN, produces application-consistent backups, and backs up multiple virtual servers simultaneously over VMFS or NTFS.


Data Recovery Snapshots and Data Protection

In this option, the ESX server, and subsequently Core, takes the snapshot, which is stored on a separately configured proxy server. The ESX server flushes the virtual machine buffers to the data stores and the Edge appliance flushes the data to the data center LUN, resulting in application-consistent snapshots on the data center storage array.

You must separately configure the proxy server and storage array for backup. For details, see the “Configuring the Proxy Host for Backup” on page 113.

This option does not require the installation of the Riverbed Host Tools on the branch Windows server.

For details about data protection, backing up strategies, as well as a detailed discussion of crash-consistent and application-consistent snapshots, see the SteelFusion Data Protection and Recovery Guide.

For a discussion of application consistency and crash consistency in general, see “Understanding Crash Consistency and Application Consistency” on page 14.

Data Recovery

In the event your data protection strategy fails, the SteelFusion product family enables several strategies for file-level recovery. The recovery approach depends on the protection strategy you used.

This section describes the following strategies:

File recovery from Storage Edge snapshots at the branch

When snapshots are taken at the branch using Windows VSS in conjunction with RHSP, each snapshot is available to the Windows host as a separate drive. In order to recover a file, browse to the drive associated with the desired snapshot, locate the file, and restore it. For more information about RHSP, see “Implementing Riverbed Host Tools for Snapshot Support” on page 111.

File recovery from the backup application catalog file at the branch

When backups are taken at the branch using a backup application such as Symantec® NetBackup™ or IBM Tivoli® Storage Manager, you access and restore files directly from the backup server. Riverbed recommends that you restore the files to a different location in case you need to resort to the current files.

Recover individual files from a data center snapshot (VMDK files)

To recover individual files from a storage array snapshot of a LUN containing virtual disk (VMDK) files, present the snapshot to a VMware ESX Server, attach the VMDK to an existing VM running the same operating system (or an operating system that reads the file system used inside the VMDKs in question). You can then browse the file system to retrieve the files stored inside the VM.

Recover individual files from a data center snapshot (individual files)

To recover individual files from a storage array snapshot of a LUN containing individual files, present the snapshot to a server running the same operating system (or an operating system that reads the file system used on the LUN). You can then browse the file system to retrieve the files.

File recovery from a backup application at the data center

You can back up snapshots taken at the data center with a backup application or through Network Data Management Protocol (NDMP) dumps. In this case, file recovery remains unchanged from the existing workflow. Use the backup application to restore the file. You can send the file to the branch office location.


Snapshots and Data Protection Branch Recovery

Alternatively, you can take the LUN offline from the branch office and inject the file directly into the LUN at the data center. However, Riverbed does not recommend this procedure because it requires taking the entire LUN down for the duration of the procedure.

File recovery from Windows VSS at the branch

You can enable Windows VSS and previous versions can also be enabled at the branch on a SteelFusion LUN no matter which main backup option you implement. When using Windows VSS, you can directly access the drive, navigate to the previous version tab, and recover deleted, damaged or overwritten files.

Windows uses its default VSS software provider to back up the previous 64 versions of each file. In addition to restoring individual files to a previous version, VSS also provides the ability to restore an entire volume.

Setting up this recovery strategy requires considerations too numerous to detail here. For more details about recovery strategies, see the SteelFusion Data Protection and Recovery Guide.

Branch Recovery

SteelFusion v3.0 and later includes the branch recovery feature that allows you to define the working data set of LUNs projected by Core. During a catastrophic and irrecoverable failure, you can lose access to the working set of LUNs. Branch recovery enables proactive prepopulation of the working set when the LUNs are restored. This section includes the following topics:

“Overview of Branch Recovery” on page 116

“How Branch Recovery Works” on page 117

“Branch Recovery Configuration” on page 117

“Branch Recovery CLI Configuration Example” on page 118

Overview of Branch Recovery

The branch recovery feature in SteelFusion v3.0 and later enables you to track disk-accesses for both Windows LUNs and VMFS LUNs (hosting Windows VMs) and quickly recover from a catastrophic failure. During normal operations, the Storage Edge caches the relevant and recently accessed user data on a working set of projected LUNs.

In event of catastrophic failure in which you cannot recover the Storage Edge, the working set of projected LUNs is also lost. With branch recovery enabled, the working set is proactively streamed to the branch when a new Storage Edge is installed and LUNs mapped. Branch recovery ensures that after the branch is recovered, the user experience at the branch does not change.

Do not confuse either regular prefetch or intelligent prepopulation with branch recovery. Branch recovery prepopulates the working set proactively, as opposed to pulling related data on access, as in the case of regular prefetch. Branch recovery is different from intelligent prepopulation because intelligent prepopulation pushes all the used blocks in a volume with no regard to the actual working set.

Branch recovery is based on Event Tracing for Windows (ETW), a kernel-level facility. Riverbed supports only Windows 7, Windows 2008 R2, Windows 2012, and Windows 2012 R2. Branch recovery is supported for both physical Windows hosts and Windows VMs. You must format physical Window host LUNs with NTFS. For VMs, NTFS-formatted VMDK hosted on VMware VMFS LUNs are supported.


Branch Recovery Snapshots and Data Protection

How Branch Recovery Works

The following are the major components of branch recovery:

Branch recovery agent

Branch recovery support on Core

The branch recovery agent is a service that runs in branch on a Windows host or VM. The agent uses Windows ETW-provided statistics to collect and log all disk access I/O. Periodically, the collected statistics are written to a file that is stored on the same disk for which the statistics are collected for. The file is called lru*.log and is located in the \Riverbed\BranchRecovery\ directory.

Note: The SteelFusion Turbo Boot plugin is not compatible with the branch recovery agent. For more information about the branch recovery agent, see the SteelFusion Core Management Console User’s Guide.

You must enable branch recovery support for the LUN prior to mapping LUNs to the new Storage Edge. When VMFS LUN or a snapshot is mapped to a new Storage Edge, the Core crawls the LUN and parses all the lru*.log files. If the files that are previously created by a branch recovery agent are found, the Core pushes the referenced blocks to the new Storage Edge. The branch recovery agent sends the most recently accessed blocks first, sequentially for each VM. When data for a certain time frame (hour, day, week, or month) is recovered for one VM, the process moves on to the next VM in round-robin fashion, providing equal recovery resources to all VMs.

Branch Recovery Configuration

You must install the branch recovery agent on each VM for which you want the benefit of the branch recovery feature. You must have administrative privileges to perform the installation. After the installation, the agent starts to monitor I/O operations and records the activity into designated files.

When recovering a branch from a disaster, you must enable branch recovery for the LUN. For VMFS LUNs, you can enable branch recovery for all the VMs on the LUN, or pick and choose specific VMs on which you want the feature enabled.

You must disable the branch recovery feature prior to configuration changes. If you want to add or remove any VMs from the configurations, follow these steps:

1. Disable branch recovery.

2. Make changes.

3. Enable branch recovery.

You can choose a start time for branch recovery. This option enables you to control bandwidth usage and to choose the best time to start the recovery when you restore branches. For example, you can choose to schedule the recovery during the night, when the least amount of bandwidth is being used. In addition, you can set a cap (that is a percentage of the total disk size) for the amount of data that is pushed per (virtual) disk.

You can configure branch recovery in the Management Console and in the CLI. In the Management Console, choose Configure > Manage: LUNs, and select the Branch Recovery tab on the desired LUN.



For more information about configuring branch recovery, see the SteelFusion Core Management Console User’s Guide and the SteelFusion Command-Line Interface Reference Manual.

Figure 8-1. Branch Recovery Tab on the LUNs Page

Branch Recovery CLI Configuration Example

The following example shows how to use the CLI to configure branch recovery on Core. The first output shows a LUN that is not configured with branch recovery. The example then shows how to start a schedule (with output), how to configure specific VMs, how to enable branch recovery, and output for a successfully recovered LUN.

The following output is from a new VMFS LUN, with branch recovery not enabled:

# show storage lun alias 200GB branch-recoveryBranch Recovery configuration : Enabled : no Status : Not Enabled Phase : Not Enabled Progress : Not Enabled Start date : Not Configured Start time : Not Configured#

Use the following command to start a branch recovery schedule:

# conf terminal(config)# storage lun modify alias 200GB branch-recovery schedule start-now

The output from the VMFS LUN now has a started schedule, but branch recovery remains disabled:

# show storage lun alias alias-vmfs_lun branch-recoveryBranch Recovery configuration : Enabled : no Status : not_started Progress : 0 Bytes pushed Start date : 2014/03/14 Start time : 15:01:16#


Branch Recovery Snapshots and Data Protection

The output does not list any VMs. If you have not defined them, all VMs are added by default. If you want to enable branch recovery for specific VMs on a specific LUN, use the following command:

(config) # storage lun modify alias 200GB branch-recovery add-vm oak-sh486-vm1

Note: VM names are discovered by prefetch and available for automatic completion. The default cap is set to 10. You can change the default with the storage lun modify alias 200GB branch-recovery add-vm oak-sh486 cap 50 command.

Use the following command to show the result of configuring specific VMs:

# show storage lun alias 200GB branch-recoveryBranch Recovery configuration : Enabled : no Status : not_started Phase : not_started Progress : 0 Bytes pushed Start date : 2014/02/20 Start time : 10:32:59VMs : Name : oak-sh486-vm1 Status : Not Started Cap : 10 % Percent Complete: Branch recovery not started or not enabled on VM Recovering data from: Branch recovery not started or not enabled on VM#

When you are done configuring, scheduling, and adding the VMs, you can enable branch recovery for the LUNs by using the following command:

(config) # storage lun modify alias 200GB branch-recovery enable

Notice that with branch recovery enabled, data blocks are actively being restored to the LUN. Use the following command to check the status of the recovery on a LUN:

# show storage lun alias 200GB branch-recoveryBranch Recovery configuration : Enabled : yes Status : started Phase : day Progress : 3729920 Bytes pushed Start date : 2014/02/20 Start time : 10:32:59VMs : Name : oak-sh486 Status : In-progress Cap : 10 % Percent Complete : 9 % Recovering data from : Mon Feb 19 15:25 2014#

When the recovery of the LUN is complete, you see the following output:

# show storage lun alias 200GB branch-recoveryBranch Recovery configuration : Enabled : yes Status : complete Progress : complete Start date : 2014/02/20 Start time : 10:32:59VMs : Name : oak-sh486-vm1 Status : Complete Cap : 10 %



Percent Complete : 100 % Recovering data from : N/A


CHAPTER 9 Data Resilience and Security

This chapter describes security and data resilience deployment procedures and design considerations. It contains the following sections:

“Recovering a Single Core” on page 121

“Storage Edge Replacement” on page 123

“Disaster Recovery Scenarios” on page 124

“Best Practice for LUN Snapshot Rollback” on page 127

“Using CHAP to Secure iSCSI Connectivity” on page 128

“At-Rest and In-Flight Data Security” on page 130

“Clearing the Blockstore Contents” on page 132

“Additional Security Best Practices” on page 133

Recovering a Single Core

If you decide you want to deploy only a single Core, read this section to minimize downtime and data loss when recovering from a Core failure. This section includes the following topics:

“Recovering a Single Physical Core” on page 122

“Recovering a Single Core-v” on page 122

Caution: Riverbed strongly recommends that you deploy Core as an HA pair so that in an event of a failure, you can seamlessly continue operations. Both physical and virtual Core HA deployments provide a fully automated failover without end-user impact. For more information about HA and SteelFusion Replication, see “SteelFusion Appliance High-Availability Deployment” on page 67 and “SteelFusion Replication (FusionSync)” on page 99.


Data Resilience and Security Recovering a Single Core

Recovering a Single Physical Core

The Core internal configuration file is crucial to rebuilding your environment in the event of a failure. The possible configuration file recovery scenarios are as follows:

Up-to-date Core configuration file is available on an external server - When you replace the failed Core with a new Core, you can import the latest configuration file to resume operations. The Storage Edges reconnect to the Core and start replicating the new writes that were created after the Core failed.

In this scenario, you do not need to perform any additional configuration and there is no data loss on the Core and the Storage Edge.

Riverbed recommends that you frequently back up the Core configuration file. For details about the backup and restore procedures for device configurations, see the SteelCentral Controller for SteelHead User’s Guide.

Use the following CLI commands to export the configuration file:

enableconfigure terminalconfiguration bulk export scp://username:password@server/path/to/config

Use the following CLI commands to replace the configuration file:

enableconfigure terminalno service enableconfiguration bulk import scp://username:password@server/path/to/configservice enable

Core configuration file is available but it is not up to date - If you do not regularly back up the Core configuration file, you can be missing the latest information. When you import the configuration file, you retain all data since the last export. In other words, the data written to the configuration file after the Core failure to Storage Edges and LUNs are lost. You must manually add the components of the environment that were added after the configuration file was exported.

No Core configuration file is available - This is the worst-case scenario. In this case you need to build a new Core and reconfigure all Storage Edges as if they were new. All data in the Storage Edges is invalidated, and new writes to Storage Edge LUNs after Core failure are lost. There is no data loss at the Core. If there were applications running at the Storage Edge that cannot handle the loss of most recent data, they need to be recovered from an application-consistent snapshot and backup from the data center.

For more instruction on how to export and import the configuration file, see “Core Configuration Export” on page 157 and “Core in HA Configuration Replacement” on page 157. For general information about the configuration file, see the SteelFusion Core Management Console User’s Guide.

Recovering a Single Core-v

The following recommendation will help to recover from potential failures and disasters and minimize data loss in a Core-v and Storage Edges.

Configure the Core-v with iSCSI and use VMware HA - VMware HA is a component of the vSphere platform, which provides high availability for applications running in virtual machines. In the event of physical server failure, affected virtual machines are automatically restarted on other production servers. If you configure VMware HA for the Core-v, you have an automated failover for the single Core-v. You must be using iSCSI and not with Fiber Channel RDM disks.


Storage Edge Replacement Data Resilience and Security

Continually back up the Core-v configuration file to an external shared storage - See the scenarios described in “Recovering a Single Physical Core” on page 122.

Note: Core-v is not compatible with VMware Fault Tolerance (FT).

Storage Edge Replacement

In event of catastrophic failure, you might need to replace the Edge appliance and remap the LUNs. It is usually impossible to properly shut down an Edge LUN and bring it offline because the Storage Edge wants to commit all its pending writes (for the LUN) to the Core. If the Edge has failed, and you cannot successfully bring the LUN offline, you need to manually remove the LUN.

The blockstore is a part of Storage Edge, and if you replace the Edge, the cached data on the failed blockstore is discarded. To protect the Storage Edge against a single point of failure, consider an HA deployment of Edge. For more information, see “Storage Edge High Availability” on page 80.

Use the following procedure for an Edge disaster recovery scenario in which there is an unexpected Edge or remote site failure. This procedure does not include Storage Edge HA.

Note: Riverbed recommends that you contact Riverbed Support before performing the following procedure.

To replace Edge

1. Schedule time that is convenient to be offline (if possible).

2. On the Core, force unmap LUNs from the failed Edge.

3. From the Core Management Console, remove the failed Edge.

4. Add replacement Edge.

You can use the same Edge Identifier.

5. Map LUNs back to the Edge.

Note: When the LUNs are remapped to the replacement Edge, the iSCSI LUN IDs might change. You must rescan or rediscover the LUNs on the ESXi.

You can lose data on the LUNs when writes to the Storage Edge are not committed to the Core. In the case of minimal data loss, it is possible that you can easily recover the LUNs from a crash consistent state, such as with a filesystem check. However, this would depend on the type of applications that were using the LUNs. If you have concerns about the data consistency, Riverbed recommends that you roll back the LUN to a latest application-consistent snapshot. For details, see “Best Practice for LUN Snapshot Rollback” on page 127.


Data Resilience and Security Disaster Recovery Scenarios

Disaster Recovery Scenarios

This section describes basic SteelFusion appliance disaster scenarios, and includes general recovery recommendations. It includes the following topics:

“SteelFusion Appliance Failure—Failover” on page 124

“SteelFusion Appliance Failure—Failback” on page 125

Keep in mind the following definitions:

Failover - to switch to a redundant computer server, storage, and network upon the failure or abnormal termination of the production server, storage, hardware component, or network.

Failback - the process of restoring a system, component, or service previously in a state of failure back to its original, working state.

Production site - the site in which applications, systems, and storage are originally designed and configured. Also known as the primary site.

Disaster recovery site - the site that is set up in preparation for a disaster. Also known as the secondary site.

SteelFusion Appliance Failure—Failover

In the case of a failure or a disaster affecting the entire site, Riverbed recommends you take the following considerations into account. The exact process depends on the storage array and other environment specifics. You must create thorough documentation of the disaster recovery plan for successful recovery implementation. Riverbed recommends that you perform regular testing so that the information in the plan is maintained and up to date.

This sections includes the following topics:

“Data Center Failover” on page 124

“Branch Office Failover” on page 125

Data Center Failover

In the event that an entire data center experiences failure or a disaster, you can restore the Core operations assuming you have met the following prerequisites:

The disaster recovery site has the storage array replicated from the production site.

The network infrastructure is configured on the disaster recovery site similarly to the production site, enabling the Edges to communicate with Core.

Core and SteelHeads (or their virtual editions) at the disaster recovery site are installed, licensed, and configured similarly to the production site.

Ideally, the Core at the disaster recovery site is configured identically to the Core on the production site. You can import the configuration file from Core at the production site to ensure that you have configured both Cores the same way.

Unless the disaster recovery site is designed to be an exact replica of the production site, minor differences are inevitable: for example, the IP addresses of the Core, the storage array, and so on. Riverbed recommends that you regularly replicate the Core configuration file to the disaster recovery site and import it into the disaster recovery instance. You can script the necessary adjustments to the configuration to automate the configuration adoption process.


Disaster Recovery Scenarios Data Resilience and Security

Likewise, the configuration of SteelHeads in the disaster recovery site should reflect the latest changes to the configuration in the production site. All the relevant in-path rules must be maintained and kept up to date.

There are some limitations:

If you have different LUN IDs in the disaster recovery site than in the production site, you need to reconfigure the Core and all the Edges and deploy them as new. You must know which LUNs belong to which Edge and map them correspondingly. Riverbed recommends that you implement a naming convention.

Even if the data from the production storage array is replicated in synchronous mode, you can assume that there is already-committed data to the Storage Edge. The data has not been sent to the production storage, or the data has not been replicated to disaster recovery site yet. This means that a gap in data consistency can occur if, after the failover, the Edges immediately start writing to the disaster recovery Core. To prevent the data corruption, you need to configure all the LUNs at Edges as new. When you configure Storage Edges as new, this configuration empties out their blockstore, causing data loss of all the writes occurred after a disaster at the production site. To prevent data loss, Riverbed recommends that you configure FusionSync. For more information, see “SteelFusion Replication (FusionSync)” on page 99.

If you want data consistency on the application level, Riverbed recommends that you perform a rollback to one of the previous snapshots. For details, see “Best Practice for LUN Snapshot Rollback” on page 127.

Keep in mind that initially after the recovery, the blockstore on Storage Edges does not have any data in the cache.

Branch Office Failover

When the Edge in a branch becomes inaccessible from outside the branch office due to a network outage, the operation in the branch office might continue. SteelFusion products are designed with disconnected operations resiliency in mind. If your workflow enables branch office users to operate independently for a period of time (which is defined during the network planning stage and implemented with a correctly sized appliance), the branch office continues as operational and synchronizes with the data center later.

In the case when the branch office is completely lost, or it is imperative for the business to have a service in the branch office online sooner, you can choose to deploy the Edge in another branch or in the data center.

Riverbed recommends that if you chose to deploy a Storage Edge in the data center, that you remove the LUNs from Core as to not enable data corruption by multiple write access to the LUNs. Riverbed recommends that you roll back to a latest application-consistent snapshot. If mostly read access is required to the data projected to the branch office, a good alternative is to temporarily mount a snapshot to a local host. This snapshot enables the data to be accessible to the data center, while the branch office is operating in disconnected-operation mode. Avoiding the failover will also simplify fallback to the production site.

If you chose to deploy Storage Edge in another branch office, follow the steps in “Storage Edge Replacement” on page 123. You must understand that in this scenario, all the uncommitted writes at the branch are not stored. Riverbed recommends that you to roll back the LUNs to the latest application-consistent snapshot.

SteelFusion Appliance Failure—Failback

After a disaster is over, or a failure is fixed, you might need to revert the changes and move the data and computing resources to where they were located before the disaster, while ensuring that the data integrity is not compromised. This process is called failback. Unlike the failover process that can occur in a rush, you can thoroughly plan and test the failback process.


Data Resilience and Security Disaster Recovery Scenarios


“Data Center Failback” on page 126

“Branch Office Failback” on page 126

Data Center Failback

As SteelFusion relies on primary storage to keep the data intact, the Core failback can only follow a successful storage array replication from the disaster recovery site back to the production site. There are multiple ways to perform the recovery; however, Riverbed recommends that you use the following method. The process most likely requires a downtime, which you can schedule in advance. Riverbed also recommends that you create an application-consistent snapshot and backup prior to performing the following procedure. Perform these steps on one appliance at a time.

To perform the Core failback process

1. Shut down hosts and unmount LUNs.

2. Export the configuration file from the Core at the disaster recovery site.

3. From the Core, initiate taking the Storage Edge offline. This process forces the Storage Edge to replicate all the committed writes to the Core.

4. Remove iSCSI Initiator access from the LUN at the Core.

Data can not be written to the LUN until the data has become available after the failback completes.

5. Make sure that you replicate the LUN with the storage array from the disaster recovery site back to the production site.

6. On a storage array at the production site, make the replicated LUN the primary LUN.

Depending on a storage array, you might need to create a snapshot, clone, or promote the clone to a LUN—or all the above.

For more information, see the user guide for your storage array. The preferred method is the method that preserves the LUN ID, which might not work for all the arrays. If the LUN ID is going to change, you need to add the LUN as new on first the Core and then on the Storage Edge.

7. If you had to make changes on the disaster recovery site due to a LUN ID change, import the Core configuration file from the disaster recovery site and make the necessary adjustments to IP addresses and so on.

8. Add access to the LUN for Core. If the LUN ID remained the same, the Core at production site begins servicing the LUN instantaneously.

9. At the branch office, check to see if you need to change the Core IP address.

Branch Office Failback

The branch office failback process is similar to the Storage Edge replacement process. The procedure requires downtime that you can schedule in advance.

If the production LUNs were mapped to another Storage Edge, use the following procedure.


Best Practice for LUN Snapshot Rollback Data Resilience and Security

To perform the Storage Edge failback process

1. Shut down hosts and unmount LUNs.

2. Take the LUNs offline from the disaster recovery Storage Edge. This process forces the Storage Edge to replicate all the committed writes to Core.

3. If any changes were made to the LAN mapping configuration, you need to merge the changes during the fallback process. If you need assistance with this process, contact Riverbed Support.

4. Shut down the Storage Edge at the disaster recovery site.

5. Bring up the Storage Edge at the production site.

6. Follow the steps described in “Storage Edge Replacement” on page 123.

Keep in mind that after the fallback process is completed, the blockstore on Edges does not have any data in the cache.

If you took out the production LUNs of Core and used them locally in the data center, shut down hosts and unmount LUNs and then continue the setup process as described in the SteelFusion Core Installation and Configuration Guide.

Best Practice for LUN Snapshot Rollback

When single file restore is impossible or impractical, you can roll back the entire LUN snapshot on the storage array at the data center and projected out to the branch. Riverbed recommends the following procedure for a LUN snapshot rollback.

Note: A single file restore is to recover your deleted file from a backup or a snapshot without rolling back the entire file system to a point of time in which a file still existed in the file system. When you use the LUN rollback, everything that was written to (and deleted from) the file system is lost.

To roll back the LUN snapshot

1. Set the LUN offline at the server running at the Storage Edge.

2. Remove iSCSI Initiator access from the LUN at the Core.

3. Remove the LUN from the Core.

4. Restore the LUN on the storage array from a snapshot.

5. Add the LUN to the Core.

6. Add iSCSI Initiator access for the LUN at Core.

You can now access the LUN snapshot from a server on the Storage Edge.

Keep in mind that after this process is completed, the blockstore on Storage Edges does not have any data in the cache.


Data Resilience and Security Using CHAP to Secure iSCSI Connectivity

Using CHAP to Secure iSCSI Connectivity

Challenge-Handshake Authentication Protocol (CHAP) is a convenient and well-known security mechanism that can be used with iSCSI configurations. This section provides an overview with an example configuration. It contains the following topics:

“One-Way CHAP” on page 128

“Mutual CHAP” on page 129

Both types of CHAP are supported on Core and Storage Edge.

For more details about configuring CHAP on either Core or Storage Edge, see the corresponding Management Console user’s guide.

Within an iSCSI deployment both initiator and target have their own passwords. In CHAP terminology these are called secrets. These passwords are shared between initiator and target in order for them to authenticate with each other.

One-Way CHAP

With one-way CHAP, the iSCSI target (server) authenticates the iSCSI initiator (client).

This process is analogous to logging in to a Web site. The Initiator needs to provide a username and secret when logging in to the target. The username is usually the IQN (but can be any free-form string) and the password is the target secret.

To configure one-way CHAP in a Core deployment

1. Configure a target secret on the backend storage array portal.

2. Log in to the Core Management Console.

3. Add a CHAP User on the Core.

The username is something descriptive or even the IQN of the Core. For example, username=cuser2. The password is the target secret configured on the backend array.

4. Select the CHAP User (Figure 9-1).

When the iSCSI initiator on the Core connects to the backend storage array, it uses the credentials from the CHAP user that was created.

Figure 9-1. iSCSC Portal Configuration for One-Way CHAP

CHAP credentials are created and stored separately. They are then used when the Core initiates an iSCSI session and logs in to the storage array portal.


Using CHAP to Secure iSCSI Connectivity Data Resilience and Security

Mutual CHAP

The difference between one-way CHAP and mutual CHAP is that the iSCSI target authenticates the iSCSI initiator and additionally the iSCSI initiator also authenticates the iSCSI target.

Mutual CHAP incorporates two separate sequences. The first sequence is the iSCSI target authenticating the iSCSI initiator and is the exact same procedure as for one-way CHAP. The second sequence is the initiator authenticating the target, which is the reverse of the previous authentication procedure.

To configure mutual CHAP in a Core deployment

1. Configure an initiator CHAP User on the Core Management Console.

For example: username=cuser1 and password=abcd1234

2. Select the Enable Mutual CHAP Authentication setting on the Core and chooses cuser1 from the drop-down menu (Figure 9-2).

The Core now requires all iSCSI targets to specify the password (or secret) abcd1234 before the target is trusted by the Core.

Figure 9-2. iSCSI Initiator Configuration for Mutual CHAP

3. On the backend storage array, add the CHAP user details from the Core.

In this example, the storage array CHAP user has username=cuser1 and password=abcd1234.

The target now knows the secret (username and password) of the initiator.

4. On the backend storage array, configure a target CHAP user.

For example: username=cuser2 and password=wxyz5678

5. Log in to the Core Management Console and add the target CHAP User on the Core.

In this example: username= cuser2 and password= wxyz5678

Mutual CHAP configuration is now complete.

When adding the portal of the backend storage array to the Core configuration, select the target CHAP user (cuser2).

When the iSCSI initiator of the Core connects to the iSCSI target of the backend storage array, it uses the credentials from the CHAP user (cuser2) that you created.

Because of mutual CHAP, the iSCSI target uses the credentials cuser1/abcd1234 to connect to the iSCSI initiator of the Core.


Data Resilience and Security At-Rest and In-Flight Data Security

At-Rest and In-Flight Data Security

For organizations that require high levels of security or face stringent compliance requirements, Edge provides data at-rest and in-flight encryption capabilities for the data blocks written on the blockstore cache. This section includes the following topics:

“Enable Data At-Rest Blockstore Encryption” on page 130

“Enable Data In-Flight Secure Peering Encryption” on page 132

Supported encryption standards include AES-128, AES-192, and AES-256. The keys are maintained in an encrypted secure vault. In 2003, the United States government declared a review of the three algorithm key lengths to see if they were sufficient for protection of classified information up to the secret level. Top secret information requires 192-bit or 256-bit keys.

The vault is encrypted by AES with a 256-bit key and a 16-byte cipher, and you must unlock it before the blockstore is available. The secure vault password is verified upon every power up of the appliance, assuring that the data is confidential in case the Edge is lost or stolen.

Initially, the secure vault has a default password known only to the RiOS software so the Edge can automatically unlock the vault during system startup. You can change the password so that the Edge does not automatically unlock the secure vault during system startup and the blockstore is not available until you enter the password.

When the system boots, the contents of the vault are read into memory, decrypted, and mounted (through EncFS, a FUSE-based cryptographic file system). Because this information is only in memory, when an appliance is rebooted or powered off, the information is no longer available and the in-memory object disappears. Decrypted vault contents are never persisted on disk storage.

Riverbed recommends that you keep your secure vault password safe. Your private keys cannot be compromised, so there is no password recovery. In the event of a lost password, you can reset the secure vault only after erasing all the information within the secure vault.

To reset a lost password

From either Edge appliance, enter the following CLI commands:

> enable# config term(conf)# secure-vault clear

When you use the secure-vault clear command, you lose the data in the blockstore if it was encrypted. You then need to reload or regenerate the certificates and private keys.

Note: The Edge blockstore encryption is the same mechanism that is used in the RiOS data store encryption. For more information, see the security information in the SteelHead Deployment Guide.

Configuring data encryption requires extra CPU resources and might affect performance, hence Riverbed recommends you enable blockstore encryption only if you require a high level of security or dictated by compliance requirements.

Enable Data At-Rest Blockstore Encryption

The following example shows how to configure blockstore encryption on a Storage Edge. The commands are entered on the Core at the data center.


At-Rest and In-Flight Data Security Data Resilience and Security

To configure blockstore encryption on the Storage Edge

1. From the Core, enter the following commands:

> enable# configure(config) # edge id <edge-identifier> blockstore enc-type <AES_128 | AES_192 | AES_256 | NONE>

2. To verify whether encryption has been enabled on the Storage Edge, enter the following commands:

> enable# show edge id <edge-identifier> blockstore Write Reserve : 10%Encryption type : AES_256

You can do the same procedure in the Core Management Console by choosing Configure > Manage: SteelFusion Edges.

Figure 9-3. Adding Blockstore Encryption


Data Resilience and Security Clearing the Blockstore Contents

To verify whether encryption is enabled on your Edge device, look at the Blockstore Encryption field on your Edge status window as shown in Figure 9-4.

Figure 9-4. Verify Blockstore Encryption

Enable Data In-Flight Secure Peering Encryption

SteelFusion Rdisk protocol operates on clear text and there is a possibility that remote branch data can be exposed to hackers during transfer over the WAN. To counter this, the Edge provides data in-flight encryption capabilities when the data blocks are asynchronously propagated to the data center LUN.

You can use secure peering between the Storage Edge and the data center SteelHead to create a secure SSL channel and protect the data in-flight over the WAN. For more information about security and SSL, see the SteelHead Deployment Guide and the SteelHead Deployment Guide - Protocols.

Clearing the Blockstore Contents

Under normal conditions, if you select Offline on the Core for a particular LUN, the contents of the blockstore on the corresponding Storage Edge is synchronized and then cleared.


Additional Security Best Practices Data Resilience and Security

However, there can be a situation in which it is necessary to make sure the entire contents of the blockstore on a Storage Edge is erased to a military grade level. While you can achieve this level of deletion, it involves the use of commands not normally available for general use. To ensure the correct procedures are followed, please open a support case with Riverbed Support.

Additional Security Best Practices

For additional advice and guidance on appliance security in general, see the SteelHead Deployment Guide. The guide includes suggestions on restricting Web-based access, the use of role-based accounts, creation of login banners, alarm settings, and so on, which you can apply in principle to SteelFusion Edge appliances just as you can for the SteelHead EX appliances.


Data Resilience and Security Additional Security Best Practices


CHAPTER 10 SteelFusion Appliance Upgrade

This chapter provides some general guidance when upgrading your SteelFusion appliances. It includes the following topics:

“Planning Software Upgrades” on page 135

“Upgrade Sequence” on page 136

“Minimize Risk During Upgrading” on page 137

“Performing the Upgrade” on page 137

Planning Software Upgrades

Before you perform a software upgrade to a SteelFusion deployment, there are a few steps to consider. This section describes best practices that you can incorporate into your own upgrade and change control procedures.

For detailed instructions and guidance on upgrading each of the products, see the SteelHead EX Installation and Configuration Guide, the SteelFusion Edge Installation and Configuration Guide, and the SteelFusion Core Installation and Configuration Guide.

Prior to upgrading, complete the following prerequisites:

Alert users - Depending on your deployment you might have a full-HA SteelFusion configuration at the data center and at the branch office. This configuration allows you to perform software upgrades with minimal or zero disruption to your production environment. Whether this is your case or not, Riverbed recommends that you schedule either downtime or an at risk period so that your users are aware of any possible interruption to service.

Alert IT staff - Because you might also be using your Edge appliances (including SteelHead EX) simultaneously for WAN optimization services, Riverbed recommends that you alert other IT departments within your organization: for example, networking, monitoring, applications, and so on.

Software - Gather all the relevant software images from the Riverbed Support site and consider using the SCC to store the images and assist with the upgrade.

When downloading the software images, make sure to download the release notes so that you are aware of any warnings, special instructions, or known problems that can affect your upgrade plans.


SteelFusion Appliance Upgrade Upgrade Sequence

Configuration data - Ensure all your Cores and Storage Edges have their current running configurations saved to a suitable location external to the appliances themselves. You can use the SCC to assist with this task with the Core and BlockStream-enabled SteelHead EX.

Upgrade Sequence

If you are planning to upgrade both Core and Storage Edge as part of the same procedure, then the normal sequence—in which there is no HA configuration at the Core—is to upgrade the Edge appliances and then upgrade the Cores.

Note: If you are only upgrading Core or Storage Edge, but not both, this section does not apply.

If there is HA at the Storage Edge and no HA at the Core, the sequence is the same—Storage Edge first followed by Core with standby Storage Edges preceding active Storage Edge upgrades.

However, if there is HA at the Core, regardless of whether or not there is HA in the branch office with Storage Edges, upgrade the Core first, followed by the Storage Edge.

The following table summarizes the sequence.

If you have an HA deployment, it is possible you can have mixed software version between HA peers for a short period of time. You can also run with mismatched software versions between Core and Storage Edge for short periods of time; however, Riverbed does not recommend this practice.

If there are any doubts about these procedures, contact Riverbed Support.

DeploymentUpgrade Phases

First Second

Core - Storage Edge All Edges owned by the Core Core

Core HA - Storage Edge Core HA All Storage Edges owned by the Core HA

Core - Storage Edge HA All Storage Edges are owned by the Core. Upgrade the standby Storage Edge first, and wait for it to be synchronized with the active Storage Edge. Next, upgrade the active Storage Edge.

Core

Core HA - Storage Edge HA Core HA All Storage Edges are owned by the Core HA. Upgrade standby Storage Edges before upgrading active Storage Edges.


Minimize Risk During Upgrading SteelFusion Appliance Upgrade

Minimize Risk During Upgrading

Although it is expected that the upgrade procedure will progress and complete without any problems, Riverbed recommends that you have a contingency plan to back out or restore the previous state of operations.

Both Core and Storage Edge upgrade procedures automatically install the new software image into a backup partition on the appliance. The existing (running) software image is in a separate (active) partition. During the reboot, which is part of the upgrade procedure, the roles of the backup and active partitions are reversed. This ensures that if you require a downgrade to restore the previous software version, a partition swap and reboot are all that should be required.

If you have a lab or development environment in which some nonproduction SteelFusion appliances are in use, consider doing a trial upgrade. This ensures you have some exposure to the upgrade processes, enables you to measure the time taken to perform the tasks, and gain other useful experience. You can complete the trial upgrade well ahead of the production upgrade to confirm the new version of software operates as expected.

Performing the Upgrade

This section describes the tasks involved to upgrade your SteelFusion appliances. It contains the following sections:

“Storage Edge Upgrade” on page 137

“Core Upgrade” on page 138

Once you are ready (and if there is no HA configuration for the Core) start by upgrading the Edge appliances first. After these appliances are successfully upgraded, proceed to upgrade the Cores.

If you have Core deployed in an HA deployment, then upgrade the Cores first, followed by the Edge appliances.

For the proper sequence, see “Upgrade Sequence” on page 136.

Storage Edge Upgrade

Depending on your hardware platform, Storage Edge software and functionality is either incorporated into the BlockStream-enabled SteelHead EX software image or into the SteelFusion Edge appliance software image. When performing the upgrade there is a reboot of the appliance, which causes an interruption or degradation of service both to Edge and WAN optimization (if there is no HA).

While you do not need to disconnect the Edge from the Core, Riverbed recommends that you stop all read and write operations for any VSP-hosted services and any external application servers that are using the Edge for storage connectivity. Preferably, shut down the servers, and in the case of VSP, place the ESXi instance into maintenance mode.

In the case of Storage Edge HA deployments, upgrade one of the Storage Edge peers first, leaving the other Storage Edge in a normal operating state. During the upgrade process the surviving Storage Edge enters a degraded state. This is expected behavior. After the upgrade of the first Edge in the HA configuration is complete, check that the two Storage Edge HA peers rediscover each other before proceeding with the upgrade of the second Storage Edge.


SteelFusion Appliance Upgrade Performing the Upgrade

Core Upgrade

Before upgrading the Core, Riverbed strongly recommends that you ensure that any data written by Storage Edge to LUNs projected by the Core is synchronized to the LUNs in the data center storage array. In addition, take a snapshot of any LUNs prior to the upgrade.

If a Core is part of an HA configuration with a second Core, then you must upgrade both before the Storage Edges that the Cores are responsible for are also upgraded. You can choose which Core you begin with because the HA configuration is active-active. In either case, the upgrade triggers a failover when the first Core is rebooted with the new software version, followed by a failback after the reboot is complete.

The same process occurs with the second Core. Therefore, during the Core HA upgrade there are two separate instances of failover followed by failback. Whichever Core is upgraded first, continue to upgrade the second Core of the HA pair before upgrading the Storage Edges.

When upgrading a Core that is not part of an HA configuration, there is an interruption to service for the projected LUNs to the Edges. You do not need to disconnect the Edge appliances from the Core, nor do you need to unmap any required LUNs managed by the Core from the storage array.

When upgrading a Core that is part of an HA configuration, the peer Core appliance triggers an HA failover event. This is expected behavior. After the upgrade of the first Core is complete, check to ensure that the two Core HA peers have rediscovered each other and that both are in ActiveSelf state before upgrading the second Core.


CHAPTER 11 Network Quality of Service

SteelFusion technology enables remote branch offices to use storage provisioned at the data center through unreliable, low-bandwidth and high-latency WAN links. Adding this new type of traffic to the WAN links creates new considerations in terms of guaranteeing quality of service (QoS) to existing WAN applications and to the SteelFusion Rdisk protocol to function at the best possible level.

This chapter contains the following topics:

“Rdisk Protocol Overview” on page 139

“QoS for SteelFusion Replication Traffic” on page 141

“QoS for LUNs” on page 141

“QoS for Branch Offices” on page 141

“Time-Based QoS Rules Example” on page 142

For general information about QoS, see the SteelHead Deployment Guide.

Rdisk Protocol Overview

To understand the QoS requirements for the SteelFusion Rdisk protocol, you must understand how it works. The Rdisk protocol defines how the Edge and Cores communicate and how they transfer data blocks over the WAN. Rdisk uses five TCP ports for data transfers and one TCP port for management.


Network Quality of Service Rdisk Protocol Overview

The following table summarizes the TCP ports used by the Rdisk protocol. It maps the different Rdisk operations to each TCP port.

Note: Rdisk Protocol creates five TCP connections per exported LUN.

Different Rdisk operations use different TCP ports. The following table summarizes the Rdisk QoS requirements for each Rdisk operation and its respective TCP port.

For more information about Rdisk, see “Rdisk Traffic Routing Options” on page 146

TCP Port Operation Description

7970 Management Manages information exchange between Edge and Core. The majority of the data flows from the Core to the Edge.

7950 Read Transfers data requests for data-blocks absent in Edge from the data center. The majority of the data flows from the Edge to the Core.

7951 Write Transfers new data created at the Edge to the data center and snapshot operations. The majority of the data flows from the Edge to the Core.

7952 Prefetch0 Prefetches data for which SteelFusion has highest confidence (for example, file Read Ahead). The majority of the data flows from the Core to the Edge.

7953 Prefetch1 Prefetches data for which SteelFusion has medium confidence (for example, Boot). The majority of the data flows from the Core to the Edge.

7954 Prefetch2 Prefetches data for which SteelFusion has lowest confidence (for example, Prepop). The majority of the data flows from the Core to the Edge.

TCP Port Operation Outgoing Branch Office Bandwidth

Outgoing Branch Office Priority

Outgoing Data Center Bandwidth

Outgoing Data Center Priority

7970 Management Low Normal Low Normal

7950 Read Low Business critical High Business critical

7951 Write High (off-peak hours)

Low (during peak hours)

Low priority Low Normal

7952 Prefetch0 Low Business critical High Business critical

7953 Prefetch1 Low Business critical Medium Business critical

7954 Prefetch2 Low Business critical High Best effort


QoS for SteelFusion Replication Traffic Network Quality of Service

QoS for SteelFusion Replication Traffic

To prevent SteelFusion replication traffic from consuming bandwidth required for other applications during business hours, Riverbed recommends that you allow more bandwidth for Rdisk write traffic (port 7951) during the off-peak hours and less bandwidth during the peak hours. Also consider carefully your recovery point objectives (RPO) and recovering time objectives (RTO) when configuring QoS for Rdisk SteelFusion traffic.

Depending on which SteelFusion features you use, you might need to consider different priorities and bandwidth requirements.

QoS for LUNs


“QoS for Unpinned LUNs” on page 141

“QoS for Pinned LUNs” on page 141

For more information about pinned LUNs, see “Pin the LUN and Prepopulate the Blockstore” on page 144 and “When to PIN and Prepopulate the LUN” on page 156.

QoS for Unpinned LUNs

In a unpinned LUNs scenario, Riverbed recommends that you prioritize traffic on port 7950 so that the SCSI read requests for data blocks not present on the Edge blockstore cache can arrive from the data center LUN in a timely manner. Riverbed also recommends that you prioritize traffic on ports 7952, 7953, and 7954 so that the prefetch data can arrive at the blockstore when needed.

QoS for Pinned LUNs

In a pinned, prepopulated LUN scenario, all the data is present at the Edge. Riverbed recommends that you prioritize only port 7951 so that the Rdisk protocol can transfer newly written data blocks from the Edge blockstore to the data center LUN through Core.

QoS for Branch Offices


“QoS for Branch Offices That Mainly Read Data from the Data Center” on page 142

“QoS for Branch Offices Booting Virtual Machines from the Data Center” on page 142


Network Quality of Service Time-Based QoS Rules Example

QoS for Branch Offices That Mainly Read Data from the Data Center

In the case of branch office users who are not producing new data but instead are mainly reading data from the data center and the LUNs are not pinned, Riverbed recommends that you prioritize traffic on port 7950 and 7952 so that the iSCSI read requests for data blocks not present on the Storage Edge blockstore cache can arrive from the data center LUN in a timely manner.

QoS for Branch Offices Booting Virtual Machines from the Data Center

In the case of branch office users who are booting virtual machines from the data center and the LUNs are not pinned, Riverbed recommends that port 7950 is the top priority for nonpinned LUN and that you prioritize traffic on port 7953 so that boot data is prefetched on this port in a timely manner.

Time-Based QoS Rules Example

This example illustrates how to configure time-based QoS rules on a SteelHead.

You want to create two recurring jobs, each undoing the other, using the standard job CLI command. One sets the daytime cap on throughput or a low minimum guarantee, and the other then removes that cap or sets a higher minimum guarantee.

steelhead (config) # job 1 date-time hh:mm:ss year/month/day "Start time"steelhead (config) # job 1 recurring 864000 "Occurs once a day" steelhead (config) # job 1 command 1 <command>steelhead (config) # job 1 command 2 <command2> "Commands to set daytime cap"steelhead (config) # job 1 enable

steelhead (config) # job 2 date-time hh:mm:ss year/month/day "Start time"steelhead (config) # job 2 recurring 864000 "Occurs once a day" steelhead (config) # job 2 command 1 <command>steelhead (config) # job 2 command 2 <command2> "Commands to remove daytime cap"steelhead (config) # job 2 enable


CHAPTER 12 Deployment Best Practices

Every deployment of the SteelFusion product family differs due to variations in specific customer needs and types and sizes of IT infrastructure.

The following recommendations and best practices are intended to guide you to achieving optimal performance while reducing configuration and maintenance requirements. However, these guidelines are general; for detailed worksheets for proper sizing, contact your Riverbed account team.


“Storage Edge Best Practices” on page 143

“Core Best Practices” on page 155

“iSCSI Initiators Timeouts” on page 157

“Operating System Patching” on page 158

Storage Edge Best Practices

This section describes best practices for deploying Storage Edge. It includes the following topics:

“Segregate Traffic” on page 144

“Pin the LUN and Prepopulate the Blockstore” on page 144

“Segregate Data onto Multiple LUNs” on page 144

“Ports and Type of Traffic” on page 145

“Changing IP Addresses on the Storage Edge, ESXi Host, and Servers” on page 145

“Disk Management” on page 146

“Rdisk Traffic Routing Options” on page 146

“Deploying SteelFusion with Third-Party Traffic Optimization” on page 147

“Windows and ESX Server Storage Layout—SteelFusion-Protected LUNs Vs. Local LUNs” on page 148

“VMFS Datastores Deployment on SteelFusion LUNs” on page 152

“Enable Windows Persistent Bindings for Mounted iSCSI LUNs” on page 152

“Set Up Memory Reservation for VMs Running on VMware ESXi in the VSP” on page 153


Deployment Best Practices Storage Edge Best Practices

“Boot from an Unpinned iSCSI LUN” on page 154

“Running Antivirus Software” on page 154

“Running Disk Defragmentation Software” on page 154

“Running Backup Software” on page 154

“Configure Jumbo Frames” on page 155

Segregate Traffic

At the remote branch office, Riverbed recommends that you separate storage iSCSI traffic and WAN/Rdisk traffic from LAN traffic. This practice helps to increase overall security, minimize congestion, minimize latency, and simplify the overall configuration of your storage infrastructure.

Pin the LUN and Prepopulate the Blockstore

In specific circumstances, Riverbed recommends that you pin the LUN and prepopulate the blockstore. Additionally, you can have the write-reserve space resized accordingly; by default, the Storage Edge has a write-reserve space that is 10 percent of the blockstore.

To resize the write-reserve space, contact your Riverbed representative.

Riverbed recommends that you pin the LUN in the following circumstances:

Unoptimized file systems - Core supports intelligent prefetch optimization on NFTS and VMFS file systems. For unoptimized file systems such as FAT, FAT32, ext3, and others. Core cannot perform optimization techniques such as prediction and prefetch in the same way as it does for NTFS and VMFS. For best results, Riverbed recommends that you pin the LUN and prepopulate the blockstore.

Database applications - If the LUN contains database applications that use raw disk file formats or proprietary file systems, Riverbed recommends that you pin the LUN and prepopulate the blockstore.

WAN outages are likely or common - Ordinary operation of SteelFusion depends on WAN connectivity between the branch office and the data center. If WAN outages are likely or common, Riverbed recommends that you pin the LUN and prepopulate the blockstore.

Segregate Data onto Multiple LUNs

Riverbed recommends that you separate storage into three LUNs, as follows:

Operating system - In case of recovery, the operating system LUN can be quickly restored from the Windows installation disk or ESX datastore, depending on the type of server used in the deployment.

Production data - The production data LUN is hosted on the Storage Edge and therefore safely backed up at the data center.

Swap space - Data on the swap space LUN is transient and therefore not required in disaster recovery. Riverbed recommends that you use this LUN as a Storage Edge local LUN.


Storage Edge Best Practices Deployment Best Practices

Ports and Type of Traffic

You should only allow iSCSI traffic on primary and auxiliary interfaces. Riverbed does not recommend that you configure your external iSCSI Initiators to use the IP address configured on the in-path interface. Some appliance models can optionally support an additional NIC to provide extra network interfaces. You can also configure these interfaces to provide iSCSI connectivity.

Changing IP Addresses on the Storage Edge, ESXi Host, and Servers

When you have a Storage Edge and ESXi running on the same converged platform, you must change IP addresses in a specific order to keep the task simple and fast. You can use this procedure when staging Edges in the data center or moving them from one site to another.

This procedure assumes that the Edges are configured with IP addresses in a staged or production environment. You must test and verify all ESXi, servers, and interfaces before making these changes.

To change the IP addresses on the Storage Edge, ESXi host, and servers

1. Starting with the Windows server, use your vSphere client to connect to the console, login and change it to DHCP or the new destination IP address, and finally shut down the Windows server from the console.

2. Use virtual network computing (VNC) client to connect to the ESXi console, change the IP to the new destination IP address, and shut ESXi down from the console.

If you did not configure VNC during the ESXi installation wizard, you may also use vSphere Client and change it from Configuration > Networking > rvbd_vswitch_pri > Properties.

Some devices perform better with TightVNC versus RealVNC.

3. On either Edge appliance Management Console, choose Networking > Networking: In-Path Interfaces, and then change the IP address for inpath0_0 to the new destination IP address.

4. Use the included console cable to connect to the console port on the back of the Edge appliance and log in as the administrator.

5. Enter the following command to change the IP address to your new destination IP address.

enableconfig terminalinterface primary ip address 1.7.7.7 /24ip default-gateway 1.7.7.1write memory

6. Enter the following command to shut down the appliance:

reload halt

7. Move the Edge appliance to the new location.

8. Start your Windows server at the new location and open the iSCSI Initiator.

Select the Discovery tab and remove the old portal.

Click OK.

Open the tab again and select Discover Portal.



Add the new Edge appliance primary IP address.

This process brings the original data disk to functioning.

Disk Management

Disk management configuration is different for the BlockStream-enabled SteelHead EX versus the SteelFusion Edge appliance.

You can partition the disk space in the SteelHead EX in different ways based on how you want to use the appliance and which license you purchased. VSP and SteelFusion Storage Mode is the default disk layout configured on the SteelHead EX during the manufacturing process. This mode evenly divides the disk space between VSP and SteelFusion functionalities.

However, if you plan on using the storage delivery capabilities (Storage Edge feature) of the SteelHead EX, Riverbed recommends that you select the SteelFusion Storage Mode disk layout. In the SteelFusion storage mode, most of the disk space is dedicated to Storage Edge blockstore cache, while leaving the required amount for VSP and WAN optimization functionalities. VSP can then use SteelFusion-delivered storage for its datastores—instead of local unprotected storage. This mode allows you to centralize your data center storage for both the operating system and the production data drives of the virtual servers running at the remote branch.

On SteelHead EX v2.1.0 and later you can use the Extended VSP and SteelFusion Storage Mode. This mode reclaims disk space that is reserved for updating legacy RSP virtual machines to ESXi format.

The extended VSP stand-alone storage mode and the legacy VSP stand-alone storage mode are designed for SteelHead EX appliances that do not have the SteelFusion feature.

Use the extended VSP stand-alone storage mode in cases in which you do not want to consolidate the operating system drive of the virtual servers into the data center storage, but instead wants to keep it locally on the SteelHead EX.

For SteelFusion Edge appliances you can specify the size of the local LUN during the hypervisor installation. The installation wizard allows more flexible disk partitioning in which you can use a percentage of the exact amount in gigabytes that you want to use for the local LUN. The rest of the disk space is allocated to Storage Edge blockstore. Riverbed recommends that you run the hypervisor installer before connecting the SteelFusion Edge appliance to the Core to set up local storage. That streamlines the ESXi configuration. If local storage is configured during the hypervisor installation, all LUNs provisioned by the Core to the SteelFusion Edge is automatically made available to ESXi of the SteelFusion Edge.

For more information on disk management, see “Configuring Disk Management” on page 61.

Rdisk Traffic Routing Options

You can route Rdisk traffic out of the primary or the in-path interfaces. This section contains the following topics:

“In-Path Interface” on page 147

“Primary Interface” on page 147

For more information about Rdisk, see “Network Quality of Service” on page 139. For information about WAN redundancy, see “Configuring WAN Redundancy” on page 96.



In-Path Interface

Riverbed recommends that you select the in-path interface when you deploy the SteelFusion Edge W0 appliance or the BlockStream-enabled SteelHead EX (also know as Granite-only mode). When you configure SteelHead EX to use the in-path interface, the Rdisk traffic is intercepted, optimized, and sent directly out of the WAN interface toward the Core deployed at the data center.

Use this option during proof of concepts (POC) installations or if the primary interface is dedicated to management.

The drawback of this mode is the lack of redundancy in the event of WAN interface failure. In this configuration, only the WAN interface needs to be connected. Disable link state propagation.

Primary Interface

Riverbed recommends that you select the primary interface when you deploy the SteelFusion Edge W1-W3 appliance or the BlockStream-enabled SteelHead EX. When you configure SteelHead EX to use the primary interface, the Rdisk traffic is sent unoptimized out of the primary interface to a switch or a router that in turn redirects the traffic back into the LAN interface of the SteelHead EX to get optimized. The traffic is then sent out of the WAN interface toward the Core deployed at the data center.

This configuration offers more redundancy because you can have both in-path interfaces connected to different switches.

Deploying SteelFusion with Third-Party Traffic Optimization

The Storage Edges and Cores communicate with each other and transfer data-blocks over the WAN using six different TCP port numbers: 7950, 7951, 7952, 7953, 7954, and 7970.



Figure 12-1 shows a deployment in which the remote branch and data center third-party optimization appliances are configured through WCCP. You can optionally configure WCCP redirect lists on the router to redirect traffic belonging to the six different TCP ports of SteelFusion to the SteelHeads. Configure a fixed-target rule for the six different TCP ports of SteelFusion to the in-path interface of the data center SteelHead.

Figure 12-1. SteelFusion Behind a Third-Party Deployment Scenario

Windows and ESX Server Storage Layout—SteelFusion-Protected LUNs Vs. Local LUNs

This section describes different LUNs and storage layouts. It includes the following topics:

“Physical Windows Server Storage Layout” on page 150

“Virtualized Windows Server on SteelHead EX and SteelFusion Storage Layout” on page 150

“Virtualized Windows Server on ESX Infrastructure with Production Data LUN on ESX Datastore Storage Layout” on page 151

Note: SteelFusion-Protected LUNs are also known as iSCSI LUNs. This section refers to iSCSI LUNs as SteelFusion-Protected LUNs.

Transient and temporary server data is not required in the case of disaster recovery and therefore does not need to be replicated back to the data center. For this reason, Riverbed recommends that you separate transient and temporary data from the production data by implementing a layout that separates the two into multiple LUNs.



In general, Riverbed recommends that you plan to configure one LUN for the operating system, one LUN for the production data, and one LUN for the temporary swap or paging space. Configuring LUNs in this manner greatly enhances data protection and operations recovery in case of a disaster. This extra configuration also facilitates migration to server virtualization if you are using physical servers.

For more information about disaster recovery, see “Data Resilience and Security” on page 121.

In order to achieve these goals SteelFusion implements two types of LUNs: SteelFusion-Protected (iSCSI) LUNs and local LUNs. You can add LUNs by choosing Configure > Manage: LUNs.

Use SteelFusion-Protected LUNs to store production data. They share the space of the blockstore cache. The data is continuously replicated and kept in sync with the associated LUN back at the data center. The Storage Edge cache only keeps the working set of data blocks for these LUNs. The remaining data is kept at the data center and predictably retrieved at the edge when needed. During WAN outages, edge servers are not guaranteed to operate and function at 100 percent because some of the data that is needed can be at the data center and not locally present in the Storage Edge blockstore cache.

One particular type of SteelFusion-Protected LUN is the pinned LUN. Pinned LUNs are used to store production data but they use dedicated space in the Storage Edge. The space required and dedicated in the blockstore cache is equal to the size of the LUN provisioned at the data center. The pinned LUN enables the edge servers to continue to operate and function during WAN outages because 100 percent of data is kept in blockstore cache. Like regular SteelFusion LUNs the data is replicated and kept in sync with the associated LUN at the data center.

For more information about pinned LUNs, see “When to PIN and Prepopulate the LUN” on page 156.

Use local LUNs to store transient and temporary data. Local LUNs also use dedicated space in the blockstore cache. The data is never replicated back to the data center because it is not required in the case of disaster recovery.



Physical Windows Server Storage Layout

When deploying a physical Windows server, Riverbed recommends that you separate its storage into three different LUNs: the operating system and swap space (or page file) can reside in two partitions on the server internal hard drive (or two separate drives), while production data should reside on the SteelFusion-Protected LUN (Figure 12-2).

Figure 12-2. Physical Server Layout

This layout facilitates future server virtualization and service recovery in the case of hardware failure at the remote branch. The production data is hosted on a SteelFusion-Protected LUN, which is safely stored and backed up at the data center. In case of a disaster, you can stream this data with little notice to a newly deployed Windows server without having to restore the entire dataset from backup.

Virtualized Windows Server on SteelHead EX and SteelFusion Storage Layout

When you deploy a virtual Windows server into the VSP SteelHead EX infrastructure, Riverbed recommends that you separate its storage in three different LUNs (Figure 12-3) as follows:

You can virtualize the operating system disk (OS LUN) on a VMDK file and hosted on the SteelFusion-Protected LUN, allowing for data center backup and instant recovery in the event of SteelHead EX hardware failure.

You can store Swap and vSwap space containing transient data on to local LUNs because this data does not need to be recovered after a disaster.



Production data continues to reside on a SteelFusion-Protected LUN, allowing for data center backup and instant recovery in the event of SteelHead EX hardware failure.

Figure 12-3. Virtual Server Layout 1

Virtualized Windows Server on ESX Infrastructure with Production Data LUN on ESX Datastore Storage Layout

When you deploy a virtual Windows server into an ESX infrastructure, you can also store the production data on an ESX datastore mapped to a SteelFusion-Protected LUN (Figure 12-4). This deployment facilitates service recovery in the event of hardware failure at the remote branch because SteelFusion appliances optimize not only LUNs formatted directly with NTFS file system but also optimize LUNs that are first virtualized with VMFS and are later formatted with NTFS.

Figure 12-4. Virtual Server Layout 2



VMFS Datastores Deployment on SteelFusion LUNs

When you deploy VMFS datastores on SteelFusion-Protected LUNs, for best performance, Riverbed recommends that you choose the Thick Provision Lazy Zeroed disk format (VMware default). Because of the way we use blockstore in the Storage Edge, this disk format is the most efficient option.

Thin provisioning is when you assign a LUN to be used by a device (in this case a VMFS datastore for an ESXi server, host) and you tell the host how big the LUN is (for example, 10 GB). However, as an administrator you can choose to pretend that the LUN is 10 GB, and only assign the host 2 GB. This fake number is useful if you know that the host needs only 2 GB to begin with. As time goes by (days or months), the host starts to write more data and needs more space, the storage array automatically grows the LUN until eventually it really is 10 GB in size.

Thick provisioning means there is no pretending. You allocate all 10 GB from the beginning whether the host needs it from day one or not.

Whether you choose thick or thin provisioning, you need to initialize (format) the LUN like any other new disk. The formatting is essentially a process of writing a pattern to the disk sectors (in this case zeros). You cannot write to a disk before you format it. Normally, you have to wait for the entire disk to be formatted before you can use it—for large disks, this can take hours. Lazy Zeroed means the process works away slowly in the background and as soon as the first few sectors have been formatted the host can start using it. This means the host does not have to wait until the entire disk (LUN) is formatted.

Enable Windows Persistent Bindings for Mounted iSCSI LUNs

Riverbed recommends that you make iSCSI LUNs persistent across Windows server reboots; otherwise, you must manually reconnect them. To configure Windows servers to automatically connect to the iSCSI LUNs after system reboots, select the Add this connection to the list of Favorite Targets check box (Figure 12-5) when you connect to the Storage Edge iSCSI target.

Figure 12-5. Favorite Targets

To make iSCSI LUNs persistent and ensure that Windows does not consider the iSCSI service fully started until connections are restored to all the SteelFusion volumes on the binding list, remember to add the Edge iSCSI target to the binding list of the iSCSI service. This addition is important particularly if you have data on an iSCSI LUN that other services depend on: for example, a Windows file server that is using the iSCSI LUN as a share.



The best option to do this is to select the Volumes and Devices tab from the iSCSI Initiator's control panel and click Auto Configure (Figure 12-6). This binds all available iSCSI targets to the iSCSI startup process. If you want to choose individual targets to bind, click Add. To add individual targets, you must know the target drive letter or mount point.

Figure 12-6. Target Binding

Set Up Memory Reservation for VMs Running on VMware ESXi in the VSP

By default, VMware ESXi dynamically tries to reclaim unused memory from guest virtual machines, while the Windows operating system uses free memory to perform caching and avoid swapping to disk.

To significantly improve performance of Windows virtual machines, Riverbed recommends that you configure memory reservation to the highest possible value of the ESXi memory available to the VM. This advice applies whether the VMs are hosted within the VSP of the SteelHead EX or on an external ESXi server in the branch that is using LUNs from SteelFusion.

Setting the memory reservation to the configured size of the virtual machine results in a per virtual machine vmkernel swap file of zero bytes, which consumes less storage and helps to increase performance by eliminating ESXi host-level swapping. The guest operating system within the virtual machine maintains its own separate swap and page file.



Boot from an Unpinned iSCSI LUN

If you are booting a Windows server or client from an unpinned iSCSI LUN, Riverbed recommends that you install the Riverbed Turbo Boot software on the Windows machine. The Riverbed Turbo Boot software greatly improves boot over the WAN performance because it allows Core to send to Edge only the files needed for the boot process.

Note: The SteelFusion Turbo Boot plugin is not compatible with the branch recovery agent. For more information about the branch recovery agent, see “How Branch Recovery Works” on page 117 and the SteelFusion Core Management Console User’s Guide.

Running Antivirus Software

There are two types of antivirus scanning mode:

Ondemand - Scans the entire LUN data files for viruses at scheduled intervals.

Onaccess - Scans the data files dynamically as they are read or written to disk.

There are two common locations to perform the scanning:

Onhost - Antivirus software is installed on the application server.

Offhost - Antivirus software is installed on dedicated servers that can access directly the application server data.

In typical SteelFusion deployments in which the LUNs at the data center contain the full amount of data and the remote branch cache contains the working set, Riverbed recommends that you run ondemand scan mode at the data center and onaccess scan mode at the remote branch. Running ondemand full file system scan mode at the remote branch causes the blockstore to wrap and evict the working set of data leading to slow performance results.

However, if the LUNs are pinned, ondemand full file system scan mode can also be performed at the remote branch.

Whether scanning onhost or offhost, the SteelFusion solution does not dictate one way versus another, but in order to minimize the server load, Riverbed recommends offhost virus scans.

Running Disk Defragmentation Software

Disk defragmentation software is another category of software that can possibly cause the SteelFusion blockstore cache to wrap and evict the working set of data. Riverbed does not recommend that you run disk defragmentation software.

Riverbed recommends that you disable default-enabled disk defragmentation on Windows 7 or later.

Running Backup Software

Backup software is another category of software that will possibly cause the Storage Edge blockstore cache to wrap and evict the working set of data, especially during the execution of full backups. In a SteelFusion deployment, Riverbed recommends that you run differential, incremental, and synthetic full and a full backup at the data center.


Core Best Practices Deployment Best Practices

Configure Jumbo Frames

If jumbo frames are supported by your network infrastructure, Riverbed recommends that you use jumbo frames between Core and storage arrays. Riverbed has the same recommendation for Storage Edge and any external application servers (not hosted within VSP) that are using LUNs from the Storage Edge. The application server interfaces must support jumbo frames. For details, see “Configuring Edge for Jumbo Frames” on page 60.

Core Best Practices

This section describes best practices for deploying the Core. It includes the following topics:

“Deploy on Gigabit Ethernet Networks” on page 155

“Use CHAP” on page 155

“Configure Initiators and Storage Groups or LUN Masking” on page 155

“Core Hostname and IP Address” on page 156

“Segregate Storage Traffic from Management Traffic” on page 156

“When to PIN and Prepopulate the LUN” on page 156

“Core Configuration Export” on page 157

“Core in HA Configuration Replacement” on page 157

Deploy on Gigabit Ethernet Networks

The iSCSI protocol enables block-level traffic over IP networks. However, iSCSI is both latency and bandwidth sensitive. To optimize performance reliability, Riverbed recommends that you deploy Core and the storage array on Gigabit Ethernet networks.

Use CHAP

For additional security, Riverbed recommends that you use CHAP between Core and the storage array, and between Storage Edge and the server. One-way CHAP is also supported.

For more information, see “Using CHAP to Secure iSCSI Connectivity” on page 128.

Configure Initiators and Storage Groups or LUN Masking

To avoid unwanted hosts to access LUNs mapped to Core, Riverbed recommends that you configure initiator and storage groups between Core and the storage system. This particular practice is also known as LUN masking or Storage Access Control.

When mapping Fibre Channel LUNs to the Core-vs, ensure the ESXi servers in the cluster that are hosting the Core-v appliances have access to these LUNs. Configure the ESXi servers in the cluster that are not hosting the Core-v appliances to not have access to these LUNs.


Deployment Best Practices Core Best Practices

Core Hostname and IP Address

If the branch DNS server runs on VSP and its DNS datastore is deployed on a LUN used with SteelFusion, Riverbed recommends that you use the Core IP address instead of the hostname when you specify the Core hostname and IP address.

If you must use the hostname, deploy the DNS server on the VSP internal storage, or configure host DNS entries for the Core hostname on the SteelHead.

Segregate Storage Traffic from Management Traffic

To increase overall security, minimize congestion, minimize latency, and simplify the overall configuration of your storage infrastructure, Riverbed recommends that you segregate storage traffic from regular LAN traffic using VLANs (Figure 12-7).

Figure 12-7. Traffic Segregation

When to PIN and Prepopulate the LUN

SteelFusion technology has built-in file system awareness for NTFS and VMFS file systems. There are two likely circumstances when you need to pin and prepopulate the LUN:

“LUNs Containing File Systems Other Than NTFS and VMFS and LUNs Containing Unstructured Data” on page 156

“Data Availability at the Branch During a WAN Link Outage” on page 157

LUNs Containing File Systems Other Than NTFS and VMFS and LUNs Containing Unstructured Data

Riverbed recommends that you pin and prepopulate the LUN for unoptimized file systems such as FAT, FAT32, ext3, and so on. You can also pin the LUN for applications like databases that use raw disk file format or proprietary file systems.


iSCSI Initiators Timeouts Deployment Best Practices

Data Availability at the Branch During a WAN Link Outage

When the WAN link between the remote branch office and the data center is down, data no longer travels through the WAN link. Hence, SteelFusion technology and its intelligent prefetch mechanisms no longer functions. Riverbed recommends that you pin and prepopulate the LUN if frequent prolonged periods of WAN outages are expected.

By default, the Storage Edge keeps a write reserve that is 10 percent of the blockstore size. If prolonged periods of WAN outages are expected, Riverbed recommends that you appropriately increase the write reserve space.

Core Configuration Export

Riverbed recommends that you store and back up the configuration on an external server in case of system failure. Enter the following CLI commands to export the configuration:

enableconfigure terminalconfiguration bulk export scp://username:password@server/path/to/config

Riverbed recommends that you complete this export each time a configuration operation is performed, or you have some other changes on your configuration.

Core in HA Configuration Replacement

If the configuration has been saved on an external server, the failed Core can be seamlessly replaced. Enter the following CLI commands to retrieve what was previously saved:

enableconfigure terminalno service enableconfiguration bulk import scp://username:password@server/path/to/configservice enable

iSCSI Initiators Timeouts


“Microsoft iSCSI Initiator Timeouts” on page 157

“ESX iSCSI Initiator Timeouts” on page 158

Microsoft iSCSI Initiator Timeouts

By default, the Microsoft iSCSI Initiator LinkDownTime timeout value is set to 15 seconds and the MaxRequestHoldTime timeout value is also 15 seconds. These timeout values determine how much time the initiator holds a request before reporting an iSCSI connection error. You can increase these values to accommodate longer outages, such as a Storage Edge failover event or a power cycle in the case of a single appliance.

If MPIO is installed in the Microsoft iSCSI Initiator, the LinkDownTime value is used. If MPIO is not installed, MaxRequestHoldTime is used instead.


Deployment Best Practices Operating System Patching

If you are using Storage Edge in an HA configuration and MPIO is configured in the Microsoft iSCSI Initiator, change the LinkDownTime timeout value to 60 seconds to allow the failover to complete.

ESX iSCSI Initiator Timeouts

By default, the VMware ESX iSCSI Initiator DefaultTimeToWait timeout is set to 2 seconds. This is the minimum time to wait before attempting an explicit or implicit logout or active iSCSI task reassignment after an unexpected connection termination or a connection reset. You can increase this value to accommodate longer outages, such as a Storage Edge failover event or a power cycle in case of a single appliance.

If you are using Storage Edge in an HA configuration, change the DefaultTimeToWait timeout value to 60 seconds to allow the failover to complete.

For more information about iSCSI Initiator timeouts, see “Configuring iSCSI Initiator Timeouts” on page 61.

Operating System Patching


“Patching at the Branch Office for Virtual Servers Installed on iSCSI LUNs” on page 158

“Patching at the Data Center for Virtual Servers Installed on iSCSI LUNs” on page 158

Patching at the Branch Office for Virtual Servers Installed on iSCSI LUNs

You can continue to use the same or existing methodologies and tools to perform patch management on physical or virtual branch servers booted over the WAN using SteelFusion appliances.

Patching at the Data Center for Virtual Servers Installed on iSCSI LUNs

If you want to perform virtual server patching at the data center and save a round-trip of patch software from the data center to the branch office, use the following procedure.

To perform virtual server patching at the data center

1. At the branch office:

Power down the virtual machine.

Take the VMFS datastore offline.


Operating System Patching Deployment Best Practices

2. At the data center:

Take the LUN on the Core offline.

Mount the LUN to a temporary ESX server.

Power up the virtual machine, and apply patches and file system updates.

Power down the virtual machine.

Take the VMFS datastore offline.

Bring the LUN on the Core online.

3. At the branch office:

Bring the VMFS datastore online.

Boot up the virtual machine.

If the LUN was previously pinned at the edge, patching at the data center can potentially invalidate the cache. If this is the case, you might need to prepopulate the LUN.


Deployment Best Practices Operating System Patching


CHAPTER 13 SteelFusion Appliance Sizing

Every deployment of the SteelFusion product family differs due to variations in specific customer needs and types and sizes of IT infrastructure. The following information is intended to guide you to achieving optimal performance. However, these guidelines are general; for detailed worksheets for proper sizing, contact your Riverbed representative.


“General Sizing Considerations” on page 161

“Core Sizing Guidelines” on page 161

“Storage Edge Sizing Guidelines” on page 163

General Sizing Considerations

Accurate sizing typically requires a discussion between Riverbed representatives and your server, storage, and application administrators.

General considerations include but not are not limited to:

Storage capacity used by branch offices - How much capacity is currently used, or expected to be used, by the branch office. The total capacity might include the amount of used and free space.

Input/output operations per second (IOPS) - What are the number and types of drives being used? This value should be determined early so that the SteelFusion-enabled SteelHead can provide the same or higher level of performance.

Daily rate of change - How much data is Edge expected to write back to the storage array through the Core? This value can be determined by studying backup logs.

Branch applications - Which and how many applications are required to continue running during a WAN outage? This answer can impact disk capacity calculations.

Core Sizing Guidelines

The main considerations for sizing your Core deployment are as follows:

Total data set size - The total space used across LUNs (not the size of LUNs).


SteelFusion Appliance Sizing Core Sizing Guidelines

Total number of LUNs - Each LUN adds five optimized connections to the SteelHead. Also, each branch office in which you have deployed Storage Edge represents at least one LUN in the storage array.

RAM requirements - Riverbed recommends that you have at least 700 MB of RAM per TB of used space in the data set. There is no specific setting on the Core to allocate memory on this basis, but in general this is how much the Core uses under normal circumstances if the memory is available. Each Core model has a fixed capacity of memory (see SteelFusion and SteelHead specification sheets for details) that it is shipped with. If the metric falls below the recommended value, performance of the Core can be affected, and the ability to efficiently perform prediction and prefetch operations.

Other potentially decisive factors include:

Number of files and directories

Type of file system, such as NTFS or VMFS

File fragmentation

Active working set of LUNs

Number of misses seen from Edge

Response time of the storage array

The following table summarizes sizing recommendations for Core appliances based on the number of branches and data set sizes.

Model Number of LUNs Number of Branches Data Set Size RAM

1000U 10 5 2 TB VM guidelines

1000L 20 10 5 TB VM guidelines

1000M 40 20 10 TB VM guidelines

1500L 60 30 20 TB VM guidelines

1500M 60 30 35 TB VM guidelines

2000L 20 10 5 TB 24 GB

2000M 40 20 10 TB 24 GB

2000H 80 40 20 TB 24 GB

2000VH 160 80 35 TB 24 GB

3000L 200 100 50 TB 128 GB

3000M 250 125 75 TB 128 GB

3000H 300 150 100 TB 128 GB

3500C1 150 75 25 TB 256 GB

3500C2 200 100 50 TB 256 GB

3500C3 300 150 100 TB 256 GB


Storage Edge Sizing Guidelines SteelFusion Appliance Sizing

The table assumes 2 x LUNs per branch; however, there is no enforced limit for the number of LUNs per branch or number of branches, so long as the recommended number of LUNs and data set sizes are within limits.

Note: Core models 1000 and 1500 are virtual appliances. For minimum memory requirements, see the SteelFusion Core Installation and Configuration Guide.

Storage Edge Sizing Guidelines

The main considerations for sizing your Storage Edge deployment are as follows:

Disk size - What is the expected capacity of the Storage Edge blockstore?

– Your calculations can be affected depending on if LUNs are pinned, unpinned, or local.

– During WAN outages when the Storage Edge cannot synchronize write operations back through the Core to the LUNs in the data center, the Edge uses a write reserve area on the blockstore to store the data. As described in “Pin the LUN and Prepopulate the Blockstore” on page 144, this area is 10 percent of the blockstore capacity.

Input/output operations per second (IOPS) - If you are replacing existing storage in the branch office, you can calculate this value from the number and types of drives in the devices you want to replace. Remember that the drives might not have been operating at their full performance capacity. So, if an accurate figure is required, consider using performance monitoring tools that might be included within the server OS: for example, perfmon.exe in Windows.

Other potentially decisive factors:

HA requirements (PSU, disk, network redundancy)

VSP CPU and memory requirements

SteelHead EX optimization requirements (bandwidth and connection count)

See the SteelFusion Edge and BlockStream-enabled SteelHead EX specification sheets for capacity and capabilities of each model.

When sizing the SteelFusion Edge appliance, you have the additional flexibility of choosing optimized WAN bandwidth (W0-W3) needed after you have completed sizing for disk size, IOPS, and CPU. The W0 model comes with 0 optimized WAN bandwidth for external traffic, but it still optimizes SteelFusion traffic. For more information, see “Rdisk Traffic Routing Options” on page 146.


SteelFusion Appliance Sizing Storage Edge Sizing Guidelines


APPENDIX A Edge Appliance Network Reference

Architecture

This appendix provides detailed diagrams for SteelHead EXs that run VSP and Edge. It includes the following topics:

“Converting In-Path Interfaces to Data Interfaces” on page 165

“Multiple VLAN Branch with Four-Port Data NIC” on page 167

– “SteelHead EX Setup for Multiple VLAN Branch with Four-Port Data NIC” on page 167

– “Edge Setup Multiple VLAN Branch with Four-Port Data NIC” on page 167

– “Virtual Services Platform Setup Multiple VLAN Branch with Four-Port Data NIC” on page 168

“Single VLAN Branch with Four-Port Data NIC” on page 169

– “SteelHead EX Setup for Single VLAN Branch with Four-Port Data NIC” on page 169

– “Edge Setup Single VLAN Branch with Four-Port Data NIC” on page 169

– “Virtual Services Platform Setup Single VLAN Branch with Four-Port Data NIC” on page 170

“Multiple VLAN Branch Without Four-Port Data NIC” on page 171

– “SteelHead EX Setup for Multiple VLAN Branch Without Four-Port Data NIC” on page 171

– “Edge Setup Multiple VLAN Branch Without Four-Port Data NIC” on page 171

– “Virtual Services Platform Setup Multiple VLAN Branch Without Four-Port Data NIC” on page 172

For additional information about four-port NIC data cards, see “Using the Correct Interfaces for BlockStream-Enabled SteelHead EX Deployment” on page 81.

Converting In-Path Interfaces to Data Interfaces

For deployments with SteelFusion Edge appliances, this feature is not required because any additional multi-port NICs that you purchase and install must be specified for bypass or data at the time of ordering.

For BlockStream-Enabled SteelHead EX deployments in which you want to use additional interfaces for iSCSI traffic, you can convert the in-path interfaces on an additional, installed bypass NIC. To convert to data interfaces, use the hardware nic slot <slot-number> mode data CLI command or the Networking > Networking: Data Interfaces page of the SteelHead EX Management Console. After you enter the command, you need to reboot the appliance and bring up the data interfaces. You can convert data interfaces back to in-path interfaces using the hardware nic slot <slot-number> mode inpath CLI command.


Edge Appliance Network Reference Architecture Converting In-Path Interfaces to Data Interfaces

This feature is not supported on the SteelHead EX560 and EX760 models.

For details, see the Riverbed SteelFusion Command-Line Interface Reference Manual and the Riverbed Command-Line Interface Reference Manual.

Some SteelHead EX models have expansion slots and can support up to two additional NICs (slot 1 and slot 2). You can use a single NIC in slot 1 for SteelHead optimization (in-path mode), for VSP traffic (data mode), or for SteelFusion traffic (data mode). If you have two additional NICs, you can use slot 1 for SteelHead optimization (in-path), for VSP traffic (data mode), or for SteelFusion traffic (data mode). However, you can only use the NIC in slot 2 can for SteelHead optimization (in-path) or for SteelFusion traffic (data mode). There is no support for VSP traffic on a NIC in slot 2.

Note: SteelHead EX software releases prior to v3.5 only support a single NIC in slot 1. To support two NICs, or a single NIC in slot 2, you must use EX release v3.5 or later.

The following table summarizes the deployment options for SteelHead EX models.

In which:

in-path= in-path mode for SteelHead EX WAN optimization traffic

VSP = data mode access for VSP traffic

storage = data mode access for SteelFusion traffic

The following table summarizes the deployment options for SteelFusion Edge models.

In which:

in-path = in-path mode for WAN optimization traffic

storage = data mode access for SteelFusion traffic

data = LAN access for hypervisor node traffic

EX Model Number of PCIe Expansion slots

Slot 1 Slot 2 (EX v3.5 or later)

EX560 0 — —

EX760 0 — —

EX1160 2 in-path, VSP, or storage in-path or storage



SteelFusion Edge Model

Number of PCIe Expansion slots

Slot 1 (always populated prior to shipment)

Slot 2 (RiOS node) Slot 4 and 5 (RiOS node)

Slot 5 and 6 (Hypervisor node)

SFED2100 2 in-path in-path or storage — —

SFED2200 2 in-path in-path or storage — —

SFE32100 6 in-path in-path or storage in-path or storage data

SFED3200 6 in-path in-path or storage in-path or storage data

SFED5100 6 in-path in-path or storage in-path or storage data


Multiple VLAN Branch with Four-Port Data NIC Edge Appliance Network Reference Architecture

Multiple VLAN Branch with Four-Port Data NIC

The following diagrams apply only to SteelHead EX models 1160/1260.

Figure A-1. SteelHead EX Setup for Multiple VLAN Branch with Four-Port Data NIC

Figure A-2. Edge Setup Multiple VLAN Branch with Four-Port Data NIC


Edge Appliance Network Reference Architecture Multiple VLAN Branch with Four-Port Data NIC

Figure A-3. Virtual Services Platform Setup Multiple VLAN Branch with Four-Port Data NIC


Single VLAN Branch with Four-Port Data NIC Edge Appliance Network Reference Architecture

Single VLAN Branch with Four-Port Data NIC

The following network diagrams apply only to SteelHead EX models 1160/1260.

Figure A-4. SteelHead EX Setup for Single VLAN Branch with Four-Port Data NIC

Figure A-5. Edge Setup Single VLAN Branch with Four-Port Data NIC


Edge Appliance Network Reference Architecture Single VLAN Branch with Four-Port Data NIC

Figure A-6. Virtual Services Platform Setup Single VLAN Branch with Four-Port Data NIC


Multiple VLAN Branch Without Four-Port Data NIC Edge Appliance Network Reference Architecture

Multiple VLAN Branch Without Four-Port Data NIC

Figure A-7. SteelHead EX Setup for Multiple VLAN Branch Without Four-Port Data NIC

Figure A-8. Edge Setup Multiple VLAN Branch Without Four-Port Data NIC


Edge Appliance Network Reference Architecture Multiple VLAN Branch Without Four-Port Data NIC

Figure A-9. Virtual Services Platform Setup Multiple VLAN Branch Without Four-Port Data NIC


Index

AAntivirus software 154Appliance ports, definitions 24Application-consistent snapshots 109At-rest encryption 130Auxiliary port, definition 24

BBackup proxy host configuration 113Backup software 154Best practices, Edge 143Block Disk LUNs, configuring 31Blockstore

clearing contents 132definition 6overview 7prepopulate 144synchronization 86

Boot up, unpinned LUN 154Branch recovery 116Branch services

configuring branch storage 64configuring disk management 61

CCabling

Core 71Edge 83high availability 71

CHAP 155Configuring

disk management 146interfaces 57jumbo frames 155ports 57storage on an Edge 64

Console port, definition 24Core

cabling 71configuring failover peers 72configuring high availability 30configuring jumbo frames 29deploying 23, 67, 121, 135high availability 68hostname and IP address 156MPIO 32, 69pool management 33port definitions 24upgrade 138

SteelFusion Deployment Guide

Core-v, Fibre Channel LUNs 48

DData protection 114Data recovery 115Deployment process

configuring Edge 18network scenarios 14overview 11, 16

Deployment scenarios, high availability 15

Disaster recovery 124Disaster recovery scenarios, failback 125Disaster recovery scenarios, failover 124Disk defragmentation software 154Disk management, configuring 61Document conventions, overview of 2

EEdge

adding to configuration 32cabling 83configuring 18configuring storage 64confirming connection to 18high availability 80, 88initiator groups 20initiators, configuring 20jumbo frames 60LUN mapping 20MPIO 85peer communication 87ports 58prepopulating 20target settings 19upgrade 137

Ethernet network compatibility ii

FFailback

branch 126data center 126disaster recovery scenarios 125

Failoverbranch office 125data center 124

Failover, disaster recovery scenarios 124Fibre Channel

best practices 52

173

Index

VMware 41Fibre Channel LUNs

configuring 19, 31Core-v 48populating 51support 19

Fibre Channel, overview 39FusionSync, see SteelFusion Replication

GGigabit Ethernet networks 155

HHeartbeat 72High availability

cables 71Core 68Core, concepts 69heartbeat 72overview 15, 32, 67pool management 38SteelFusion Edge 88SteelHead EX 80testing deployments 95with SteelFusion Replication 105

IIn-flight encryption 130Interface configuration 24, 57, 68Interface routing 26Interfaces

SteelHead EX 81Interoperability matrix 6IP addresses, changing 145iSCSI

Initiator 30Initiator timeouts 158Initiator, configuring 18portal, configuring 18

iSCSI LUNs, configuring 19

JJumbo frames

configuring 29Edge 60support 29

KKnown issues 2

LLocal LUN, definition 149Lost password 130LUN filtering 49LUN masking 41, 155LUN snapshot rollback 127LUNs

configuring 19, 30configuring snapshots 110discovering 18exposing 30iSCSI 19mapping to an Edge 18offlining 31removing 31removing from configuration 31

174

Mmetrics 107MPIO

configuring for Core 32configuring in Edge 65Core 69Edge 85HA 68overview 32

Multi-path I/O, see MPIO 32

NNetwork deployment scenarios 14

OOnline documentation 2Overview, blockstore 7

PPersistent bindings 152Pinned LUN, definition 149Pool management

high availability 38overview 33REST API 35

Port configuration 24, 68Ports

auxiliary 24configuration 57definitions of 24Edge 58primary, definition of 24type of traffic 145

Product overviewbranch converged infrastructure 5high availability deployment 15

Proxy hostconfiguration 113

QQoS

branch office 141pinned LUNs 141replication traffic 141time-based rules 142unpinned LUNs 141

RRaw device mapping 42Rdisk protocol 139, 146Recovery

single Core 121single Core-v 122

Related reading 2Replacement, Edge 123Requirements, Core VE and Fibre Chan-

nel SANs 44REST API access 35Riverbed Hardware Snapshot

Provider 112Riverbed Host Tools

operation and configuration 112Riverbed Snapshot Agent 111

Riverbed Snapshot Agent 112Riverbed, contacting 3

Index

Index

SSeparate storage 144Snapshots

application consistent 109configuring for LUNs 110Riverbed Hardware Snapshot Provider 112Riverbed Host Tools 111Volume Snapshot Service (VSS) support 111

Software upgrade 135Split brain

EdgeHA, split brain 95

SteelFusion Replication 107architecture 100failover scenarios 102overview 99with HA 105

SteelHead EXHA 80interfaces 81

Storage Access Control 155Storage array

data protection 114data recovery 115proxy backup 113

System overview, virtual storage 5

TTarget, configuring 18Thick provisioning 152Thin provisioning 152Turbo Boot 20

UUpgrade

Core 138Edge 137minimize risk 137sequence 136software 135

VVirtual Machine File System 41Virtual storage, overview 5VMware

configure memory 153ESX server storage layout 148Fibre Channel 41

Volume Snapshot Service (VSS) support 111

WWAN redundancy 96WCCP 148


Index


Documents

SteelFusion Deployment Guide - Riverbed