SAN Administration Basics

1

SAN INTRODUCTION

SAN Introduction:

Problems of DAS:

Throughout the 1980s, the standard way of connecting hosts to storage devices was point-

to-point, direct-attach storage through interfaces such as Integrated Drive Electronics (IDE)

and parallel SCSI. Parallel SCSI offered relatively fast (5 or 10 Mbit/sec) access to SCSI-

enabled disks, and several disks could be connected at once to the same computer through

the same interface. This worked well for the time, with fairly reliable, fast-speed connections

allowing administrators to connect internal and external storage through just simple ribbon

cabling or multiconductor external cables. However, as storage subsystems became larger

and larger and computers faster and faster, a new problem emerged–external storage

(which at one time was just a simple disk drive on the desk next to a machine) started to get

bigger. Tape libraries, Redundant Array of Inexpensive Disks (RAID) arrays, and other SCSI

devices began to require more and more space–requiring the parallel SCSI connection to be

stretched farther and farther away from the host. Input/Output (I/O) rates also increased,

pushing on the physics of keeping signal integrity in a large bundle of wires (32- and 64-bit

data bus widths). Simple parallel SCSI variants were devised to enable longer distances and

to address the signal integrity issues. However, they all eventually ran up against the

difficulties of high-speed signals across the parallel SCSI bus architecture.

DAS Fig.

Solutions from SAN:

The solution to all of this was slow in coming, but eventually the storage industry settled on

using a serial protocol with high-speed transceivers–offering good noise immunity, ease of

cabling, and plentiful bandwidth. Different specifications (Serial Storage Architecture [SSA]

and Fibre Channel as well as more advanced parallel SCSI technologies) competed for

adoption, and companies began experimenting with different serial communications media.

New high-speed circuits made serial transfers (using a simple pair of wires to transmit bits

2

serially, in order, rather than a large number of wires to transfer several bytes or words of

data at a time) the most practical solution to the signal problems.

The high speed of the circuits enabled the data rates for Fibre Channel to offer up to 100

Mbit/sec transfers, versus the slower 10 to 20 Mbit/sec parallel limitations. (At present FC

provides 1\2\4 Gbit/sec transfers).

When Fibre Channel was first applied to the area of storage connections, the primary reason

for the technology was for the extended distances and simplified cabling that the technology

offered. This extension of direct-attach operation basically replaced the old parallel SCSI

attachments with a high-speed serial line (Figure 1.2). The new Fibre Channel connections

offered a much faster interface and simplified cabling (four copper wire connections through

DB-9 connectors, as well as optical cabling), and could be used to distribute storage as far

as 10 km away from a host computer, or 30 km away with optical extenders.

Figure 1.2 Using Fibre Channel to Extend Distances from Storage

The connections to disks at this time began using the Fibre Channel Arbitrated Loop (FC-AL)

protocol, which enabled disks to negotiate their addresses and traffic on a loop topology

with a host (Figure 1.3). Because of the combined ability to easily cable and distribute

storage, users were now able to add separate racks of equipment to attach to hosts. A new

component, the Fibre Channel hub, began to be used to make it easier to plug in devices.

The hub, a purely electrical piece of equipment that simply connected pieces of a Fibre

Channel loop together, made it possible to dynamically add and remove storage from the

network without requiring a complete reconfiguration. As these components began to be

used in increasingly more complex environments, manufacturers began to add "intelligence"

to these Fibre Channel hubs, enabling them to independently deal with such issues as

failures in the network and noise in the network from loops being added and removed. An

alternative to the hub came in the form of the Fibre Channel switch, which, unlike a hub, was

not just connecting pieces of a loop, but instead offered the packet-switching ability of

traditional switches.

3

Figure 1.3 Arbitrated Loop Disk Configurations Attached to a Single Host

Because there was now a Fibre Channel network available, other hosts (not storage) were

added to take advantage of the same network. With the addition of SAN-aware software, it

was suddenly possible to share storage between two different devices on the network.

Storage sharing was the first realization of the modern SAN, with companies in the

multimedia and video production areas paving the way by using the Fibre Channel network

to share enormous data files between workstations, distribute jobs for rendering, and make

fully digital production possible (Figure 1.4).

Figure 1.4 Multiple Host Arbitrated Loop for Storage Sharing

The next big step in Fibre Channel evolution came with the increased reliability and

manageability of a Fibre Channel switched fabric. Early implementations of FC-AL were

sometimes difficult to manage, unstable, and prone to interoperability problems between

4

components. Because the FC-AL protocol was quite complex, what sometimes would happen

would be an inability for anything to communicate and stay operational on a loop. The

solution to this was a move to a switched fabric architecture, which not only enhanced the

manageability and reliability of the connection, but provided switched, high-speed

connections between all nodes of a network instead of a shared loop. As a result, each port

on a switch now provides a full 1 Gbit/sec of available bandwidth rather than just a portion of

the total 1 Gbit/sec of bandwidth shared between all the devices connected to the loop.

Fabrics now make up the majority of Fibre Channel installations. A typical Fibre Channel

switched fabric installation (Figure 1.5) has multiple hosts and storage units all connected

into the same Fibre Channel network cloud through one or more Fibre Channel switches.

Figure 1.5 Switched Fabric, Multiple Host, and Storage Unit Configuration

Today, the modern SAN looks much like any other modern computer network. Network

infrastructures such as switches, hubs, bridges, and routers help transport frame-level

information across the network. Network interface cards interface computer systems to the

same network (called HBAs in the SAN world, as they replaced SCSI Host Bus Adapters).

Figure 1.6 shows an example of how these components could be used in conjunction with

Fibre Channel switches.

5

Figure 1.6 Typical Deployed SAN Configuration with Multiple Hosts, Storage, and Tape

Devices

FC Technology:

6

1. What is Fibre Channel?

Fibre Channel (FC) is a serial, high-speed data transfer technology, which can be

utilized by networks and mass storage.

Fibre Channel is an open standard, defined by ANSI and OSI and which supports the

most important higher protocols, such as Internet Protocol, ATM (Asynchronous

Transfer Mode), IEEE 802 (Institute of Electrical and Electronics Engineers Standard),

HIPPI (High Performance Parallel Interface), SCSI (Small Computer System Interface),

etc..

Fibre Channel is fast (data Transfer), Flexible (supports many topologies), Simple and

Scalable.

FC Storage Devices available in the current Storage industry includes FC Controllers

and HBA, FC Hard disks, FC HD Enclosures, FC Storage Arrays, FC Hubs and Switches,

FC Connectors and Cables and Other devices.

2 FC Layers:

• The FC-0 layer defines the specification for media types, distance, and signal

electrical and optical characteristics.

• The FC-1 layer defines the mechanism for encoding/decoding data for transmission

over the intended media and the command structure for accessing the media.

• FC-2 layer defines how data blocks are segmented into frames, how the frames are

handled according to the class of service, and the mechanisms for flow control and

ensuring frame data integrity.

• The FC-3 layer defines facilities for data encryption and compression.

• The FC-4 layer is responsible for mapping SCSI-3 protocols (FCP) and other higher

layer protocols/services into Fibre Channel commands.

7

FC Layers & OSI Layers Comparison

8

FC Topologies

1. Point-to-Point Topology

Up to 2 Devices (ports) can be in p-to-p.

Each devices are called Nodes and each node port are designated as N-Port.

Point-to-point connections can access the entire 100MB/sec bandwidth available with

FC for communication between two nodes.

2. Arbitrated Loop Topology

Arbitrated Loop (AL) allows up to 127 ports to be connected in a circular daisy chain.

The ports in an AL are designated as NL_Ports, and two ports can be active

simultaneously.

The other ports function as repeaters and simply pass the signal along. This means,

of course, that the bandwidth of 100MB/sec is shared among all devices.

Arbitrated Loop uses ALPA (8-bits) Addressing.

9

3. Switched Fabric Topology:

Switched Fabric Allows up to 16 million devices to connect together.

FC devices are connected to the network via the F_Ports or FL_Ports.

The connection between the individual ports on the network functions similarly to a

telephone system.

Switched Fabric uses Dynamic (24-bits) Addressing.

10

FC Addressing & Ports

WWN (World Wide Names)

– Static 64-bit address for each port

– IEEE Assigned by Block to Manufactures

AL_PA (Arbitrated Loop Physical Address)

– Dynamic 8-bit Address when connected to arbitrated loop

S_ID (Native Address Identifier)

– Dynamic 24-bit address assigned to a node in Fabric

Basic Port Types

– N_Port (Node) à End-Point; typically HBA or Disk; Connect to other N_Ports or

F_Ports

– F_Port (Fabric) à Found only in Switch; connect directly to N_Ports

– E_Port (Expansion) à Found only in Switch; connect directly to E_Ports in

Other Switches to expand SAN

Used in Arbitrated Loops

– NL_Port (N_Port with Arbitrated Loop Capabilities)

• Connect directly to N_Port, F_Port, NL_Port or FL_Port

11

– FL_Port (F_Port with Arbitrated Loop Capabilities)

• Switch port connects to N_Port and NL_Port

– G_Port (Generic)

• Switch port that can be F_Port, FL_Port, or E_Port

FC Flow Control

Buffer-to-Buffer Credit: This type of flow control deals only with the link between

an N_Port and an F_Port or between two N_Ports. Both ports on the link exchange

values of how many frames it is willing to receive at a time from the other port. This

value becomes the other port's BB_Credit value and remains till they are logged in.

End –To-End Credit: End-to-End flow control is not concerned with individual links,

but rather the source and destination N_Ports. The concept is very similar to buffer-

to-buffer flow control. When the two N_Ports log into each other, they report how

many receive buffers are available for the other port. This value becomes EE_Credit.

FC Class of Service

12

NODE A

N_Port 1

End to End Credit(ACK)

F_Port 1

F_Port 2

FABRIC

NODE B

N_Port 2

Buffer to BufferCredit

(R_RDY)

Buffer to BufferCredit

(R_RDY)

Class 1

à Guaranteed Bandwidth an Delivery

à Dedicated Connection

à End-to-End Flow Control (R_RDY and ACK)

Class 2

à Guaranteed Delivery (ACK required)

à Connectionless Service

à Buffer to Buffer & End to

End Flow Control (R_RDY and ACK) à

Out of Order Delivery of Frames Allowed

Class 3

à Delivery Managed Exclusively by Flow Control (R_Rdy)

à Connectionless Service

à Out of order delivery of frames

allowed

Intermix

Enhanced Class 1, allows Class 2 or Class 3 frames between Class 1 frames.

Class 4 à Class 4 can be used only with the pure Fabric topology. One N_Port will

set up a Virtual Circuit (VC) by sending a request to the Fabric indicating the remote

N_Port as well as quality of service parameters. The resulting Class 4 circuit will

consist of two unidirectional VCs between the two N_Ports.

Class 5 à The idea for Class 5 involved isochronous, just-in-time service. However, it

is still undefined.

Class 6 à Class 6 provides support for multicast service through a Fabric at the

well-known address of hex'FFFFF5'

FC AL initialization

Loop initialization happens mainly for two reasons.

à When a Loop is newly formed with all FC devices and about to

come up in network.

13

à When any loop failures happens initialization starts.

3 main functions happens during Loop initialization.

à LIP Primitive sequences.

LIP is transmitted by an L-Port after it powers on, or when it detects

Loop Failure (loss of synchronization at its receiver). The LIP will propagate around

the Loop, triggering all other L_Ports to transmit LIP as well. At this point, the Loop is

not usable.

àSelection of Loop Master.

This is done by the L_Ports constantly transmitting Loop Initialization Select Master

(LISM) frames.

à Selection of ALPA by all the devices in the Loop.

The concept of an AL_PA bitmap is used, where each L_Port selects (and sets) a

single bit in the bitmap of a frame originated by the Loop master and repeats the

frame back on the Loop. There are 127 available bits, corresponding to the 127 valid

AL_PAs.

This Process is done using following four frames.

LIFA à A certain AL_PA was assigned by the Fabric.

LIPA à before this initialization, the L_Port had a valid AL_PA.

LIHA à the L_Port has a certain AL_PA it tries to claim.

LISA à the L_Port claims the

first available AL_PA that is left.

Two additional frames may be sent by the Loop master, but only if all L_Ports on the

Loop support. LIRP & LILP.

FC AL Arbitration

14

Arbitrated Loop is not a token-passing scheme. When a device is ready to

transmit data, it first must arbitrate and gain control of the Loop. It does this by

transmitting the Arbitrate (ARBx) Primitive Signal, where x = the Arbitrated Loop

Physical Address (AL_PA) of the device.

Once a device receives its own ARBx Primitive Signal, it has gained control of the

Loop and can now communicate with other devices by transmitting an Open (OPN)

Primitive Signal to a destination device.

Once this happens, there essentially exists point-to-point communication between

the two devices. All other devices in between simply repeat the data.

If more than one device on the Loop is arbitrating at the same time , the x

values of the ARB Primitive Signals are compared.

When an arbitrating device receives another device's ARBx, the ARBx with the

numerically lower AL_PA is forwarded, while the ARBx with the numerically higher

AL_PA is blocked.

Unlike token-passing schemes, there is no limit on how long a device may retain

control of the Loop. This demonstrates the “Channel" aspect of Fibre Channel.

There is, however, an Access Fairness Algorithm, which prohibits a device from

arbitrating again until all other devices have had a chance to arbitrate. The catch is

that the Access Fairness Algorithm is optional.

Fibre Channel over IP

Two popular solutions for extending the Fibre Channel over the IP network are FCIP and

iFCP.

Main Reasons for Fibre Channel over IP Networks are as following:

15

Leverage existing storage devices (SCSI and Fibre Channel) and networking

infrastructures (Gigabit Ethernet);

Maximize storage resources to be available to more applications;

Extend the geographical limitations of DAS and SAN access;

Use existing storage applications (backup, disaster recovery, and mirroring) without

modification; and

Manage IP-based storage networks with existing tools and IT expertise.

FCIP Protocol:

FCIP is currently the most widely supported IP based extension protocol. This is probably due

to the fact that it is simple and easy to implement. The basic concept of FCIP is a tunnel that

connects two or more Fibre Channel SAN islands through a IP network. Once connected, the

two SAN islands logically merge into a single fabric across the IP tunnel.

An FCIP gateway is required to encapsulate Fibre Channel frames into TCP/IP packets. This is

sent through the IP network. On the remote side, another FCIP gateway receives the

incoming FCIP traffic, strips off the TCP/IP header before forwarding thenative Fibre Channel

frames into the SAN (Figure).

The FCIP gateway can be a separate device, or its functionality can be integrated into the

Fibre Channel switch.

16

The obvious advantage of using FCIP is that existing IP infrastructure can be used to provide

the distance extension.

How FCIP works

FCIP solutions encapsulate Fibre Channel packets and transport them via TCP/IP, which

enables applications that were developed to run over Fibre Channel SANs to be supported

under FCIP. It also enables organizations to leverage their current IP infrastructure and

management resources to interconnect and extend Fibre Channel SANs.

FCIP is a tunneling protocol that uses TCP/IP as the transport

while keeping Fibre Channel services intact. FCIP relies on IP-based network services and on

TCP/IP for congestion control and management. It also relies on both TCP/IP and Fibre

Channel for data-error and data-loss recovery.

In FCIP, gateways are used to interconnect Fibre Channel SANs

to the IP network and to set up connections between SANs, or between Fibre Channel

devices and SANs. Like iSCSI, there are a number of "pre-standard" FCIP products on the

market.

iFCP Protocol:

How iFCP works Fibre Channel devices (e.g., switches, disk arrays, and HBAs) connect to

an iFCP gateway or switch. Each Fibre Channel session is terminated at the local gateway

and converted to a TCP/IP session via iFCP. A second gateway or switch receives the iFCP

session and initiates a Fibre Channel session. In iFCP, TCP/IP switching and routing elements

complement and enhance, or replace, Fibre Channel SAN fabric components. The protocol

enables existing Fibre Channel storage devices or SANs to attach to an IP network. Sessions

include device-to-device, device-to-SAN, and SAN-to-SAN communications.

From the IP side, each of the Fibre Channel devices connected to the

iFCP gateway is given a unique IP address, which is advertised in the IP network. This allows

individual Fibre Channel devices to be reached through the IP network via the iFCP

gateway.The ability to individually address devices gives iFCP some advantages compared

to the FCIP protocol.

The biggest advantage is that of stability. Using FCIP between two Fibre Channel SAN islands

will cause the islands to merge into one. This means if there are perturbations in the IP

network, it can potentially cause the fabric to rebuild on both sides of the IP tunnel. Using

iFCP, the connectivity is between individual devices, and the fabrics stay separate. If

perturbations occur in the network, it may affect individual connections, but will not cause

17

fabric rebuilds, thus leading to more stable fabrics on both sides of the IP network.The

disadvantage compared to FCIP is the limited availability of iFCP solutions in the market

place. This could be because FCIP is very simple to implement, thus the FCIP solution is

widely available and is provided by a number of different manufacturers. In contrast, iFCP is

only supported by a limited number of vendors.

Fibre Channel Switch

I. FC Switch configuration

1. Open up a HyperTerminal

2. Login as admin

3. Enter password for password

4. Type in configure

a. Configure entire switch

5. Type in Help

a. List commands possible

6. Type ipAddrSet

7. Enter the Ethernet IP address

a. Get from Tom York

8. Enter the common SubNet Mask

a. 255.255.252.0

9. Hit enter twice after SubNet Mask

a. Use default values

10. Enter the gateway number, which is the same throughout the lab

a. 147.145.175.254

11. When ask to set respond by entering <y>

12. Now type ipAddrShow

a. Make sure the ip address held

13. Now reboot

a. This will take several minutes

II. Enable the Switches

1. Connect to the internet

2. Enter the address http:// ip address of the switch

3. Now click on zone admin

4. Enter admin for user name

18

5. And enter password for password

6. For zone selection select switch/port level zoning

7. And click <ok>

8. Click port zone tab

9. Click create zone

10. Name zone

11. Go to switch port, domain and select ports 0 through 7

12. Click add mem =>

13. Create another zone

14. Select ports 8 through 15

15. Now select port config tab

16. Highlight both new zone add them

a. Under file zones

17. Add mem =>

18. Click enable config and then apply and okay

a. All located on bottom of screen

Switch Behavior:

Switch Initialization:

At the Power ON, boot PROM diagnostics :

Verify CPU DRAM Memory.

Initialize base Fabric Operating System (FOS).

The Initialized FOS does the following:

Execute Power-On Self Test (POST) on switch.

Initialize ASICs & Front panel.

19

Initialize link for all ports (put online).

Explore the Fabric and determine the Principal Switch.

Assign Addresses to Ports.

Build Unicast routing table.

Enable N-Port operations.

Fabric Port Initialization Process: (From Switch Prospective)

Transition 1: At the beginning, verify if anything is plugged to Switch Port.

Transition 2: At FL-Port, Is there any Loop connections present in the Switch.

Transition 3: At G-Port, Verifying if any other (switch or Hubs) devices connected.

Transition 4: After G-Port, Verifying if Switch or Point-to-Point devices connected.

Communication Protocols:

Fabric Devices typically

FLOGI à PLOGI to Name Server à SCR to Fabric Controller à Register &

Query [using FC Common Transport (FC_CT) Protocol] à LOGON.

Loop Devices typically

PRIVATE NL: LIP (PLOGI & PRLI will enable private storage devices that accept

PRLI & thus “appear” Fabric capable)

PUBLIC NL: LIP à FLOGI à PLOGI à SCR à Register & Query à LOGO & then

PLOGI à & Communicate with other end nodes in the fabric.

LIP Process include: LIP, LISM, LIFA, LIPA, LIHA, LISA and LIRP & LILP.

20

Switch Commands: (general from Brocade switches)

Help

Switchshow

fabricshow

Switchenable/disable

Nsshow

nsallshow

Zoneshow/alishow/cfgshow

Cfgenable

Cfgdisable

Cfgcreate

Zonecreate

Errdump

Licenseshow

Portcfgdefault

Portenable/portdisable

Wwn

urouteshow

21

22

23

24

Switch or Fabric Zoning:

SAN implementations make data highly accessible; as a result, there is a need for data-

transfer optimization and finely tuned network security. Fabric zoning sets up the way

devices in the SAN interact, establishing a certain level of management and

security.

What is zoning?

Zoning is a fabric-centric enforced way of creating barriers on the SAN fabric to prevent set

groups of devices from interacting with other devices. SAN architectures provide port-to-port

connections between servers and storage subsystems through bridges, switches, and hubs.

Zoning sets up efficient methods of managing, partitioning, and controlling pathways to and

from storage subsystems on the SAN fabric, which improves storage subsystem utilization,

data access, and security on the SAN. In addition, zoning enables heterogeneous devices to

be grouped by operating system, and further demarcation based on application, function, or

department.

Types of zoning

There are two types of zoning: soft zoning and hard zoning.

• Soft zoning uses software to enforce zoning. The zoning process uses the name server

database located in the FC switch. The name server database stores port numbers and

World Wide Names (WWNs) used to identify devices during the zoning process.

When a zone change takes place, the devices in the database receive Registered State

Change Notification (RSCN). Each device must

correctly address the RSCN to change related communication paths. Any device that does

not correctly address the RSCN, yet continues to transfer data to a specific device after a

zoning change, that device will be blocked from communicating with its targeted device.

• Hard zoning uses only WWNs to specify each device for a specific zone. Hard zoning

requires each device to pass through the switch’s route table so that the switch can regulate

data transfers by verified zone.

For example, if two ports are not authorized to communicate with each other, the route

table for those ports is disabled, and the communication between those ports is blocked.

Zoning components

25

Zone configurations are based on either the physical port that devices plug into, or the WWN

of the device. There are three zoning components:

• Zones

• Zone members

• Zone sets

What is a zone?

A zone is composed of servers and storage subsystems on a SAN that access each other

through managed port-to-port connections. Devices in the same zone recognize and

communicate with each other, but not necessarily with devices in other zones unless a

device, in that zone, is configured for multiple zones.

Figure 1 shows a three-zone SAN with zones 1 and 3 sharing the tape library in zone 2.

Figure 1: Three-Zone SAN Fabric

Zone types

• Port zoning (all zone members are ports)

• WWN zoning (all zone members are WWNs)

• Session-based zoning (zone members are a mixture of WWNs and ports)

Zone database

• Zone database consists of zone objects.

26

• A zone object can be an alias, a zone, or a configuration

• Configurations contain zones which contain aliases

• For any object, the commands available allow you to create, delete, add, remove, or

show

– cfgcreate/delete/add/remove/show

– zonecreate/delete/add/remove/show

– alicreate/delete/add/remove/show

• Every switch in the fabric has the same copy of the entire database.

• To clear the zone database from a switch, use cfgclear

Alias

• An alias is a name for a device in the fabric

• The alias contains the name of the devices, and either the WWN of the device, or the

domain and port the device is attached to

• WWN alias: alicreate “alias1”,”10:00:00:00:01:01:02:02”

• Port alias: alicreate “alias2”,”100,15”

What is a zone member?

Zone members are the devices within the same assigned zone. See Figure 2. Zone member

devices are restricted to intra-zone communications, meaning that these devices can only

interact with members within their assigned zone. A zone member

cannot interact with devices outside its assigned zone unless it is configured in other zones.

27

Figure 2: Zone Members

How is a zone member identified?

Each zone member is identified by a WWN or port number. each device has a unique WWN.

A WWN is a 64-bit number that uniquely identifies each zone member.

What is a zone set?

A zone set is a group of zones that function together on the SAN. Each zone set can

accommodate up to 256 zones. All devices in a zone see only devices assigned to their zone,

but any device in that zone can be a member of other zones. In Figure 3,

all 4 zones see Member A.

28

Figure 3: Zone Set

Configurations

• A configuration is a set of zones.

• You can have multiple defined configurations, but only one active configuration in a

fabric at any time.

• cfgcreate “cfg1”,”zone1”

• To enable a configuration, use ‘cfgenable “config1”. This is now called the effective

configuration

• To disable the effective configuration, use cfgdisable command. Note when you

disable zoning that all devices can now see each other!

29

Zone Commit

• A zone commit is the process of updating all switches in the fabric when making a

zone change

• Zone commit is executed for cfgdisable, cfgenable, or cfgsave commands

• Zone commit uses RCS protocol. The switch making the commit communicates with

each switch individually to ensure commit took place

• When zone commit takes place, entire zoning database is sent to all switches even if

only a little change has taken place.

30

RCS [Reliable Commit Service]

• RCS is used for zoning, security, and some other things.

• For zoning, RCS ensures a zone commits happens for every switch in the fabric, or

none at all

• 4 phases to RCS: ACA, SFC, UFC, RCA

Zoning limitation

Currently, fabric zoning cannot mask individual tape or disk storage LUNs that sit behind a

storage-subsystem port. LUN masking and persistent binding are used to isolate devices

behind storage-subsystem ports.

Components of FC-SAN

While SAN configurations can become very complex, a SAN can be simplified to three basic

entities;

The host system or systems, the network and the storage device

1. Host System(s)

• Application Software (SAN Management Software, CLI Interface and others)

• Middleware (e.g., Volume Manager or Host RAID)

• Operating System/File System

• Host Bus Adapter (HBA) Driver

• Host Bus Adapter (HBA)

• Host Bus Adapter Firmware

2. Storage Network/Communications Infrastructure

• Physical Links (FC, ISCSI, Ethernet)

• Transceivers (GBIC & SFP or any other Transreceiver)

• Switches and Switch Firmware (Switches & Directors)

• Routers and Router Firmware

• Bridges or Extenders and their Firmware

3. Storage Device(s)

• Interface Adapter

• Interface Adapter Driver/Firmware

31

• Storage Controller Firmware

• Storage Device (e.g., disk, JBOD, Storage Arrays, Tape or Tape Library)

• Storage Media

Storage Area Network Management

1. Storage Management Software

2. SAN Protection and Security

3. Storage Backup, Disaster Recovery & Data Replication.

1. SAN Management Software

Though typically spoken of in terms of hardware, SANs very often include (or require)

specialized software for

their operation. In fact, configurating, optimizing, monitoring, and securing a contemporary

SAN will almost certainly involve advanced software, particularly centralized management

tools. When considering more complex options, such as High Availability configurations,

selecting the proper management software can be just as critical as choosing the

equipment. Though somewhat recent in its development, SAN management software

borrows heavily from the mature ideas, benefits, and functionality that have been available

for traditional LANs and WANs. Ideally, this new category of software would be universal and

work with any SAN equipment. But in today's multi-vendor and hardware diverse SAN

environments, this software is very often proprietary and/or tied to certain products and

vendors. While this situation is beginning to change, SAN management software today must

be selected with great care. Much consideration has to be given to the SAN equipment

manufacturers, OS platforms, firmware revisions, HBA drivers, client applications, and even

other software that may be running on the SAN. Until SAN management software becomes

very universal, it will continue to be quite important, and even vital, to work closely with

product (and total solution) providers in order to successfully implement, and realize, the

best features that SANs have to offer.

Using Management Software following actions can be done

32

Drive Management (Fail drive, Rebuild drive, initialize drive, online/offline drive,

Hotspare drive, Drive FW Upgrade/Downgrade).

Controller Management (Volumes ownership, active/passive mode, online/offline,

Controller Firmware & NVSRAM Upgrade/Downgrade)

Storage Array Management

( Array Management: Array Profile, Add/Remove Array, Rename or Modify Array, Reset

Array Configuration, Monitor Performance, Event DataBase, Collect Logs, Connectivity

Status.

Storage Management: Create Logical drive/ LD Group, Delete LD/ LDGroup, Modify LD

Capacity (DCE & DVE), DRM, Modify LD Settings, LUN Mapping, LUN Masking)

Switch management can be done using Switch Vendor Management

Softwares.

Fig. SAN Storage Array

Storage Array

33

In data storage, an array is a Hardware consisting of multiple Storage devices designed to

full-fill the need for Data storage.

It consists of one or more Controller modules (boxes) and one or more Drive

Modules(boxes).

Command Module

The command module is the housing for the controllers. It allows the user to “hot swap”

controllers and components. This is made possible due to the redundant nature of the

controller canister. The components that the user can hot swap include redundant power

supplies, batteries, fans, and communications. Additionally, it houses two controllers that

are hot swappable. Below is the front and back of a command module.

Controller

The controller is the “brains” behind the array. It can be loaded with different controller

firmware code that will enable different features which the array can perform. The

controllers are redundant (there are two housed in the command module) and they are hot

swappable. One controller can fail and the other will control the array until the failed

controller is replaced. The Heartbeat light should be blinking during normal operation.

Below is the front view of a controller.

Gigabit Interface Converter (GBIC) and Small Form Factor Pluggable(SFP) - The

GBICs and SFPs allow us to connect our host and drive trays to the controller canister

through fibre channel cables. There are two sizes of: fibre channel mini-hubs, GBICs, and

cables. The LC small form factor connector cable corresponds to the 2 Gb/s fibre channel

while the SC larger connector cable corresponds to the 1 Gb/s fibre channel. Below are the

1 Gb/s GBIC, and the 2 Gb/s SFP.

1 Gb/s GBIC 2 Gb/s SFP

FIRMWARE

A type of software on controllers, drives, or any other storage components that contains

instructions for their operation.

34

it includes the RAID algorithms and other features implemented & the real-time kernel, the

Diagnostics Manager, the firmware to initialize the hardware, and the firmware to upload

and initialize the other parts of the downloadable firmware.

NVSRAM

Acronym is Non-Volatile Static Random Access Memory-

A controller file that specifies default settings for the controller. The file uses either a

permanently connected battery or takes advantage of the non-volatile cache to store data

indefinitely in the event of a power failure

VOLUME or Logical Drive

A volume is a region of storage that is provided by the controller and is visible to external

I/O hosts for data access. Each volume has a RAID level associated with it. A given volume

resides in exactly one volume group.

VOLUME GROUP or Logical Drive Group

A volume group is a collection of volumes whose storage areas reside on the same set of

physical drives in the array. A volume group contains one or more volumes and consists of

one or more physical drives. A volume group comes into existence when the first volume is

created on it.

HOTSPARING

The purpose of the Hot Spare drives is to serve as "immediate" replacements for failed

drives that had been configured as part of a storage volume. When a configured drive failes,

the firmware automatically recognizes that the failure has occurred and will select one of the

Hot Spare drives to replace the failed drive; data reconstruction begins immediately to the

selected spare once it's been integrated into the volume containing the ailed drive. Once

reconstruction has completed and the user has replaced the original failed drive, data from

the Spare will be copied back to the original drive's replacement. Once the copy-back

operation is complete, the Hot Spare is returned to the spare pool. All reconstruction and

copy-back operations happen without significant interruption to I/O processing for the

affected storage volume.

Dynamic RAID Migration

The Dynamic RAID Migration feature provides the ability to change the RAID level for the

volumes of a drive group. Changing the RAID level causes the volumes in the drive group to

be reconfigured such that the data is mapped according to the definition of the new RAID

level.

Dynamic Capacity Expansion

The Dynamic Capacity Expansion feature provides the ability to add drives to a drive group.

Adding drives to a drive group causes the volumes to be reconfigured such that the data is

spread over the drives in the newly expanded drive group. After reconfiguration, all unused

35

capacity is evenly distributed across all drives following the last volume. This unused

capacity may be used to create additional volumes on the drive group.

Dynamic Volume Expansion

The Dynamic Volume Expansion feature provides the ability to increase the size of a volume

if there is sufficient amount of free capacity on the drive group. If there is not enough free

capacity the DVE can be coupled with Dynamic Capacity Expansion to add additional

capacity.

Active-Active Controller Setup Mode

In a traditional Active-Active configuration, both controllers are working concurrently to

serve host I/O requests and transfer data. In this mode, when both controllers are operating

normally, the system is theoretically able to handle twice the workload and traffic, doubling

the speed of the system compared to the Active-Passive configuration.

However in practice, performance increases are much less significant. In the event of a

controller failure in traditional Active-Active configurations, the remaining controller

automatically assumes responsibility for handling all I/O requests and data transfer. Once

the failed controller is replaced, the controllers will automatically read the configuration of

drives and LUNs in the system, and return to normal operation.

36

Active-Passive Controller Setup Mode

Active-Passive is a dual controller configuration where two controllers provide full

redundancy to all disks, disk enclosures, and Fibre Channel host connections. In an Active-

Passive configuration, the primary (active) controller services all host I/O requests and

performs all data transfers, while the passive controllers remains alert to the active

controller’s status using bi-directional heartbeat communications. Typically, the available

space in the RAID array is divided up into an arbitrary number of logical units (LUNs). The

capacity of each LUN can be spread across multiple controller Fibre Channels and disk

drives. In this configuration, both the active and passive controller know the logical volume

configuration. In the

event of a primary controller failure, the passive controller automatically and seamlessly

assumes I/O and data transfer activities without interrupting system performance or

37

operation. It is important to note that one of the advantages to Active-Passive is that there is

no degradation of performance when one controller fails or is taken off-line for maintenance.

SAN Failover Mechanisms

Storage Array or Controller side Failover Mechanisms

RAID controllers generally have two different characteristics for access to the LUNs:

1. Active/Active. & 2. Active/Passive

Higher-end and enterprise controllers are always Active/Active. Mid-range and lower-end

controllers can be either. How the controller manages internal failover and your server side,

software and hardware will have a great deal to do with your choices for accomplishing HBA

and switch failover. Before developing a failover or multipathing architecture, you need to

fully understand the issues with the RAID controller.

With Active/Active

controllers, all LUNs are seen and can be written to by any controller within the RAID.

Generally, with these types of RAID controllers, failover is not a problem, since the host can

write or read to any path. Basically, all LUN access is equal, and load balancing I/O requests

and access to the LUNs in case of switch or HBA failover is simple. All you have to do is write

to the LUN from a different HBA path.

Active/Passive Increases Complexity

If your RAID controller is active/passive, the complexity for systems that require HBA failover

can increase greatly. With active/passive controllers, generally the RAID system is arranged

in a controller pair where both controllers see both LUNs, but LUNs have a primary path for

access to a LUN and a secondary path. If the LUN is accessed via the secondary path, the

ownership of the LUN changes from the primary path to the secondary path.

This is not a problem if the

controller has failed, but if the controller path has failed, either the HBA or switch and other

hosts are accessing that LUN via its primary path. Now each time one of the other LUNs

accesses the LUN on the primary path, the LUN moves from ownership on the secondary

path to ownership on the primary path. Then when the LUN is again accessed on the

secondary path, the LUN fails over again to the secondary path. This ping-pong effect will

eventually cause the performance of the LUN to drop dramatically.

Host-Side Failover Options

38

On the host side, there are three options for HBA and switch failover, and in some cases,

depending on the vendor, load balancing of I/O requests across the HBAs. Here they are in

order of hierarchy in the operating system:

1.Volume manager and/or File system failover

2. A failover and/or load balancing driver failover

3. HBA driver failover

Each of these has some advantages and disadvantages — what they are

depends on your situation and the hardware and software you have in the configuration.

In the drawing below, we have an example of a mid-range RAID controller connected

in an HA configuration with dual switches and HBAs, and with a dual-port RAID controller for

both Active/Active and Active/Passive.

Fig. Active/Active Controller Setup Active/Passive Controller Setup

With an Active/Active RAID controller configuration, the failover software knows the path to

each of the LUNs and ensures that it will be able to get to the LUN through the appropriate

path. With this Active/Active configuration, you could access any of the LUNs via any of the

HBAs with no impact on the host or another host, and both controllers can equally access

any LUN. If this were an Active/Passive RAID controller, it

would be critical to access LUNs 0, 2 and 4 with primary controller A if a switch or HBA

failed. You would only want to access LUNs 0, 2, and 4 from controller B if controller A failed.

If a port on controller A failed, you would want to access the LUNs via the other switch and

39

port and not via controller B. If you did access via controller B, and another host accessed

the LUNs via controller A, the ownership of the LUNs would pong-ping and the performance

would plummet.

Volume Manager and File System Failover Options

Volume managers such as Veritas VxVM and file systems such as ADIC

StorNext and a number of Linux cluster file system vendors understand and are able to

maintain multiple potential paths to a LUN. These types of products are able to determine

what the appropriate path to the LUN should be, but oftentimes for Active/Passive

controllers, it is up to the administrator to determine the correct path(s) to access the LUNs

without failing over the LUNs to the other controller unnecessarily.

Failover at this layer was the initial type of HBA and storage failover available for Unix

systems. Failover at the file system layer allows the file system itself to understand the

storage topology and load balance it. On the other hand, you could be doing a great deal

more work in the file system that might belong at lower layers that have more information

about the LUNs and the paths. Volume managers and file system multipathing also support

HBA load balancing.

Loadable Drivers

Loadable drivers from vendors such as EMC (PowerPath) and Sun (Traffic Manager) are

examples of loadable drivers that manage HBA and switch failover. You need to make sure

that the hardware you plan to use with these types of drivers is supported.

For example, according to the EMC Web site, EMC PowerPath currently supports only EMC

Symmetrix, EMC CLARiiON, Hitachi Data Systems (HDS) Lightning, HP XP (Hitachi OEM) and

IBM Enterprise Storage Server (Shark). According to Sun's Web site, Sun Traffic Manager

currently supports Sun Storage and Hitachi Data System HDS Lightning.

Other vendors are developing products that will provide similar functionality. As with the

volume manager and file system method for failover, loadable drivers also support HBA load

balancing as well as failover.

HBA Driver Failover

40

HBA drivers on some systems provide the capability for the drive to maintain and

understand the various paths to the LUNs. In some cases, this failover works only for

Active/Active RAIDs, and in other cases, depending on the vendor and the system type, it

works for both type of RAIDs (Active/Active and Active/Passive). Since HBA drivers often

recognize link failures and link logins faster than other methods, using this failover

mechanism generally allows for the fastest resumption of I/O, since at the lowest level you

have the greatest knowledge.

SAN Features for DATA Backup, Replication and High Availability

Snapshots for Backup

Snapshots can be of two types Point-in-time Image of actual Volume and Full image

of the Actual Volume.

Point-in-time Image of actual Volume:

A logical point-in-time image of actual volume & it can be simply called as Snapshot.

A snapshot is the logical equivalent of a complete physical copy, but you create it much

more quickly than a physical copy and it requires less disk space.

A repository Volume is additional volume associated to Snapshot and it saves overwritten

blocks from the actual or Base volume. The repository volume contains the original image of

any modified data along with meta-data describing the location where it is stored in the

repository. The repository volume is not accessible to external hosts.

Point-in-time image (snapshot) can be created using "copy-on-write" scheme.

Note:

à Exactly one repository volume is created per snapshot.

à Increasing the capacity of the base volume does not change existing snapshots.

41

à When a base volume is deleted, all associated snapshot volumes and their repositories

are also deleted.

à When a snapshot is deleted, its associated repository volume is also deleted.

The Snapshot allows the end-user to quickly create a single point-in-timeimage or

"snapshot" of a volume. The primary benefit of this feature is data backup. Online backup

images can be created periodically during the course of the day without disrupting normal

operations.

Full image of the Actual Volume (Cloning):

The Volume Full Copy or Clone is used to copy data from one volume (the source) to

another volume (the target) on a single storage array. This feature can be used to back

up data, to copy data from volume groups that use smaller capacity drives to volume

groups using greater capacity drives, or to restore snapshot volume data to the

associated base volume.

When you create a volume copy, a copy pair is created, which consists of a source

volume and a target volume that are located on the same storage array. The source

volume is the volume that accepts host I/O and stores data. The source volume can be a

standard volume, snapshot volume, base volume of a snapshot volume.

When a volume copy is started, data from the source volume is copied in its entirety to

the target volume. The source volume is available for read I/O activity only while a

volume copy has a status of In Progress, Pending, or Failed. After the volume copy is

completed, the source volume becomes available to host applications for write requests.

A target volume contains a copy of the data from the source volume. The target volume

can be a standard volume, a base volume of a failed or disabled snapshot volume. While

the volume copy has a status of In Progress, Pending, or Failed, read and write requests

to the target volume will be rejected by the controllers. After the volume copy is

completed, the target volume automatically becomes read-only to hosts, and writes

requests to the target volume will be rejected. The Read-Only attribute can be changed

after the volume copy has completed or has been stopped.

Additionally, volume copy can be used to redistribute data — moving volumes from

older, slower disk drives to newer, faster, or higher capacity drives — to optimize

application performance and/or capacity utilization.

Remote Volume Mirroring:

Remote Volume Mirroring allows for protection against and recovery from disasters or

catastrophic failures of systems or data centers. When a disaster occurs at one site the

secondary (or backup) site takes over responsibility for computer services. RVM maintains a

42

fully synchronized image of key data at the secondary site so that no data is lost and

minimal interrupt of overall computing services occurs if a disaster or failure occurs.

It is a controller-level, firmware-based mechanism for ensuring fully synchronized data

replication between the primary and secondary sites. A mirroring relationship is comprised

of exactly two volumes, each residing on separate arrays. One volume acts in a primary

role, servicing host I/O, the other acts as the backup secondary volume. Replication is

managed on a per-volume basis. This allows the storage administrator to associate a

distinct remote mirror volume with any/every primary volume of a given storage array. A

given array's primary volumes can be mirrored to secondary volumes that reside on multiple

distinct remote storage arrays. The following figure shows one possible configuration with a

primary and backup data center.

Mirroring relationships are established at the volume level between two

storage arrays. Our terminology refers to the primary volume as the one receiving host I/O,

the secondary volume is the stand-by mirrored image of the primary. The array controllers

manage synchronization activities, both in the initial image synchronization from primary to

secondary and replicating host write data.One host channel of each array is dedicated to

inter-array data movement. The dedicated host port of the two arrays must be connected

via fibre channel fabric with name service. The name service function allows the two arrays

to locate each other in the fabric network and perform the required login initialization.

2. SAN Protection and security

When the word “security” is used in association with a SAN, thoughts can easily lead to

computer hackers infiltrating the network and causing havoc. Although hacker invasions are

43

a concern, there is another security issue associated with a SAN that must be addressed,

and that is the issue of technology containment. For example, Windows NT servers would

naturally claim every available Logical Unit Number (LUN) visible to them. In brief,

technology containment keeps servers from gaining unauthorized or accidental access to

undesignated areas within the SAN. The two major areas of concern with SAN

implementations are data access and fabric

management security.

Security at Different Stages

Open systems offer many different file systems, volume and disk management formats and

software requiring that security issues be considered and then implemented during the SAN

design and development phase, for the following reasons:

A. Data access and security

B. Fabric management and security (protection from outside threats)

C. Higher levels of availability to data and the applications that use the data

2.A. Data access and security

2.A.I. Questions concerning data access and security

Concerning data access and security on a SAN, consider the following questions:

1. How can we segregate operating systems at the port level on the SAN fabric?

• It is not advisable to have Windows NT and Sun Solaris systems accessing the same RAID-

array port on the SAN fabric because Windows NT will attempt to write disk signatures to all

new disk LUNs it finds attached to the SAN fabric. That creates the need for a network fabric-

enforced way of segregating ports into logical groups of visibility.

2. How can we segregate different application types on the SAN fabric?

• For example, it may be necessary to ensure that finance systems on the SAN fabric cannot

access the data owned by engineering systems, or web systems. That creates the need for a

fabric-enforced way of grouping ports on the SAN fabric into zones of visibility based on

application, function, or departmental rules.

44

3. How can we isolate any single LUN on an array, permitting only a certain host(s) access to

that LUN and no others?

• A basic advantage of a SAN is that a large number of hosts can share expensive storage

resources. As it concerns RAID storage subsystems, this demands that multiple hosts have

access to its disk storage LUNs through a single-shared port on the array. Therefore, it is

necessary to employ security methods to ensure that LUNs behind a port are accessible only

by the intended hosts. Without special

software and architectures to manage multi-host block-level read/write access (when

multiple systems access the same LUN concurrently), data corruption or data loss could

occur.

4. How can we, from the host side, ensure that hosts see their storage ports and storage

LUNs consistently when adding new storage LUNs, and after each reboot?

• In the world of SANs, the assignment of Small Computer System Interface (SCSI) target IDs

is moved from the storage side to the host/Fiber Channel (FC) Host Bus Adapter (HBA) side.

Thus, SCSI target IDs can be dynamically reassigned as new storage LUNs are added to an

individual host via the SAN. Since this feature is a fundamental advantage of SAN

architectures, the assignment of SCSI target IDs requires management to ensure their

consistency across storage subsystems, SAN fabrics, and after host configuration changes.

2. A.II. Data access and security methodologies

The following are data access and security methodologies:

• Fabric zoning is fabric-centric enforcement: It provides a fabric port-and host/

storage-level point of logical partitioning and can help ensure that different OS types or

applications are partitioned on the SAN. Fabric zoning is managed and enforced on the SAN

fabric. Fabric zoning cannot mask individual LUNs that sit behind a port. All hosts connected

to the same port will see all the LUNs addressed through that port.

• LUN Masking is RAID storage subsystem-centric enforcement:

LUN Masking is configured at the RAID storage subsystem level; this helps ensure that only

designated hosts assigned to that single storage port could access the specified RAID LUN.

LUN masking is a RAID system-centric enforced method of masking multiple LUNs behind a

single port. LUN masking configuration occurs at the RAID-array level, using World Wide Port

Names (WWNs) of server FC HBAs. See Figure 4. LUN masking allows disk storage resource

sharing across multiple independent servers.

45

With LUN masking, a single large RAID subsystem can be subdivided to serve a number of

different hosts that attach to it through the SAN fabric. Each LUN (disk slice, portion, unit)

inside the RAID subsystem can be limited so that only one or a limited number of servers

can see that LUN.

LUN masking can occur either at the server FC HBA or at the RAID subsystem (behind the

RAID port). It is more secure to mask LUNs at the RAID subsystem, but not all RAID

subsystems have LUN masking capability; therefore, some FC HBA vendors allow persistent

binding at the driver level to mask LUNs.

Figure 4: LUN Masking

• Persistent Binding is host-centric enforcement:

This consistently forces a host to see a specific storage-subsystem port as a particular SCSI

target. Persistent binding also helps ensure that a specific storage-subsystem port on the

SAN is always seen as the same SCSI Target ID on the host, across the host and fabric, and

throughout storage configuration changes. OS and upper-level applications (such as LAN-

free backup software) typically require a static or predictable SCSI Target ID for storage and

reliability purposes.

Persistent binding is a host-centric enforced way of directing an operating system to assign

certain SCSI-target IDs and LUNs. For example, where a specific host will always assign SCSI

ID 3 to the first router it finds, and LUNs 0, 1, and 2 behind the port to the three-tape drives

attached to the router, as shown in Figure 5. Operating systems and upper-level applications

46

(such as backup software) typically require a static or predictable SCSI target ID for their

storage reliability—persistent binding makes that possible.

Figure 5: Persistent Binding

* LUN Mapping, in addition to persistent binding, is another host-centric method of storage

visibility management. LUN Mapping selectively allows a system administrator to scan for

specified SCSI targets and LUNs at storage-driver boot time and to ignore selectively non-

specified SCSI targets and LUNs.

The advantage of LUN Mapping is that it provides a level of security management in SANs

where LUN Masking is not an option, perhaps because it is not supported on the storage

hardware. The disadvantage is that LUN Mapping is configured and enabled on a host-by-

host basis. It requires good coordination among the administrators of the systems sharing

the storage, which ensures that only one host sees certain storage unless planned, as in a

clustered server configuration.

2.B. Fabric management and security (protection from outside threats)

2.B.I. Questions concerning SAN fabric-level security

Concerning SAN fabric-level security, consider the following questions:

1. How can we manage switch-to-switch security on the SAN fabric; also, how can we

enforce security policies that prohibit unauthorized switches or hosts from attaching to the

SAN fabric?

47

• In early SAN infrastructures, additional switches (configured with a default password and

login) could easily attach to an existing operating SAN fabric, and that new non-secure

switch could be used as a single point of configuration administration for the entire SAN

fabric. There is a need for technologies that enforce access control at the fabric-level, and

ensure only authorized and

authenticated switches can be added to the fabric.

2. How can we centrally manage security and configuration changes on a SAN fabric?

• In the initial phases of SAN evolution and even today, large SAN fabrics are frequently

composed of many 8- or 16-port FC switch-building blocks. Each switch features both in-

band and out-of-band management components (Simple Network Management Protocol

(SNMP), telnet, etc.), and a switch-centric security control model. As large SANs evolve, so

does the need for technologies

to centrally control security, in regards to SAN data access and fabric management; also, to

minimize the number of administrative access and security control points on the SAN fabric.

3. How can we ensure that only authorized hosts connect to the SAN fabric and to a specific

port designated by an administrator?

• Initially, in SAN configurations, a host FC HBA could attach to any point in a SAN fabric and

if the FC HBA was capable of basic SAN fabric login, that FC HBA became a participating

member of the SAN fabric. There is a need for technologies that allow a fabric-centric

method of access control for determining which hosts can attach to a specific port or switch

on the SAN fabric. This would

prevent a rogue attacker with a Windows NT system and a FC HBA from attaching to a non- -

secure SAN fabric for the purpose of configuration changes, or data access.

4. How can we ensure that the tools used to manage the SAN fabric, and SAN management

requests are coming from an authorized source?

• Multiple in-band and out-of-band methods are used to manage SAN fabric configurations. A

tunnel of communication must exist between SAN management consoles and frameworks,

and the targeted SAN fabric being managed. That tunnel of communication must be secure

and confirmed as authentic to prevent an attacker from using a management tool to access

the nonsecure SAN fabric.

5. How can we ensure that configuration changes on the SAN fabric are valid when there are

multiple points of configuration management?

• In early SAN configurations, multiple administrators could log into different switches on the

same SAN fabric and perform fabric-configuration changes concurrently. After enabling and

propagating that configuration changes fabrics wide, corruption could occur due to

configuration conflicts. Corruption of the SAN fabric usually occurs when configuration

48

changes are made through multiple points on the SAN fabric. There is a need for

technologies that ensure SAN fabric configuration changes only occur through a central and

secure point on the SAN fabric, and that those configuration changes do not cause

configuration conflicts.

2.B.II. Fabric Management and security Technologies

The following technologies protect and manage the fabric:

• Fabric-to-Fabric Security technologies allow Access Control Lists (ACLs) to allow or

deny the addition of new switches to the fabric. Public Key Infrastructure (PKI) technology

may be applied as a mechanism for validating the identity of the new switch. Also, fabric-

wide security databases help ensure that all new authorized switches added to the fabric

inherit fabric-wide security policies, so

that a new out-of-the box switch does not become a non-secured access point. • Host-to-

Fabric Security technologies can apply ACLs at the port-level on the fabric, and allow or deny

a particular host’s FC HBA to attach to that port. This would prevent an unauthorized

intruder host from attaching to the fabric via any port. The host’s ability to log into the fabric

is clearly defined and is allowed with

this model.

• Management-to-Fabric technologies can use PKI and other encryption (such as MD5)

technologies to ensure a trusted and secure management console-to-fabric communication

layer exists. This will help ensure that the management console or

framework used to control the SAN fabric is valid and authorized.

• Configuration Integrity technologies ensure that propagated fabric configuration

changes only come from one location at a time, and are correctly propagated to all switches

on the SAN fabric with integrity. Distributed lock managers can ensure

that only serial and valid configuration changes are enabled on the fabric.

3.A - Backup Solutions

Data backup methods

There are three effective methods used to backup data:

A. Distributed

B. Centralized (conventional)

C. SAN

3.A.1. Backups in distributed environments

49

In distributed environments, storage subsystems are directly attached to servers. See Figure

2. Distributed backups require IT personnel to touch each system physically (i.e., handling

tapes) to perform backup operations. If the server data exceeds the tape capacity (which is

usually the case), the IT person must monitor the operation and reload new tapes at the

proper time.

Distributed environments are fragmented in the following circumstances:

• When storage is isolated on individual servers (storage islands)

• When there are point-to-point SCSI connections only

• When there is a one-to-one relationship between servers and storage subsystems, creating

storage islands, which scale poorly and are difficult to centrally manage.

Figure 2: Distributed Backup Environment

3.A.2. Backups in conventional centralized environments

In conventional-centralized environments, a storage subsystem is attached to one server,

and all other systems are backed up to that storage subsystem through the server and over

the Local Area Network (LAN). See Figure 3. Conventional centralized backups limit

management overhead to a single storage subsystem. The challenge is not managing the

storage subsystem, but getting the data to it. Conventional-centralized backup solutions rely

on an Internet Protocol (IP) network as the data path. The problem with this is that the

Transmission Control Protocol/Internet Protocol (TCP/IP) processing associated with

transporting the sheer volume of data can adversely impact server CPU cycles. This results

in long-backup cycles that exceed the scheduled backup window. Therefore,

conventionalcentralized backups often overflow into user uptime, resulting in poor network

response and generally unacceptable server performance.

This method is an improvement over the distributed method, but it still has inefficiencies:

Pros:

50

• Centralizes the storage in fewer locations and on fewer platforms

• Requires fewer backup servers and software packages

• Uses centralized administration

• Results in fewer human errors

Cons:

• Backup bottlenecks develop on the LAN

• Bottlenecks become more frequent as storage needs grow

• Still managing multiple separate backup servers

• Typically uses the same LAN for production and data backups

• Many-to-one relationship between servers and the storage subsystem

Figure 3: Conventional-Centralized Backup Environment

3.A.3. Backups in SAN environments

In SAN environments, storage subsystems are attached to the SAN fabric where all servers

potentially have equal access to them. See Figure 4. SANs offer the following efficiencies

and advantages over conventional-centralized and distributed backup methods:

• The entire storage-network infrastructure can be off-loaded from the LAN, promoting LAN-

free backups—20% or more of LAN traffic can be due to backups

• Significant improvements in backup times, since data is moved at Fibre Channel (FC)

speeds over dedicated storage networks, rather than at Ethernet speeds over a shared

network

• Fewer network interruptions when adding incremental storage hardware

• Reduces or eliminates backup windows

• Promotes on-the-fly scaling (non-disruptive) rather than set-planned downtime windows

• Extends the life expectancy of servers

51

• Enables off-host backups where data transfers directly from storage disks to tape libraries,

bypassing the server, and reducing server loads

Figure 4: Storage Area Network

One of the most valuable time and cost saving features of SAN architecture is its ability to

offload backup operations from LANs and servers. This capability can significantly increase

the available bandwidth on a LAN to network clients and end users during backup

operations. When traditional backup servers are relieved from "handling" backup data, they

can be repurposed and made available for other tasks.

Traditional Tape Drive Backup

52

SAN (LAN-free) backup

SAN technology provides an alternative path for data movement between the Storage

Manager client and the server. Shared storage resources (such as disk and tape) are

accessible to both the client and the server through the SAN.

Data is off-loaded from the LAN and from the server processor, which can create greater

scalability.

LAN-free backups decrease the load on the LAN by introducing a storage agent. The storage

agent can be perceived as a small Storage Manager server (without a database or recovery

log) that is installed and run on the Storage Manager client machine. The storage agent

handles the communication with the Storage Manager server over the LAN but sends the

data directly to SAN attached tape devices, relieving the Storage Manager server from the

actual I/O transfer.

A LAN-free backup environment is shown in Figure

LAN-free backup solutions can optimize backup operations by offloading backup traffic

from a LAN to a SAN, thereby increasing the amount of LAN bandwidth.

53

SAN (Server-less) backup

Serverless Backup, on the other hand, extends these performance gains even further by

offloading more than 90 percent of the administrative Server-free backup, are made possible

by a SAN's flexible architecture and can improve overall performance significantly. Even

storage reliability can be greatly enhanced by special features made possible within a SAN.

Options like redundant I/O paths, server clustering, and run-time data replication (local

and/or remote) can ensure data and application availability.

Adding storage capacity and other storage resources can be accomplished easily within a

SAN, often without the need to shut down or even quiesce the server(s) or their client

networks. These, and other, features can quickly add up to big cost savings, painless

expansion, reduced network loading, and fewer network outages. burden that is usually

placed upon a dedicated backup server as backups are performed. This is typically achieved

by embedding some of the backup intelligence into the data storage devices themselves

(RAID systems and tape drives) or SAN connectivity peripherals (switches, hubs or bridges).

This can free up traditional backup servers significantly by releasing them from data moving

duties and large portions of a backup operation's administration. When implemented

properly, these SAN based backup solutions let administrators optimize network and server

utilization, dramatically shorten backup times, and regain processor and network resources.

Figure. Server-Free Backup

SAN data backup and access benefits

SANs promote the following benefits:

• Improved data availability and performance speed

54

• Number of connections to storage subsystems can be easily scaled for both availability

and performance

• Access to data is faster, easier, and more reliable

3.B - Disaster Recovery

Planning a backup and restoration of files for disaster recovery

Planning a backup and restoration of files is the most important step to protect data from

accidental loss in the event of data deletion or a hard disk failure. The backup copy can be

used to restore lost or damaged data. For taking backups and restoring files, Microsoft has

provided a utility called Backup.

The Backup utility creates a copy of data on a hard disk of a computer and archives data on

another storage media. Any storage media such as removable disks, tapes, and logical

drives can be used as a backup storage.

While taking a backup of files, the Backup utility creates a volume shadow copy of the data

to create an accurate copy of the contents. It includes any open files or files that are being

used by the system. Users can continue to access the system while the Backup utility is

running without the risk of losing data.

Volume Shadow Copy

Backup provides a feature of taking a backup of files that are opened by a user or system.

This feature is known as volume shadow copy. Volume shadow copy makes a duplicate copy

of all files at the start of the backup process. In this way, files that have changed during the

backup process are copied correctly. Due to this feature, applications can continue writing

data to the volume during a backup operation, and backups can be scheduled at any time

without locking out users.

Types of Backups

The Windows Backup utility provides various types of backups. While planning for a backup

strategy, it is important to choose an appropriate type or combination of different types of

backups. The backup type determines which files are transferred to the destination media.

55

Each backup type relates to an attribute maintained by every file known as archive (A). The

archive attribute is set when a file is created or changed. When an archive attribute is set, it

means that the backup

Of this file has not been taken or it is due.

Note: When it is said that "The file is marked as backup", it means that the

archive attribute of the file has been cleared.

NormalBackups

When an administrator chooses to use a normal backup, all selected files and folders are

backed up and the archive attribute of all files are cleared. A normal backup does not use

the archive attribute to determine which files to back up. A normal backup is used as the

first step of any backup plan. It is used with the combination of other backup types for

planning a backup strategy of an organization. Normal backups are the most time-

consuming and are resource hungry. Restoration from a normal backup is more efficient

than other types of backups.

Incremental backups

An incremental backup backs up files that are created or changed since the

last normal or incremental backup. It takes the backup of files of which the archive attribute

is set. After taking a backup, it clears the archive attribute of files. An incremental backup is

the fastest backup process. Restoring data from an incremental backup requires the last

normal backup and all subsequent incremental backups. Incremental backups must be

restored in the same order as they were created.

Note: If any media in the incremental backup set is damaged or data becomes

corrupt, the data backedup after corruption cannot be restored.

Differential Backups

Differential backup backs up files that are created or changed since the last normal backup.

It does not clear the archive attribute of files after taking a backup. The restoration of files

from a differential backup is more efficient than an incremental backup.

Copy Backups

56

A copy backup copies all selected files and folders. It neither uses nor clears the archive

attribute of the files. It is generally not a part of a planned scheduled backup.

Daily Backups

A daily backup backs up all selected files and folders that have changed during the day. It

backs up data by using the modified date of the files. It neither uses nor clears the archive

attribute of the files.

Combining backup types

The easiest backup plan is to take a normal backup every night. A normal backup every

night ensures that the data is restored from a single job the next day. Although the

restoration of data from a normal backup is easy, taking a backup is time consuming. Hence,

an administrator is required to make an optimal backup plan. An administrator must

consider the following points before creating a backup plan:

The time involved in taking the backup.

The size of the backup job.

The time required to restore a system in the event of a system failure.

The most common solutions for the needs of different organizations include the combination

of normal, differential,and incremental backups.

Combination of Normal and Differential Backups

An administrator can use a combination of a normal backup and a differential backup to

save time in taking a backup as well as for a restoration of data. In this plan, a normal

backup can be taken on Sunday, and differential backups can be taken on Monday through

Friday every night. If data becomes corrupt at any time, only a normal and last differential

backup are required to be restored. Although this combination is easier and takes lesser

time for restoration, it takes more time to take backup if data changes frequently.

57

Combination of Normal and Incremental Backups

A combination of normal and incremental backups can be used to save more time for taking

backups. In this plan, a normal backup is taken on Sunday and incremental backups on

Monday through Friday every night. If data becomes corrupt at any time, a normal and all

incremental backups till date are required to be restored.

Backing up a System State Data

System State data contains critical elements of the Windows 2000 and Windows Server

2003 operating systems. Following are the files included in the System State data:

Boot files, including the system files and all files protected by Windows File Protection

(WFP).

Active Directory (on domain controller only).

SYSVOL (on domain controller only).

Certificate Services (on certification authority only).

Cluster database (on cluster node only).

Registry.

IIS metabase.

Performance counter configuration information.

Component Services Class registration database.

For backing up the System State of a computer, the System State node is included as a part

of the backup selection in the Backup utility.

Note: On domain controllers, System State can be restored only by restarting the

domain controller in Directory Services Restore Mode. NTDSUTIL is used to

recover deleted objects in Active Directory.

System Recovery

In the event of a system failure, the recovery of the system is difficult and tedious for

administrators. Recovery involves reinstallation of the operating system, mounting and

cataloging the backup tape, and then performing the full restore. To make this process

easier, Windows provides a feature called Automated System Recovery (ASR). ASR is used to

perform a restore of the System State data and services in the event of a major system

failure. An ASR restore includes the configuration information for devices. ASR backs up the

system data and local system partition.

58

How to create an ASR set?

Take the following steps to create an Automated System Recovery (ASR) set by using the

Backup or Restore Wizard:

1.Run Backup from Start Menu > Programs > Accessories > System Tools > Backup.

2.In the welcome screen of the Backup or Restore Wizard, click the Advanced Mode

link.

3.On the welcome page of the Advanced Mode of the Backup utility, choose the ASR

Wizard option from the Tools menu.

4.In the welcome screen of the ASR Wizard, click the Next button.

5.On the Backup Destination page, specify the location of the backup, and click the

Next button.

6.Click the Finish button.

Note: An ASR backup does not include folders and files.

Best practices for Backup

According to Microsoft, administrators should take the following steps to ensure the recovery

in case of a system failure:

Develop backup and restore strategies and test them.

Train appropriate personnel.

In a high-security network, ensure that only administrators are able to restore files.

Back up all data on the system and boot volumes and the System State.

Back up the data on all volumes and the System State data at the same time.

Create an Automated System Recovery backup set.

Create a backup log.

Keep at least three copies of the media. Keep at least one copy off-site in a properly

controlled environment.

Perform trial restorations.

Secure devices and media.

Do not disable the default volume shadow copy backup method and revert to the

pre-Windows Server 2003 backup method.

Back up your server cluster effectively.

Back up the cluster disks from each node.

3.C - Data Replication

59

Data Replication provides many benefits in today's IT environments. For

example, it can allow system administrators to create and manage multiple copies of vital

information across a global enterprise. This enables disaster recovery solutions, maximizes

business continuity, and permits file server content to be distributed over the Internet.

Replication options can even improve host processing efficiency by moving data sets onto

secondary (often remote) servers for backup operations. In some cases, these data

replication capabilities are required by the "high availability" and "server clustering"

features provided by many of today's SAN architectures. Remote data replication is typically

achieved with one of two basic strategies:

Storage replication is focused on the bulk transfer of files, or block data, from one server

to "one or more" other servers. This type of replication generally allows application(s) to be

running on a server while they, and/or their data, are being replicated to another off-site

server.

Application level replication is specific to a particular application, such as a database or

web server, and is typically performed at the transaction level (field, row, table, etc.) by the

application itself. Many replication products include the ability to transfer data

Synchronously or Asynchronously.

With synchronous transfers, each packet of transmitted data is acknowledged by the

receiving server before more data is sent to it. This can be a slower form of replication, but

is very reliable. Asynchronous data transfers allow data packets to be sent ahead of

acknowledgements from the receiving server for previously sent packets. This method is

usually faster but allows more data to be lost if links fail. Eventually, in either case, all

transmitted packets must be acknowledged by the receiving system.

60

Documents

SAN Administration Basics