Upload
nithya-raj
View
479
Download
1
Embed Size (px)
Citation preview
1
SAN INTRODUCTION
SAN Introduction:
Problems of DAS:
Throughout the 1980s, the standard way of connecting hosts to storage devices was point-
to-point, direct-attach storage through interfaces such as Integrated Drive Electronics (IDE)
and parallel SCSI. Parallel SCSI offered relatively fast (5 or 10 Mbit/sec) access to SCSI-
enabled disks, and several disks could be connected at once to the same computer through
the same interface. This worked well for the time, with fairly reliable, fast-speed connections
allowing administrators to connect internal and external storage through just simple ribbon
cabling or multiconductor external cables. However, as storage subsystems became larger
and larger and computers faster and faster, a new problem emerged–external storage
(which at one time was just a simple disk drive on the desk next to a machine) started to get
bigger. Tape libraries, Redundant Array of Inexpensive Disks (RAID) arrays, and other SCSI
devices began to require more and more space–requiring the parallel SCSI connection to be
stretched farther and farther away from the host. Input/Output (I/O) rates also increased,
pushing on the physics of keeping signal integrity in a large bundle of wires (32- and 64-bit
data bus widths). Simple parallel SCSI variants were devised to enable longer distances and
to address the signal integrity issues. However, they all eventually ran up against the
difficulties of high-speed signals across the parallel SCSI bus architecture.
DAS Fig.
Solutions from SAN:
The solution to all of this was slow in coming, but eventually the storage industry settled on
using a serial protocol with high-speed transceivers–offering good noise immunity, ease of
cabling, and plentiful bandwidth. Different specifications (Serial Storage Architecture [SSA]
and Fibre Channel as well as more advanced parallel SCSI technologies) competed for
adoption, and companies began experimenting with different serial communications media.
New high-speed circuits made serial transfers (using a simple pair of wires to transmit bits
2
serially, in order, rather than a large number of wires to transfer several bytes or words of
data at a time) the most practical solution to the signal problems.
The high speed of the circuits enabled the data rates for Fibre Channel to offer up to 100
Mbit/sec transfers, versus the slower 10 to 20 Mbit/sec parallel limitations. (At present FC
provides 1\2\4 Gbit/sec transfers).
When Fibre Channel was first applied to the area of storage connections, the primary reason
for the technology was for the extended distances and simplified cabling that the technology
offered. This extension of direct-attach operation basically replaced the old parallel SCSI
attachments with a high-speed serial line (Figure 1.2). The new Fibre Channel connections
offered a much faster interface and simplified cabling (four copper wire connections through
DB-9 connectors, as well as optical cabling), and could be used to distribute storage as far
as 10 km away from a host computer, or 30 km away with optical extenders.
Figure 1.2 Using Fibre Channel to Extend Distances from Storage
The connections to disks at this time began using the Fibre Channel Arbitrated Loop (FC-AL)
protocol, which enabled disks to negotiate their addresses and traffic on a loop topology
with a host (Figure 1.3). Because of the combined ability to easily cable and distribute
storage, users were now able to add separate racks of equipment to attach to hosts. A new
component, the Fibre Channel hub, began to be used to make it easier to plug in devices.
The hub, a purely electrical piece of equipment that simply connected pieces of a Fibre
Channel loop together, made it possible to dynamically add and remove storage from the
network without requiring a complete reconfiguration. As these components began to be
used in increasingly more complex environments, manufacturers began to add "intelligence"
to these Fibre Channel hubs, enabling them to independently deal with such issues as
failures in the network and noise in the network from loops being added and removed. An
alternative to the hub came in the form of the Fibre Channel switch, which, unlike a hub, was
not just connecting pieces of a loop, but instead offered the packet-switching ability of
traditional switches.
3
Figure 1.3 Arbitrated Loop Disk Configurations Attached to a Single Host
Because there was now a Fibre Channel network available, other hosts (not storage) were
added to take advantage of the same network. With the addition of SAN-aware software, it
was suddenly possible to share storage between two different devices on the network.
Storage sharing was the first realization of the modern SAN, with companies in the
multimedia and video production areas paving the way by using the Fibre Channel network
to share enormous data files between workstations, distribute jobs for rendering, and make
fully digital production possible (Figure 1.4).
Figure 1.4 Multiple Host Arbitrated Loop for Storage Sharing
The next big step in Fibre Channel evolution came with the increased reliability and
manageability of a Fibre Channel switched fabric. Early implementations of FC-AL were
sometimes difficult to manage, unstable, and prone to interoperability problems between
4
components. Because the FC-AL protocol was quite complex, what sometimes would happen
would be an inability for anything to communicate and stay operational on a loop. The
solution to this was a move to a switched fabric architecture, which not only enhanced the
manageability and reliability of the connection, but provided switched, high-speed
connections between all nodes of a network instead of a shared loop. As a result, each port
on a switch now provides a full 1 Gbit/sec of available bandwidth rather than just a portion of
the total 1 Gbit/sec of bandwidth shared between all the devices connected to the loop.
Fabrics now make up the majority of Fibre Channel installations. A typical Fibre Channel
switched fabric installation (Figure 1.5) has multiple hosts and storage units all connected
into the same Fibre Channel network cloud through one or more Fibre Channel switches.
Figure 1.5 Switched Fabric, Multiple Host, and Storage Unit Configuration
Today, the modern SAN looks much like any other modern computer network. Network
infrastructures such as switches, hubs, bridges, and routers help transport frame-level
information across the network. Network interface cards interface computer systems to the
same network (called HBAs in the SAN world, as they replaced SCSI Host Bus Adapters).
Figure 1.6 shows an example of how these components could be used in conjunction with
Fibre Channel switches.
5
Figure 1.6 Typical Deployed SAN Configuration with Multiple Hosts, Storage, and Tape
Devices
FC Technology:
6
1. What is Fibre Channel?
Fibre Channel (FC) is a serial, high-speed data transfer technology, which can be
utilized by networks and mass storage.
Fibre Channel is an open standard, defined by ANSI and OSI and which supports the
most important higher protocols, such as Internet Protocol, ATM (Asynchronous
Transfer Mode), IEEE 802 (Institute of Electrical and Electronics Engineers Standard),
HIPPI (High Performance Parallel Interface), SCSI (Small Computer System Interface),
etc..
Fibre Channel is fast (data Transfer), Flexible (supports many topologies), Simple and
Scalable.
FC Storage Devices available in the current Storage industry includes FC Controllers
and HBA, FC Hard disks, FC HD Enclosures, FC Storage Arrays, FC Hubs and Switches,
FC Connectors and Cables and Other devices.
2 FC Layers:
• The FC-0 layer defines the specification for media types, distance, and signal
electrical and optical characteristics.
• The FC-1 layer defines the mechanism for encoding/decoding data for transmission
over the intended media and the command structure for accessing the media.
• FC-2 layer defines how data blocks are segmented into frames, how the frames are
handled according to the class of service, and the mechanisms for flow control and
ensuring frame data integrity.
• The FC-3 layer defines facilities for data encryption and compression.
• The FC-4 layer is responsible for mapping SCSI-3 protocols (FCP) and other higher
layer protocols/services into Fibre Channel commands.
7
FC Layers & OSI Layers Comparison
8
FC Topologies
1. Point-to-Point Topology
Up to 2 Devices (ports) can be in p-to-p.
Each devices are called Nodes and each node port are designated as N-Port.
Point-to-point connections can access the entire 100MB/sec bandwidth available with
FC for communication between two nodes.
2. Arbitrated Loop Topology
Arbitrated Loop (AL) allows up to 127 ports to be connected in a circular daisy chain.
The ports in an AL are designated as NL_Ports, and two ports can be active
simultaneously.
The other ports function as repeaters and simply pass the signal along. This means,
of course, that the bandwidth of 100MB/sec is shared among all devices.
Arbitrated Loop uses ALPA (8-bits) Addressing.
9
3. Switched Fabric Topology:
Switched Fabric Allows up to 16 million devices to connect together.
FC devices are connected to the network via the F_Ports or FL_Ports.
The connection between the individual ports on the network functions similarly to a
telephone system.
Switched Fabric uses Dynamic (24-bits) Addressing.
10
FC Addressing & Ports
WWN (World Wide Names)
– Static 64-bit address for each port
– IEEE Assigned by Block to Manufactures
AL_PA (Arbitrated Loop Physical Address)
– Dynamic 8-bit Address when connected to arbitrated loop
S_ID (Native Address Identifier)
– Dynamic 24-bit address assigned to a node in Fabric
Basic Port Types
– N_Port (Node) à End-Point; typically HBA or Disk; Connect to other N_Ports or
F_Ports
– F_Port (Fabric) à Found only in Switch; connect directly to N_Ports
– E_Port (Expansion) à Found only in Switch; connect directly to E_Ports in
Other Switches to expand SAN
Used in Arbitrated Loops
– NL_Port (N_Port with Arbitrated Loop Capabilities)
• Connect directly to N_Port, F_Port, NL_Port or FL_Port
11
– FL_Port (F_Port with Arbitrated Loop Capabilities)
• Switch port connects to N_Port and NL_Port
– G_Port (Generic)
• Switch port that can be F_Port, FL_Port, or E_Port
FC Flow Control
Buffer-to-Buffer Credit: This type of flow control deals only with the link between
an N_Port and an F_Port or between two N_Ports. Both ports on the link exchange
values of how many frames it is willing to receive at a time from the other port. This
value becomes the other port's BB_Credit value and remains till they are logged in.
End –To-End Credit: End-to-End flow control is not concerned with individual links,
but rather the source and destination N_Ports. The concept is very similar to buffer-
to-buffer flow control. When the two N_Ports log into each other, they report how
many receive buffers are available for the other port. This value becomes EE_Credit.
FC Class of Service
12
NODE A
N_Port 1
End to End Credit(ACK)
F_Port 1
F_Port 2
FABRIC
NODE B
N_Port 2
Buffer to BufferCredit
(R_RDY)
Buffer to BufferCredit
(R_RDY)
Class 1
à Guaranteed Bandwidth an Delivery
à Dedicated Connection
à End-to-End Flow Control (R_RDY and ACK)
Class 2
à Guaranteed Delivery (ACK required)
à Connectionless Service
à Buffer to Buffer & End to
End Flow Control (R_RDY and ACK) à
Out of Order Delivery of Frames Allowed
Class 3
à Delivery Managed Exclusively by Flow Control (R_Rdy)
à Connectionless Service
à Out of order delivery of frames
allowed
Intermix
Enhanced Class 1, allows Class 2 or Class 3 frames between Class 1 frames.
Class 4 à Class 4 can be used only with the pure Fabric topology. One N_Port will
set up a Virtual Circuit (VC) by sending a request to the Fabric indicating the remote
N_Port as well as quality of service parameters. The resulting Class 4 circuit will
consist of two unidirectional VCs between the two N_Ports.
Class 5 à The idea for Class 5 involved isochronous, just-in-time service. However, it
is still undefined.
Class 6 à Class 6 provides support for multicast service through a Fabric at the
well-known address of hex'FFFFF5'
FC AL initialization
Loop initialization happens mainly for two reasons.
à When a Loop is newly formed with all FC devices and about to
come up in network.
13
à When any loop failures happens initialization starts.
3 main functions happens during Loop initialization.
à LIP Primitive sequences.
LIP is transmitted by an L-Port after it powers on, or when it detects
Loop Failure (loss of synchronization at its receiver). The LIP will propagate around
the Loop, triggering all other L_Ports to transmit LIP as well. At this point, the Loop is
not usable.
àSelection of Loop Master.
This is done by the L_Ports constantly transmitting Loop Initialization Select Master
(LISM) frames.
à Selection of ALPA by all the devices in the Loop.
The concept of an AL_PA bitmap is used, where each L_Port selects (and sets) a
single bit in the bitmap of a frame originated by the Loop master and repeats the
frame back on the Loop. There are 127 available bits, corresponding to the 127 valid
AL_PAs.
This Process is done using following four frames.
LIFA à A certain AL_PA was assigned by the Fabric.
LIPA à before this initialization, the L_Port had a valid AL_PA.
LIHA à the L_Port has a certain AL_PA it tries to claim.
LISA à the L_Port claims the
first available AL_PA that is left.
Two additional frames may be sent by the Loop master, but only if all L_Ports on the
Loop support. LIRP & LILP.
FC AL Arbitration
14
Arbitrated Loop is not a token-passing scheme. When a device is ready to
transmit data, it first must arbitrate and gain control of the Loop. It does this by
transmitting the Arbitrate (ARBx) Primitive Signal, where x = the Arbitrated Loop
Physical Address (AL_PA) of the device.
Once a device receives its own ARBx Primitive Signal, it has gained control of the
Loop and can now communicate with other devices by transmitting an Open (OPN)
Primitive Signal to a destination device.
Once this happens, there essentially exists point-to-point communication between
the two devices. All other devices in between simply repeat the data.
If more than one device on the Loop is arbitrating at the same time , the x
values of the ARB Primitive Signals are compared.
When an arbitrating device receives another device's ARBx, the ARBx with the
numerically lower AL_PA is forwarded, while the ARBx with the numerically higher
AL_PA is blocked.
Unlike token-passing schemes, there is no limit on how long a device may retain
control of the Loop. This demonstrates the “Channel" aspect of Fibre Channel.
There is, however, an Access Fairness Algorithm, which prohibits a device from
arbitrating again until all other devices have had a chance to arbitrate. The catch is
that the Access Fairness Algorithm is optional.
Fibre Channel over IP
Two popular solutions for extending the Fibre Channel over the IP network are FCIP and
iFCP.
Main Reasons for Fibre Channel over IP Networks are as following:
15
Leverage existing storage devices (SCSI and Fibre Channel) and networking
infrastructures (Gigabit Ethernet);
Maximize storage resources to be available to more applications;
Extend the geographical limitations of DAS and SAN access;
Use existing storage applications (backup, disaster recovery, and mirroring) without
modification; and
Manage IP-based storage networks with existing tools and IT expertise.
FCIP Protocol:
FCIP is currently the most widely supported IP based extension protocol. This is probably due
to the fact that it is simple and easy to implement. The basic concept of FCIP is a tunnel that
connects two or more Fibre Channel SAN islands through a IP network. Once connected, the
two SAN islands logically merge into a single fabric across the IP tunnel.
An FCIP gateway is required to encapsulate Fibre Channel frames into TCP/IP packets. This is
sent through the IP network. On the remote side, another FCIP gateway receives the
incoming FCIP traffic, strips off the TCP/IP header before forwarding thenative Fibre Channel
frames into the SAN (Figure).
The FCIP gateway can be a separate device, or its functionality can be integrated into the
Fibre Channel switch.
16
The obvious advantage of using FCIP is that existing IP infrastructure can be used to provide
the distance extension.
How FCIP works
FCIP solutions encapsulate Fibre Channel packets and transport them via TCP/IP, which
enables applications that were developed to run over Fibre Channel SANs to be supported
under FCIP. It also enables organizations to leverage their current IP infrastructure and
management resources to interconnect and extend Fibre Channel SANs.
FCIP is a tunneling protocol that uses TCP/IP as the transport
while keeping Fibre Channel services intact. FCIP relies on IP-based network services and on
TCP/IP for congestion control and management. It also relies on both TCP/IP and Fibre
Channel for data-error and data-loss recovery.
In FCIP, gateways are used to interconnect Fibre Channel SANs
to the IP network and to set up connections between SANs, or between Fibre Channel
devices and SANs. Like iSCSI, there are a number of "pre-standard" FCIP products on the
market.
iFCP Protocol:
How iFCP works Fibre Channel devices (e.g., switches, disk arrays, and HBAs) connect to
an iFCP gateway or switch. Each Fibre Channel session is terminated at the local gateway
and converted to a TCP/IP session via iFCP. A second gateway or switch receives the iFCP
session and initiates a Fibre Channel session. In iFCP, TCP/IP switching and routing elements
complement and enhance, or replace, Fibre Channel SAN fabric components. The protocol
enables existing Fibre Channel storage devices or SANs to attach to an IP network. Sessions
include device-to-device, device-to-SAN, and SAN-to-SAN communications.
From the IP side, each of the Fibre Channel devices connected to the
iFCP gateway is given a unique IP address, which is advertised in the IP network. This allows
individual Fibre Channel devices to be reached through the IP network via the iFCP
gateway.The ability to individually address devices gives iFCP some advantages compared
to the FCIP protocol.
The biggest advantage is that of stability. Using FCIP between two Fibre Channel SAN islands
will cause the islands to merge into one. This means if there are perturbations in the IP
network, it can potentially cause the fabric to rebuild on both sides of the IP tunnel. Using
iFCP, the connectivity is between individual devices, and the fabrics stay separate. If
perturbations occur in the network, it may affect individual connections, but will not cause
17
fabric rebuilds, thus leading to more stable fabrics on both sides of the IP network.The
disadvantage compared to FCIP is the limited availability of iFCP solutions in the market
place. This could be because FCIP is very simple to implement, thus the FCIP solution is
widely available and is provided by a number of different manufacturers. In contrast, iFCP is
only supported by a limited number of vendors.
Fibre Channel Switch
I. FC Switch configuration
1. Open up a HyperTerminal
2. Login as admin
3. Enter password for password
4. Type in configure
a. Configure entire switch
5. Type in Help
a. List commands possible
6. Type ipAddrSet
7. Enter the Ethernet IP address
a. Get from Tom York
8. Enter the common SubNet Mask
a. 255.255.252.0
9. Hit enter twice after SubNet Mask
a. Use default values
10. Enter the gateway number, which is the same throughout the lab
a. 147.145.175.254
11. When ask to set respond by entering <y>
12. Now type ipAddrShow
a. Make sure the ip address held
13. Now reboot
a. This will take several minutes
II. Enable the Switches
1. Connect to the internet
2. Enter the address http:// ip address of the switch
3. Now click on zone admin
4. Enter admin for user name
18
5. And enter password for password
6. For zone selection select switch/port level zoning
7. And click <ok>
8. Click port zone tab
9. Click create zone
10. Name zone
11. Go to switch port, domain and select ports 0 through 7
12. Click add mem =>
13. Create another zone
14. Select ports 8 through 15
15. Now select port config tab
16. Highlight both new zone add them
a. Under file zones
17. Add mem =>
18. Click enable config and then apply and okay
a. All located on bottom of screen
Switch Behavior:
Switch Initialization:
At the Power ON, boot PROM diagnostics :
Verify CPU DRAM Memory.
Initialize base Fabric Operating System (FOS).
The Initialized FOS does the following:
Execute Power-On Self Test (POST) on switch.
Initialize ASICs & Front panel.
19
Initialize link for all ports (put online).
Explore the Fabric and determine the Principal Switch.
Assign Addresses to Ports.
Build Unicast routing table.
Enable N-Port operations.
Fabric Port Initialization Process: (From Switch Prospective)
Transition 1: At the beginning, verify if anything is plugged to Switch Port.
Transition 2: At FL-Port, Is there any Loop connections present in the Switch.
Transition 3: At G-Port, Verifying if any other (switch or Hubs) devices connected.
Transition 4: After G-Port, Verifying if Switch or Point-to-Point devices connected.
Communication Protocols:
Fabric Devices typically
FLOGI à PLOGI to Name Server à SCR to Fabric Controller à Register &
Query [using FC Common Transport (FC_CT) Protocol] à LOGON.
Loop Devices typically
PRIVATE NL: LIP (PLOGI & PRLI will enable private storage devices that accept
PRLI & thus “appear” Fabric capable)
PUBLIC NL: LIP à FLOGI à PLOGI à SCR à Register & Query à LOGO & then
PLOGI à & Communicate with other end nodes in the fabric.
LIP Process include: LIP, LISM, LIFA, LIPA, LIHA, LISA and LIRP & LILP.
20
Switch Commands: (general from Brocade switches)
Help
Switchshow
fabricshow
Switchenable/disable
Nsshow
nsallshow
Zoneshow/alishow/cfgshow
Cfgenable
Cfgdisable
Cfgcreate
Zonecreate
Errdump
Licenseshow
Portcfgdefault
Portenable/portdisable
Wwn
urouteshow
21
22
23
24
Switch or Fabric Zoning:
SAN implementations make data highly accessible; as a result, there is a need for data-
transfer optimization and finely tuned network security. Fabric zoning sets up the way
devices in the SAN interact, establishing a certain level of management and
security.
What is zoning?
Zoning is a fabric-centric enforced way of creating barriers on the SAN fabric to prevent set
groups of devices from interacting with other devices. SAN architectures provide port-to-port
connections between servers and storage subsystems through bridges, switches, and hubs.
Zoning sets up efficient methods of managing, partitioning, and controlling pathways to and
from storage subsystems on the SAN fabric, which improves storage subsystem utilization,
data access, and security on the SAN. In addition, zoning enables heterogeneous devices to
be grouped by operating system, and further demarcation based on application, function, or
department.
Types of zoning
There are two types of zoning: soft zoning and hard zoning.
• Soft zoning uses software to enforce zoning. The zoning process uses the name server
database located in the FC switch. The name server database stores port numbers and
World Wide Names (WWNs) used to identify devices during the zoning process.
When a zone change takes place, the devices in the database receive Registered State
Change Notification (RSCN). Each device must
correctly address the RSCN to change related communication paths. Any device that does
not correctly address the RSCN, yet continues to transfer data to a specific device after a
zoning change, that device will be blocked from communicating with its targeted device.
• Hard zoning uses only WWNs to specify each device for a specific zone. Hard zoning
requires each device to pass through the switch’s route table so that the switch can regulate
data transfers by verified zone.
For example, if two ports are not authorized to communicate with each other, the route
table for those ports is disabled, and the communication between those ports is blocked.
Zoning components
25
Zone configurations are based on either the physical port that devices plug into, or the WWN
of the device. There are three zoning components:
• Zones
• Zone members
• Zone sets
What is a zone?
A zone is composed of servers and storage subsystems on a SAN that access each other
through managed port-to-port connections. Devices in the same zone recognize and
communicate with each other, but not necessarily with devices in other zones unless a
device, in that zone, is configured for multiple zones.
Figure 1 shows a three-zone SAN with zones 1 and 3 sharing the tape library in zone 2.
Figure 1: Three-Zone SAN Fabric
Zone types
• Port zoning (all zone members are ports)
• WWN zoning (all zone members are WWNs)
• Session-based zoning (zone members are a mixture of WWNs and ports)
Zone database
• Zone database consists of zone objects.
26
• A zone object can be an alias, a zone, or a configuration
• Configurations contain zones which contain aliases
• For any object, the commands available allow you to create, delete, add, remove, or
show
– cfgcreate/delete/add/remove/show
– zonecreate/delete/add/remove/show
– alicreate/delete/add/remove/show
• Every switch in the fabric has the same copy of the entire database.
• To clear the zone database from a switch, use cfgclear
Alias
• An alias is a name for a device in the fabric
• The alias contains the name of the devices, and either the WWN of the device, or the
domain and port the device is attached to
• WWN alias: alicreate “alias1”,”10:00:00:00:01:01:02:02”
• Port alias: alicreate “alias2”,”100,15”
What is a zone member?
Zone members are the devices within the same assigned zone. See Figure 2. Zone member
devices are restricted to intra-zone communications, meaning that these devices can only
interact with members within their assigned zone. A zone member
cannot interact with devices outside its assigned zone unless it is configured in other zones.
27
Figure 2: Zone Members
How is a zone member identified?
Each zone member is identified by a WWN or port number. each device has a unique WWN.
A WWN is a 64-bit number that uniquely identifies each zone member.
What is a zone set?
A zone set is a group of zones that function together on the SAN. Each zone set can
accommodate up to 256 zones. All devices in a zone see only devices assigned to their zone,
but any device in that zone can be a member of other zones. In Figure 3,
all 4 zones see Member A.
28
Figure 3: Zone Set
Configurations
• A configuration is a set of zones.
• You can have multiple defined configurations, but only one active configuration in a
fabric at any time.
• cfgcreate “cfg1”,”zone1”
• To enable a configuration, use ‘cfgenable “config1”. This is now called the effective
configuration
• To disable the effective configuration, use cfgdisable command. Note when you
disable zoning that all devices can now see each other!
29
Zone Commit
• A zone commit is the process of updating all switches in the fabric when making a
zone change
• Zone commit is executed for cfgdisable, cfgenable, or cfgsave commands
• Zone commit uses RCS protocol. The switch making the commit communicates with
each switch individually to ensure commit took place
• When zone commit takes place, entire zoning database is sent to all switches even if
only a little change has taken place.
30
RCS [Reliable Commit Service]
• RCS is used for zoning, security, and some other things.
• For zoning, RCS ensures a zone commits happens for every switch in the fabric, or
none at all
• 4 phases to RCS: ACA, SFC, UFC, RCA
Zoning limitation
Currently, fabric zoning cannot mask individual tape or disk storage LUNs that sit behind a
storage-subsystem port. LUN masking and persistent binding are used to isolate devices
behind storage-subsystem ports.
Components of FC-SAN
While SAN configurations can become very complex, a SAN can be simplified to three basic
entities;
The host system or systems, the network and the storage device
1. Host System(s)
• Application Software (SAN Management Software, CLI Interface and others)
• Middleware (e.g., Volume Manager or Host RAID)
• Operating System/File System
• Host Bus Adapter (HBA) Driver
• Host Bus Adapter (HBA)
• Host Bus Adapter Firmware
2. Storage Network/Communications Infrastructure
• Physical Links (FC, ISCSI, Ethernet)
• Transceivers (GBIC & SFP or any other Transreceiver)
• Switches and Switch Firmware (Switches & Directors)
• Routers and Router Firmware
• Bridges or Extenders and their Firmware
3. Storage Device(s)
• Interface Adapter
• Interface Adapter Driver/Firmware
31
• Storage Controller Firmware
• Storage Device (e.g., disk, JBOD, Storage Arrays, Tape or Tape Library)
• Storage Media
Storage Area Network Management
1. Storage Management Software
2. SAN Protection and Security
3. Storage Backup, Disaster Recovery & Data Replication.
1. SAN Management Software
Though typically spoken of in terms of hardware, SANs very often include (or require)
specialized software for
their operation. In fact, configurating, optimizing, monitoring, and securing a contemporary
SAN will almost certainly involve advanced software, particularly centralized management
tools. When considering more complex options, such as High Availability configurations,
selecting the proper management software can be just as critical as choosing the
equipment. Though somewhat recent in its development, SAN management software
borrows heavily from the mature ideas, benefits, and functionality that have been available
for traditional LANs and WANs. Ideally, this new category of software would be universal and
work with any SAN equipment. But in today's multi-vendor and hardware diverse SAN
environments, this software is very often proprietary and/or tied to certain products and
vendors. While this situation is beginning to change, SAN management software today must
be selected with great care. Much consideration has to be given to the SAN equipment
manufacturers, OS platforms, firmware revisions, HBA drivers, client applications, and even
other software that may be running on the SAN. Until SAN management software becomes
very universal, it will continue to be quite important, and even vital, to work closely with
product (and total solution) providers in order to successfully implement, and realize, the
best features that SANs have to offer.
Using Management Software following actions can be done
32
Drive Management (Fail drive, Rebuild drive, initialize drive, online/offline drive,
Hotspare drive, Drive FW Upgrade/Downgrade).
Controller Management (Volumes ownership, active/passive mode, online/offline,
Controller Firmware & NVSRAM Upgrade/Downgrade)
Storage Array Management
( Array Management: Array Profile, Add/Remove Array, Rename or Modify Array, Reset
Array Configuration, Monitor Performance, Event DataBase, Collect Logs, Connectivity
Status.
Storage Management: Create Logical drive/ LD Group, Delete LD/ LDGroup, Modify LD
Capacity (DCE & DVE), DRM, Modify LD Settings, LUN Mapping, LUN Masking)
Switch management can be done using Switch Vendor Management
Softwares.
Fig. SAN Storage Array
Storage Array
33
In data storage, an array is a Hardware consisting of multiple Storage devices designed to
full-fill the need for Data storage.
It consists of one or more Controller modules (boxes) and one or more Drive
Modules(boxes).
Command Module
The command module is the housing for the controllers. It allows the user to “hot swap”
controllers and components. This is made possible due to the redundant nature of the
controller canister. The components that the user can hot swap include redundant power
supplies, batteries, fans, and communications. Additionally, it houses two controllers that
are hot swappable. Below is the front and back of a command module.
Controller
The controller is the “brains” behind the array. It can be loaded with different controller
firmware code that will enable different features which the array can perform. The
controllers are redundant (there are two housed in the command module) and they are hot
swappable. One controller can fail and the other will control the array until the failed
controller is replaced. The Heartbeat light should be blinking during normal operation.
Below is the front view of a controller.
Gigabit Interface Converter (GBIC) and Small Form Factor Pluggable(SFP) - The
GBICs and SFPs allow us to connect our host and drive trays to the controller canister
through fibre channel cables. There are two sizes of: fibre channel mini-hubs, GBICs, and
cables. The LC small form factor connector cable corresponds to the 2 Gb/s fibre channel
while the SC larger connector cable corresponds to the 1 Gb/s fibre channel. Below are the
1 Gb/s GBIC, and the 2 Gb/s SFP.
1 Gb/s GBIC 2 Gb/s SFP
FIRMWARE
A type of software on controllers, drives, or any other storage components that contains
instructions for their operation.
34
it includes the RAID algorithms and other features implemented & the real-time kernel, the
Diagnostics Manager, the firmware to initialize the hardware, and the firmware to upload
and initialize the other parts of the downloadable firmware.
NVSRAM
Acronym is Non-Volatile Static Random Access Memory-
A controller file that specifies default settings for the controller. The file uses either a
permanently connected battery or takes advantage of the non-volatile cache to store data
indefinitely in the event of a power failure
VOLUME or Logical Drive
A volume is a region of storage that is provided by the controller and is visible to external
I/O hosts for data access. Each volume has a RAID level associated with it. A given volume
resides in exactly one volume group.
VOLUME GROUP or Logical Drive Group
A volume group is a collection of volumes whose storage areas reside on the same set of
physical drives in the array. A volume group contains one or more volumes and consists of
one or more physical drives. A volume group comes into existence when the first volume is
created on it.
HOTSPARING
The purpose of the Hot Spare drives is to serve as "immediate" replacements for failed
drives that had been configured as part of a storage volume. When a configured drive failes,
the firmware automatically recognizes that the failure has occurred and will select one of the
Hot Spare drives to replace the failed drive; data reconstruction begins immediately to the
selected spare once it's been integrated into the volume containing the ailed drive. Once
reconstruction has completed and the user has replaced the original failed drive, data from
the Spare will be copied back to the original drive's replacement. Once the copy-back
operation is complete, the Hot Spare is returned to the spare pool. All reconstruction and
copy-back operations happen without significant interruption to I/O processing for the
affected storage volume.
Dynamic RAID Migration
The Dynamic RAID Migration feature provides the ability to change the RAID level for the
volumes of a drive group. Changing the RAID level causes the volumes in the drive group to
be reconfigured such that the data is mapped according to the definition of the new RAID
level.
Dynamic Capacity Expansion
The Dynamic Capacity Expansion feature provides the ability to add drives to a drive group.
Adding drives to a drive group causes the volumes to be reconfigured such that the data is
spread over the drives in the newly expanded drive group. After reconfiguration, all unused
35
capacity is evenly distributed across all drives following the last volume. This unused
capacity may be used to create additional volumes on the drive group.
Dynamic Volume Expansion
The Dynamic Volume Expansion feature provides the ability to increase the size of a volume
if there is sufficient amount of free capacity on the drive group. If there is not enough free
capacity the DVE can be coupled with Dynamic Capacity Expansion to add additional
capacity.
Active-Active Controller Setup Mode
In a traditional Active-Active configuration, both controllers are working concurrently to
serve host I/O requests and transfer data. In this mode, when both controllers are operating
normally, the system is theoretically able to handle twice the workload and traffic, doubling
the speed of the system compared to the Active-Passive configuration.
However in practice, performance increases are much less significant. In the event of a
controller failure in traditional Active-Active configurations, the remaining controller
automatically assumes responsibility for handling all I/O requests and data transfer. Once
the failed controller is replaced, the controllers will automatically read the configuration of
drives and LUNs in the system, and return to normal operation.
36
Active-Passive Controller Setup Mode
Active-Passive is a dual controller configuration where two controllers provide full
redundancy to all disks, disk enclosures, and Fibre Channel host connections. In an Active-
Passive configuration, the primary (active) controller services all host I/O requests and
performs all data transfers, while the passive controllers remains alert to the active
controller’s status using bi-directional heartbeat communications. Typically, the available
space in the RAID array is divided up into an arbitrary number of logical units (LUNs). The
capacity of each LUN can be spread across multiple controller Fibre Channels and disk
drives. In this configuration, both the active and passive controller know the logical volume
configuration. In the
event of a primary controller failure, the passive controller automatically and seamlessly
assumes I/O and data transfer activities without interrupting system performance or
37
operation. It is important to note that one of the advantages to Active-Passive is that there is
no degradation of performance when one controller fails or is taken off-line for maintenance.
SAN Failover Mechanisms
Storage Array or Controller side Failover Mechanisms
RAID controllers generally have two different characteristics for access to the LUNs:
1. Active/Active. & 2. Active/Passive
Higher-end and enterprise controllers are always Active/Active. Mid-range and lower-end
controllers can be either. How the controller manages internal failover and your server side,
software and hardware will have a great deal to do with your choices for accomplishing HBA
and switch failover. Before developing a failover or multipathing architecture, you need to
fully understand the issues with the RAID controller.
With Active/Active
controllers, all LUNs are seen and can be written to by any controller within the RAID.
Generally, with these types of RAID controllers, failover is not a problem, since the host can
write or read to any path. Basically, all LUN access is equal, and load balancing I/O requests
and access to the LUNs in case of switch or HBA failover is simple. All you have to do is write
to the LUN from a different HBA path.
Active/Passive Increases Complexity
If your RAID controller is active/passive, the complexity for systems that require HBA failover
can increase greatly. With active/passive controllers, generally the RAID system is arranged
in a controller pair where both controllers see both LUNs, but LUNs have a primary path for
access to a LUN and a secondary path. If the LUN is accessed via the secondary path, the
ownership of the LUN changes from the primary path to the secondary path.
This is not a problem if the
controller has failed, but if the controller path has failed, either the HBA or switch and other
hosts are accessing that LUN via its primary path. Now each time one of the other LUNs
accesses the LUN on the primary path, the LUN moves from ownership on the secondary
path to ownership on the primary path. Then when the LUN is again accessed on the
secondary path, the LUN fails over again to the secondary path. This ping-pong effect will
eventually cause the performance of the LUN to drop dramatically.
Host-Side Failover Options
38
On the host side, there are three options for HBA and switch failover, and in some cases,
depending on the vendor, load balancing of I/O requests across the HBAs. Here they are in
order of hierarchy in the operating system:
1.Volume manager and/or File system failover
2. A failover and/or load balancing driver failover
3. HBA driver failover
Each of these has some advantages and disadvantages — what they are
depends on your situation and the hardware and software you have in the configuration.
In the drawing below, we have an example of a mid-range RAID controller connected
in an HA configuration with dual switches and HBAs, and with a dual-port RAID controller for
both Active/Active and Active/Passive.
Fig. Active/Active Controller Setup Active/Passive Controller Setup
With an Active/Active RAID controller configuration, the failover software knows the path to
each of the LUNs and ensures that it will be able to get to the LUN through the appropriate
path. With this Active/Active configuration, you could access any of the LUNs via any of the
HBAs with no impact on the host or another host, and both controllers can equally access
any LUN. If this were an Active/Passive RAID controller, it
would be critical to access LUNs 0, 2 and 4 with primary controller A if a switch or HBA
failed. You would only want to access LUNs 0, 2, and 4 from controller B if controller A failed.
If a port on controller A failed, you would want to access the LUNs via the other switch and
39
port and not via controller B. If you did access via controller B, and another host accessed
the LUNs via controller A, the ownership of the LUNs would pong-ping and the performance
would plummet.
Volume Manager and File System Failover Options
Volume managers such as Veritas VxVM and file systems such as ADIC
StorNext and a number of Linux cluster file system vendors understand and are able to
maintain multiple potential paths to a LUN. These types of products are able to determine
what the appropriate path to the LUN should be, but oftentimes for Active/Passive
controllers, it is up to the administrator to determine the correct path(s) to access the LUNs
without failing over the LUNs to the other controller unnecessarily.
Failover at this layer was the initial type of HBA and storage failover available for Unix
systems. Failover at the file system layer allows the file system itself to understand the
storage topology and load balance it. On the other hand, you could be doing a great deal
more work in the file system that might belong at lower layers that have more information
about the LUNs and the paths. Volume managers and file system multipathing also support
HBA load balancing.
Loadable Drivers
Loadable drivers from vendors such as EMC (PowerPath) and Sun (Traffic Manager) are
examples of loadable drivers that manage HBA and switch failover. You need to make sure
that the hardware you plan to use with these types of drivers is supported.
For example, according to the EMC Web site, EMC PowerPath currently supports only EMC
Symmetrix, EMC CLARiiON, Hitachi Data Systems (HDS) Lightning, HP XP (Hitachi OEM) and
IBM Enterprise Storage Server (Shark). According to Sun's Web site, Sun Traffic Manager
currently supports Sun Storage and Hitachi Data System HDS Lightning.
Other vendors are developing products that will provide similar functionality. As with the
volume manager and file system method for failover, loadable drivers also support HBA load
balancing as well as failover.
HBA Driver Failover
40
HBA drivers on some systems provide the capability for the drive to maintain and
understand the various paths to the LUNs. In some cases, this failover works only for
Active/Active RAIDs, and in other cases, depending on the vendor and the system type, it
works for both type of RAIDs (Active/Active and Active/Passive). Since HBA drivers often
recognize link failures and link logins faster than other methods, using this failover
mechanism generally allows for the fastest resumption of I/O, since at the lowest level you
have the greatest knowledge.
SAN Features for DATA Backup, Replication and High Availability
Snapshots for Backup
Snapshots can be of two types Point-in-time Image of actual Volume and Full image
of the Actual Volume.
Point-in-time Image of actual Volume:
A logical point-in-time image of actual volume & it can be simply called as Snapshot.
A snapshot is the logical equivalent of a complete physical copy, but you create it much
more quickly than a physical copy and it requires less disk space.
A repository Volume is additional volume associated to Snapshot and it saves overwritten
blocks from the actual or Base volume. The repository volume contains the original image of
any modified data along with meta-data describing the location where it is stored in the
repository. The repository volume is not accessible to external hosts.
Point-in-time image (snapshot) can be created using "copy-on-write" scheme.
Note:
à Exactly one repository volume is created per snapshot.
à Increasing the capacity of the base volume does not change existing snapshots.
41
à When a base volume is deleted, all associated snapshot volumes and their repositories
are also deleted.
à When a snapshot is deleted, its associated repository volume is also deleted.
The Snapshot allows the end-user to quickly create a single point-in-timeimage or
"snapshot" of a volume. The primary benefit of this feature is data backup. Online backup
images can be created periodically during the course of the day without disrupting normal
operations.
Full image of the Actual Volume (Cloning):
The Volume Full Copy or Clone is used to copy data from one volume (the source) to
another volume (the target) on a single storage array. This feature can be used to back
up data, to copy data from volume groups that use smaller capacity drives to volume
groups using greater capacity drives, or to restore snapshot volume data to the
associated base volume.
When you create a volume copy, a copy pair is created, which consists of a source
volume and a target volume that are located on the same storage array. The source
volume is the volume that accepts host I/O and stores data. The source volume can be a
standard volume, snapshot volume, base volume of a snapshot volume.
When a volume copy is started, data from the source volume is copied in its entirety to
the target volume. The source volume is available for read I/O activity only while a
volume copy has a status of In Progress, Pending, or Failed. After the volume copy is
completed, the source volume becomes available to host applications for write requests.
A target volume contains a copy of the data from the source volume. The target volume
can be a standard volume, a base volume of a failed or disabled snapshot volume. While
the volume copy has a status of In Progress, Pending, or Failed, read and write requests
to the target volume will be rejected by the controllers. After the volume copy is
completed, the target volume automatically becomes read-only to hosts, and writes
requests to the target volume will be rejected. The Read-Only attribute can be changed
after the volume copy has completed or has been stopped.
Additionally, volume copy can be used to redistribute data — moving volumes from
older, slower disk drives to newer, faster, or higher capacity drives — to optimize
application performance and/or capacity utilization.
Remote Volume Mirroring:
Remote Volume Mirroring allows for protection against and recovery from disasters or
catastrophic failures of systems or data centers. When a disaster occurs at one site the
secondary (or backup) site takes over responsibility for computer services. RVM maintains a
42
fully synchronized image of key data at the secondary site so that no data is lost and
minimal interrupt of overall computing services occurs if a disaster or failure occurs.
It is a controller-level, firmware-based mechanism for ensuring fully synchronized data
replication between the primary and secondary sites. A mirroring relationship is comprised
of exactly two volumes, each residing on separate arrays. One volume acts in a primary
role, servicing host I/O, the other acts as the backup secondary volume. Replication is
managed on a per-volume basis. This allows the storage administrator to associate a
distinct remote mirror volume with any/every primary volume of a given storage array. A
given array's primary volumes can be mirrored to secondary volumes that reside on multiple
distinct remote storage arrays. The following figure shows one possible configuration with a
primary and backup data center.
Mirroring relationships are established at the volume level between two
storage arrays. Our terminology refers to the primary volume as the one receiving host I/O,
the secondary volume is the stand-by mirrored image of the primary. The array controllers
manage synchronization activities, both in the initial image synchronization from primary to
secondary and replicating host write data.One host channel of each array is dedicated to
inter-array data movement. The dedicated host port of the two arrays must be connected
via fibre channel fabric with name service. The name service function allows the two arrays
to locate each other in the fabric network and perform the required login initialization.
2. SAN Protection and security
When the word “security” is used in association with a SAN, thoughts can easily lead to
computer hackers infiltrating the network and causing havoc. Although hacker invasions are
43
a concern, there is another security issue associated with a SAN that must be addressed,
and that is the issue of technology containment. For example, Windows NT servers would
naturally claim every available Logical Unit Number (LUN) visible to them. In brief,
technology containment keeps servers from gaining unauthorized or accidental access to
undesignated areas within the SAN. The two major areas of concern with SAN
implementations are data access and fabric
management security.
Security at Different Stages
Open systems offer many different file systems, volume and disk management formats and
software requiring that security issues be considered and then implemented during the SAN
design and development phase, for the following reasons:
A. Data access and security
B. Fabric management and security (protection from outside threats)
C. Higher levels of availability to data and the applications that use the data
2.A. Data access and security
2.A.I. Questions concerning data access and security
Concerning data access and security on a SAN, consider the following questions:
1. How can we segregate operating systems at the port level on the SAN fabric?
• It is not advisable to have Windows NT and Sun Solaris systems accessing the same RAID-
array port on the SAN fabric because Windows NT will attempt to write disk signatures to all
new disk LUNs it finds attached to the SAN fabric. That creates the need for a network fabric-
enforced way of segregating ports into logical groups of visibility.
2. How can we segregate different application types on the SAN fabric?
• For example, it may be necessary to ensure that finance systems on the SAN fabric cannot
access the data owned by engineering systems, or web systems. That creates the need for a
fabric-enforced way of grouping ports on the SAN fabric into zones of visibility based on
application, function, or departmental rules.
44
3. How can we isolate any single LUN on an array, permitting only a certain host(s) access to
that LUN and no others?
• A basic advantage of a SAN is that a large number of hosts can share expensive storage
resources. As it concerns RAID storage subsystems, this demands that multiple hosts have
access to its disk storage LUNs through a single-shared port on the array. Therefore, it is
necessary to employ security methods to ensure that LUNs behind a port are accessible only
by the intended hosts. Without special
software and architectures to manage multi-host block-level read/write access (when
multiple systems access the same LUN concurrently), data corruption or data loss could
occur.
4. How can we, from the host side, ensure that hosts see their storage ports and storage
LUNs consistently when adding new storage LUNs, and after each reboot?
• In the world of SANs, the assignment of Small Computer System Interface (SCSI) target IDs
is moved from the storage side to the host/Fiber Channel (FC) Host Bus Adapter (HBA) side.
Thus, SCSI target IDs can be dynamically reassigned as new storage LUNs are added to an
individual host via the SAN. Since this feature is a fundamental advantage of SAN
architectures, the assignment of SCSI target IDs requires management to ensure their
consistency across storage subsystems, SAN fabrics, and after host configuration changes.
2. A.II. Data access and security methodologies
The following are data access and security methodologies:
• Fabric zoning is fabric-centric enforcement: It provides a fabric port-and host/
storage-level point of logical partitioning and can help ensure that different OS types or
applications are partitioned on the SAN. Fabric zoning is managed and enforced on the SAN
fabric. Fabric zoning cannot mask individual LUNs that sit behind a port. All hosts connected
to the same port will see all the LUNs addressed through that port.
• LUN Masking is RAID storage subsystem-centric enforcement:
LUN Masking is configured at the RAID storage subsystem level; this helps ensure that only
designated hosts assigned to that single storage port could access the specified RAID LUN.
LUN masking is a RAID system-centric enforced method of masking multiple LUNs behind a
single port. LUN masking configuration occurs at the RAID-array level, using World Wide Port
Names (WWNs) of server FC HBAs. See Figure 4. LUN masking allows disk storage resource
sharing across multiple independent servers.
45
With LUN masking, a single large RAID subsystem can be subdivided to serve a number of
different hosts that attach to it through the SAN fabric. Each LUN (disk slice, portion, unit)
inside the RAID subsystem can be limited so that only one or a limited number of servers
can see that LUN.
LUN masking can occur either at the server FC HBA or at the RAID subsystem (behind the
RAID port). It is more secure to mask LUNs at the RAID subsystem, but not all RAID
subsystems have LUN masking capability; therefore, some FC HBA vendors allow persistent
binding at the driver level to mask LUNs.
Figure 4: LUN Masking
• Persistent Binding is host-centric enforcement:
This consistently forces a host to see a specific storage-subsystem port as a particular SCSI
target. Persistent binding also helps ensure that a specific storage-subsystem port on the
SAN is always seen as the same SCSI Target ID on the host, across the host and fabric, and
throughout storage configuration changes. OS and upper-level applications (such as LAN-
free backup software) typically require a static or predictable SCSI Target ID for storage and
reliability purposes.
Persistent binding is a host-centric enforced way of directing an operating system to assign
certain SCSI-target IDs and LUNs. For example, where a specific host will always assign SCSI
ID 3 to the first router it finds, and LUNs 0, 1, and 2 behind the port to the three-tape drives
attached to the router, as shown in Figure 5. Operating systems and upper-level applications
46
(such as backup software) typically require a static or predictable SCSI target ID for their
storage reliability—persistent binding makes that possible.
Figure 5: Persistent Binding
* LUN Mapping, in addition to persistent binding, is another host-centric method of storage
visibility management. LUN Mapping selectively allows a system administrator to scan for
specified SCSI targets and LUNs at storage-driver boot time and to ignore selectively non-
specified SCSI targets and LUNs.
The advantage of LUN Mapping is that it provides a level of security management in SANs
where LUN Masking is not an option, perhaps because it is not supported on the storage
hardware. The disadvantage is that LUN Mapping is configured and enabled on a host-by-
host basis. It requires good coordination among the administrators of the systems sharing
the storage, which ensures that only one host sees certain storage unless planned, as in a
clustered server configuration.
2.B. Fabric management and security (protection from outside threats)
2.B.I. Questions concerning SAN fabric-level security
Concerning SAN fabric-level security, consider the following questions:
1. How can we manage switch-to-switch security on the SAN fabric; also, how can we
enforce security policies that prohibit unauthorized switches or hosts from attaching to the
SAN fabric?
47
• In early SAN infrastructures, additional switches (configured with a default password and
login) could easily attach to an existing operating SAN fabric, and that new non-secure
switch could be used as a single point of configuration administration for the entire SAN
fabric. There is a need for technologies that enforce access control at the fabric-level, and
ensure only authorized and
authenticated switches can be added to the fabric.
2. How can we centrally manage security and configuration changes on a SAN fabric?
• In the initial phases of SAN evolution and even today, large SAN fabrics are frequently
composed of many 8- or 16-port FC switch-building blocks. Each switch features both in-
band and out-of-band management components (Simple Network Management Protocol
(SNMP), telnet, etc.), and a switch-centric security control model. As large SANs evolve, so
does the need for technologies
to centrally control security, in regards to SAN data access and fabric management; also, to
minimize the number of administrative access and security control points on the SAN fabric.
3. How can we ensure that only authorized hosts connect to the SAN fabric and to a specific
port designated by an administrator?
• Initially, in SAN configurations, a host FC HBA could attach to any point in a SAN fabric and
if the FC HBA was capable of basic SAN fabric login, that FC HBA became a participating
member of the SAN fabric. There is a need for technologies that allow a fabric-centric
method of access control for determining which hosts can attach to a specific port or switch
on the SAN fabric. This would
prevent a rogue attacker with a Windows NT system and a FC HBA from attaching to a non- -
secure SAN fabric for the purpose of configuration changes, or data access.
4. How can we ensure that the tools used to manage the SAN fabric, and SAN management
requests are coming from an authorized source?
• Multiple in-band and out-of-band methods are used to manage SAN fabric configurations. A
tunnel of communication must exist between SAN management consoles and frameworks,
and the targeted SAN fabric being managed. That tunnel of communication must be secure
and confirmed as authentic to prevent an attacker from using a management tool to access
the nonsecure SAN fabric.
5. How can we ensure that configuration changes on the SAN fabric are valid when there are
multiple points of configuration management?
• In early SAN configurations, multiple administrators could log into different switches on the
same SAN fabric and perform fabric-configuration changes concurrently. After enabling and
propagating that configuration changes fabrics wide, corruption could occur due to
configuration conflicts. Corruption of the SAN fabric usually occurs when configuration
48
changes are made through multiple points on the SAN fabric. There is a need for
technologies that ensure SAN fabric configuration changes only occur through a central and
secure point on the SAN fabric, and that those configuration changes do not cause
configuration conflicts.
2.B.II. Fabric Management and security Technologies
The following technologies protect and manage the fabric:
• Fabric-to-Fabric Security technologies allow Access Control Lists (ACLs) to allow or
deny the addition of new switches to the fabric. Public Key Infrastructure (PKI) technology
may be applied as a mechanism for validating the identity of the new switch. Also, fabric-
wide security databases help ensure that all new authorized switches added to the fabric
inherit fabric-wide security policies, so
that a new out-of-the box switch does not become a non-secured access point. • Host-to-
Fabric Security technologies can apply ACLs at the port-level on the fabric, and allow or deny
a particular host’s FC HBA to attach to that port. This would prevent an unauthorized
intruder host from attaching to the fabric via any port. The host’s ability to log into the fabric
is clearly defined and is allowed with
this model.
• Management-to-Fabric technologies can use PKI and other encryption (such as MD5)
technologies to ensure a trusted and secure management console-to-fabric communication
layer exists. This will help ensure that the management console or
framework used to control the SAN fabric is valid and authorized.
• Configuration Integrity technologies ensure that propagated fabric configuration
changes only come from one location at a time, and are correctly propagated to all switches
on the SAN fabric with integrity. Distributed lock managers can ensure
that only serial and valid configuration changes are enabled on the fabric.
3.A - Backup Solutions
Data backup methods
There are three effective methods used to backup data:
A. Distributed
B. Centralized (conventional)
C. SAN
3.A.1. Backups in distributed environments
49
In distributed environments, storage subsystems are directly attached to servers. See Figure
2. Distributed backups require IT personnel to touch each system physically (i.e., handling
tapes) to perform backup operations. If the server data exceeds the tape capacity (which is
usually the case), the IT person must monitor the operation and reload new tapes at the
proper time.
Distributed environments are fragmented in the following circumstances:
• When storage is isolated on individual servers (storage islands)
• When there are point-to-point SCSI connections only
• When there is a one-to-one relationship between servers and storage subsystems, creating
storage islands, which scale poorly and are difficult to centrally manage.
Figure 2: Distributed Backup Environment
3.A.2. Backups in conventional centralized environments
In conventional-centralized environments, a storage subsystem is attached to one server,
and all other systems are backed up to that storage subsystem through the server and over
the Local Area Network (LAN). See Figure 3. Conventional centralized backups limit
management overhead to a single storage subsystem. The challenge is not managing the
storage subsystem, but getting the data to it. Conventional-centralized backup solutions rely
on an Internet Protocol (IP) network as the data path. The problem with this is that the
Transmission Control Protocol/Internet Protocol (TCP/IP) processing associated with
transporting the sheer volume of data can adversely impact server CPU cycles. This results
in long-backup cycles that exceed the scheduled backup window. Therefore,
conventionalcentralized backups often overflow into user uptime, resulting in poor network
response and generally unacceptable server performance.
This method is an improvement over the distributed method, but it still has inefficiencies:
Pros:
50
• Centralizes the storage in fewer locations and on fewer platforms
• Requires fewer backup servers and software packages
• Uses centralized administration
• Results in fewer human errors
Cons:
• Backup bottlenecks develop on the LAN
• Bottlenecks become more frequent as storage needs grow
• Still managing multiple separate backup servers
• Typically uses the same LAN for production and data backups
• Many-to-one relationship between servers and the storage subsystem
Figure 3: Conventional-Centralized Backup Environment
3.A.3. Backups in SAN environments
In SAN environments, storage subsystems are attached to the SAN fabric where all servers
potentially have equal access to them. See Figure 4. SANs offer the following efficiencies
and advantages over conventional-centralized and distributed backup methods:
• The entire storage-network infrastructure can be off-loaded from the LAN, promoting LAN-
free backups—20% or more of LAN traffic can be due to backups
• Significant improvements in backup times, since data is moved at Fibre Channel (FC)
speeds over dedicated storage networks, rather than at Ethernet speeds over a shared
network
• Fewer network interruptions when adding incremental storage hardware
• Reduces or eliminates backup windows
• Promotes on-the-fly scaling (non-disruptive) rather than set-planned downtime windows
• Extends the life expectancy of servers
51
• Enables off-host backups where data transfers directly from storage disks to tape libraries,
bypassing the server, and reducing server loads
Figure 4: Storage Area Network
One of the most valuable time and cost saving features of SAN architecture is its ability to
offload backup operations from LANs and servers. This capability can significantly increase
the available bandwidth on a LAN to network clients and end users during backup
operations. When traditional backup servers are relieved from "handling" backup data, they
can be repurposed and made available for other tasks.
Traditional Tape Drive Backup
52
SAN (LAN-free) backup
SAN technology provides an alternative path for data movement between the Storage
Manager client and the server. Shared storage resources (such as disk and tape) are
accessible to both the client and the server through the SAN.
Data is off-loaded from the LAN and from the server processor, which can create greater
scalability.
LAN-free backups decrease the load on the LAN by introducing a storage agent. The storage
agent can be perceived as a small Storage Manager server (without a database or recovery
log) that is installed and run on the Storage Manager client machine. The storage agent
handles the communication with the Storage Manager server over the LAN but sends the
data directly to SAN attached tape devices, relieving the Storage Manager server from the
actual I/O transfer.
A LAN-free backup environment is shown in Figure
LAN-free backup solutions can optimize backup operations by offloading backup traffic
from a LAN to a SAN, thereby increasing the amount of LAN bandwidth.
53
SAN (Server-less) backup
Serverless Backup, on the other hand, extends these performance gains even further by
offloading more than 90 percent of the administrative Server-free backup, are made possible
by a SAN's flexible architecture and can improve overall performance significantly. Even
storage reliability can be greatly enhanced by special features made possible within a SAN.
Options like redundant I/O paths, server clustering, and run-time data replication (local
and/or remote) can ensure data and application availability.
Adding storage capacity and other storage resources can be accomplished easily within a
SAN, often without the need to shut down or even quiesce the server(s) or their client
networks. These, and other, features can quickly add up to big cost savings, painless
expansion, reduced network loading, and fewer network outages. burden that is usually
placed upon a dedicated backup server as backups are performed. This is typically achieved
by embedding some of the backup intelligence into the data storage devices themselves
(RAID systems and tape drives) or SAN connectivity peripherals (switches, hubs or bridges).
This can free up traditional backup servers significantly by releasing them from data moving
duties and large portions of a backup operation's administration. When implemented
properly, these SAN based backup solutions let administrators optimize network and server
utilization, dramatically shorten backup times, and regain processor and network resources.
Figure. Server-Free Backup
SAN data backup and access benefits
SANs promote the following benefits:
• Improved data availability and performance speed
54
• Number of connections to storage subsystems can be easily scaled for both availability
and performance
• Access to data is faster, easier, and more reliable
3.B - Disaster Recovery
Planning a backup and restoration of files for disaster recovery
Planning a backup and restoration of files is the most important step to protect data from
accidental loss in the event of data deletion or a hard disk failure. The backup copy can be
used to restore lost or damaged data. For taking backups and restoring files, Microsoft has
provided a utility called Backup.
The Backup utility creates a copy of data on a hard disk of a computer and archives data on
another storage media. Any storage media such as removable disks, tapes, and logical
drives can be used as a backup storage.
While taking a backup of files, the Backup utility creates a volume shadow copy of the data
to create an accurate copy of the contents. It includes any open files or files that are being
used by the system. Users can continue to access the system while the Backup utility is
running without the risk of losing data.
Volume Shadow Copy
Backup provides a feature of taking a backup of files that are opened by a user or system.
This feature is known as volume shadow copy. Volume shadow copy makes a duplicate copy
of all files at the start of the backup process. In this way, files that have changed during the
backup process are copied correctly. Due to this feature, applications can continue writing
data to the volume during a backup operation, and backups can be scheduled at any time
without locking out users.
Types of Backups
The Windows Backup utility provides various types of backups. While planning for a backup
strategy, it is important to choose an appropriate type or combination of different types of
backups. The backup type determines which files are transferred to the destination media.
55
Each backup type relates to an attribute maintained by every file known as archive (A). The
archive attribute is set when a file is created or changed. When an archive attribute is set, it
means that the backup
Of this file has not been taken or it is due.
Note: When it is said that "The file is marked as backup", it means that the
archive attribute of the file has been cleared.
NormalBackups
When an administrator chooses to use a normal backup, all selected files and folders are
backed up and the archive attribute of all files are cleared. A normal backup does not use
the archive attribute to determine which files to back up. A normal backup is used as the
first step of any backup plan. It is used with the combination of other backup types for
planning a backup strategy of an organization. Normal backups are the most time-
consuming and are resource hungry. Restoration from a normal backup is more efficient
than other types of backups.
Incremental backups
An incremental backup backs up files that are created or changed since the
last normal or incremental backup. It takes the backup of files of which the archive attribute
is set. After taking a backup, it clears the archive attribute of files. An incremental backup is
the fastest backup process. Restoring data from an incremental backup requires the last
normal backup and all subsequent incremental backups. Incremental backups must be
restored in the same order as they were created.
Note: If any media in the incremental backup set is damaged or data becomes
corrupt, the data backedup after corruption cannot be restored.
Differential Backups
Differential backup backs up files that are created or changed since the last normal backup.
It does not clear the archive attribute of files after taking a backup. The restoration of files
from a differential backup is more efficient than an incremental backup.
Copy Backups
56
A copy backup copies all selected files and folders. It neither uses nor clears the archive
attribute of the files. It is generally not a part of a planned scheduled backup.
Daily Backups
A daily backup backs up all selected files and folders that have changed during the day. It
backs up data by using the modified date of the files. It neither uses nor clears the archive
attribute of the files.
Combining backup types
The easiest backup plan is to take a normal backup every night. A normal backup every
night ensures that the data is restored from a single job the next day. Although the
restoration of data from a normal backup is easy, taking a backup is time consuming. Hence,
an administrator is required to make an optimal backup plan. An administrator must
consider the following points before creating a backup plan:
The time involved in taking the backup.
The size of the backup job.
The time required to restore a system in the event of a system failure.
The most common solutions for the needs of different organizations include the combination
of normal, differential,and incremental backups.
Combination of Normal and Differential Backups
An administrator can use a combination of a normal backup and a differential backup to
save time in taking a backup as well as for a restoration of data. In this plan, a normal
backup can be taken on Sunday, and differential backups can be taken on Monday through
Friday every night. If data becomes corrupt at any time, only a normal and last differential
backup are required to be restored. Although this combination is easier and takes lesser
time for restoration, it takes more time to take backup if data changes frequently.
57
Combination of Normal and Incremental Backups
A combination of normal and incremental backups can be used to save more time for taking
backups. In this plan, a normal backup is taken on Sunday and incremental backups on
Monday through Friday every night. If data becomes corrupt at any time, a normal and all
incremental backups till date are required to be restored.
Backing up a System State Data
System State data contains critical elements of the Windows 2000 and Windows Server
2003 operating systems. Following are the files included in the System State data:
Boot files, including the system files and all files protected by Windows File Protection
(WFP).
Active Directory (on domain controller only).
SYSVOL (on domain controller only).
Certificate Services (on certification authority only).
Cluster database (on cluster node only).
Registry.
IIS metabase.
Performance counter configuration information.
Component Services Class registration database.
For backing up the System State of a computer, the System State node is included as a part
of the backup selection in the Backup utility.
Note: On domain controllers, System State can be restored only by restarting the
domain controller in Directory Services Restore Mode. NTDSUTIL is used to
recover deleted objects in Active Directory.
System Recovery
In the event of a system failure, the recovery of the system is difficult and tedious for
administrators. Recovery involves reinstallation of the operating system, mounting and
cataloging the backup tape, and then performing the full restore. To make this process
easier, Windows provides a feature called Automated System Recovery (ASR). ASR is used to
perform a restore of the System State data and services in the event of a major system
failure. An ASR restore includes the configuration information for devices. ASR backs up the
system data and local system partition.
58
How to create an ASR set?
Take the following steps to create an Automated System Recovery (ASR) set by using the
Backup or Restore Wizard:
1.Run Backup from Start Menu > Programs > Accessories > System Tools > Backup.
2.In the welcome screen of the Backup or Restore Wizard, click the Advanced Mode
link.
3.On the welcome page of the Advanced Mode of the Backup utility, choose the ASR
Wizard option from the Tools menu.
4.In the welcome screen of the ASR Wizard, click the Next button.
5.On the Backup Destination page, specify the location of the backup, and click the
Next button.
6.Click the Finish button.
Note: An ASR backup does not include folders and files.
Best practices for Backup
According to Microsoft, administrators should take the following steps to ensure the recovery
in case of a system failure:
Develop backup and restore strategies and test them.
Train appropriate personnel.
In a high-security network, ensure that only administrators are able to restore files.
Back up all data on the system and boot volumes and the System State.
Back up the data on all volumes and the System State data at the same time.
Create an Automated System Recovery backup set.
Create a backup log.
Keep at least three copies of the media. Keep at least one copy off-site in a properly
controlled environment.
Perform trial restorations.
Secure devices and media.
Do not disable the default volume shadow copy backup method and revert to the
pre-Windows Server 2003 backup method.
Back up your server cluster effectively.
Back up the cluster disks from each node.
3.C - Data Replication
59
Data Replication provides many benefits in today's IT environments. For
example, it can allow system administrators to create and manage multiple copies of vital
information across a global enterprise. This enables disaster recovery solutions, maximizes
business continuity, and permits file server content to be distributed over the Internet.
Replication options can even improve host processing efficiency by moving data sets onto
secondary (often remote) servers for backup operations. In some cases, these data
replication capabilities are required by the "high availability" and "server clustering"
features provided by many of today's SAN architectures. Remote data replication is typically
achieved with one of two basic strategies:
Storage replication is focused on the bulk transfer of files, or block data, from one server
to "one or more" other servers. This type of replication generally allows application(s) to be
running on a server while they, and/or their data, are being replicated to another off-site
server.
Application level replication is specific to a particular application, such as a database or
web server, and is typically performed at the transaction level (field, row, table, etc.) by the
application itself. Many replication products include the ability to transfer data
Synchronously or Asynchronously.
With synchronous transfers, each packet of transmitted data is acknowledged by the
receiving server before more data is sent to it. This can be a slower form of replication, but
is very reliable. Asynchronous data transfers allow data packets to be sent ahead of
acknowledgements from the receiving server for previously sent packets. This method is
usually faster but allows more data to be lost if links fail. Eventually, in either case, all
transmitted packets must be acknowledged by the receiving system.
60