7/28/2019 Clustering - GSL
1/14
Microsoft Clustering GSLProduced by: Kingsley Bell
7/28/2019 Clustering - GSL
2/14
Produced by: Kingsley Bell
Distributed Operations Windows
Windows Server Support Contact Information
Windows Regional Services EMEA:
Hotline *448 6868
Email Address # IT TIS RDO EMEA DO Windows
ManagerAleet Kavia *448 7753
Back Office team Lead: Edwin Broersma *443 9606
Front Office team lead: Barry Roberts *448 5483
Windows Production Services (Global):Hotline *650 8888
Email Address # IT TIS RDO Windows Prod Svcs
ManagerTejendra Dhiman *650 8860
Remedy GIM / RFC Queue TIS_RDO_DO_WIN_PROD_SVCS
Remedy GIM / RFC Queue :
EMEA Asset Management TIS_RDO_EMEA_DO_WIN_ASSET_MGT
EMEA Equities & PrimeServices TIS_RDO_EMEA_DO_WIN_EQ_PS
EMEA Fixed Income & Deriv. TIS_RDO_EMEA_DO_WIN_FID_DRV
EMEA Back Office TIS_RDO_EMEA_DO_WIN_IBO_BO
7/28/2019 Clustering - GSL
3/14
Produced by: Kingsley Bell
Contents
What is MSCS? Cluster Overview
Cluster groups
Resources
Credit Suisse Naming Standards
Failover Disaster Recovery
Load Balancing
Questions & Answers
7/28/2019 Clustering - GSL
4/14
Produced by: Kingsley Bell
What is MSCS?
A cluster consists of two or more computers working together to provide a higher level of availability, reliability, and
scalability than can be obtained by using a single computer. Microsoft cluster technologies guard against three specific
types of failure:
Application and service failures, which affect application software and essential services.
System and hardware failures, which affect hardware components such as CPUs, drives, memory, network
adapters, and power supplies.
Site failures in multisite organizations, which can be caused by natural disasters, power outages, or
connectivity outages.
The ability to handle failure allows server clusters to meet requirements for high availability, which is the ability to
provide users with access to a service for a high percentage of time while reducing unscheduled outages.
In a server cluster, each server owns and manages its local devices and has a copy of the operating system and the
applications or services that the cluster is managing. Devices common to the cluster, such as disks in common disk
arrays and the connection media for accessing those disks, are owned and managed by only one server at a time. For
most server clusters, the application data is stored on disks in one of the common disk arrays, and this data is
accessible only to the server that currently owns the corresponding application or service.
Server clusters are designed so that the servers in the cluster work together to protect data, keep applications and
services running after failure on one of the servers, and maintain consistency of the cluster configuration over time.
7/28/2019 Clustering - GSL
5/14
Produced by: Kingsley Bell
Cluster Overview
7/28/2019 Clustering - GSL
6/14
Produced by: Kingsley Bell
Cluster Groups
Cluster Groups are used to group together all resources Required to run an application or instance.
A cluster group can only run on one physical node at one time. No other node will be able to access the resources e.g.Disks
Multiple cluster groups can be run simultaneously on the same node.
When a cluster group is moved to an other node all resources in that group are taken offline and brought up on the othernode.
An Active/Active cluster is when 2 cluster groups are running on 2 physical machines.
In case of a node failure the cluster service will automatically start the whole cluster group on an other available node.
The first cluster Group is used to operate the cluster, no other resources should be placed in this group.
Cluster groups have some configuration options - Preferred Owners, Failover (Threshold-Period), failback options.
7/28/2019 Clustering - GSL
7/14
Produced by: Kingsley Bell
Resources Resources reside in cluster groups
All resources required for a specific function should be grouped together
When a resource is in a cluster, it should only be administered through the cluster Typically a each cluster group has a network name and associated IP the resources can be accessed through
A large number of resource types can be created which can be used to provide a total clustered applicationecosystem
IP Address
Network Name
File share
Generic Service
Physical Disk
Some Resources have required dependencies e.g. the Network Name requires an IP address
You can create your own dependencies, for example a service can not start until a file share is online
Each resource has a number of configuration options
Some applications create new cluster resources e.g. MS SQL server
If the application is not cluster aware then the use of generic service/application can be used for a roll your ownsolution
Resources are required to be available on each node that may own the resource
Cluster aware applications will install required binaries on all nodes at install
Generic applications will need to have required binaries installed onto each node manually
7/28/2019 Clustering - GSL
8/14
7/28/2019 Clustering - GSL
9/14
7/28/2019 Clustering - GSL
10/14
Produced by: Kingsley Bell
FailoverMSCS does not provide a seamless failover solution, resources are shutdown on one node and
then brought up on an other node in case of failure
Careful consideration should be made when configuring resource parameters e.g. affect groupCluster resources should not be overcommitted to allow space for node failure e.g. if one cluster
group requires 80% of computing power to operate there should always be this amount of
capacity in the cluster available in case of node failure e.g. 2 cluster groups both need 55% of
compute power 3 nodes should be in the cluster
Keep all nodes in a cluster with the same specification
Individual resource failures can initiate a cluster failover
Node failure will initiate a cluster failover
7/28/2019 Clustering - GSL
11/14
Produced by: Kingsley Bell
Disaster RecoveryDR nodes should installed with enough resources to just run the cluster e.g. 3Production Nodes
requires 2 DR nodes
2+1
3+2 4+3
DR nodes typically have the cluster service disabled or running just the default cluster group with
all other cluster groups offline
DR nodes will need to have the application installed
Configuration changes need to be updated when the production configuring is changed
Credit Suisse utilises the following 3rd party vendor technologies to aid DR failover
EMC SRDF Symmetrix Remote Data Facility
CISCO LAM Local Area Mobility
When using SRDF the cluster disk resources will be unable to be brought online without the disks
being in a split state
The IP address and Network Name will be unable to be brought online if they are in use in the
estate
The DR nodes naming standard reflects the production nodes XNYC19P11013A -> XNYC19B11013A
CNYC19P11013 -> CNYC19B11013
With the use of LAM however, the Virtual server names will be able to be brought online in a DR
scenario (CNYC19P11013A)
7/28/2019 Clustering - GSL
12/14
Produced by: Kingsley Bell
MSCS Clusters DRCollection of clustered Windows servers with shared disks, IP addresses, network names and SQL resources. 2+1 or 3+2.
Shared Storage
PROD
Shared Storage
DR
LAM
SRDF
Heartbeatvlannon
-routed
Corp
Corp
Prod A
Prod B
DR
Slough Global Switch
Enable LAM in DR
Stop production resources
Split storage
Import disk groups in DR
Start Network and storage cluster resources in DR
Start SQL resources in DR
7/28/2019 Clustering - GSL
13/14
Produced by: Kingsley Bell
Load BalancingService is provided by the NOC
Cisco GSS Global Site Selector
Round Robin or weighted balancing
Session aware
End point node checking (ping)
Node port end point checking e.g. port 80,21,443 etc
Can also query website connectivity, e.g. 404 Page not found errors
7/28/2019 Clustering - GSL
14/14
Produced by: Kingsley Bell
Questions & Answers
Clustering solutions would be expected to support:
Automatic failover of application processes/services in the event of node failure
Automatic restart of application processes/services in the event of process failure
Automatic load balancing of peer processes/services in a cluster
Automatic reallocation of processes/services to ensure best utilisation of the cluster
Management and software deployment and provisioning at the cluster level rather than individual node level
Given these reasonable expectations of a clustering solution a number of facets of the GSL / SlatePlus clustering
arrangements don't seem to quite match these expectations.
Why is it necessary to install services explicitly to every machine rather than installing a service to a cluster andletting the clustering solution manage the deployment of services to the cluster nodes?
What role does GSL play in providing the clustering solution rather than relying upon the MS product. Forinstance is GSL Gateway necessary or could it be replaced in whole or in part by MS components?
What 'templates' exist for stateless services where we can run multiple instances of services concurrently in thecluster?
What 'templates' exist for stateful services where we would want only one instance of particular service to runat one time, but where we do want the benefits of the cluster, i.e. automatic failover to another node and
automatic restart?
http://couchfiresports.com/wp-content/uploads/2010/09/blue-question-mark_crop_340x234.jpg