Upload
swift
View
202
Download
5
Tags:
Embed Size (px)
DESCRIPTION
SOFE 2013 - FNAP for your infrastructure - session presentation
Citation preview
SWIFT Operations Forum - EMEA
FNAO for your Infrastructure
Johan Limborgh / Dries Watteyne
27-29 November 2013
Amsterdam
1. What is the Top 1 reason for extended downtime of a SWIFT
infrastructures?
2. In case of blocking issues, what is the best way to contact SWIFT
support?
3. What are the 3 main issues SWIFT support sees with DR sites?
4. Can you contact SWIFTSupport: 24x7, 24x5, only during business
hours?
5. Where does SWIFT spend most time when faced with a failure?
6. When IT people talk to other IT people, they often forget to talk about
something critical! What is it all about?
7. Who remembers some of these questions?
FNAO for your SWIFT infrastructure – 27-29 November 2013 2
Quiz
1. What is the Top 1 reason for extended downtime of a SWIFT
infrastructures? Recovery procedures/systems not documented or
tested
2. In case of blocking issues, what is the best way to contact SWIFT
support? Call
3. What are the 3 main issues SWIFT support sees with DR sites?
certificates, not tested, outdated contact list
4. Can you contact SWIFTSupport: 24x7, 24x5, only during business
hours? 24x7
5. Where does SWIFT spend most time when faced with a failure?
Preventive measures to avoid re-occurrence
6. When IT people talk to other IT people, they often forget to talk about
something critical! What is it all about? Business impact
FNAO for your SWIFT infrastructure – 27-29 November 2013 3
Quiz
Did you listen in the past sessions / quiz?
Blocking cases?
• How should blocking cases be reported to SWIFT?
• Which piece of the pie represents the blocking cases raised by
phone (other types are e-mail, web ticket or manually by SWIFT
staff)?
FNAO for your SWIFT infrastructure – 27-29 November 2013 4
Did you listen in the past sessions?
Blocking cases?
• How should blocking cases be reported to SWIFT?
• Which piece of the pie represents the blocking cases raised by
phone (other types are e-mail, web ticket or manually by SWIFT
staff)?
FNAO for your SWIFT infrastructure – 27-29 November 2013 5
Agenda
Introduction
Resiliency
Resilient Lite2 scenario
Resilient SWIFT interface scenario
Generic preventive actions
Services - Best practices
What’s next ?
FNAO for your SWIFT infrastructure – 27-29 November 2013 6
Increased
Resilience
Reduced
Operational
Impact
Introduction
• Failure is Not An Option - advice based on:
– SWIFT Customer service (case) interactions
– Health check findings
– Practical experiences in the field
• Failure does happen
– How to reduce operational impact?
– How to increase availability, resiliency?
– How can we learn from it?
FNAO for your SWIFT infrastructure – 27-29 November 2013 7
Advice from SWIFT Customer services
FNAO for your SWIFT infrastructure – 27-29 November 2013 8
Statistics 91,000 cases per year
500,000 customer interactions per year
Support
centres
3 regional support centres: EMEA, Americas & Asia Pacific
6 locations: US, NL, BE, UK, HK & RU
Awards Industry awards for mission critical support
from TSIA, ASP
Customer Satisfaction evolution (1999-2012)
FNAO for your SWIFT infrastructure – 27-29 November 2013 9
Case analysis – products/severity
FNAO for your SWIFT infrastructure – 27-29 November 2013 10
Top 5 products with blocking issues
• 29% Alliance Access & Entry
• 22% Alliance Lite and Lite2
• 11% Alliance Gateway
• 8% SWIFTNet Link
• 4% Alliance Connect (network)
• Special attention required for these products to:
– Prevent known issues and advise on best practices
– Improve resiliency to reduce downtime in case of failure
FNAO for your SWIFT infrastructure – 27-29 November 2013 11
Health check
SWIFT Health Checks
FNAO for your SWIFT infrastructure – 27-29 November 2013 12
identify security risks
identify configuration risks
identify performance risks
alignment to SWIFT recommendations
1. Schedule regular testing of the environment
2. Schedule a regular reboot of your equipment
3. Know your environment : know roles and responsibilities
4. Re-assure matching system requirements
5. Do you use true Network Partner diversification ?
6. Is spare equipment easily available?
7. Security: password management
8. Monitoring log files & connections
9. Backup : frequency, location
10.Spare accounts
FNAO for your SWIFT infrastructure – 27-29 November 2013 14
Top 10 health-check findings
Learn: Troubleshooting course
FNAO for your SWIFT infrastructure – 27-29 November 2013 15 www.swift.com > Training > training topics > Connectivity
Troubleshoot most commonly problems
Recommendations
Provide you with diagnostic info
Guidelines
Agenda
Introduction
Resiliency
Resilient Lite2 scenario
Resilient SWIFT interface scenario
Generic preventive actions
Services - Best practices
What’s next ?
FNAO for your SWIFT infrastructure – 27-29 November 2013 17
Increased
Resilience
Reduced
Operational
Impact
Resiliency
• “To minimise the impact of a technical failure on your business,
you should be prepared for any disaster that might arise.”
• Failover procedures
• HSM Resilience
• PKI resilience
– Certificate renewal
– SO access to secondary SNL
• Failover routes
• Application data to restore
FNAO for your SWIFT infrastructure – 27-29 November 2013 18
Prepare
Test
Learn
VPN
Messaging application
Communication interface
Hardware Security Module (HSM)
SWIFTNet connectivity
Agenda
Introduction
Resiliency
Resilient Lite2 scenario
Resilient SWIFT interface scenario
Generic preventive actions
Services - Best practices
What’s next ?
FNAO for your SWIFT infrastructure – 27-29 November 2013 19
Increased
Resilience
Reduced
Operational
Impact
Lite2 scenario
FNAO for your SWIFT infrastructure – 27-29 November 2013 20
SWIFTNet
Lite2 Server
User
ISP 1
Autoclient
LSO
RSO
Production
ISP 2
https://www2.swift.com/support/knowledgebase/tip/5017169
SWIFTNet
Lite2 Server
User
ISP 1
Autoclient
LSO
RSO
DR
ISP 2
Lite2 AutoClient Resiliency Multiple instances - locally or over distant sites
FNAO for your SWIFT infrastructure – 27-29 November 2013 21
• Two or more separate Lite2 AutoClient instances, with
unique instance names and using separate Lite2
tokens. Only 1 AutoClient instance is active.
• Both tokens must contain equivalent DNs.
•Once started, the Disaster AutoClient instance will
retrieve by default all files from the last 30 days.
Support for Active/Cold Standby configurations
AutoClient
Production
SWIFTNet
Lite2 Server
AutoClient
Disaster
Internet Internet
Lite2 AutoClient Resiliency Multiple instances - locally or over distant sites
FNAO for your SWIFT infrastructure – 27-29 November 2013 22
• Two or more separate Lite2 AutoClient instances, with
unique instance names, can connect to Lite2 server using
separate Lite2 tokens.
• Both tokens must contain equivalent DNs.
• Both AutoClients (Production & Disaster) are active at
the same time; Production instance is connected to the
back-office application.
• All same files will be downloaded on all AutoClients.
No routing to specific AutoClient is possible.
Support for Active/Hot Standby configurations
AutoClient
Production 1
SWIFTNet
Lite2 Server
AutoClient
Disaster
Internet Internet
Agenda
Introduction
Resiliency
Resilient Lite2 scenario
Resilient SWIFT interface scenario
Generic preventive actions
Services - Best practices
What’s next ?
FNAO for your SWIFT infrastructure – 27-29 November 2013 23
Increased
Resilience
Reduced
Operational
Impact
Single host infrastructure
24
Prime/Backup site
PRIMARY host
•Alliance Access 7.0.71
•Alliance Gateway 7.0.21
VPN VPN
-Bronze
-Silver
or
VPN VPN
Contingency site
DR host
•Alliance Access 7.0.50
•Alliance Gateway 7.0.20
-Bronze
-Silver
or
24
Copy
Backups
•Alliance Access 7.0.71
•Alliance Gateway 7.0.21
Keep your Backup or
Contingency systems on the
same patch level and
license
Implement Backups on
Alliance Gateway and
SWIFTNet Link
Multiple ISP
Backup tokens or HSM
cluster
How to recover lost
messages on DR
procedure?
Is the DR site working?
Connectivity?
Certificates?
Applications? FNAO for your SWIFT infrastructure – 27-29 November 2013
In case of system failure, the back office and users need to switch to the
contingency SAA
Extra resiliency locally
FNAO for your SWIFT infrastructure – 27-29 November 2013 25 25
Prime/Backup site
PRIMARY host
•Alliance Entry/Access
•Alliance Gateway
BACKUP host
•Alliance Entry/Access
•Alliance Gateway
VPN VPN
-Bronze (ISP, ISP)
-Silver (LL, ISP)
VPN VPN
Contingency site
DR host
•Alliance Entry/Access
•Alliance Gateway
-Bronze
-Silver
or or
Extra resilience for emergencies
FNAO for your SWIFT infrastructure – 27-29 November 2013 26 26
Prime/Backup site
PRIMARY host
•Alliance Entry/Access
•Alliance Gateway
VPN VPN
-Bronze (ISP, ISP)
-Silver (LL, ISP)
VPN VPN
Contingency site
DR host
•Alliance Entry/Access
•Alliance Gateway
-Bronze
-Silver
or or
Alliance Lifeline
Worst case
FNAO for your SWIFT infrastructure – 27-29 November 2013 27
Prime/Backup site
PRIMARY host
•Alliance Entry/Access
•Alliance Gateway
BACKUP host
•Alliance Entry/Access
•Alliance Gateway
VPN VPN
-Bronze (ISP, ISP)
-Silver (LL, ISP)
VPN VPN
Contingency site
DR host
•Alliance Entry/Access
•Alliance Gateway
-Bronze
-Silver
or or
Secure Channel, offline request: • Who can do it?
• Do they have a activated Secure Code Card?
Multi host with local resiliency
RAHA license
FNAO for your SWIFT infrastructure – 27-29 November 2013 28 28
VPN VPN
Contingency site
VPN VPN
Prime/Backup site PRIMARY host
•Alliance Entry/Access
BACKUP host
•Alliance Gateway
DR host
-Bronze (ISP, ISP)
-Silver (LL, ISP) -Bronze
-Silver
•Alliance Entry/Access
•Alliance Entry/Access
•Alliance Gateway •Alliance Gateway
DMZ DMZ RAHA RAHA RAHA
fin RMA/Fileact
Split traffic over two sites: Advantage: All sites used constantly
Disadvantage: extra SNL license
Real time FileAct or InterAct services need a reroute to receive on a
different SAG.
FIN and SnF: Automatic failover to backup Gateway.
Back office only need to switch in case of SAA failure
What about the messaging?
FNAO for your SWIFT infrastructure – 27-29 November 2013 29
DB REDO
Log
Software
+
DB Backup
DB Backup
DB Backup
(on-line)
Daily Copy
Restore
Backup Cold Backup
(No messages
& Events)
Synch
Primary Backup Disaster
DB REDO
Log
Software
+
DB REDO
Log
Software
+
• Message loss
• ISN/OSN gap check
• Message Retrieval & Resend
with PDE
• Archive restored
• No message loss
• Automatic redo log replay
at start-up
• Operations transparently
resumed
FNAO for your SWIFT infrastructure – 27-29 November 2013 30
DB REDO
Log
Software
+
Primary
Mirror
Disk Backup
disk
+
Database Recovery option
Recovery
Backup
DB REDO
Log
Software
+
Mirror
Disk Backup
disk
+
Database Recovery option
Synch
Disaster
DB REDO
Log
Software
+
Mirror
Disk Backup
disk
+
Database Recovery option
Asynch
optional
Resiliency with DB Recovery
Site Recovery – Partial DB Recovery
FNAO for your SWIFT infrastructure – 27-29 November 2013 31
DB REDO
Log
Software
+
Synch
Primary Backup Disaster
DB REDO
Log
Software
+
DB REDO
Log
Software
+
Mirror
Disk Backup
disk
+
Database Recovery option
Mirror
Disk Backup
disk
+
Database Recovery option
Synch
optional Mirror
Disk Backup
disk
+
Database Recovery option
Asynch Disk replication (SAN)
!
Database
Recovery
Partial mode
‘Up to last
valid data’
?
Agenda
Introduction
Resiliency
Resilient Lite2 scenario
Resilient SWIFT interface scenario
Generic preventive actions
Services - Best practices
What’s next ?
FNAO for your SWIFT infrastructure – 27-29 November 2013 32
Increased
Resilience
Reduced
Operational
Impact
Preventive actions
• Disaster recovery plans
– Run a system health-check by an external company
– It is good to create a disaster plan … it is better to test it
– One test is not enough … build a book of scenarios
– Review your plan on regular basis and adapt
– Think the unthinkable
• Roles & Responsibilities
– One person should be responsible to maintain the roles and
responsibilities across the organization; a backup should be
assigned for each of them
– Ensure everybody involved is fully up to date on his/her role in a
potential disaster scenario
– Regularly verify each role and update over time
FNAO for your SWIFT infrastructure – 27-29 November 2013 33
Prepare for the worst:
SWIFT incident management
FNAO for your SWIFT infrastructure – 27-29 November 2013 34
Awareness at all levels of company (including certifcation)
Call out key customer contacts to verify role and awareness
One daily checkpoint (14:15) / One daily report to senior mgt (even if NTR)
Rehearse - one drop of rain every day is better than one shower every year
No dedicated Crisis management roles, engrained within company All roles are round robin (per week) by line & senior (operational) management
Clear roles & responsibilities - No time to argue during crisis/incident
Recovery of services comes first – investigation & correction second !
Don’t be complacent – DO IT (DR - hot stand-by systems) !
Agenda
Introduction
Resiliency
Resilient Lite2 scenario
Resilient SWIFT interface scenario
Generic preventive actions
Services - Best practices
What’s next ?
FNAO for your SWIFT infrastructure – 27-29 November 2013 35
Increased
Resilience
Reduced
Operational
Impact
SWIFT Support
We can offer you
• Standard Support 24x7
– Highly qualified staff
– Excellent reachability
• Specific Support
– Premium, Premium Plus
• Remote Support
• Consulting Services
– Installations
– In depth analysis
– Health-checks
• …
We can’t offer you
• Immediate on site visit
• Conclusions without evidences
• Handholding for “easy” steps
• Managing your third parties
• Support to your back-office
application
• Out of band config…
FNAO for your SWIFT infrastructure – 27-29 November 2013 36
• Check www.swift.com for:
– Self-help guide
(swift.com > Support > Self-help guide)
– Operational status (swift.com > Support > Operational Status)
– Guidelines to protect and to maintain HSM boxes
• Tip 2097727 – HSM Box Best Practice Guidelines
– Schedule for Troubleshooting training
FNAO for your SWIFT infrastructure – 27-29 November 2013 37
SWIFT tools and documentation
• Subscribe to:
– Operational status notifications
– Connectivity Email Contact role(s)
– Support newsletter
• Consult our Knowledge base
• Consult Self-help guide on www.swift.com
• Contact SWIFT Customer Support
– Use Case Manager
– Use Phone for blocking situations
• Regularly check www.swift.com
– “Ordering’’ and ‘’Support” pages for latest news
FNAO for your SWIFT infrastructure – 27-29 November 2013 38
Support sources
• State the impact to your business - do you have to meet a cut-
off ?
• Clarify what system is impacted (production, test, DR, …)
• Provide an accurate description of the problem.
• List the actions that you have tried to troubleshoot the problem
• Did you experience similar problems in the past?
• Did you made recent network or system changes
• Prepare to supply relevant evidence to allow a faster
investigation and a faster resolution of the problem
FNAO for your SWIFT infrastructure – 27-29 November 2013 40
Troubleshooting Tips & Tricks
What to do and how to prepare?
Network issues: specific items to remember
• Arranging an onsite visit of a network partner
- Needs alignment and clear agreements
- SWIFT is responsible for arranging the onsite visit
- Customer is responsible for providing access to buildings and
equipment – keep it simple!
FNAO for your SWIFT infrastructure – 27-29 November 2013 41
Agenda
Introduction
Roles and responsibilities
Technical environment
Documented processes
Generic preventive actions
Services - Best practices
What’s next ?
FNAO for your SWIFT infrastructure – 27-29 November 2013 42
Increased
Resilience
Reduced
Operational
Impact
SWIFT:
- Continuous improvement to keep up with best practices and new
technology
- Business impact assessment
- Matching KPI’s: it is not about fast closing cases!
- Understand our customers! Managing a relationship is key!
FNAO for your SWIFT infrastructure – 27-29 November 2013 43
What’s next ?
New proposed service features
44
Pre-empts possible
system outages by
finding potential
problems
Operational check-up
Dedicated personal
sessions around
specific operational
topics
FNAO for your SWIFT infrastructure – 27-29 November 2013
You:
- Continuous improvement to keep up with best practices and new
technology
- Business impact assessment
- Managing a relationship is key! Updated records, procedures and
testing will help fast recover from a problem.
FNAO for your SWIFT infrastructure – 27-29 November 2013 45
What’s next ?
Q&A
FNAO for your SWIFT infrastructure – 27-29 November 2013 46
?