43
SWIFT Operations Forum - EMEA FNAO for your Infrastructure Johan Limborgh / Dries Watteyne 27-29 November 2013 Amsterdam

Sofe fnao 2013_v1

  • Upload
    swift

  • View
    202

  • Download
    5

Embed Size (px)

DESCRIPTION

SOFE 2013 - FNAP for your infrastructure - session presentation

Citation preview

Page 1: Sofe fnao 2013_v1

SWIFT Operations Forum - EMEA

FNAO for your Infrastructure

Johan Limborgh / Dries Watteyne

27-29 November 2013

Amsterdam

Page 2: Sofe fnao 2013_v1

1. What is the Top 1 reason for extended downtime of a SWIFT

infrastructures?

2. In case of blocking issues, what is the best way to contact SWIFT

support?

3. What are the 3 main issues SWIFT support sees with DR sites?

4. Can you contact SWIFTSupport: 24x7, 24x5, only during business

hours?

5. Where does SWIFT spend most time when faced with a failure?

6. When IT people talk to other IT people, they often forget to talk about

something critical! What is it all about?

7. Who remembers some of these questions?

FNAO for your SWIFT infrastructure – 27-29 November 2013 2

Quiz

Page 3: Sofe fnao 2013_v1

1. What is the Top 1 reason for extended downtime of a SWIFT

infrastructures? Recovery procedures/systems not documented or

tested

2. In case of blocking issues, what is the best way to contact SWIFT

support? Call

3. What are the 3 main issues SWIFT support sees with DR sites?

certificates, not tested, outdated contact list

4. Can you contact SWIFTSupport: 24x7, 24x5, only during business

hours? 24x7

5. Where does SWIFT spend most time when faced with a failure?

Preventive measures to avoid re-occurrence

6. When IT people talk to other IT people, they often forget to talk about

something critical! What is it all about? Business impact

FNAO for your SWIFT infrastructure – 27-29 November 2013 3

Quiz

Page 4: Sofe fnao 2013_v1

Did you listen in the past sessions / quiz?

Blocking cases?

• How should blocking cases be reported to SWIFT?

• Which piece of the pie represents the blocking cases raised by

phone (other types are e-mail, web ticket or manually by SWIFT

staff)?

FNAO for your SWIFT infrastructure – 27-29 November 2013 4

Page 5: Sofe fnao 2013_v1

Did you listen in the past sessions?

Blocking cases?

• How should blocking cases be reported to SWIFT?

• Which piece of the pie represents the blocking cases raised by

phone (other types are e-mail, web ticket or manually by SWIFT

staff)?

FNAO for your SWIFT infrastructure – 27-29 November 2013 5

Page 6: Sofe fnao 2013_v1

Agenda

Introduction

Resiliency

Resilient Lite2 scenario

Resilient SWIFT interface scenario

Generic preventive actions

Services - Best practices

What’s next ?

FNAO for your SWIFT infrastructure – 27-29 November 2013 6

Increased

Resilience

Reduced

Operational

Impact

Page 7: Sofe fnao 2013_v1

Introduction

• Failure is Not An Option - advice based on:

– SWIFT Customer service (case) interactions

– Health check findings

– Practical experiences in the field

• Failure does happen

– How to reduce operational impact?

– How to increase availability, resiliency?

– How can we learn from it?

FNAO for your SWIFT infrastructure – 27-29 November 2013 7

Page 8: Sofe fnao 2013_v1

Advice from SWIFT Customer services

FNAO for your SWIFT infrastructure – 27-29 November 2013 8

Statistics 91,000 cases per year

500,000 customer interactions per year

Support

centres

3 regional support centres: EMEA, Americas & Asia Pacific

6 locations: US, NL, BE, UK, HK & RU

Awards Industry awards for mission critical support

from TSIA, ASP

Page 9: Sofe fnao 2013_v1

Customer Satisfaction evolution (1999-2012)

FNAO for your SWIFT infrastructure – 27-29 November 2013 9

Page 10: Sofe fnao 2013_v1

Case analysis – products/severity

FNAO for your SWIFT infrastructure – 27-29 November 2013 10

Page 11: Sofe fnao 2013_v1

Top 5 products with blocking issues

• 29% Alliance Access & Entry

• 22% Alliance Lite and Lite2

• 11% Alliance Gateway

• 8% SWIFTNet Link

• 4% Alliance Connect (network)

• Special attention required for these products to:

– Prevent known issues and advise on best practices

– Improve resiliency to reduce downtime in case of failure

FNAO for your SWIFT infrastructure – 27-29 November 2013 11

Health check

Page 12: Sofe fnao 2013_v1

SWIFT Health Checks

FNAO for your SWIFT infrastructure – 27-29 November 2013 12

identify security risks

identify configuration risks

identify performance risks

alignment to SWIFT recommendations

Page 13: Sofe fnao 2013_v1

1. Schedule regular testing of the environment

2. Schedule a regular reboot of your equipment

3. Know your environment : know roles and responsibilities

4. Re-assure matching system requirements

5. Do you use true Network Partner diversification ?

6. Is spare equipment easily available?

7. Security: password management

8. Monitoring log files & connections

9. Backup : frequency, location

10.Spare accounts

FNAO for your SWIFT infrastructure – 27-29 November 2013 14

Top 10 health-check findings

Page 14: Sofe fnao 2013_v1

Learn: Troubleshooting course

FNAO for your SWIFT infrastructure – 27-29 November 2013 15 www.swift.com > Training > training topics > Connectivity

Troubleshoot most commonly problems

Recommendations

Provide you with diagnostic info

Guidelines

Page 15: Sofe fnao 2013_v1

Agenda

Introduction

Resiliency

Resilient Lite2 scenario

Resilient SWIFT interface scenario

Generic preventive actions

Services - Best practices

What’s next ?

FNAO for your SWIFT infrastructure – 27-29 November 2013 17

Increased

Resilience

Reduced

Operational

Impact

Page 16: Sofe fnao 2013_v1

Resiliency

• “To minimise the impact of a technical failure on your business,

you should be prepared for any disaster that might arise.”

• Failover procedures

• HSM Resilience

• PKI resilience

– Certificate renewal

– SO access to secondary SNL

• Failover routes

• Application data to restore

FNAO for your SWIFT infrastructure – 27-29 November 2013 18

Prepare

Test

Learn

VPN

Messaging application

Communication interface

Hardware Security Module (HSM)

SWIFTNet connectivity

Page 17: Sofe fnao 2013_v1

Agenda

Introduction

Resiliency

Resilient Lite2 scenario

Resilient SWIFT interface scenario

Generic preventive actions

Services - Best practices

What’s next ?

FNAO for your SWIFT infrastructure – 27-29 November 2013 19

Increased

Resilience

Reduced

Operational

Impact

Page 18: Sofe fnao 2013_v1

Lite2 scenario

FNAO for your SWIFT infrastructure – 27-29 November 2013 20

SWIFTNet

Lite2 Server

User

ISP 1

Autoclient

LSO

RSO

Production

ISP 2

https://www2.swift.com/support/knowledgebase/tip/5017169

SWIFTNet

Lite2 Server

User

ISP 1

Autoclient

LSO

RSO

DR

ISP 2

Page 19: Sofe fnao 2013_v1

Lite2 AutoClient Resiliency Multiple instances - locally or over distant sites

FNAO for your SWIFT infrastructure – 27-29 November 2013 21

• Two or more separate Lite2 AutoClient instances, with

unique instance names and using separate Lite2

tokens. Only 1 AutoClient instance is active.

• Both tokens must contain equivalent DNs.

•Once started, the Disaster AutoClient instance will

retrieve by default all files from the last 30 days.

Support for Active/Cold Standby configurations

AutoClient

Production

SWIFTNet

Lite2 Server

AutoClient

Disaster

Internet Internet

Page 20: Sofe fnao 2013_v1

Lite2 AutoClient Resiliency Multiple instances - locally or over distant sites

FNAO for your SWIFT infrastructure – 27-29 November 2013 22

• Two or more separate Lite2 AutoClient instances, with

unique instance names, can connect to Lite2 server using

separate Lite2 tokens.

• Both tokens must contain equivalent DNs.

• Both AutoClients (Production & Disaster) are active at

the same time; Production instance is connected to the

back-office application.

• All same files will be downloaded on all AutoClients.

No routing to specific AutoClient is possible.

Support for Active/Hot Standby configurations

AutoClient

Production 1

SWIFTNet

Lite2 Server

AutoClient

Disaster

Internet Internet

Page 21: Sofe fnao 2013_v1

Agenda

Introduction

Resiliency

Resilient Lite2 scenario

Resilient SWIFT interface scenario

Generic preventive actions

Services - Best practices

What’s next ?

FNAO for your SWIFT infrastructure – 27-29 November 2013 23

Increased

Resilience

Reduced

Operational

Impact

Page 22: Sofe fnao 2013_v1

Single host infrastructure

24

Prime/Backup site

PRIMARY host

•Alliance Access 7.0.71

•Alliance Gateway 7.0.21

VPN VPN

-Bronze

-Silver

or

VPN VPN

Contingency site

DR host

•Alliance Access 7.0.50

•Alliance Gateway 7.0.20

-Bronze

-Silver

or

24

Copy

Backups

•Alliance Access 7.0.71

•Alliance Gateway 7.0.21

Keep your Backup or

Contingency systems on the

same patch level and

license

Implement Backups on

Alliance Gateway and

SWIFTNet Link

Multiple ISP

Backup tokens or HSM

cluster

How to recover lost

messages on DR

procedure?

Is the DR site working?

Connectivity?

Certificates?

Applications? FNAO for your SWIFT infrastructure – 27-29 November 2013

In case of system failure, the back office and users need to switch to the

contingency SAA

Page 23: Sofe fnao 2013_v1

Extra resiliency locally

FNAO for your SWIFT infrastructure – 27-29 November 2013 25 25

Prime/Backup site

PRIMARY host

•Alliance Entry/Access

•Alliance Gateway

BACKUP host

•Alliance Entry/Access

•Alliance Gateway

VPN VPN

-Bronze (ISP, ISP)

-Silver (LL, ISP)

VPN VPN

Contingency site

DR host

•Alliance Entry/Access

•Alliance Gateway

-Bronze

-Silver

or or

Page 24: Sofe fnao 2013_v1

Extra resilience for emergencies

FNAO for your SWIFT infrastructure – 27-29 November 2013 26 26

Prime/Backup site

PRIMARY host

•Alliance Entry/Access

•Alliance Gateway

VPN VPN

-Bronze (ISP, ISP)

-Silver (LL, ISP)

VPN VPN

Contingency site

DR host

•Alliance Entry/Access

•Alliance Gateway

-Bronze

-Silver

or or

Alliance Lifeline

Page 25: Sofe fnao 2013_v1

Worst case

FNAO for your SWIFT infrastructure – 27-29 November 2013 27

Prime/Backup site

PRIMARY host

•Alliance Entry/Access

•Alliance Gateway

BACKUP host

•Alliance Entry/Access

•Alliance Gateway

VPN VPN

-Bronze (ISP, ISP)

-Silver (LL, ISP)

VPN VPN

Contingency site

DR host

•Alliance Entry/Access

•Alliance Gateway

-Bronze

-Silver

or or

Secure Channel, offline request: • Who can do it?

• Do they have a activated Secure Code Card?

Page 26: Sofe fnao 2013_v1

Multi host with local resiliency

RAHA license

FNAO for your SWIFT infrastructure – 27-29 November 2013 28 28

VPN VPN

Contingency site

VPN VPN

Prime/Backup site PRIMARY host

•Alliance Entry/Access

BACKUP host

•Alliance Gateway

DR host

-Bronze (ISP, ISP)

-Silver (LL, ISP) -Bronze

-Silver

•Alliance Entry/Access

•Alliance Entry/Access

•Alliance Gateway •Alliance Gateway

DMZ DMZ RAHA RAHA RAHA

fin RMA/Fileact

Split traffic over two sites: Advantage: All sites used constantly

Disadvantage: extra SNL license

Real time FileAct or InterAct services need a reroute to receive on a

different SAG.

FIN and SnF: Automatic failover to backup Gateway.

Back office only need to switch in case of SAA failure

Page 27: Sofe fnao 2013_v1

What about the messaging?

FNAO for your SWIFT infrastructure – 27-29 November 2013 29

DB REDO

Log

Software

+

DB Backup

DB Backup

DB Backup

(on-line)

Daily Copy

Restore

Backup Cold Backup

(No messages

& Events)

Synch

Primary Backup Disaster

DB REDO

Log

Software

+

DB REDO

Log

Software

+

• Message loss

• ISN/OSN gap check

• Message Retrieval & Resend

with PDE

• Archive restored

• No message loss

• Automatic redo log replay

at start-up

• Operations transparently

resumed

Page 28: Sofe fnao 2013_v1

FNAO for your SWIFT infrastructure – 27-29 November 2013 30

DB REDO

Log

Software

+

Primary

Mirror

Disk Backup

disk

+

Database Recovery option

Recovery

Backup

DB REDO

Log

Software

+

Mirror

Disk Backup

disk

+

Database Recovery option

Synch

Disaster

DB REDO

Log

Software

+

Mirror

Disk Backup

disk

+

Database Recovery option

Asynch

optional

Resiliency with DB Recovery

Page 29: Sofe fnao 2013_v1

Site Recovery – Partial DB Recovery

FNAO for your SWIFT infrastructure – 27-29 November 2013 31

DB REDO

Log

Software

+

Synch

Primary Backup Disaster

DB REDO

Log

Software

+

DB REDO

Log

Software

+

Mirror

Disk Backup

disk

+

Database Recovery option

Mirror

Disk Backup

disk

+

Database Recovery option

Synch

optional Mirror

Disk Backup

disk

+

Database Recovery option

Asynch Disk replication (SAN)

!

Database

Recovery

Partial mode

‘Up to last

valid data’

?

Page 30: Sofe fnao 2013_v1

Agenda

Introduction

Resiliency

Resilient Lite2 scenario

Resilient SWIFT interface scenario

Generic preventive actions

Services - Best practices

What’s next ?

FNAO for your SWIFT infrastructure – 27-29 November 2013 32

Increased

Resilience

Reduced

Operational

Impact

Page 31: Sofe fnao 2013_v1

Preventive actions

• Disaster recovery plans

– Run a system health-check by an external company

– It is good to create a disaster plan … it is better to test it

– One test is not enough … build a book of scenarios

– Review your plan on regular basis and adapt

– Think the unthinkable

• Roles & Responsibilities

– One person should be responsible to maintain the roles and

responsibilities across the organization; a backup should be

assigned for each of them

– Ensure everybody involved is fully up to date on his/her role in a

potential disaster scenario

– Regularly verify each role and update over time

FNAO for your SWIFT infrastructure – 27-29 November 2013 33

Page 32: Sofe fnao 2013_v1

Prepare for the worst:

SWIFT incident management

FNAO for your SWIFT infrastructure – 27-29 November 2013 34

Awareness at all levels of company (including certifcation)

Call out key customer contacts to verify role and awareness

One daily checkpoint (14:15) / One daily report to senior mgt (even if NTR)

Rehearse - one drop of rain every day is better than one shower every year

No dedicated Crisis management roles, engrained within company All roles are round robin (per week) by line & senior (operational) management

Clear roles & responsibilities - No time to argue during crisis/incident

Recovery of services comes first – investigation & correction second !

Don’t be complacent – DO IT (DR - hot stand-by systems) !

Page 33: Sofe fnao 2013_v1

Agenda

Introduction

Resiliency

Resilient Lite2 scenario

Resilient SWIFT interface scenario

Generic preventive actions

Services - Best practices

What’s next ?

FNAO for your SWIFT infrastructure – 27-29 November 2013 35

Increased

Resilience

Reduced

Operational

Impact

Page 34: Sofe fnao 2013_v1

SWIFT Support

We can offer you

• Standard Support 24x7

– Highly qualified staff

– Excellent reachability

• Specific Support

– Premium, Premium Plus

• Remote Support

• Consulting Services

– Installations

– In depth analysis

– Health-checks

• …

We can’t offer you

• Immediate on site visit

• Conclusions without evidences

• Handholding for “easy” steps

• Managing your third parties

• Support to your back-office

application

• Out of band config…

FNAO for your SWIFT infrastructure – 27-29 November 2013 36

Page 35: Sofe fnao 2013_v1

• Check www.swift.com for:

– Self-help guide

(swift.com > Support > Self-help guide)

– Operational status (swift.com > Support > Operational Status)

– Guidelines to protect and to maintain HSM boxes

• Tip 2097727 – HSM Box Best Practice Guidelines

– Schedule for Troubleshooting training

FNAO for your SWIFT infrastructure – 27-29 November 2013 37

SWIFT tools and documentation

Page 36: Sofe fnao 2013_v1

• Subscribe to:

– Operational status notifications

– Connectivity Email Contact role(s)

– Support newsletter

• Consult our Knowledge base

• Consult Self-help guide on www.swift.com

• Contact SWIFT Customer Support

– Use Case Manager

– Use Phone for blocking situations

• Regularly check www.swift.com

– “Ordering’’ and ‘’Support” pages for latest news

FNAO for your SWIFT infrastructure – 27-29 November 2013 38

Support sources

Page 37: Sofe fnao 2013_v1

• State the impact to your business - do you have to meet a cut-

off ?

• Clarify what system is impacted (production, test, DR, …)

• Provide an accurate description of the problem.

• List the actions that you have tried to troubleshoot the problem

• Did you experience similar problems in the past?

• Did you made recent network or system changes

• Prepare to supply relevant evidence to allow a faster

investigation and a faster resolution of the problem

FNAO for your SWIFT infrastructure – 27-29 November 2013 40

Troubleshooting Tips & Tricks

What to do and how to prepare?

Page 38: Sofe fnao 2013_v1

Network issues: specific items to remember

• Arranging an onsite visit of a network partner

- Needs alignment and clear agreements

- SWIFT is responsible for arranging the onsite visit

- Customer is responsible for providing access to buildings and

equipment – keep it simple!

FNAO for your SWIFT infrastructure – 27-29 November 2013 41

Page 39: Sofe fnao 2013_v1

Agenda

Introduction

Roles and responsibilities

Technical environment

Documented processes

Generic preventive actions

Services - Best practices

What’s next ?

FNAO for your SWIFT infrastructure – 27-29 November 2013 42

Increased

Resilience

Reduced

Operational

Impact

Page 40: Sofe fnao 2013_v1

SWIFT:

- Continuous improvement to keep up with best practices and new

technology

- Business impact assessment

- Matching KPI’s: it is not about fast closing cases!

- Understand our customers! Managing a relationship is key!

FNAO for your SWIFT infrastructure – 27-29 November 2013 43

What’s next ?

Page 41: Sofe fnao 2013_v1

New proposed service features

44

Pre-empts possible

system outages by

finding potential

problems

Operational check-up

Dedicated personal

sessions around

specific operational

topics

FNAO for your SWIFT infrastructure – 27-29 November 2013

Page 42: Sofe fnao 2013_v1

You:

- Continuous improvement to keep up with best practices and new

technology

- Business impact assessment

- Managing a relationship is key! Updated records, procedures and

testing will help fast recover from a problem.

FNAO for your SWIFT infrastructure – 27-29 November 2013 45

What’s next ?

Page 43: Sofe fnao 2013_v1

Q&A

FNAO for your SWIFT infrastructure – 27-29 November 2013 46

?