24
MQSeries Auto Channel Recovery William Hao Communications Middleware Worldwide Technical Operations July 11, 2002

MQSeries Auto Channel Recovery William Hao Communications Middleware Worldwide Technical Operations July 11, 2002

Embed Size (px)

Citation preview

MQSeries Auto Channel Recovery

William HaoCommunications Middleware Worldwide Technical OperationsJuly 11, 2002

2

Contents

• Overview of Worldspan’s Current MQ Connectivity

• Summary of MQ Channel Issues

• Solutions to MQ Channel Issues

• Conclusion/Benefits

3

Overview of Worldspan’s Current MQ Connectivity

TPF Loosely-Coupled Complex

UNIX MQ Hub

WIN Servers

Remote MQ Connections

OS/390

UNIX

OTHERS

UNISYS

4

Summary

of

MQ Channel Issues

5

Summary of MQ Channel Issues

• Automated Channel Retry in TPF (PJ28758) not yet available at the time.

text

TPF

chl restart

6

Summary of MQ Channel Issues (cont.)

• Message sequence numbering between sender and receiver channel pair gets out of sync.

Sequence numbers are generated at the sending end of the channel and is incremented before being used, which means that the current seq num is that of the last message sent. These are filed for the last message transferred in a batch and are used during channel start-up to ensure that both ends agree on which messages have been transferred succesfully.

text text

Msg seq = 00123

MQ Server

MQ Server

Msg seq = 00113RECEIVER

SENDER

7

Summary of MQ Channel Issues (cont.)

• Sender channels go into INDOUBT status.

In MQ, messages are always transferred individually; however, these are committed or backed out as a batch. When MQ commits a batch, it syncpoints a logical unit of work (LUW). If this syncpoint procedure is interrupted, an indoubt chl condition may occur.

text text

MQ Server

MQ Server

8

text text

Summary of MQ Channel Issues (cont.)

• TPF rcvr chl shows READY but partner sdr chl in UNIX cannot establish channel connection.

• UNIX rcvr chl shows RUNNING but partner sdr chl in TPF cannot establish channel connection.

start chl ready /

running

MQ Server

MQ Server

9

Solutions

To

MQ Channel Issues

10

Automated Channel Recovery Function in TPF

Cycle to NORM activates a time-initiated auto-chl recovery function which has the following features:

• First time around, START all sdr chls.

• CRETs to itself every minute.

• Check status of all sdr chls and perform necessary action.

• Can be activated or deactivated via functional entry.

11

Automated Channel Reset for TPF

RESET and START the sender channel

Is sender chlStatus not READY

Nor INDOUBT?

YES

12

Automated Channel Resolve for TPF

The sdr chl goes into INDOUBT status if it is in doubt with the partner rcvr chl about which msgs have been sent and received. In this situation, the sdr chl has to be told whether to COMMIT or BACKOUT these msgs. Although this condition rarely occurs, it requires manual intervention to resynchronize the channels via functional entry.

13

Automated Channel Resolve for TPF

RESOLVE, RESET and START the sender channel

Is sender chlStatus INDOUBT?

YES

14

Automated Channel Retry for UNIX

UNIX v5.2 has a built-in channel retry mechanism and may be used in conjunction with the following channel attributes:

• SHORTRTY – Short retry is the max nbr times sdr chl will try to allocate a session to its partner (set at 60).

• SHORTTMR – Short retry timer is the interval in sec wherein sdr chl will wait before retrying to establish a chl connection during the short retry mode (set at 60 sec).

• LONGRTY – Long retry kicks in after SHORTRTY expires (set at 999999999).

15

Automated Channel Retry for UNIX (cont.)

• LONGTMR – Long retry timer is set at 1200 sec (20 min).

• HBINT – Heartbeat interval is the interval in sec wherein the sending MCA will send heartbeat flows to unblock the receiving MCA so that it can disconnect the channel.

• DISCINT – Disconnect interval is the time out value in sec for the sdr chl to disconnect when the xmitq becomes empty.

Note: Setting these channel attributes will work only when the Queue Manager of the partner channel can support it.

16

Automated Channel Recovery Function for UNIX

The CRON table contains a script file which has the following features:

• Activated once every minute.

• Check status of all sdr chls.

17

Automated Channel Resolve for UNIX

RESOLVE and RESET the sender channel

Is sender chlStatus INDOUBT?

YES

18

Automated Channel Reset for UNIX

RESET the sender channel

Is sender channel in RETRYING mode?

YES

19

Automated Channel Reset for UNIX

RESET the sender channel

No chl status for sender channel?

YES

20

Using TCP KeepAlive

• TCP KeepAlive knows nothing about MQSeries channels. It works on the TCP socket level.

• It sends a KeepAlive msg to the socket partner.

• If it detects that the partner is no longer available, it will disconnect the socket.

21

text text

Using TCP KeepAlive (cont.)

• Alleviates the problem where the rcvr chl shows READY or RUNNING but the partner sdr chl is retrying to establish a new connection.

MQ Server

MQ Server

start chl

chl started

22

Using TCP KeepAlive (cont.)

• For TPF native stack, PJ28289 (PUT16 APAR) enables the KeepAlive option of the socket used by MQ rcvr chls.

• For TPF native stack, the socket sweeper checks if a socket has the KeepAlive option and sends a KeepAlive msg. Currently, the socket sweeper activates every 2 minutes.

• For UNIX, the KeepAlive interval is currently set to 1 minute.

23

Conclusion

Automated channel restart mechanism for TPF

Automated RESET mechanism for TPF

Automated RESOLVE mechanism for TPF

Automated RESET mechanism for UNIX

Automated RESOLVE mechanism for UNIX

Automated channel resolution between TPF and UNIX

24

Benefits

• Eliminates manual intervention from staff

• Faster MQ channel recovery times

• Increased uptime = $$$ more company revenues