Performance of stop-and-wait protocols over high-delay links

Performance of stop-and-wait protocols over high-delay links Inder Gopal and Parviz Kermani show that in most environments, acceptably

high throughput can be obtained without sacrificing data integrity

The performance of stop-and-wait protocols over links with high propagation delay such as satellite links is discussed. Primary focus is placed on situations where a large block size is employed to compensate for the loss of throughput caused by the large propagation delay. Possible loss in data integrity due to undetected errors is investigated. The results show that in most environments, acceptably high throughput can be obtained without sacrificing data integrity.

Keywords: computer networks, data communications, satellite links, stop-and-wait protocols

Stop-and-wait protocols, for example IBM's binary synchronous communicat ion (BSC), are among the most widely used forms of data transfer protocol. In such a protocol, data is transferred in the form of blocks, each block containing header and error- checking information. Typically, the protocol performs in the fol lowing fashion. The sender transmits a block of data and then waits for a response from the receiver. The receiver, upon receiving the block, checks for errors usingthe error-checking information in the block and sends to the transmitter either a positive or a negative acknowledgement. If the transmitter receives a positive acknowledgement, it proceeds to transmit the next block of data. If it receives a negative acknowledgement, it retransmits the current block.

The protocol is simple to implement and performs well over low delay links such as those that typical ly

IBM TJ Watson Research Center, PO Box 218, Yorktown Heights, NY 10598, USA

0140-3664/83/030115-05 $03.00 ©

connect remote terminals to a host. However, over links with lengthy propagation delay such as satellite links, the protocol is highly ineff icient as the fol lowing typical scenario indicates. The propagation delay over a satellite link is usually about a quarter of a second. If the l ink has a capacity of 20 kbit/s and if the size of the blocks being transmitted is 2 kbit, the t ime to transmit a block is only 0.1 s. On transmitt ing a block, however, the transmitter has to wait for the transmission to reach the receiver (0.25 s) and for the acknowledgement from the receiver to return to it (another 0.25 s), before another block can be transmitted. Thus, the transmitter is transmitt ing useful data for only one sixth of the time, clearly representing an underuti l ization of transmission resources. The problem is exacerbated in cases where the satellite has a larger capacity.

Several techniques have been suggested to make more efficient use of links with large propagation delay. The best solution is clearly to use one of the many continuous transmission protocols, such as HDLC, SDLC, ADCCP etc., and thereby eliminate the need for the transmitter to wait for a response from the receiver before sending another block. In many environments, however, this solution is not feasible because the currently used equipment supports only a stop-and- wait protocol and the cost of conversion to a continuous transmission protocol is considered to be too high. In such environments, a relatively simple solution which is often adopted is to increase the size of the blocks that the transmitter sends. In the previous example, if the block size was 200 kbit, the t ime to transmit a block would be 10 s. The fraction of t ime that the transmitter is transmitt ing data would now be 10/10.5, which is more than 0.95 and represents a very efficient use of satellite resources.

1983 Butterworth & Co. (Publishers) Ltd.

vol 6 no 3 june 1983 115

This solution has several drawbacks, however.

• A large block size requires large buffers at the transmitter and receiver.

• The probability of a transmission error in a large block is higher and thus blocks will have to be re- transmitted more frequently. The efficiency gained in increasing the block size may be lost in the increased frequency of retransmission.

• The error-detection mechanism used to detect transmission errors may break down if the block size is too large. This may lead to a high rate of undetected errors. In applications where data integrity is of great importance, this could be disastrous.

The first two effects have been well studied and characterized 1. The third effect, however, is less well understood and an attempt is made to fill this gap by presenting some simple analytical results which provide good estimates of the frequency of undetected errors.

The results show that in most typical situations, high levels of transmission efficiency can be obtained, while maintaining acceptably low probability of undetected error.

ANALYSIS

First, an upper bound is found on the frequency of undetected errors. The probability of a block containing an undetected error is denoted by Puna. Conditioning on the number of errors in a block, one can write:

P~a = ~_~ Pund, i Q(i) i ~ 1

where Pund, i = probability that a block with i bits in error will go undetected, and Q(i) = probability that a block will have i bits in error.

As already mentioned, most stop-and-wait protocols use a 16-bit cyclic redundancy code (CRC) error detection. The most commonly used CRC polynomial is the international standard V-41 polynomial (x TM +x 12 + x s + I). BSC uses a slightly different polynomial (x 1 °+x I s + x 2 + I ) with essentially identical error-detecting capabilities 2. Both these polynomials have the following properties:

• Property 1: all blocks with an odd number of bits in error are detected 3.

• Property 2: all blocks with exactly two bits in error are detected unless the bits that are in error are exactly a multiple of 2 i s - 1 bits apart (see Appendix).

Certain assumptions are now made. First, it is assumed that errors occur within a block in a corn pletely random fashion. In other words, there is a certain probability

that a bit will be received in error and this probability is independent for all the bits within a block. This probability is termed the bit error rate and is represented by Pert. In environments with burst errors there will be some dependence between the error probability of successive bits. However, the independence assumption allows one to write:

(L) Pe~r (1 -- Per,) L-' (1) Q(i) = i

where, L = length of each block in bits. The assumption further implies that all patterns of i

bits in error are equally likely. Thus, the probability that a block with i bits in error goes undetected, is simply the fraction of patterns of i bits in error which are undetected, i.e.

number of i bit error patterns that cannot be detected

Pun~.; - ( 2 ) total number of i bit error patterns

The second assumption that can be made is that if a block contains an even number of errors greater than four, then it is never detected. The justification for this is that in the environment under consideration, Pert is sufficiently lOW and L sufficiently large so that:

Q(2) >>Q(4) >>Q(6) > > . . . (3)

Thus, blocks with two bits in error are the major cause of undetected error and the assumption that four, six or more bit errors are always undetected does not affect our results greatly as their occurrence is so rare. Together with Property I, this allows the probability of undetected error to be written as:

oo

Po~. ~ Pund,2 Q(2) + ~ Q(2k) (4) k = 2

Equation (4) provides a good upper bound on Pund in environments where equation (3) is not strictly correct.

Using Property 2 and some simple combinatorial arguments, one can write:

number of two-bit error patterns that cannot be

e

detected = [NL' + - - ( N - 1)(N - 2)] 2

where

N = floor is the largest integer smaller t h a n - e

L'=L--Ne

and

e = 2 is -- 1

Thus, from equation (2) one can obtain:

1 16 computer communications

[NL' + e/2 (N - - 1)(N - - 2)] Pund,2 = (L/2 )

The summation term in equation (4) is given by:

~ Q(2k) = ~_~ Q(2k) - Q(0) - Q(2)

~ Q(2k) k = 2

(5) k = 2 k = 0

By simple algebraic manipulations 4, one can obtain:

~i~ 1 + (1 -- 2Per) t Q(2k) = 2

k = O

Thus, equation (2) can be simplified to:

1 +(1 -2Pe, ) L - ( 1 - p ~ , , ) t

2

- - ( 1 - - Perr) L-2 P~r, (6) L(L -- 1)

From equations (4) and (6) one obtains:

e 2 Pund= (1 -- Perr)t-2Perr[NL + - - ( N - 1)(N - 2)1 + 2

1 + (1 -- 2Pert) t - (1 - p~jt

L(L -- 1 ) - - (1 - - Perr) L - 2 P2e, ,

Finally, an expression is derived for the maximum effective throughput, T~, achieved by the protocol. The maximum throughput is achieved in heavy traffic conditions, i.e. under the assumptions that a new block is always awaiting transmission at the transmitter.

L T e = bit/s

expected time to transmit a block

o r

L[(1 - - p e , , ) t + Pu,~] T~--

2~+ L/C

where 6 = propagation delay, and C = capacity of channel in bit/s. The expected time between undetected erroneous blocks, r, is simply:

expected time to transmit a block

Pu,~

N U M E R I C A L RESULTS

The results of the previous section are used here to study the performance of the BSC protocol over a

satellite link. The capacity, C, is set at 224 kbit/s, and the propagation delay 6 set at 0.274 s, both being typical values for a wide-band satellite channel. Block length and error rate are used as parameters. The results are presented in Figures 1-3.

In Figure 1, maximum throughput is shown as a function of the block size for error rates ranging from 10 -s to 10 -9. In all cases, the throughput first increases with block length, the reason being that for small block sizes the overhead caused by the transmitter waiting for an acknowledgement from the receiver is large compared to the transmission time of a block, resulting in low throughput. For larger block sizes, this overhead becomes relatively small and the throughput increases. For noisy channels ( P =10-s), thethroughput levels off and subsequently begins to decrease. This is due to the fact that when the blocks become larger, the chance that at least one bit will be received in error increases, thereby increasing the likelihood of retransmission as a result of detected error. Thus, the throughput drops. However, as Figure 1 shows for small error rates (in this case for Perr<l 0-5), this second effect does not manifest itself in the range of block lengths considered. Here, the error rate is so small that the increased likelihood of retransmission of large blocks does not offset the gain achieved by reducing the overhead.

Figure 2 shows the average time between undetected erroneous blocks as a function of block length for different error rates. In most cases, this time is considerable (more than one year) and thus the data integrity is acceptable by even the most stringent criterion. However, for sufficiently high error rates (>10 -6 ) increasing the block size can result in a breakdown of the error-detection mechanism. In particular, for Per,= 10-5, increasing the block size beyond 10 kbyte results in one undetected error every second-- potentially a disastrous rate in most

Ix lO 5

~ 4x lO 4

o

~- 2xlO 4

IxlO 4

I I I I [ ] l

0 IO 20 30 4 0

Block length (kbyte)

Figure 1. Throughput as a funct ion of b lock size: CRC size = 16 bit, l ink capacity = 224 kbit/s, l ink propaga- t ion delay = 0.274 s, error rates = A: 10 -s, B: 10 --6, C: 10 -7, D: 10 -8, E: 1019

vol 6 no 3 june 1983 117

,o' k\ i0 4 _ " . . . ~ ~ ~

% ' ' ' * ' ° ' ° ' * , , o ° . ° , ° . . . . , , , ° o . . , I - - ~ ' ' ' ° ' ' ° ' ' * ' ° °

10"4

I I I ~ I I J I 0 i0 20 30 40

Block length (kbyte)

Figure 2. Years between undetected erroneous blocks as a funct ion of b lock size: CRCsize = 16 bit, l ink capacity = 224 kbit/s, l ink propagat ion delay = 0.274 s, error rates = A: 10 -s, B: 10 -6, C: 10 -z, D: 10 -8, E: 10 -9

situations. Another interesting feature of these curves is the fact that for some error rates there is a sharp decrease in the time between undetected errors when the block length increases beyond 2 is bit. The explanation for this phenomenon is that for these error rates, the frequency of two-bit errors is several orders of magnitude larger than the frequency of four- or six-bit errors. As has already been described, the CRC polynomial of the BSC protocol detects all blocks with two bits in error unless these bits are more than 2 l s = 3 2 7 6 8 bit = 4 096 byte apart, in which case there is a chance that the block will not be detected. Thus, as Figure 2 shows when the block length passes this threshold, the time between undetected erroneous blocks suddenly drops. For higher error rates, the effect is not quite so marked as four- or six- bit errors begin to occur more frequently and it has been assumed that such errors are always undetected. It should be pointed out that this assumption results in conservative values for time between undetected blocks, and the actual time between undetected erroneous blocks is higher than the curves show.

Finally, Figure 3 shows the time between undetected erroneous blocks as a function of throughput. To obtain these curves, the block size was varied and the corresponding time between erroneous blocks versus the corresponding throughput plotted. As can be seen, in general, the higher the throughput, the lower the time between undetected erroneous blocks. The exception is when the error rate is too high (10 -s in Figure 3) where the curve has a peak. This is caused by the same effect as the peak in the curve for this error rate in Figure 1. As in Figure 2, there is a threshold for the throughput beyond which the time between undetected erroneous blocks drops sharply for some error rates. This threshold is the throughput which corresponds to the block length being equal to 2 Is bit.

,o •

~ °°o ~.,°o°; °,

I

id4 "- i i I I I

0 4 x IO 4 8 xlO 4 1.2 x IO 5 1,6xIO 5

Throughput ( bi t /s )

Figure 3. Years between undetected erroneous b/ocks as a funct ion of throughput : CRC size ---- 16 bit, / ink capacity =224 kbit/s, /ink propagat ion delay = 0.274 s, error rates = A: 10 -5, B: 10 -6, C: 10 -7, D: 10 -8, E: 10-9

C O N C L U S I O N S

The performance of a stop-and-wait protocol over a link with a high propagation delay has been studied and the possible loss of data integrity in situations where large block sizes are used to increase throughput has been investigated. As can be seen from Figures I, 2 and 3, the results indicate that acceptably high levels of throughput can be obtained without significant loss of data integrity in most environments. In these cases, it is likely that hardware constraints such as limited buffer sizes will limit throughput well before data integrity becomes an issue. However, in environments with high error rates, increasing the block size may result in an unacceptably high rate of undetected error. In these cases, alternative methods of obtaining high throughput such as a continuous transmission protocol of forward error correction may be more appropriate.

REFERENCES

I Sastry, A R K 'Error control in digital satellite networks using retransmission schemes' Com- puter Communicat ions, Vol 5, No 1 (February 1982)

2 Green Jr, P E Computer network architecture and protocols, Plenum Press, USA (1982)

3 Pelerson, W W and Brown, D T, 'Cyclic codes for error detections' Proceedings of the IRE, January 1961, pp 228-235

4 Hamming, R W Coding and in format ion theory, Prentice Hall, USA (I 980)

118 computer communications

BIBLIOGRAPHY

Binary synchronous communications, IBM System Reference Library, USA, GA27-3004-2

Marlin, J Telecommunications and the computer, Prentice Hall, USA (1976)

Peterson, W W and Weldon Jr, E J Error correcting codes, MIT Press, USA (1971)

APPENDIX: PROOF OF PROPERTY 2 Using Peterson and Brown's notation 3 one can state that an error is undetected if, and only if, the error polynomial E(x) is divisible exactly by the generator polynomial P(x). From Reference 3 it can be seen that the generator polynomials under consideration have the property that the smallest value of k, such that x k 4- 1 (the addit ion is modulo 2), is exactly divisible by P(x), is e = 2 is - 1.

Consider an error polynomial, E2(x), caused by atwo- bit error. It has the form E 2 ( x ) = x ~ + x i. Assume i<j, then,

E2(x) =x/(1 4- x/-i) (A. 1)

Asx ~ is not divisible by P(x) for anyi 3, E2(x) is divisible by P(x) if, and only if, 1 4- x i-i is divisible by P(x). However, as addit ion is done by modulo 2, one can write:

1 + x i - i = 1 + x e + x e + x i-i (A.2)

where e = 2 Is - 1. It is known that I 4- x e is divisible by P(x). Thus, the right hand side of Equation (A.2) is divisible by P(x) if, and only if, x e + x j-i is divisible by P(x). Thus, one can establish that x i 4- x j is divisible by P(x) if, and only if, x ~ + x i-i is divisible by P(x).

If e< i - i one can, in an analogous fashion, further write that xe+ x i-i is divisible by P(x) if, and only if, x e + x i-i-e is divisible by P(x). Proceeding further, one can finally write that x i 4- xj is divisible by P(x) if, and only if, x ~ + x i-i-ke is divisible by P(x), where k is chosen such that 0< j - i - ke<e. This last expression can be further simplif ied to x i-~-ke (1 + xe-(j-i-k~)). This is of the form of the expression in equation (A.1) and, by arguments used there, is divisible by P(x) if, and only if, 1 4- x e-q-~-ke) is divisible by P(x). This is possible only if,

e -- ( j - - i -- ke)>e

which is possible only if,

j - i - k e = O

or,

j - i - k e

Thus, x i + x i is divisible by P(x) if, and only if, j - i is a mult iple of e. In other words, a block with two bits in error goes undetected if, and only if, the two bits are a mult iple of e bits apart.

computer-aided design the leading international journal for users and developers of the computer as a design aid

Coverage includes: • Building design • CAD/CAM systems • Chemical engineering $ Civil and structural engineering • Electrical, electronic and logic design $ Geometr ic modell ing $ Mechanical engineering

Further details and subscription rates from: Christine Mullins Butterworth Scientific Limited - - Journals Division PO Box 63 Westbury House Bury Street Guildford Surrey GU2 5BH England Telephone: 0483 31261 Telex: 859556 SCITEC G

vol 6 no 3 june 1983 119

Documents

Performance of stop-and-wait protocols over high-delay links