21
1 CSTS WG Prototyping for Forward CSTS Performance Boulder November 2011 Martin Karch

1 CSTS WG CSTS WG Prototyping for Forward CSTS Performance Boulder November 2011 Martin Karch

Embed Size (px)

Citation preview

1

CSTS WG

Prototyping for Forward CSTS Performance

BoulderNovember 2011

Martin Karch

2

Prototyping for Fwd CSTS Performance

• Structure of the Presentation

• Background and Objective

• Measurement Set-Up

• Results

• Summary

3

Background and Objective

• Reports from NASA:• Prototyping experimental Synchronous Forward Frame Service

(CLTU Orange Book)• CLTU Service approach limits throughput seriously• Target rate (25 mbps) could only be reached if

- 7 frames of 220 bytes length- 1 data parameter of one single CLTU

• No radiation reports for individual CLTU (only the complete one)

• No investigation yet available• Why can throughput not be reached when frames are transferred in

a single CLTU?• What is the cost of acknowledging every frame?

4

Background and Objective

• Based on the Reports:• Suggest blocking of several frames into one data parameter for• Potential future Forward Frame CSTS• Process Data Operation/Data Processing Procedure (FW)

• Objective of Prototyping• Verify the blocking of data items significantly increases the

throughput• Investigate if bottleneck is in service provisioning (actual protocol

between user and provider)• Results shall support selection of most appropriate approach for the

CSTS FW Forward Specification• Measurements are made for Protocol Performance

5

Measurement Set-Up

•2 machines equipped with

• Xeon 4C X3460 2.8GHz/1333MHz/8MB

• 4GB memory

• Linux SLES 11 64 bit

•Isolated LAN

•1 Gbit

•cable connection (no switch)

SGM (SLE Ground Models)

NIS (Network Interface System)

Simulated RadiationProcess

SLE Provider

SLE APITCP/IP

SLE Operations

CLTUs

Provider User

Adapted SGM

SLE User

SLE API

SLE Operations

Adapted NISModel

6

Measurement Set-Up

• Provider• SGM• SLE Ground Models• Simulation Environment

• SGM changed such that• Receiving Thread puts

CLTUs on a Queue for Radiation

• A ‘Radiation Thread’ removes CLTUs and discards them

• No further simulation of Radiation process (radiation duration)

• User• NIS• Network Interface System• Simulation Environment

• NIS is modified to• Create (as fast as

possible) CLTU operation objects

• Immediately passes them to SLE API for transmission

• No interface to a Mission Control System (MCS)

7

Measurement Set-Up

• Basis for all Steps:• SGM based provider• NIS based user

• Step 1 Measurements:• Variation of CLTU length

- Simulates sending many small CLTUs - In one TRANSFER DATA Invocation (1st approximation)

• Step 2 Measurements:• SLE API modified

- Aggregate configurable number of CLTU (SEQUENCE OF Cltu)- With minimum annotation (CLTU Id, sequence count)- Send return when last data unit is acknowledged

8

Step1 / Measurement 1

• SGM + NIS model optimised yes• SLE API optimised no• Nagle + delayed ack on• RTT 0.1

ms

• Linear curve• Proportional to CLTU size• Constant Processing Time

• Independent of CLTU size

9

Step1 / Measurement 2

• SGM + NIS model optimised yes• SLE API optimised yes• Nagle + delayed ack on• RTT 0.1 ms

10

Step1 / Measurement 3

• SGM + NIS model optimised yes• SLE API optimised yes• Nagle + delayed ack off• RTT 0.1 ms

11

Step1 / Measurement 4

• SGM + NIS model optimised yes• SLE API optimised yes• Nagle + delayed ack off• RTT 400

ms

• Processing Time still constant

• Transfer-time increased

12

Step1 / Measurement 5.1

• Msmnt 5.1: Reference Measurement for Measurements with variations of RTT using IPerf

• Msmnt 5.2: Measurements using SGM + NIS

13

Step1 / Measurement 5.2

• SGM + NIS model optimised yes• SLE API optimised yes• Nagle + delayed ack off• RTT

variable

• Shows influence of transmission time only

• Delay is dominating factor• As expected (1/RTT)• Ratio Msmnt/Iperf = 0.165 (1544)• Ratio Msmnt/Iperf = 0.153 (1000)

14

Step1 / Measurement 5 (2)

• Operates with Maximum Send and Receive Buffer • Question:

• How big must the window size be to achieve similar throughput values like above ( for the example of 40 Mbit/sec)

• Maximum Data Rate = Buffer size/RTT

Window size RTT [ms]

CLTUSize [byte]

CLTUCount Data [byte]

Data Rate[Megabit/s]

SendDuration [s] CLTU Rate [#/s]

Send Timeper CLTU [ms]

64 KB 50 1544 5000 7,720,000 1.713 36.056 138.673 7.211“ 100 1544 5000 7,720,000 0.861 71.761 69.676 14.352“ 200 1544 5000 7,720,000 0.431 143.351 34.879 28.670“ 400 1544 5000 7,720,000 0.216 286.005 17.482 57.201“ 50 1000 5000 5,000,000 1.595 25.077 199.386 5.015“ 100 1000 5000 5,000,000 0.799 50.080 99.840 10.016“ 200 1000 5000 5,000,000 0.400 100.079 49.961 20.016“ 400 1000 5000 5,000,000 0.200 200.087 24.989 40.017

13MB 100 1544 50000 77,200,000 11.812 52.287 956.261 1.04625 MB 100 1544 50000 77,200,000 11.666 52.939 944.483 1.05925 MB 0.1 1544 50000 77,200,000 40.221 15.355 3256.268 0.307

15

Step 1 Measurements Summary

• Linear increase of data rate with CLTU length• sending as fast as possible• no network delay• Constant Processing Time

• Best results with• Optimised Code

- 5 to 10 % performance increase (optimised SLE API only)• Nagle and Delayed Ack. switched off

- (factor 2.5 lower when Nagle Alg. and Delayed Ack. are both on)• No network delay

• Network delay 200 ms (400 RTT)• Performance decrease of a factor of 400 compared to Measurement 2 (the best one)• Maximum Data Rate = Buffer size/RTT

• We have to take care on the size of the CLTU

16

What is the Cost of Confirmed Operations

Data unit size = 8000 byteCLTU: 207.57 Mbps RAF: 318.32 Mbps ( Frame size 8000 byte, 1 frame/buffer)Increase by 53%

Data unit size = 2000 byteCLTU: 53,36 Mbps RAF: 85.64 Mbps ( Frame size 2000 byte, 1 frame/buffer)Increase by 60%

17

Effects of Buffering (RAF)

Frame Size Frames/Buffer Mbps Frame/sec msec/frame

8000 1 322.79 5,120 0.195

4000 2 244.41 7,903 0.126

2000 4 167.02 10,845 0.092

1000 8 108.18 13,310 0.075

800 10 88.67 15,102 0.066

400 20 46.25 17,128 0.058

200 40 27.05 18,969 0.052

100 80 13.78 19,122 0.052

Concatenation of 80 frames of 100 byte into a buffer back-to-back and then passed to the API as one frame: 322.43 Mbps

Frame size = 2000 byte, 1 Frame/buffer: 85.64 Mbps

18

Same in Graphical Presentation

8000 - 1 4000 - 2 2000 - 4 1000 - 8 800 - 10 400 - 20 200 - 40 100 - 800

50000

100000

150000

200000

250000

300000

350000 Effect of Buffering

19

RAF Measurment Configuration

Frame Generator

SLE ServiceProvider

Application

SLE API

Communication Server

SLE API

SLE ServiceUser Application

frame

frame

transfer buffer

transfer buffer

TCP (local)

TCP

frame

20

Cost of ASN.1 Encoding (RAF)

• Result of profiling for RAF, frame size 100 byte, 80 frames per buffer:• Encoding of Transfer Buffer including all contained

frames: 6.42%• Encoding of Transfer Buffer Invocation alone: 2.31%

• Effects might be caused by increased interactions / interrupts, etc.

21

Summary of Observations

• Size of the data unit transferred has a significant impact• Almost constant end to end processing time independent of

buffer size• Liner increase of net bitrate with data unit size

• Large impact on network delay due to TCP (expected)• Significant additional cost of using confirmed

operations• Buffering of frames vs, transfer in individual frames

• 4 frames of 2K per buffer vs single 2K frames: factor 1.9• BUT: throughput for a single large data unit is much larger

than buffer of same size containing multiple small units• ASN.1 encoding for worst case test accounts for 6.4% of

overall local processing time