36
A pparent Networks A pparent Networks 9k A PP Project R esearch Rutherford R esearch Rutherford Network Power Users The Case for Jumbo Packets with WestGrid Examples

Network Power Users The Case for Jumbo Packets with WestGrid Examples

Embed Size (px)

DESCRIPTION

Network Power Users The Case for Jumbo Packets with WestGrid Examples. What are Jumbo Frames & What is 9k MTU?. 2 yrs. 0.5 yrs. 4 yrs. 9k DDS Project. 9k MTU Project. 9k APP Project. Core R&E Router Troubleshooting Large Xfer Measurements International Links - PowerPoint PPT Presentation

Citation preview

ApparentNetworks

ApparentNetworks

9k APP Project

ResearchRutherford

ResearchRutherford

Network Power Users

The Case for

Jumbo Packets

with

WestGrid Examples

Research Rutherford Apparent

Networks

What are Jumbo Frames & What is 9k MTU?

PREMAC/LLC

IP Header TCP Header Payload Data FCSIFG PREMAC/LLC

IP Header TCP Header Payload Data FCSIFG

OSILayer

Description

7 Application6 Presentation5 Session4 Transport3 Network2 Data Link1 Physical

PREMAC/LLC

IP Header TCP Header Payload Data FCSIFG

MSS(1460bytes)

Maximum Segment Size (MSS)

PREMAC/LLC

IP Header TCP Header Payload Data FCSIFG

Packet (1500 bytes = MTU)

Maximum Transmission Unit (MTU) = Packet

PREMAC/LLC

IP Header TCP Header Payload Data FCSIFG

Frame (1518 bytes)

Frame

Research Rutherford Apparent

Networks

9k MTU - APP- DDS Project Evolution…

9k MTU Project

Core R&ERouter Troubleshooting

Large Xfer MeasurementsInternational Links

Internet 2 SponsoredPhysics Participants

Manual LightpathTRIUMF to CERN TestFew 9k Taps onto Core

Main Tap SDSC

4 yrs

9k APP Project

Lightpath R&ENon-Routed… UCLP

Build Dedicated 9k LinksZX GBICs/quad gigE/campus MM

CANARIE NETERA BCNET 3 Universities Only 9k Link on Campus

Physics & Biochem ParticipantsHEPnet - WestGrid - TRIUMF

9k Node - HPC - 9k NodeViz HPC 9k TestsNFS HPC 9k Tests

0.5 yrs

9k DDS Project

Lightpath R&EClone-Tune 9k APP… Handoff

Provide 9k Switch BiochemUVic - SFU - UofA

Prelim 9k SOA FrameworkDrug Discovery System Flow

Tune 9k SOA FrameworkIntegrate 9k SOA Framework

Demo 9k SOA-DDS Based Collab9k SOA-DDS Software Dev Examples

Complete Handoff … ;-)

2 yrs

Research Rutherford Apparent

Networks

9K MTU Project - Results

GigE 2-way bandwidth vs. MTUfrom Kansas City to various universities

0

200

400

600

800

1000

1200

1400

1600

1800

2000

0 2000 4000 6000 8000 10000

MTU size (bytes)

2-w

ay

Ba

nd

wid

th (

Mb

ps

)

GigE 2-way bandwidth vs. MTUfrom Kansas City to various universities

0

200

400

600

800

1000

1200

1400

1600

1800

2000

0 2000 4000 6000 8000 10000

MTU size (bytes)

2-w

ay

Ba

nd

wid

th (

Mb

ps

)

512MTU

Standard 1500 M

TU

2048MTU

3072MTU

4096MTU

5120MTU

6144MTU

7168MTU

8192MTU

9000MTU

Research Rutherford Apparent

Networks

How do Jumbo Packets Affect Bandwidth?

• If TCP window size and network capacity are not rate limiting factors then (roughly):

0.7 * Max Segment Size (MTU)e2e throughput <

Round Trip Time (latency) sqrt[loss]

M. Mathis, et.al.

• Double the MSS, double the throughput• Effect of slow start?• Effect of irregular flow?

Research Rutherford Apparent

Networks

9k APP – Approaching Application Performance

• Need metrics for defining network impact on dependent applications

• Best current example – MOS as indication of VoIP performance

• Models of network dependence required

• Applicable to QoS/SLA

Research Rutherford Apparent

Networks

9k APP – User Experience

• User expectations of applications• Examples:

• Interaction with 3D models• Collaboration with multiple models/data/voice/video

• Massive data set manipulation• Collaborative HPC simulations

Research Rutherford Apparent

Networks

9k APP - General Models of Network Dependency

• Near Real Time (nrt)• Congestion, Drop, MTU & Transit time sensitive

• Transactional (tr)• Congestion, MTU & Drop sensitive

• Data Transfer (dt)• Congestion, Drop, MTU & Transit time sensitive

• Best Effort (be)• Not sensitive

Research Rutherford Apparent

Networks

9k APP – Real-Time Applications

• Examples – IPTV and Voice-over-IP • Requirement – support human interaction

through highly subjective perceptive processes

• Nature - asynchronous, constant, low-rate, non-TCP, streams

• Dependencies• Highly sensitive to bursty loss• Sensitive to latency, particular in

conversation context• Robust to jitter, up to some limit

Research Rutherford Apparent

Networks

9k APP – Synchronous/Transactional

• Examples - interactive collaborative systems; distributed file systems

• Requirement - maintain some form of state at two or more remote locations;

• Nature - intensive, bursty, synchronous traffic; varies from very small amounts of data to huge exchanges

• Dependencies – requires high transfer rates, irregular; highly sensitive to latencies; intolerant of slow-start

Research Rutherford Apparent

Networks

9k APP - Data Transfer

• Example – FTP; data backup; emergency recovery

• Requirement – transfer massive amounts of data, as quickly as possible

• Nature - sustained one-way flows at maximum rates of transfer

• Dependencies – sensitive to the characteristics of the end-host transmission protocols (i.e. TCP); high capacity; high impact on other flows

Research Rutherford Apparent

Networks

9k APP - Best Effort

• Examples - e-mail, Web browsing, and remote login

• Requirement – sufficient resource to maintain minimal state or connection

• Nature - largely stateless; low rates of data transfer required; not gated by human response

• Dependencies – no critical requirements for network responsiveness

Research Rutherford Apparent

Networks

9k APP - Choice of Application Performance

• Key performance factor – jumbo packets• Previous 9k MTU testing applicable to

simple data transfer use case• Make the case for a more demanding 9k

application performance category• High performance:

• Interactive/collaborative visualization• WestGrid visualization as exemplar

• Distributed file systems• WestGrid Gridstore as exemplar

Research Rutherford Apparent

Networks

Application Performance in Context

NIC driver

kernel daemon

socket

applicationsocket buffer

driver buffer

kernel bufferper cpu

application buffer

programable filter

64 bit parallel data bus

~ 2000 megabytes/sec

sm fibres 10 km 1310 nm~ 1000 megabytes/sec per 10 gigE port

dual port 10 x gigabitEthernet NIC

9k

1.5k

1.5k 1.5k

9k9k

switchrouter

64 bit symetric multiprocessor

rx fi

fo b

uff

er

tx f

ifo

bu

ffe

r

VLAN

Research Rutherford Apparent

Networks

9k APP Project Components

CANARIE

ONS15454

BCNET

ONS15454

CANARIENETERA

ONS15454

CANARIENETERA

VANCOUVER CALGARY EDMONTON

Enterasys N7

STS-24c gigE

gigE

Cisco ONS 10,000 bytes

10,239 bytes

gigELinux

gigE

Dell L3

9,018 bytes

UofA

Physics

ONS15454

CANARIEBCNET

VICTORIA

gigE gigE

Cisco 6509

UVic

Network Services

Linux

gigE

lightpath.phys.ualberta.caRouted IPv4 = 129.128.241.113Lightpath IPv4 = 172.31.241.113Reference IPv4 = 10.128.241.113

phys02.comp.uvic.caRouted IPv4 = 142.104.21.13Lightpath IPv4 = 172.31.21.13Reference IPv4 = 10.104.21.13

STS-24c

STS-24c

UCLP STS-24c

gigE

SM 22 km

long range ZX GBICsfrom CANARIE

HEPnet

Port_3 ZX

172.31 VLAN

ZX GBICZX GBIC SONET SONET

SONET9,216 bytes

172.31 VLAN

172.31 VLAN

172.31 VLAN

172.31 VLAN

dedicatedgigE port

IBM p650 AIX

Routed IPv4 = 206.12.24.65Lightpath IPv4 = 172.31.24.65Reference IPv4 = 10.12.24.65

gridstore.westgrid.ca

7

0 dual port

TRIUMFgigE

campus router

campus router

dual port

route all 172.31.0.0/16(172.31) to one VLAN of 6 on4 port gigE link aggregator

WestGrid

SFU

Enterasys ER16

64,000 bytes

IRMACS

Cisco 6509

9,216 bytes

172.31VLAN

MM

SGIIRMACS

ONYX 3000

note: possibly use unrouted black hole range10.12.24.0/22 to shadow routed IPv4 fromWestgrid 206.12.24.0/22 for lightpaths ... ?... use 172.31 due to conflicts ...

vizserver.westgrid.caRouted IPv4 = 206.12.24.8Lightpath IPv4 = 172.31.24.8Reference IPv4 = 10.12.24.8

172.31 VLANgigE

MM

phase 2 reconfigure N7 inseries between ER16 andCisco 6509

phase 1 direct path fromER16 to Cisco 6509 (ZX GBICslot not available on 6509)

gigE

MM

•Three networks: Netera, CA*net4, BCNET

Research Rutherford Apparent

Networks

9k APP Project Sites & Nodes

•Three sites: UofA, SFU, UVic•Four nodes

•TRIUMF•HEPnet •Vizserver•Gridstore

•Why these?•Physics expertise

•9k lightpath ready•WestGrid applications

•vizserver•gridstore

Research Rutherford Apparent

Networks

9k APP Project - Define 9K paths

•Two phases: Viz & Grid

•Phase 1 Viz•TRIUMF•HEPnet •Vizserver

•Phase 2 Grid•TRIUMF•HEPnet •Gridstore

Research Rutherford Apparent

Networks

9k APP Project - Phase 1 - Viz

•Vizserver session – optional compression•IRIX OS – render pipe

•TCP: vary MTU 68 – 9000 at server•Measure performance vs. MTU

•Vizclient session – wraps local openGL•Fully interactive local X session•refresh requests in render pipe

•VMD: Visual Molecular Dynamics •Xserver & Graphics calls to render pipe•Collaborative VMD sessions

Research Rutherford Apparent

Networks

9k APP Project - Network Build

•3 Lightpaths - UCLP - SONET•Each path STM-24c dropped on gigE

•New quad gigE blade BCNET ONS•2 new ZX GBICs•SM fibre run to SFU lit

•SFU Campus Network•ZX GBIC installed in Enterasys ER16 •Phase 1 - Viz

•MM fibre run direct to Cisco 6509•Phase 2 - Grid

•MM fibre run via Enterasys N7

Research Rutherford Apparent

Networks

9k APP Project - Short Circuit Routing

NIC-0

NIC-1

NIC-0

NIC-1

Routed

Lightpath Short Circuit

Machine 0 dual port Machine 1 dual port

IPv4-0-1

IPv4-0-0

IPv4-1-1

IPv4-1-0

APP-0 APP-1

always on

on/off

APP-Socket-0-1

VM/OS-Socket-0-1

Handler TCP-Session-0-1

TCP-Socket/Port-0-1

LLC-Socket/Port/ARP-0-1

MAC-0-1

Local map unrouted black holeIPv4 to start ... try routed switchover later... ?

1.5k

9k

Research Rutherford Apparent

Networks

9k APP Project – Testing Procedure - pMTU

SFU CANARIE

ONS15454

BCNET

ONS15454

CANARIENETERA

ONS15454

CANARIENETERA

VANCOUVER CALGARY EDMONTON

OC24 OC24 gigE

gigE

Cisco ONS 10,000 bytes

LinuxgigE

Dell L3

9,018 bytes

UofA

Physics

ONS15454

CANARIEBCNET

VICTORIA

OC24

gigE gigE

Cisco 6509

UVicLinux

HEPnet

Routed IPv4 = 206.12.24.8Lightpath IPv4 = 10.12.24.8

lightpath.phys.ualberta.caRouted IPv4 = 129.128.241.113Lightpath IPv4 = 10.128.241.113

9,216 bytes

TRIUMF

phys02.comp.uvic.caRouted IPv4 = 142.104.21.13Lightpath IPv4 = 10.104.21.13

SGI

Cisco 65099,216 bytes

gigE

IRIX

vizserver .westgrid.caIRMACS

Enterasys ER16

64,000 bytes

gigE

Linux

gigE

Routed IPv4 = 206.12.24-27.XXXLightpath IPv4 = 10.12.24-27.XXX

Global Academic Probe Database

AnalysisServer

ApplicationServer

GUI

Sequencer

Sequencer Sequencer

Sequencer

Apparent NetworksMTU Probe

68576

100015003000400050006000700080009000

Apparent NetworksMTU Probe

68576

100015003000400050006000700080009000

Apparent NetworksMTU Probe

68576

100015003000400050006000700080009000

Apparent NetworksMTU Probe

68576

100015003000400050006000700080009000

ProbeReports

Accumulated data from 9k MTU and 9k APP Projects

Research Rutherford Apparent

Networks

• As MTU increases and increasing varies between hops• determine optimal pMTU before and changes

during application use• locate problem hops with unusual behavior

• Is larger effective lower layer pMTU actually better from the application perspective?

• Are packets actually sized appropriately, by the packetization layer, given a larger pMTU?

• What are some effects of larger pMTU under congestion conditions?

9k APP Project – pMTU Issues

Research Rutherford Apparent

Networks

9k APP Project - Phase 2 - Grid

•Distributed file system•Possible candidates

•CXFS, GPFS, NFSv4•Possible clients/servers

•SGI IRIX, IBM AIX, Linux•Preliminary model

•NFSv4 on Linux & IBM AIX(gridstore)•Primary use cases

•File sharing – massive data sets•Physics •Bioinformatics

Research Rutherford Apparent

Networks

9k APP Project - Grid NFS Application

• NFS session• AIX NFSv4 server

• Linux NFSv4 clients• over TCP not UDP

• NFS server – wraps local HPC file system• Fully interactive local file system

session• Fast metadata updates for

directory browsing

Research Rutherford Apparent

Networks

9k APP Project – Grid Network & Testing

•Network reconfigured with intermediate link•Viz performance rechecked•Gridstore VLAN setup

•Fractional quad gigE

•Test via NFSTest suite (opensource)•TCP: vary MTU 68 – 9000 at server

•Probe network after MTU alteration•Measure performance vs. MTU

•Linux NFSv4 to AIX NFSv4•Probe while test in progress

Research Rutherford Apparent

Networks

9k APP Project - Grid - NFS - Tuning

• Objectives derived from WestGrid gridstore• Baseline performance – simple NFS client

• Session• TCP - NFSv4• NFStest suite

• Time to complete vs. MTU• Individual test performance vs. MTU

• Blocksize and other tuning considerations• NFS filesystem mount options

• Block size = 8192 bytes• Fragmentation factors

• Native filesystem block size

Research Rutherford Apparent

Networks

9k APP Project• Bill Rutherford (Rutherford Research/RRX – Project Coordinator)

• Loki Jorgenson (Apparent Networks/SFU – Project Coordinator)

• Thomas Tam (CANARIE/CA*net4 – CANARIE/UCLP Coordinator)

• Bryan Caron (TRIUMF/UofAlberta – TRIUMF/UCLP Coordinator)

• Randy Sobie (HEPnet/UVic – HEPnet President/Grid Integration)

• Brian Corrie (WestGrid/IRMACS/SFU - IRMACS Coordinator)

• Rob Ballantyne (IRMACS/SFU - IRMACS Network Coordinator)

• Martin Siegert (WestGrid/SFU – WestGrid/GridStore Coordinator)

• Dave Bickle (HEPnet/UVic – HEPnet Coordinator/Grid Integration)

• Ken Howard (Network Services/UVic – Network Coordinator)

• Peter van Epp (Network Services/SFU – Network Coordinator)

Research Rutherford Apparent

Networks

9k DDS Project – Drug Discovery System

• Based on 9k APP Project • Combined Physics, Grid, Bioinformatics• Joint development of network & software

• Share network expertise• Help develop preliminary software

• SOA approach• Collaborative viz • Distributed file systems • Instrument interfaces • Grid integration• Lightpath integration

Research Rutherford Apparent

Networks

9k DDS Project – Network Overview

CANARIE

ONS15454

BCNET

ONS15454

CANARIENETERA

ONS15454

CANARIENETERA

VANCOUVER CALGARY EDMONTON

Enterasys N7

STS-24c

gigE

Cisco ONS 10,000 bytes

10,239 bytes

gigE

ONS15454

CANARIEBCNET

VICTORIA

gigE

STS-24c

STS-24c

note: available bandwidth on "lightpath" e2e is dependenton configuration of ONS15454 and activity of ports

UCLP STS-24c

gigE

SM 22 km

long range ZX GBICsfrom CANARIE

Port_3 ZX

10/8 VLAN

ZX GBICZX GBIC SONET SONET

SONET

10/8 VLAN10/8 VLAN

dedicatedgigE port

IBM p650 AIX

Routed IPv4 = 206.12.24.65Lightpath IPv4 = 10.12.24.65

gridstore.westgrid.ca

7

0

route all 10.0.0.0/8 (10/8)todedicated port of 8 portgigE link aggregator

route all 10.0.0.0/8 (10/8)toone VLAN of 6 on 4 portgigE link aggregator

WestGrid

SFU

Enterasys ER16

64,000 bytes

IRMACS

Cisco 6509

9,216 bytes

10/8VLAN

MM

SGIIRMACS

ONYX 3000

note: possibly use unrouted black hole range10.12.24.0/22 to shadow routed IPv4 fromWestgrid 206.12.24.0/22 for lightpaths ... ?

vizserver.westgrid.caRouted IPv4 = 206.12.24.8Lightpath IPv4 = 10.12.24.8

10/8 VLANgigE

MM

9k APP phase 2 reconfigureN7 in series between ER16and Cisco 6509

9k APP phase 1 direct pathfrom ER16 to Cisco 6509(ZX GBIC slot not availableon 6509)

gigEMM gigE

LinuxgigE

Dell L39,018 bytes

UofA

Physics

lightpath.phys.ualberta.caRouted IPv4 = 129.128.241.113Lightpath IPv4 = 10.128.241.113

10/8 VLAN

dual port

TRIUMFgigE

campus router

gigE

LinuxgigE

9k Switch

UofABiochemistry

lightpath.pence.caRouted IPv4 = 129.128.139.2XXLightpath IPv4 = 10.128.139.2XX

10/8 VLAN

dual port

PENCEgigE

campus router

Supplied by Pence

Supplied and set up byTRIUMF, reimbursedby 9k DDS Project

gigE

gigE

Cisco 6509

UVicNetwork Services

Linux

phys02.comp.uvic.caRouted IPv4 = 142.104.21.13Lightpath IPv4 = 10.104.21.13

HEPnet9,216 bytes

10/8 VLAN

campus router

dual port

Supplied by TVBR

gigE

gigE

UVicBiochemistry & Microbiology

Linux

lightpath.bioc.uvic.caRouted IPv4 = 142.104.33.XXXLightpath IPv4 = 10.104.33.XXX

TVBR

10/8 VLAN

campus router

dual port

9k Switch

Supplied and set up byHEPnet, reimbursedby 9k DDS Project

gigE

SFUMolecular Biology and Biochemistry

10/8 VLAN

9k Switch

Supplied and set up byWestGrid, reimbursedby 9k DDS Project

Linux

lightpath.mbb.sfu.caRouted IPv4 = 142.58.213.XXXLightpath IPv4 = 10.58.213.XXX

MBB

campus router

dual portSupplied by MBB

9k DDS phase 2reconfigure todedicated gigE portfrom ONS

9k DDS phase 2reconfigure todedicated gigE portfrom ONS

9k DDS phase 1 tapTRIUMF gigE portfrom ONS for setup

9k DDS phase 1 tap HEPnetgigE port from ONS for setup

9k DDS phase 2reconfigure todedicated gigE portfrom ONS

9k DDS phase 1 tap WestGrid-IRMACS gigE port from ONSfor setup

Research Rutherford Apparent

Networks

Future – Performance Profiles by Application

• APP Network Performance Profile • Build up statistical APP profiles• Use APP profiles to optimize context

• Next Generation Router Design• Use APP Profiles

• Allocate resources• Design microflow queues

• Identify MTU issues• Dynamically configure path mechanics

Research Rutherford Apparent

Networks

End of Presentation

Note: 9k APP Project Meeting

in

Room 1535 at 3:00

Research Rutherford Apparent

Networks

BCNET ANC 2005 Outline…

Research Rutherford Apparent

Networks

Outline

• Short overview of previous 9K work• What is 9k? • 9k XXX Project snapshots?• Bandwidth value … example data?• How effect bandwidth? – equations?

• Application performance – definition• User experience … limits?• Near Real-time (nrt)• Transactional (tr)• Bulk transfer (bt)• Best-effort (be)

• Value of MOS… expand? to the VoIP industry• Project objectives

• Isolate a single simple performance factor – packet size?• Identify prospective applications

• Interactive collaborative visualization (nrt)• Distributed file system (tr + bt + nrt[metadata])

• Characterize application performance in context … • stack mechanics … grid integration?

Research Rutherford Apparent

Networks

Outline – cont.

• Project components• Three sites: UofA, SFU, Uvic

• Why these sites … only 9k available + RTT factor?• Three networks: BCNET, Netera, CA*net4• WestGrid applications

• Brian’s Westgrid vizserver… IRMACS • Martin’s Westgrid grid storage

• Define 9K paths• Application profiling

• Visualization server – phase 1• Identify system• Define primary use case … why vmd … why collab?• Define network profile … describe nw build?• Identify testing procedure … pMTU tests … issues

Research Rutherford Apparent

Networks

Outline – cont.

• Distributed file system – phase 2• Identify candidates – CXFS, GPFS, NFSv3-4• Identify possible clients/servers• Define primary use cases … physics … DDS file sharing• Define network profile• Identify testing procedure (NFStest)

• Basic types of test … ?

• Futures• 9k DDS … trend to separate purpose nw… key role of UCLP• Performance profiles by application … ???

• Ng rtr … mtu issues … app microflow queues?• Credits• Meeting Reminder

Research Rutherford Apparent

Networks

BCNET ANC 9k APP Project Meeting Agenda – April 26 3-4pm

• Review of current status• Lightpaths• IRMACS• Gridstore• HEPnet• TRIUMF

• Viz test• Network• Probes & schedule• Tests• Demos (incl special medicine collab demo UVic – UofA)

• Gridstore test• Network• Probes & schedule• Tests• Demos

• Follow on• 9k DDS preliminary

• UCLP integration … ?• SOA ideas… ?