31
Lambda Station: On-demand flow based routing for data intensive GRID applications over multitopology networks Fermi National Accelerator Laboratory Don Petravick (PI), Phil Demar, Matt Crawford, Andrey Bobyshev, Maxim Grigoriev California Institute of Technology. Harvey Newman (co-PI) ESCC / Internet2 Joint Techs Workshop Fermilab, July 15-19, 2007

Lambda Station: On-demand flow based routing for data intensive GRID applications over multitopology networks Fermi National Accelerator Laboratory Don

Embed Size (px)

Citation preview

Page 1: Lambda Station: On-demand flow based routing for data intensive GRID applications over multitopology networks Fermi National Accelerator Laboratory Don

Lambda Station: On-demand flow based routing for data intensive GRID applications over multitopology networks

Fermi National Accelerator Laboratory Don Petravick (PI), Phil Demar, Matt Crawford, Andrey Bobyshev, Maxim Grigoriev

California Institute of Technology. Harvey Newman (co-PI)

ESCC/Internet2 Joint Techs Workshop

Fermilab, July 15-19, 2007

Page 2: Lambda Station: On-demand flow based routing for data intensive GRID applications over multitopology networks Fermi National Accelerator Laboratory Don

Outline of the talk

● goals and major directions of the project

● software architecture, API, middleware

● results of using Lambda Station service

● problems and challenges, plans

Page 3: Lambda Station: On-demand flow based routing for data intensive GRID applications over multitopology networks Fermi National Accelerator Laboratory Don

Basic terms

● Lambda Station – a host with special software to control traffic path across LAN and WAN on-demand of applications

● PBR – policy based routing

● PBR Client – a system or cluster and applications running on it sourcing traffic flows that can be subject for policy based routing

● Flow - a stream of packets with some attributes in common such as endpoint IP addresses (or range of addresses), protocols, protocol's ports if applicable and differentiated services code point (DSCP).

Page 4: Lambda Station: On-demand flow based routing for data intensive GRID applications over multitopology networks Fermi National Accelerator Laboratory Don

The main goal of Lambda Station project is to design, develop and deploy a network path selection services to interface production storage and computing facilities with advanced research networks.

– selective forwarding on a per flow basis

– alternate network paths for high impact data movement

– access control in site edge routers for those selected flows

– on-demand from applications (authentication & authorization)

– current implementation based on policy-based routing & including the

support of DSCP marking

The goal of the project... deals with the last-mile problem in local area networks

Page 5: Lambda Station: On-demand flow based routing for data intensive GRID applications over multitopology networks Fermi National Accelerator Laboratory Don

Flows and DSCP tagging

Any combination of flow's attributes can be used by Lambda Station software to identify flows on per ticket's basis.

Typical steps of alternative path reservation:

● generate request for service (application, application's proxy, admin)● LS negotiates service and parameters with remote site● configure local and wide area network (not yet available)● marking traffic (if specified).

Current LS software is capable to complete all these steps within 3 – 5 mins. That is why it is desirable to know flow selection parameters before transferring is started:● endpoint IP addresses ● DSCP

Page 6: Lambda Station: On-demand flow based routing for data intensive GRID applications over multitopology networks Fermi National Accelerator Laboratory Don

Lambda Station Building Blocks

LS request interfaceLS-Management &Reporting Interface

mySQL:requests, Authorization

LS-to-LS service

LS Controller

SiSi

local definitions

online updates

SiSi

SOAP/JClarens

NETWORK CONFIGURATOR

CISCO Force10 WAN

Storage & application spaceManagement

SOAP/JClarens

LSInterface

RemoteLambdaStation

Data Exchange

Control & Management

Service-based Architecture:●LS Controller – manages LS persistence, controls other services●LS persistence is stores current state of the system●LS request Iinterface - webservice for placing all kinds of “ticket” related requests ●LS-to-LS – webservice for LS definitions propagation and LS discovery● NETWORK CONFIGURATOR – dynamic reconfiguring of LAN and WAN

SOAP over HTTPS

LS persistence

LS-2-LS service

Vendor specific modules

Page 7: Lambda Station: On-demand flow based routing for data intensive GRID applications over multitopology networks Fermi National Accelerator Laboratory Don

Tomcat + Axis container

JClarens

LS Service Request

Interface

LS PBR clients

Interface

LS Controller

Client API

LS NetConfig

LS/PBR Config Manager

LS JAVA API Components

Page 8: Lambda Station: On-demand flow based routing for data intensive GRID applications over multitopology networks Fermi National Accelerator Laboratory Don

Java implementation of LS Software

● Service Oriented Architecture● utilized JClarens and Axis framework as a web-services toolkit● messages are defined and strongly validated by XML schema●LS service is multi-threaded, one thread for LSController, one thread for LS2LS service and threads pool for openSvcTicket requests● LS2LS and client-2-LS authentication is based on gLite library and supports standard Grid proxies and KCA-issued certificates●Authorization is based on rules set●General framework persistence is accomplished by MySQL DB backend● secure document/literal wrapped SOAP messages, Web Services Interoperability Profile (WS-I Basic Profile Version 1.1)

Page 9: Lambda Station: On-demand flow based routing for data intensive GRID applications over multitopology networks Fermi National Accelerator Laboratory Don

Java implementation of LS Software(continued)

● Automated LS and PBR client configuration management●Automated deployment ( one can install on any Linux box)●LS Controller, LS2LS , LS AAA, LS client interface are ready for deployment. Supported java and perl clients.●Some interest from ANL to support C client for Globus toolkit●Network Config calls implemented in interface and may relay requests to perl service ( SOA at work ) ●Currently deployed and work ( exchanging PBR and LS configurations) at FNAL ( 2 stations) and CalTech ( 1 station)

Page 10: Lambda Station: On-demand flow based routing for data intensive GRID applications over multitopology networks Fermi National Accelerator Laboratory Don

DSCP Tagging

Complexity of using DSCP tagging:

● preservation of DSCP is not guaranteed in WAN

● DSCP tagging needs to be synchronize between sites for dynamically configurable networks ( asymmetry is bad for high-performance transferring )

LS software does support two different modes of DSCP tagging :

● fixed DSCP values to identify site's traffic.

● DSCP value is assigned dynamically on per ticket base.

Page 11: Lambda Station: On-demand flow based routing for data intensive GRID applications over multitopology networks Fermi National Accelerator Laboratory Don

Netconfig Module● dynamically modifies the configurations of local network devices. ● a vendor dependent components. ● Cisco routers with IOS version supporting sequencing type of named ACLs.

Tasks to configure PBR in Cisco devices:● interface on which PBR is applied needs to be configured with “ip policy route-map” statement● route map needs to be configured as ordered list of match/action statements● match criteria need to be associated with ACLs

Page 12: Lambda Station: On-demand flow based routing for data intensive GRID applications over multitopology networks Fermi National Accelerator Laboratory Don

openSvcTicket request

● new ticket - creating a new ticket● join - provides ID and required parameters(DSCP) of already opened ticket● extend ticket – allows to extend already created ticket.

Accepts many input parameters, most of them could also be determined automatically:● Remote LambdaStation site's identifiers● Source/Destination PBRClient identifier or/and list of IP netblocks●IP protocols, protocol's ports or port range● localPath, remotePath identifiers ( subject to available resources)● Bandwidth out, Bandwidth in● boardTime, startTime, endTime, travelTime

Returns ID of locally assigned ticket

Operational modes (subject to authentication, authorization):

Page 13: Lambda Station: On-demand flow based routing for data intensive GRID applications over multitopology networks Fermi National Accelerator Laboratory Don

1. Data transfer started:– 10GE host; 5 tcp streams

– Network path is via ESnet

– OC12 bottleneck…

– Path MTU is 1500B

– LambdaStation service ticket is

opened

2. LambdaStation changes

network path to USN

3. Host path MTUD check

detects a larger path MTU

4. LambdaStation service ticket

expires:– Network path changed back to

ESnet

LSiperf End-to-End Test

1

2 3

4

Page 14: Lambda Station: On-demand flow based routing for data intensive GRID applications over multitopology networks Fermi National Accelerator Laboratory Don

Lambda Station TestBed

Page 15: Lambda Station: On-demand flow based routing for data intensive GRID applications over multitopology networks Fermi National Accelerator Laboratory Don

SRM/DCache 1.7 LS-awareness

Wide Area Network

normal traffic flow

High Impact traffic

USCMS Tier1

Advanced Networks

CMS core router

FNAL LambdaStation

Production USCMS SRM server sends requests to Lambda Station to stir a high-impact traffic into Advanced Network infrastructure

Site Network

StarLight

How to demonstrate increasing transfer rate ?? In tests we could generate a data stream and monitor its rates when switching traffic between alternative paths. Production traffic is not deterministic.

CMS SRM

CalTech

CalTech LambdaStation

SRM

Page 16: Lambda Station: On-demand flow based routing for data intensive GRID applications over multitopology networks Fermi National Accelerator Laboratory Don

Project accomplishments

● Software version 1.0 (a fully functional prototype supporting whole cycle of LS functionality)● positive results of testing between Fermilab and Caltech●lsiperf, lsTraceroute – wrappers around well known applications to add Lambda Station awareness ( based on prototype version 1.0)● SRM/dCache integration – testing, added in production 1.7.0 release● run LS-aware SRM/dCache on production cluster at Fermilab●Interoperable Java implementation of the major components of LambdaStation ( perl, java clients available)

Page 17: Lambda Station: On-demand flow based routing for data intensive GRID applications over multitopology networks Fermi National Accelerator Laboratory Don

Problems and challenges

● Traffic Asymmetry is bad for high performance applications● Network (Lambda Station) awareness is too complex● Definition of PBR Clients is a complex issue, auto definition is not yet available

References: http://www.lambdastation.org/

Plans● release fully functional Java LS● SRM/dCache with production quality LS support● add real-time monitoring of resources ( perfSONAR )● add WAN control plane module● integration with OSCARS ( unified Network Path Reservation Model ?)

Page 18: Lambda Station: On-demand flow based routing for data intensive GRID applications over multitopology networks Fermi National Accelerator Laboratory Don

END

Page 19: Lambda Station: On-demand flow based routing for data intensive GRID applications over multitopology networks Fermi National Accelerator Laboratory Don

Miscellaneous Slides

● Simulation of multiple Lss● screen shoots of ticket's queue● SC05 Demo

Page 20: Lambda Station: On-demand flow based routing for data intensive GRID applications over multitopology networks Fermi National Accelerator Laboratory Don
Page 21: Lambda Station: On-demand flow based routing for data intensive GRID applications over multitopology networks Fermi National Accelerator Laboratory Don

Netconfig Module● dynamically modifies the configurations of local network devices. ● a vendor dependent components. ● Cisco routers with IOS version supporting sequencing type of named ACLs.

Tasks to configure PBR in Cisco devices:● interface on which PBR is applied needs to be configured with “ip policy route-map” statement● route map needs to be configured as ordered list of match/action statements● match criteria need to be associated with ACLs

Page 22: Lambda Station: On-demand flow based routing for data intensive GRID applications over multitopology networks Fermi National Accelerator Laboratory Don

openSvcTicket request

● new ticket - creating a new ticket● join - provides ID and required parameters(DSCP) of already opened ticket● extend ticket – allows to extend already created ticket.

Accepts many input parameters, most of them could also be determined automatically:● Dst site's identifiers● Src/Dst PBRClient identifier or/and list of Src/Dst IP in (CIDR)● IP protocols, protocol's ports● localPath, remotePath identifiers● BWout,BWin● boardTime, startTime, endTime

Returns ID of locally assigned ticket

Operational modes (subject authentication, authorization and quoting):

Page 23: Lambda Station: On-demand flow based routing for data intensive GRID applications over multitopology networks Fermi National Accelerator Laboratory Don

Directions of the project.

● building a wide-are testbed infrastructure● designing, developing Lambda Station software, Lambda Station (network) aware applications● researching effects of flow based switching on applications

Page 24: Lambda Station: On-demand flow based routing for data intensive GRID applications over multitopology networks Fermi National Accelerator Laboratory Don

Application's behavior in flow based switching environment

Tuning of end systems for maximum rates is not a subject of the Lambda Station project, however, we need to see an increase of data movement performance when selectively switch flows

● DSCP tagging with IPtables ● switching between two paths with different MTUs

Page 25: Lambda Station: On-demand flow based routing for data intensive GRID applications over multitopology networks Fermi National Accelerator Laboratory Don

Effect of DSCP tagging with IPTables

Page 26: Lambda Station: On-demand flow based routing for data intensive GRID applications over multitopology networks Fermi National Accelerator Laboratory Don

Motivation of the project

● unprecedented demands for data movement in physics experiments such as CDF, D0, BarBar and coming LHC experiments ● massive, globally distributed datasets growing to the 100 petabytes by 2010● collaborative data analysis by global communities of thousands of scientists ● available data communication technology will not be able to satisfy these demands simply by plain increasing bandwidth in LANs and WANs due to technology limitations and high deployment and operation costs.● Advance Research optical networks – greater capacity, no production quality of service, are not universally available for all pairs of communicating endpoints

Page 27: Lambda Station: On-demand flow based routing for data intensive GRID applications over multitopology networks Fermi National Accelerator Laboratory Don

LS multitopology network model

PBR-client B

Multiple Network Toplogies

Red

Green

BlueAdmission Group of network devices

RT1 RT2 RT3

RT1 RT2 RT3

PBR-client A

H1 H2 H3

NG-A

NG-C

NG-B

NG-ADM

H1 H2 H3

PBR-client

PBR-clients or regular clientsat the remote sites

ClientA rules for Red& GREEN topologies:

RT1 RT2 RT3

BLUE is ProductionPath

RED-ClientA-INGREEN-ClientB-INRED-ClientB-INGREEN-ClientB-IN

RED-B-IN

GREEN-OUT RED-OUT

Client A, RED & GREEN rules for NGC

Cisco IOS, dynamic configuring of PBR, extended sequencing ACLs + access policy ACLs

Page 28: Lambda Station: On-demand flow based routing for data intensive GRID applications over multitopology networks Fermi National Accelerator Laboratory Don

lambdastation@FNALlambdastation@SC05

Fermilab SC05/Seattle

nws-lab.fnal.gov

charley.fnal.gov

NAA-2-NAA

protocol

netconfig

LS-2-LS

protocol

A122.302.sc05.org

CommodityInternet/SCinet

A126.302.sc05.org

lsiperf

srmcp

ls-request

reply

default path

alternative path forspecific flows only

SC05/HighSpeedLinks

LambdaStation SC05 Demo

Page 29: Lambda Station: On-demand flow based routing for data intensive GRID applications over multitopology networks Fermi National Accelerator Laboratory Don
Page 30: Lambda Station: On-demand flow based routing for data intensive GRID applications over multitopology networks Fermi National Accelerator Laboratory Don

Note A

Note A: We believe it is a HW/ASIC problem with SNMP monitoring, a time to time SNMP -get returns the same counters as in previous cycle.

PMTUD

Page 31: Lambda Station: On-demand flow based routing for data intensive GRID applications over multitopology networks Fermi National Accelerator Laboratory Don