CHEETAH End-Host Software Design Specification

CHEETAH End-Host Software Design Specification

1. Introduction

1.1 Goals and Objectives

CHEETAH end-host software is a part of our effort to develop the infrastructure and networking technologies to support a

broad class of eScience projects and specifically the Terascale Supernova Initiative (TSI) [1]. The objectives of this work are

to design and deploy a high-performance, experimental optical network infrastructure and to test application/middleware/

transport protocol software, developed specifically for eScience projects, on this network. Our two target applications are file

transfers and remote visualization.

To meet the needs of those large-scale eScience project, we propose an end-to-end optical networking solution called Cir-

cuit-switched High-speed End-to-End Transport ArcHitecture (CHEETAH) [2]. In this solution, high-speed optical circuits,

consisting of a hybrid of high-speed Ethernet signals from/to end hosts within LANs and equivalent-rate Ethernet-over-

SONET (EoS) circuits across the wide area, are set up and released dynamically. Once a circuit is established, large files can

simply be streamed unhindered resulting in low file transfer delays. Furthermore, there is almost no variation in the delays

experienced by different data blocks sent over a circuit. Bit error rates on optical links are low, requiring few retransmissions if

any. Therefore, low end-to-end latency and jitter guarantees are possible in this networking solution.

To take advantage of the benefits brought by the CHEETAH service, software enhancements are needed at end hosts. This

enhancement addresses research questions in the various components of networking work, e.g. Internetworking between dif-

ferent types of networks, transport protocols for dedicated end-to-end circuits, GMPLS-driven circuit provisioning, routing,

etc. This document is a detailed design document for an implementation of the CHEETAH end-host software.

1.2 Assumptions and Constraints

The CHEETAH software discussed in this document is limited to the implementation on general-purposed computers (end

hosts). Any discussion about network elements other than general-purposed end hosts is beyond the scope of this documenta-

tion.

1.3 Development Methods

The current CHEETAH end-host software is developed using C/C++ and Java on Linux platform. The Win32 version will

be developed later.

2. System Overview and System Architecture

An overview of the end-host CHEETAH software architecture is shown in Figure 11. We identify six basic modules for the

CHEETAH end-host software: a Optical Connectivity Service (OCS) module, a Authentication, Authorization and Accounting

(AAA) module, a router disconnect module, a routing decision module, a RSVP-TE signaling module, and a high-speed trans-

port protocol (FRTP) module.

2.1 The functionality of each module

OCS module: Determine whether the correspondent end host can be reached by a direct end-to-end CHEETAH circuit

through circuit-switched network. OCS is important to enable a gradual growth of CHEETAH users. If an end host with

CHEETAH capability wants to communicate with an end host without such capability, it will simply use the Internet. If,

through OCS, it determines that the correspondent host also has CHEETAH capability, and furthermore it is connected via the

same optical circuit-switched network, it can use a CHEETAH circuit.

AAA module: Authenticate users, handle authorization requests, and collect accounting data.

Router disconnect module: To allow for end-to-end CHEETAH circuits to be established, an enterprise needs to lease a

1. CHEETAH End-host hardware configuration requires a secondary Ethernet NICs in end hosts, which are accessible through CHEE-TAH circuit-switched network.

TCP NIC I

NIC II

FRTPPrimary TCP/

IP path

End-to-end CHEETAH circuit

SFTPWeb

service

Videotelephony

Signaling

End-host CHEETAHsoftware

Routingdecision

OCS/AAA

RouterDisconnect

Figure 1. End-host CHEETAH software architecture

optical circuit to the service provider’s optical circuit-switched network. It would be a waste to let this high-speed circuit lie

unused when there is no CHEETAH communication. Hence we propose a scheme in which such leased circuits could be used

to interconnect enterprise IP routers with IP routers on Internet2, ESnet or the Internet itself, until a CHEETAH request

appears. When it does, this router-to-router circuit is torn down by router disconnect module and the CHEETAH circuit is

established for the duration of the session.

Routing decision module: For communication between two entities that can be connected by a direct CHEETAH circuit,

there is a choice of two paths: the primary TCP/IP path and an Ethernet/SONET circuit. The presence of two such paths raises

the question of which path an end-host application should choose. We recognize that it is not appropriate to attempt a circuit

setup for all communication sessions. For example, for a small-file transfer (file size is on the order of a few KB), the total

delay incurred in setting up a circuit and then transferring the file could be larger than the delay incurred in directly using the

TCP/IP path. Thus, a routing decision needs to be made at end hosts with access to RESCUE.

Signaling module: Initiate a call-setup request to the signaling-enabled network switches according to the GMPLS signal-

ing standards.

Transport protocol module: For the actual data transfer on CHEETAH circuits, we recommend using a combination of a

rate-based transport protocol on the unidirectional end-to-end Ethernet/SONET circuit from the server to the client and a TCP

connection for the reverse direction through the IP network. Standard TCP is not well-suited for end-to-end circuits [3], i.e.,

paths on which there are no packet switches, because of the congestion-control mechanisms built into Standard TCP. This

functionality is not only unnecessary if the end-to-end path is a circuit, it is also detrimental because bit errors will be inter-

preted as congestion losses causing variations in the sending rate. For full utilization of the circuit what we need is a transport

protocol that uses rate-based flow control and constantly sends data.

2.2 The work flow of end-host CHEETAH software

The user application (can be either SFTP, web application, or videotelephony, etc.) shown in Figure 1 first calls OCS mod-

ule to determine if the remote host has CHEETAH capability. If the remote end host does not have such a capability, the user

application will simply use the Internet through the end host’s primary NIC. If, through OCS, it determines that the correspon-

dent host also has CHEETAH capability, and furthermore it is connected via the same optical circuit-switched network, it call

AAA module to authenticate the user and password and authorize user’s circuit setup request (AAA module will also record

user’s billing information if the later circuit setup is successful.). If the user successfully obtains the authentication and autho-

rization from AAA module, it calls routing decision module to decide whether or not to attempt a circuit setup. If routing deci-

sion module decides to attempt a circuit setup, the CHEETAH signaling module initiates a call-setup request to the signaling-

enabled network switches. If the circuit setup is successful, CHEETAH software will direct the user application to initiate data

transfers on the dedicated CHEETAH circuit through the end host’s secondary NIC. Depending upon the application, TCP or

some other transport protocols could be used on the circuit. If, on the contrary, the routing decision module determines the pri-

mary TCP/IP path is preferred, or if the circuit setup fails with signaling module, the user application will be directed to the

primary TCP/IP path through the end host’s primary NIC.

3. Detailed System Design

3.1 Optical Connectivity Service (OCS) module

3.1.1 Definition and Responsibilities

Definitions: OCS module includes a daemon running on a server and a client program on CHEETAH end hosts. The OCS

sever has a database which stores the connectivity information of CHEETAH end hosts (clients), i.e. whether an end host has

the CHEETAH access capability. The OCS is a distributed system. One OCS server can query other OCS servers if the infor-

mation could not be obtained locally.

Responsibilities: The OCS client program on the end hosts sends out OCS query messages to OCS servers when the user

application requires communication with other end hosts. The OCS server accepts query from CHEETAH client and responses

with an answer.

Constraints: The database maintained by OCS servers must be configured manually. Each end host has a corresponding

TXT type Resource Record (RR) in the local OCS server database. A string like “OCS available” needs to be added to the RR

for each end host with OCS capability. You can ask your OCS administrator to add this RR in the local OCS database for each

host which has the OCS capability.

3.1.2 Uses/Interactions

From the client (end host) side, a OCS program called NSLOOKUP can be called to send query to the OCS server, examin-

ing if the remote host has the string “OCS available” in its RR. If it has, then the remote host can be reached by CHEETAH.

NSLOOKUP is a popular shell command that comes with almost all the operating system, such as Windows, Unix, and Linux.

NSLOOKUP can also be found in BIND installation package. The parameters of NSLOOKUP are set as following:

> nslookup ®Ctype=TXT hostname

3.1.3 Detailed Subsystem Design

The OCS server can be realized in a similar way as the Domain Name System (DNS). DNS is a hierarchical, distributed

database that provides directory services. It stores information for mapping Internet host names to IP addresses and vice versa,

mail routing information, and other data used by Internet applications. Clients look up information in the DNS by calling a

resolver library, which sends queries to one or more name servers and interprets the responses. By mimic the DNS, in OCS, a

client can find whether the remote client has the CHEETAH connectivity available or not, before it starts to signal the CHEE-

TAH circuit setup.

In this document, the OCS server is setup according to the Domain Name System (DNS) structure and configurations. For

convenience, we use the DNS terminology in the following description.

The DNS name server does not require the high-end computers. In our case, a normal PC with Intel-486 processor (which is

required for BIND 9) can do the work. The hardware connectivity is shown in Figure 2. The DNS name server is connected to

Cisco 12000 series router and the router is connected to SONET cloud through the leased line. Please note, in the beginning of

the CHEETAH network, i.e. in the experimental period, a separated Name Server can be omitted for simplicity. In stead, the

Name Server function can be resided within the host machine.

Cisco MSPP 15454

Connect to primary 10/100M NIC Connect to secondary GbE NIC for uplink Connect tothird10/100MNICfordownlink

Internet

OC-48c linksCHEETAH circuit

Cisco12000router DNS Name Server

SONET cloud

Figure 2. OCS server in Cheetah

BIND (Berkeley Internet Name Domain Software), the de-factual DNS management software, is used for OCS setup [4].

The software can be freely downloaded from www.isc.org. The most recent BIND 9.2.3 for Windows NT/2000/XP in binary

form can be found at ftp://ftp.isc.org/isc/bind/contrib/ in the current release directory. http://www.isc.org/products/BIND/

bind9.html contains source code which can build BIND on Windows NT/2000/XP using Visual C++. The BIND 9 contains

both a name server and a resolver library. Figure 3 shows the relationship between the name server and resolver.

The resolvers are the interface between the user and the name server, which extract information from name servers in

response to client requests and hide the distribution of data among name servers from the user. The resolver has a cache to

store recent domain name information queried by the users, to give user quick responses and also prevent the name server from

overloading by frequently similar queries.

3.1.4 Unit testing plan

The tested OCS server is placed in Electrical Engineering Dept. at City College. Go to the website http://134.74.17.77/ for

demonstration. Click “OCS availability check” and input a testing host name, then click “Submit Query,” a message from OCS

server will pop-out to show whether the host has the CHEETAH access or not (for example host “ocs.example.com” has the

CHEETAH access).

3.2 Description for Authentication Authorization and Accounting (AAA) module


Definitions: The AAA module includes a daemon running on a server and a client program on CHEETAH end hosts. The

AAA sever maintains a accounting database which stores the credential information and accounting information for each

User Resolver

Name Server

Name Server

Name Server

user queries

user responses

queries

responses

Cache

Figure 3. DNS configuratoin

CHEETAH service user.

Responsibilities: The AAA client program on the end hosts sends out AAA credential check requests to AAA servers after

an end host determines that the correspondent end host has CHEETAH connectivity through OCS query. The AAA server

receives request from user and responses with an authentication message. If an end host passes the AAA credential check and

the followed CHEETAH circuit setup is successful, the AAA server also records the CHEETAH circuit usage information for

accounting purpose, e.g. the duration and the bandwidth of a CHEETAH circuit.


The following open source software is needed at CHEETAH end hosts to setup the AAA system: Postgresql [8], Libnet [6],

Linux-PAM [7], Freeradius [8].

Radclient, which is included in the Freeradius installation package, can be found under Linux directory “/usr/local/bin.” The

CHEETAH end host calls Radclient to communicate with Radius Authentication and Authorization server via port number

1812 and 1813.


An overview of the different elements that constitute the AAA system is provided in Figure 4. The AAA server contains

authentication server and accounting database. The clients access the AAA server through the Internet.

The authentication server holds the access control information such as usernames and passwords. In our system, the access

control information is placed together with the accounting information in the accounting database to allow an easier manage-

ment. However, this could be changed based on an operator’s decision. In our experiment, we use RADIUS for authentication

Internet

AAA server

Client

Client

Internet

AAA server

Client

Client

Figure 4. AAA setup

servers.

The accounting database is a SQL database which holds the accounting information for the users and optionally it might

hold the access control information such as usernames and passwords. It is possible to configure the user credentials and

accounting information in the Postgres SQL DataBase from Internet.

About RADIUS: Several AAA protocols exist today; the most common is the Remote Access Dial In User Servizio

(RADIUS). Radius is a client-server system. The RADIUS document specifies a protocol used for Authentication and Autho-

rization and Accounting. This report extends the use of the RADIUS protocol to cover delivery of accounting information

from the client to a RADIUS AAA server. Key features of RADIUS Accounting include:

1. Client/Server Model.

2. Radius protocol uses a shared key to send the authentication and accounting messages. The shared key we used is

“c3pp0n4.” To change the key to go on root in clients.conf file.

3. The client is responsible for passing user accounting information to a designated RADIUS accounting server.

4. The RADIUS accounting server is responsible for receiving the accounting request and returning a response to the client

indicating that it has successfully received the request.

5. The RADIUS accounting server can act as a proxy client to other kinds of accounting servers.

Transactions between the client and RADIUS accounting server are authenticated through the use of a shared secret, which

is never sent over the network. All transactions are comprised of variable length Attribute-Length-Value 3-tuples. New

attribute values can be added without disturbing existing implementations of the protocol. More information about Radius can

be found on RFC 2866.


The tested AAA server is placed in Electrical Engineering Dept. at City College. Go to the website http://134.74.17.77/ for

demonstration.

The authentication page is shown in Figure 5. After input the username and password (The test username and password are

both “song”) and click “submit Query”, the result will be shown in Figure 6.

To view the accounting information in the Accounting server, go to http://134.74.17.77/pg/. The username and the password

are both “radius”. After the login in, click "radius", then "Tables". On the "radacct" row, if you clicks "browse", you will see all

Figure 5. Authentication page

Figure 6. Login result page

the accounting information (shown in Figure 7).

3.3 Description for optional Router Disconnect module


Definitions: Router Disconnect module is a executable software that talks to Cisco router to automate the router configura-

tion according to a script. Need to consider MSPP side.

Responsibilities: In default model, the router is configured to have several component links bundled together to form a bun-

dled link for TCP/IP traffic. In CHEETAH mode, the router is configured to unbundled one or more component links so that

these links can be used for CHEETAH traffic.

3.3.2 Constraints

“Expect,” an open source tool, need to be installed in the client side or application software for router disconnect. It can be

downloaded from http://expect.nist.gov/.

Link bundling is used to realize router disconnect. Link bundling is only supported by Cisco 12000 series routers and line

cards [9]. For a Gigabit EtherChannel or POS Channel to operate correctly with equal cost load balancing, the Link Bundling

feature requires that all line cards in a Cisco 12000 Series Internet Router support the ingress decision capability. Only the fol-

lowing Cisco 12000 series line cards support ingress decision capability:

Figure 7. Accounting information page

• Engine 0 Asynchronous Transfer Mode (ATM) line cards

• Engine 0 Channelized DS3/E3 line cards

• Engine 0 Dynamic Packet Transport (DPT) line cards

• Engine 0 Packet-over-SONET line cards

• Engine 1 Fast Ethernet line cards

• Engine 1 Gigabit Ethernet line cards

• Engine 2 8-port OC-3/STM-1 ATM line cards

• Engine 2 Dynamic Packet Transport (DPT) line cards

• Engine 2 Gigabit Ethernet line cards

• Engine 2 Packet-over-SONET line cards

• IP Service Engine (ISE) line cards:

• ®C 4-Port Gigabit Ethernet ISE line card

• ®C 8-Port OC-3c/STM-1c POS/SDH ISE line card




• Engine 4+ line cards

The following Cisco 12000 series line cards do not support ingress decision capability:

• Engine 0 Channelized DS1/E1, OC-3, and T3 line cards

• Engine 2 Asynchronous Transfer Mode (ATM) line cards, except for the 8-port OC-3/STM-1 ATM line card

• Engine 4 line cards


Expect is a tool for automating interactive applications such as telnet, ftp, passwd, fsck, rlogin, tip, etc. It talks to other inter-

active programs according to a script. Most of the Linux installations already contain a version of Expect. The scripts for link

bundle (bundle.exp) and unbundled (unbundle.exp) are listed in Appendix 8.1 for reference.


In Cheetah network, the hosts are connected by leased high-speed link. It would be a waste to let this high-speed circuit lie

unused when there is no Cheetah communication. Hence we propose a scheme in which such leased circuits could be used to

interconnect enterprise IP routers with IP routers on Internet2, ESnet or Internet itself, until a Cheetah request appears. When

it does, this router-to-router circuit is torn down and the Cheetah Ethernet/EoS circuit is established for the duration of the ses-

sion. In the mean time, the client needs to be authenticated before it starts to use Cheetah service and how long the client uti-

lizes this service needs to be recorded for billing purpose.

The Router disconnect is demonstrated in Figure 8. The router-to-router circuit is presented by default (shown with the bold

line). When a request arrives for Cheetah circuit, this router-to-router circuit is released and an end-to-end Ethernet/EoS circuit

is setup as shown in the dashed line.

Our solution for Router Disconnect is based on link bundling function provided by Cisco 12000 series GSR routers.

1. What is Link Bundling?

Link bundling Groups multiple point-to-point links together into one logical link (shown in Figure 9) to provide higher

bandwidth (a bigger pipe), redundancy, and load sharing between two routers.

Figure 8. Router disconnect

GbENIC

10MEth.

End host I

Primary CHEETAH

GbENIC

10MEth.

10MEth.

End host II

PrimaryCHEETAH

Shared packet-switched

LAN

Shared packet-switched

LAN

MSPP I

MSPP II

Eth ControlGbE XC WAN10MEth.

EthControl GbEXCWAN

Signalingsoftware

Signalingsoftware

OC48

End host III

(1) Releaserouter-to-router circuit

IP router

IP router

Representativeof Enterprise I

Representativeof Enterprise IIInternet,

Internet2or ESnet

Figure 9. Link bundling

The following types of link bundling are supported on Cisco 12000 Series Internet Routers:

• Gigabit EtherChannel is used to bundle multiple Gigabit Ethernet (GE) interfaces.

• POS Channel is used to bundle multiple Packet-over-SONET (POS) interfaces.

Use link bundling on Cisco 12000 Series Internet Routers in networks under the following conditions:

• Faster links do not exist.

• The next step available for increasing link capacity is too expensive.

• The operational costs to increase link capacity are too high.

Gigabit Ethernet and POS Channel link bundling on Cisco 12000 Series Internet Routers provide flexible and incremental

bandwidth with link redundancy and higher layer transparency to network applications. You can use Gigabit EtherChannel and

POS channel in multiple locations in the same network.

Gigabit EtherChannel and POS Channel allow you to increase and decrease bandwidth by simply adding or removing an

interface from the link bundle. Also, by incrementally increasing bandwidth, you are no longer dependent on the fixed

increases in bandwidth (for example, 1 Gbps, 10 Gbps, and so on) determined by the physical layer technology.

The failure of a single link does not necessarily cause a network failure. Traffic is redirected to remaining links within the

channel without user intervention. As a result, the availability of a GE or POS link is increased.

On Cisco 12000 Series Internet Routers, link bundling is implemented so that a virtual interface (a POS Channel or Gigabit

EtherChannel) is created for each link bundle. You can dynamically add and delete links to the virtual interface. The virtual

interface is treated as a single interface on which you configure an IP address and other software features used by the link bun-

dle, instead of configuring them on individual GE and POS interfaces.

Packets sent to the link bundle are forwarded on one of the links in the bundle. Load balancing is supported on all links in a

bundle using per-destination load balancing based on a hash calculated using the source and destination IP addresses in the IP

packet. Per-destination load balancing ensures that packets are delivered in order.

2. How Link bundling works?

• A Gigabit EtherChannel or POS Channel link bundle is created on a line card or across multiple line cards.

• An adjacency representing the new bundle is created in the forwarding information base (FIB) table on the gigabit route

processor (GRP) and is forwarded to all the line cards. This adjacency represents a virtual link and has pointers to indi-

vidual links in the bundle.

• As incoming data packets are received by the router, line cards route packets to the link bundle as a whole. The ingress

line card recognizes the virtual adjacency, and properly routes and load balances the packets across the sub-adjacencies

represented by the virtual adjacency. Packets are properly routed and load balanced towards the bundle, and then prop-

erly transmitted across the bundle.

3. How Link Bundling Helps in Our Project

Link bundling can be applied during the mode transfer between Cheetah mode and default mode to minimize the TCP

packet loss. In default mode, all the component links are bundled together for TCP traffic. When a Cheetah circuit requires, the

router can un-bundle one or several component links, so that these links can be used for Cheetah circuit. The remained links in

the bundled link are used for TCP traffic.


The scripts for link bundle (bundle.exp) and unbundled (unbundle.exp) are located at computer mvstu5.cs.virginia.edu

under directory /home/cuny/). Here is an example of link bundling execution:

[cuny@mvstu5 cuny]$ expect bundle.exp 128.143.67.146 1 192.168.18.1 255.255.255.0 pos3/0 pos3/2

This command bundles POS channel 3/0 and 3/2 of router 128.143.67.146 together and assigns °×Group 1°± to the bundled

link, and an IP address 192.168.18.1 to this virtual interface.

Examples of link unbundling execution:

[cuny@mvstu5 cuny]$ expect unbundle.exp 128.143.67.146 1 pos3/2

This command removes POS channel 3/2 from link-bundling group 1 of the router 128.143.67.146.

[cuny@mvstu5 cuny]$ expect unbundle.exp 128.143.67.146 1

This command totally un-bundles the pre-bundled link and removes the bundled group 1.

3.4 Description for Routing Decision (RD) Module


Definitions: This module make a routing decision for the end-host applications, which have CHEETAH connectivity. Run-

ning as a daemon process, this module measures the Internet status periodically, maintains a network-parameter database, lis-

tens on applications’ queries, uses the routing decision algorithm presented in [2] to calculate and compare quantitatively the

costs of setting up a CHEETAH circuit and using the primary Internet path, and then provide the application a recommenda-

tion on whether to set up a circuit or not.

Responsibilities: When an application wants to transfer a file, it will consult this module to see whether it should set up a

CHEETAH circuit or just use the primary Internet path. Hence this module’s main responsibility is to give a recommendation

on choosing the path. To do this on a quantitative basis, it has to measure the Internet status and then use the measurement

results to calculate the cost of using the primary Internet path and that of opening a circuit. But the measurement of Internet

status may take a certain amount of time (for example pathload [10] takes 5-6 seconds to finish one measurement from UVa to

NCSU), which adds overhead to the circuit transfer in the case that the CHEETAH circuit is preferred. To shorten this over-

head we’d better let this Routing Decision module maintain an Internet status database and use the data inside this database as

much as possible.

3.4.2 Constraints

First of all, the assumption we make here is the status of the Internet does not change in a very short period, say the period

of the measurement carried out by this module, and the status of the past short period indicates the trend in the next period in

most of the time. Second, The data base will require some memory space. Third in the first phrase of implementation we only

realize consideration the aspect of delay analysis, however leave out the utilization aspect.


Applications communicate with this module through Socket or IPC api. Our first step is using Socket, which means this

module’ daemon keeps listening on a well-known port to wait for applications’ connections. After an application connect to

the listening port, it send a query that includes the destination IP address, file size and bandwidth expected. Then the daemon

will do the calculations and reply with a recommendation. Other parameters may be needed in the query, if we involve the

authentication or priority issues, which should be added to this software in future.


1. Architecture

The Routing Decision module has 7 components in total, which are Socket sub-module, Consultant sub-module, Child Con-

sultant sub-module, Calculator sub-module, DataBase(DB) sub-module, DB Manager sub-module and Measurer sub-module

as shown in Figure 10.

2. Socket sub-module

We encapsulate the linux socket apis into a class in order to provide convenient uses. The methods of our socket class is

RD Consultant

RD Calculator

Socket RD DB Manager

RD DB

RD MeasurerRD ChildConsultant

Objects & Use Casesin the RD moduleFrom listening port

Wait

relea

se

Pthread

Return result

Query &Update

Query &Update

Figure 10. Routing Decision module architecture

illustrated in Figure 11 below.

3. Consultant and Child Consultant sub-module

The consultant sub-mod listens on the well-known port and accept applications’ connections. Then it create a child consult-

ant thread to handle a particular query. The operation flow of Consultant and Child Consultant sub-mod is shown below.

4. Calculator sub-module

The Calculator sub-mod perform the job of computing the costs of setting up a circuit and using the primary Internet path.

My Socket

myBind

myListen

myConnect

myAccept

mySend

myRecv

MySocket(construct)

MySocket(construct)

Server side Client sideBoth sides

Figure 11. Socket sub-module

RD Consultant

Bind Socket & Listen

Accept a connection

Weekday?

Create a thread

Call back function

Get query params

Call DB Manager

Compare and return

Run Handle queryCall back

Y

NExit

RD DB Manager

Exit

Figure 12. Consultant and Child Consultant sub-module

5. DB sub-module

DB sub-mod stores the Internet status parameters and responsible to update and query routines. The organization of DB sub-

mod is presented in the figure beneath.

6. DB Manager sub-module

DataBase Manager sub-mod is in charge of handling the retrieval queries from the Child Consultant sub-mod and its work

flow is like below.

TABLE I ENTRY IN THE DB

IP address TTL RTT

(circuit) Rs K Tsp rho-sig rho-sp p Wmax(client side)

T0 b RTT (tcp)

RD DB

DBQuery DB

Hit?

Y

Return the result

NCall Measure

Update DB

Measurer

Figure 13. DB sub-module

7. Measurer sub-module

Measurer sub-mod is composed by several Internet measurement tools’ code and works in Client/Server style, which means

the client side RD application will talk to its peer on the destination machine and start the measurement. The idea is shown

below.

3.5 Description for Signaling Module


Definitions: The signaling module is used to create circuits across the LAN and the backbone network. The signaling tech-

RD DB Manager

Query DB

Hit?

Y

NCall DB update RD DB

Return the results

RD DB

Exit

Figure 14. DB Manager sub-module

RD Measurer

Client side Server side

PathloadPathrate

PathloadPathrate

Figure 15. Measurer sub-module

nologies include RSVP-TE SONET extension, VLAN and TL1 cross connection.

Signaling module exists both on the VLSR [11] and the end hosts. However the signaling modules on VLSR and end hosts

consist of different sub modules. Generally we can divide the signaling module into three parts: RSVP daemon, RSVP API

and SNMP (for Ethernet switch) / TL1 (for MSPP) module. The module on end hosts consists of RSVP daemon and RSVP

API. The module on VLSR consists of RSVP-TE daemon and SNMP/TL1 module. The SNMP/TL1 module is integrated with

RSVP-TE daemon and invisible to other modules of the system.

Responsibilities: The responsibilities of signaling module is to create, release and maintain the circuit between two end

hosts. Here the circuit include the SONET circuit in backbone, the setup of VLAN in LAN and the cross connections on

MSPPs.

3.5.2 Constraints

Currently the routing function is not implemented in the VLSR RSVP-TE code. So the initiator of the RSVP-TE signaling

must explicitly give the route of the signaling. The information of the MSPP, including the IP address, port number of the

Ethernet Card and Optical Card, should also be given by the end host.


The RSVP-TE daemons on VLSRs don’t collaborate with any other modules. The daemons on different machines commu-

nicate with each other through RAW IP message. As indicated in the definition part, the SNMP/TL1 module is invisible to

other modules.

The RSVP-TE daemons on end host talks to user application through RSVP_API class. User application can include the

header files of RSVP_API in the source code and link the RSVP-TE library with their application.

For end hosts which act as RSVP-TE receivers, the application need to know the IP addresses and port numbers of both

sender and receiver before they create the RSVP-TE session to wait for PATH message from sender. To be specific, the

receiver application should create an instance of RSVP_API class and define a set of upcall functions which will be triggered

when certain messages, such as PATH, RSV_CONFIRM, etc., are received. It should also use RSVP_API:: createSession to

create a session to register the upcall functions and wait for the PATH message from sender.

For end hosts acting as RSVP-TE senders, besides all the operations that the receivers need to do, the application of senders

also need to do following things:

• Provide the routing information (Might not be necessary when OSPF-TE is implemented later)

• Provide IP, port information of the MPSS/Ethernet Switch on the route

• Provide circuit request information to generate sender TSpec object.

• Create Label

• Call createSender with the information above to send out PATH message


TBD

3.6 Description for Transport Protocol Module

3.6.1 Definition and Responsibilities (REQUIREMENTS)

The transport protocol module should:

• provide reliable data transfer. It should guarantee ordered, error-free, and no-duplicates delivery.

• provide high goodput for file transfers.

• provide high utilization of the reserved circuit. The reserved network resources should be used as efficiently as possible

and should be held for only as long as they are required.

• be efficient in a multitasking environment, i.e. the transport protocol implementation should not be CPU intensive.

• have an application-level implementation, that is easy to deploy widely, and not require any kernel-level changes.

• provide an Application Programming Interface (API) which can be used by an application.

3.6.2 Constraints

TBD

3.6.3 Uses/Interactions (INTERNAL FUNCTIONS/ EXTERNAL INTERFACE)

Given below are the design decisions taken to meet the requirements listed in subsection 3.6.1.

• Application-level implementation (Req. 5): The transport protocol implementation should utilize the kernel network

stack through the socket API that provides an interface to the TCP/ UDP layer.

• High goodput, high circuit utilization (Req. 2, 3): TCP’s congestion and flow control algorithms, designed for sharing

network resources fairly, adversely affect utilization of reserved resources. Use UDP to transport the data packets-

since UDP has no flow or congestion control- and add these functionalities (if required) on top of the UDP layer. For

high circuit utilization, use a steady sending rate (equal to the reserved circuit rate) which is maintained using a fixed

inter-packet transmission time. High goodput requires a high sending rate as well as a low error/loss rate; these being

conflicting requirements because of the end-host limitations. Take end-host capabilities into consideration when choos-

ing the sending rate (this should be done while setting up the circuit).

• Reliability (Req. 1): The reserved circuit ensures in-sequence delivery of data packets (and hence, no duplicates). Take

advantage of this by adding a sequence number to the data packets to detect loss at the receiver, instead of maintaining

timers at the sender. Since we use UDP to transmit the data packets, the UDP checksum can be used to detect errors.

The receiver should report errors/ loss to the sender. Use a reliable TCP channel for these reverse direction control

packets.

• CPU utilization and API (Req. 4, 6): These are implementation issues and should be kept in mind while writing the

code.

Internal Functions:

Sender

• Send data at the fixed circuit rate. To do this, calculate the inter-packet time for the required circuit rate, assuming pack-

ets are the maximum allowed size (MTU). Every inter-packet time send 1 packet using the UDP socket. Add sequence

numbers to the payload (since UDP does not add sequence numbers), for error detection at the receiver.

• Retransmit packets that were lost or had errors. Wait for acknowledgement (ACK) from the receiver, keeping

unACKed data in memory. Error indications from the receiver inform the sender of which packets need to be retrans-

mitted and those portions of the data are marked for retransmission.

Receiver

• Receive data and identify lost/ incorrect packets.

• ACK correctly received packets and inform the sender of missing/ incorrect packets.

• Use the data packets’ sequence numbers to reassemble the received data in memory.

External Interface

The transport protocol module interfaces with the application (e.g. SFTP or Web Application). The application uses the

transport protocol API to send/ receive data.

Sender

The application has to supply the circuit rate, so that the transport protocol module can calculate the inter-packet time. The

application should also provide information about the data to be sent- the start memory address and length of the block to be

sent, where a block is a subset of the file to be transferred (it is unlikely that the whole file can be stored in memory). The

application can either reuse these blocks of memory (the file will usually be larger in size than 1 block, requiring multiple

blocks to be transmitted) or assign new memory blocks. In case memory is reused, the transport protocol module should indi-

cate to the application when a block of data has been sent correctly (HOW TO DO THIS?). If memory is not reused, the trans-

port protocol module should deallocate the block of memory once it has been sent out correctly.

Receiver

The receiver should reassemble the received data in memory. The application should request data (PERIODICALLY?) and

the transport protocol receiver should provide the start address and length of a correctly received block of data.


TBD

3.7 SFTP

TBD

3.8 Web Application


Definitions: The Web Application is one of the use cases for CHEETAH. It is oriented to commercial users such that they

can enjoy the CHEETAH service on the World Wide Web without knowing the underlying mechanism of CHEETAH. When

the users click a URL on the web, the Web Application is triggered to automatically utilize CHEETAH to improve large file

transfer delays.

Responsibilities: The Web Application is aimed to utilize CHEETAH to improve bulk data transfer delays via Web. It con-

sists of a CGI (Common Gateway Interface) script, a web server side application and a client side application. When a URL is

requested by a web client, the CGI is invoked to run at the web server. The CGI interacts with a web server (e.g. APACHE)

and calls the server side application to return the file to the client. The server side application collaborates with other CHEE-

TAH modules (OCS/AAA, routing decision, router disconnect, RSVP-TE signaling, and FRTP) while the client side applica-

tion is designed to run on the client side as a daemon in the background to establish a frtp connection with the server side

application. The Web Application is transparent to the users. Specifically, the users don’t need to know if CHEETAH is avail-

able for them or decide if it is appropriate to use CHEETAH. Instead, the application will provide them proper services.

3.8.2 Constraints

The current version of the Web Application is run on Linux Redhat 9.0 with APACHE 2.0 as the web server. Since the Web

Application invokes the other CHEETAH modules, all the constraints that those modules have presented should be met. Fur-

thermore, we assume that our web site is not very large and scalable. This assumption is critical because the server launches a

subprocess to run the CGI program for each click on the URL and thus gets a very heavy load when there are many requests at

the same time. In light of this assumption, although there are speed and performance issues with CGI, it is reasonable for us to

use it because we do not expect many simultaneous hits on the web site, for example, over 1000.


The Web Application collaborates with the following CHEETAH modules: OCS, AAA, routing decision, router disconnect,

signaling and FRTP. The specific interaction lists as follow:

1. OCS

The OCS is based on BIND9 (open source for DNS setup). The DNS administrator needs to install BIND9 and adds a string

like “OCS available” to the TXT type Resource Record (RR) in the DNS server database. Then, the Web Application calls pro-

gram nslookup to examine if the remote host can be reached by CHEETAH. The parameters of nslookup are set as follows:

> nslookup -type=TXT hostname

If the returned value of nslook has the string “OCS available” in its RR, that means the remote host can be reached by

CHEETAH circuit.

2. AAA

The open source software freeradius-0.9.3 should be installed in the CHEETAH host. Then, the Web Application calls rad-

client through port 1812 and 1813 to communicate with Radius Authentication and Authorization server.

• For authentication, call the following shell script:

> /usr/local/bin/radclient -d /usr/local/etc/raddb -x XX.XX.XX.XX (radius server IP address) auth c3pp0n4 (the shared

secret key stored in the server side) << EOF

> User-Name = XXXX

> password = XXXX

> EOF

If the authentication is successful, the client will receive a message saying “Access-Accept packet from host

XX.XX.XX.XX:1812, id =X, length=XX”.

• For accounting, call the following shell script to record CHEETAH circuit setup time:

> /usr/local/bin/radclient -d /usr/local/etc/raddb -x XX.XX.XX.XX (radius server IP address) acct c3pp0n4 (the shared


> User-Name = XXXX

> NAS-IP-Address = 10.0.0.1 (not used)

> NAS-Port = 1813 (not used)

> Framed-IP-Address = XX.XX.XX.XX (the client’s IP address)

> Acct-Session-Id = 1

> NAS-Port-Type = 15

> EOF

Call the following shell script to record CHEETAH circuit teardown time:

> /usr/local/bin/radclient -d /usr/local/etc/raddb -x XX.XX.XX.XX (radius server IP address) acct c3pp0n4 (the shared


> User-Name = XXXX

> NAS-IP-Address = 10.0.0.1

> NAS-Port = 1813

> Framed-IP-Address = XX.XX.XX.XX (the client’s IP address)

> Acct-Status-Type = Stop

> #Acct-Input-Octets = $3

> #Acct-Output-Octets = $4

> Acct-Session-Id =1

> #Acct-Session-Time = 100

> Acct-Terminate-Cause = 1

> NAS-Port-Type = 15

> EOF

3. Routing decision

Parameters: client IP address, file size and bandwidth requirement.

Return value: true or false to indicate if the request for a CHEETACH circuit is met or not.

4. Routing disconnect

The open source software Expect should be installed. The scripts for link bundle (bundle.exp) and unbundled (unbun-

dle.exp) have been designed by the router disconnect module.

• For link bundling, the Web Application invokes the expect command, for example:

> expect bundle.exp 128.143.67.146 1 128.143.67.254 255.255.0.0 pos3/1 pos3/2 pos3/3

The above command bundles POS channel 3/1, 3/2 and 3/3 of router 128.143.67.146 together, assigns “Group 1” to the bun-

dled link, and an IP address 128.143.67.254 to this virtual interface

• For link unbundling, the Web Application invokes the expect command, for example:

> expect unbundle.exp 128.143.67.146 1 pos3/1 pos3/2 The above command removes POS channel 3/1 and 3/2 from link-bundling group 1 of the router 128.143.67.146.

Another example is

> expect unbundle.exp 128.143.67.146 1

The above command totally unbundles the pre-bundled link and removes the bundled group 1.

5. RSVP-TE Signaling

RSVP API: createSender, receiveAndProcess

6. FRTP

FRTP API: CSabulSender, CSabulRecver


Provide a detailed description of this software component (or a reference to such a description). Details of module structure,

main functions, data structure and flowcharts.

The Web Application is made up of a CGI (Common Gateway Interface) script and a client side application. The workflow

of the CGI program is depicted in Figure 16.

Receive a URL request from a web client

The client can be reached byCHEETAH circuit(OCS/AAA)

Request a CHEETAH circuit(Routing Decision)

Set up a circuit(Signaling)

Send the file via FRTP

Release the circuit

Yes

Yes

Succeed

Send the file via TCP

No

No

Fail

Router Disconnect

Get client IP address, file size andbandwidth

Figure 16. The workflow of the Web application CGI program

4. Testing Issues

TBD

system integration and system tests

4.1 Classes of tests

The types of tests to be conducted are specified, including as much detail as is possible at this stage. Emphasis here is on

black-box and white-box testing.

4.2 Expected software response

The expected results from testing are specified.

5. Schedule

TBD

6. Glossary

TBD

7. Bibliography

[1] “NSF Experimental Infrastructure Networks (EIN) Project”, cheetah.cs.virginia.edu[2] M. Veeraraghavan, X. Zheng, H. Lee, M. Gardner, W. Feng, CHEETAH: Circuit-switched High-speed End-to-End

Transport ArcHitecture, Proc. of Opticomm 2003, Oct. 13-17, 2003. Dallas, TX. [3] W. Feng and P. Tinnakornsrisuphapá, “The Failure of TCP in High-Performance Computational Grids,” Proceeding of

SC2000: High-Performance Network and Computing Conference, Dallas, TX, November 2000.[4] “ISC BIND,” http://www.isc.org/index.pl?/sw/bind/.[5] “Postgresql,” http://www.postgresql.org/.[6] “Libnet,” http://libnet.sourceforge.net/.[7] “Linux-PAM”, http://www.kernel.org/pub/linux/libs/pam/.[8] “Freeradius,” http://www.freeradius.org/.[9] Cisco, “Link Bundling on Cisco 12000 Series Internet Routers,” http://www.cisco.com/en/US/products/sw/iosswrel/

ps1829/products_feature_guide09186a0080103708.html.[10] “Pathrate,” http://www.cc.gatech.edu/fac/Constantinos.Dovrolis/pathrate.html.[11] J. Sobieski, T. Lehman, B. Jabbari, “DRAGON Dynamic Resource Allocation via GMPLS Optical Networks,” presen-

tation at the NSF Shared Cyberinfrastructure (SCI) Division Principal Investigators Meeting, Feb. 18-20, 2004.

8. Appendix

8.1 Cisco 12000 series GSR router link bundling script

> bundle.exp

> set timeout -1

> if {$argc<5}

> {

> send_user "usage: expect bundle.exp Router_IP Channel_number Channel_IP Subnet_Mask Slot/Port\r\n"

> exit

> }

> set router_ip [lindex $argv 0]

> set argv [lrange $argv 1 end]

> set channel_number [lindex $argv 0]


> set channel_ip [lindex $argv 0]


> set subnet_mask [lindex $argv 0]


> spawn telnet $router_ip

> expect "Password:"

> send "EIN\r"

> expect "12008?>"

> send "enable\r"


> send "EIN\r"

> expect "12008?#"

> send "configure terminal\r"

> expect "12008?(config)#"

> send "interface pos-channel $channel_number\r"

> expect "12008?(config-if)#"

> send "ip address $channel_ip $subnet_mask\r"


> send "exit\r"

> while {[llength $argv]>0}

> {

> set slot_port [lindex $argv 0]


> send "interface $slot_port\r"


> send "channel-group $channel_number\r"

> send "yes\r"


> }

> send "exit\r"


> send "exit\r"

> expect "12008?#"

> send "show interface pos-channel $channel_number\r"

> expect "12008?#"

> send "exit\r"

> unbundle.exp

> set timeout -1

> if {$argc<2}

> {

> send_user "usage: expect unbundle.exp Router_IP Channel_number Slot/Port\r\n"

> exit

> }

> set router_ip [lindex $argv 0]


> set channel_number [lindex $argv 0]


> spawn telnet $router_ip


> send "EIN\r"

> expect "12008?>"

> send "enable\r"


> send "EIN\r"

> expect "12008?#"

> send "configure terminal\r"


> if {![llength $argv]}

> {

> send "no interface pos-channel $channel_number\r"

> } else

> {

> while {[llength $argv]>0}

> {

> send "interface pos-channel $channel_number\r"


> set slot_port [lindex $argv 0]


> send "interface $slot_port\r"


> send "no channel-group $channel_number\r"


> }

> send "exit\r"

> }


> send "exit\r"

> expect "12008?#"

> send "show interface pos-channel $channel_number\r"

> expect "12008?#"

> send "exit\r"

Documents

CHEETAH End-Host Software Design Specification