15
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003 Toulouse, France

CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003

Embed Size (px)

Citation preview

Page 1: CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003

CEOS Working Group on Information Systems and Services - 1

Data Services Task Team

Discussions on

GRID and GRIDftp

Stuart Doescher, USGS

WGISS-15

May 2003

Toulouse, France

Page 2: CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003

CEOS Working Group on Information Systems and Services - 2

The Grid Problem Flexible, secure, coordinated resource sharing among

dynamic collections of individuals, institutions, and resourceFrom “The Anatomy of the Grid: Enabling Scalable Virtual Organizations”

Enable communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals -- assuming the absence of…

central location, central control, omniscience, existing trust relationships.

Page 3: CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003

CEOS Working Group on Information Systems and Services - 3

The Data Grid Problem

“Enable a geographically distributed community [of thousands] to perform sophisticated, computationally intensive analyses on Petabytes of data”

Sounds like a separate class of problem, but is actually a superset.

So all work done on “Grid Problems” applies to “DataGrid Problems”. We just need some additional tools.

Page 4: CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003

CEOS Working Group on Information Systems and Services - 4

Globus Approach

Software toolkit addressing key technical areas Offer a modular “bag of technologies” Enable incremental development of grid-enabled tools and

applications Define and standardize grid protocols and APIs

(Our software development supports this goal.)

Focus is on inter-domain issues, not clustering Supports collaborative resource use spanning multiple

organizations Integrates cleanly with intra-domain services Creates a “collective” service layer

Page 5: CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003

CEOS Working Group on Information Systems and Services - 5

Major Data Grid Projects

Earth System Grid (DOE Office of Science) DG technologies, climate applications

European Data Grid (EU) DG technologies & deployment in EU

GriPhyN – Grid Physics Network (NSF ITR) Investigation of “Virtual Data” concept

Particle Physics Data Grid (DOE Science) DG applications for HENP experiments

Page 6: CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003

CEOS Working Group on Information Systems and Services - 6

Page 7: CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003

CEOS Working Group on Information Systems and Services - 7

Basic Data Grid Services

1. GridFTP: Data Transfer and Access Common protocol for data movement

– Secure, efficient, reliable, flexible, extensible, etc.– Grid Forum (Internet) Draft

Family of tools supporting this protocol– Wu-ftpd, ncftp, Globus Toolkit SDKs, etc.

2. Replica Management ArchitectureSimple scheme for managing:

multiple copies of files collections of files

Page 8: CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003

CEOS Working Group on Information Systems and Services - 8

GridFTP: Basic Approach

FTP is defined by several IETF RFCs Start with most commonly used subset

Standard FTP: get/put etc., 3rd-party transfer

Implement standard but often unused features GSS binding, extended directory listing, simple restart

Extend in various ways, while preserving interoperability with existing servers

Page 9: CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003

CEOS Working Group on Information Systems and Services - 9

Features of GridFTP

Grid Security Infrastructure and Kerberos support: Robust and flexible authentication, integrity, and confidentiality

Third-party control of data transfer: user or application at one site initiates, monitors and controls a data transfer between two other sites

Parallel data transfer: On wide-area links, use multiple TCP streams in parallel between the same source and destination

Striped data transfer: Use multiple TCP streams to transfer data that is striped or interleaved across multiple servers

Page 10: CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003

CEOS Working Group on Information Systems and Services - 10

Features of GridFTP (cont.)

Partial file transfer: Standard FTP allows transfer of the remainder of a file starting at an offset. GridFTP supports transfers of arbitrary subsets or regions of a file

Automatic negotiation of TCP buffer/window sizes: optimal settings for TCP buffer/window sizes can dramatically improve performance

Support for reliable and restartable data transfer: FTP standard includes basic features for restart that are not widely implemented. GridFTP exploits these features and extends them.

Page 11: CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003

CEOS Working Group on Information Systems and Services - 11

GridFTP for Efficient WAN Data Transfer

Secure authentication Parallel transfer gets job done quickly Partial file access gets only required data Up to 2.8Gb/s using a striped server

architecture

Parallel TransferFully utilizes bandwidth of

network interface on single nodes.

Striped TransferFully utilizes bandwidth of

Gb+ WAN using multiple nodes.

Par

alle

l F

iles

yste

m

Par

alle

l F

iles

yste

mGridFTP (globus-url-copy)

0

10

20

30

40

50

60

70

80

0 5 10 15 20 25 30 35

# of Parallel Streams

Ban

dw

idth

(M

bs)

Page 12: CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003

CEOS Working Group on Information Systems and Services - 12

Current Data delivery processftp based

Pull – Semi anonymous ftp Product ready Email sent to user with instructions and password User ftp via “anonymous” and with provided password Ftp demon positions user to appropriate directory User pull data

Push – routine data flows to high volume users

Account provided on remote system When data available is pushed to remote system

Page 13: CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003

CEOS Working Group on Information Systems and Services - 13

For routine multiple usage customers Establish “Certificate process” with customer

– Self-signed certificate authority– Customer generates private/public key pair– Generate user certificate with public key– Add user certificate to list of trusted users

Customer must install GridFTP client– Globus toolkit data management client bundle– Gsincftp– Java Commodity Grid Kit for Windows

Potential Future data deliveryGRIDftp based

Page 14: CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003

CEOS Working Group on Information Systems and Services - 14

For routine multiple usage customers Pull –

– Product ready– Email notifies user that data is ready– User using GRIDftp and user certificate for

authentication provided access and pulls data Push –

– Account provided on remote system with host certificate and our user certificate

– These GRID certificate establish Virtual Organization between the two parties

– When data available is GRIDftp used to pushed data to remote system

Potential Future data deliveryGRIDftp based

Page 15: CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003

CEOS Working Group on Information Systems and Services - 15

For single usage customersProcess to

– Establish “Certificate process” with customer– Customer must install GridFTP client

Currently seems too complex (not worth the effort)

Would like to have simplified method such as– Email a one time use “user certificate”– Integrated with browser built in GRIDftp client

Potential Future data deliveryGRIDftp based