Upload
nigel-simpson
View
213
Download
0
Embed Size (px)
Citation preview
CEOS Working Group on Information Systems and Services - 1
Data Services Task Team
Discussions on
GRID and GRIDftp
Stuart Doescher, USGS
WGISS-15
May 2003
Toulouse, France
CEOS Working Group on Information Systems and Services - 2
The Grid Problem Flexible, secure, coordinated resource sharing among
dynamic collections of individuals, institutions, and resourceFrom “The Anatomy of the Grid: Enabling Scalable Virtual Organizations”
Enable communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals -- assuming the absence of…
central location, central control, omniscience, existing trust relationships.
CEOS Working Group on Information Systems and Services - 3
The Data Grid Problem
“Enable a geographically distributed community [of thousands] to perform sophisticated, computationally intensive analyses on Petabytes of data”
Sounds like a separate class of problem, but is actually a superset.
So all work done on “Grid Problems” applies to “DataGrid Problems”. We just need some additional tools.
CEOS Working Group on Information Systems and Services - 4
Globus Approach
Software toolkit addressing key technical areas Offer a modular “bag of technologies” Enable incremental development of grid-enabled tools and
applications Define and standardize grid protocols and APIs
(Our software development supports this goal.)
Focus is on inter-domain issues, not clustering Supports collaborative resource use spanning multiple
organizations Integrates cleanly with intra-domain services Creates a “collective” service layer
CEOS Working Group on Information Systems and Services - 5
Major Data Grid Projects
Earth System Grid (DOE Office of Science) DG technologies, climate applications
European Data Grid (EU) DG technologies & deployment in EU
GriPhyN – Grid Physics Network (NSF ITR) Investigation of “Virtual Data” concept
Particle Physics Data Grid (DOE Science) DG applications for HENP experiments
CEOS Working Group on Information Systems and Services - 6
CEOS Working Group on Information Systems and Services - 7
Basic Data Grid Services
1. GridFTP: Data Transfer and Access Common protocol for data movement
– Secure, efficient, reliable, flexible, extensible, etc.– Grid Forum (Internet) Draft
Family of tools supporting this protocol– Wu-ftpd, ncftp, Globus Toolkit SDKs, etc.
2. Replica Management ArchitectureSimple scheme for managing:
multiple copies of files collections of files
CEOS Working Group on Information Systems and Services - 8
GridFTP: Basic Approach
FTP is defined by several IETF RFCs Start with most commonly used subset
Standard FTP: get/put etc., 3rd-party transfer
Implement standard but often unused features GSS binding, extended directory listing, simple restart
Extend in various ways, while preserving interoperability with existing servers
CEOS Working Group on Information Systems and Services - 9
Features of GridFTP
Grid Security Infrastructure and Kerberos support: Robust and flexible authentication, integrity, and confidentiality
Third-party control of data transfer: user or application at one site initiates, monitors and controls a data transfer between two other sites
Parallel data transfer: On wide-area links, use multiple TCP streams in parallel between the same source and destination
Striped data transfer: Use multiple TCP streams to transfer data that is striped or interleaved across multiple servers
CEOS Working Group on Information Systems and Services - 10
Features of GridFTP (cont.)
Partial file transfer: Standard FTP allows transfer of the remainder of a file starting at an offset. GridFTP supports transfers of arbitrary subsets or regions of a file
Automatic negotiation of TCP buffer/window sizes: optimal settings for TCP buffer/window sizes can dramatically improve performance
Support for reliable and restartable data transfer: FTP standard includes basic features for restart that are not widely implemented. GridFTP exploits these features and extends them.
CEOS Working Group on Information Systems and Services - 11
GridFTP for Efficient WAN Data Transfer
Secure authentication Parallel transfer gets job done quickly Partial file access gets only required data Up to 2.8Gb/s using a striped server
architecture
Parallel TransferFully utilizes bandwidth of
network interface on single nodes.
Striped TransferFully utilizes bandwidth of
Gb+ WAN using multiple nodes.
Par
alle
l F
iles
yste
m
Par
alle
l F
iles
yste
mGridFTP (globus-url-copy)
0
10
20
30
40
50
60
70
80
0 5 10 15 20 25 30 35
# of Parallel Streams
Ban
dw
idth
(M
bs)
CEOS Working Group on Information Systems and Services - 12
Current Data delivery processftp based
Pull – Semi anonymous ftp Product ready Email sent to user with instructions and password User ftp via “anonymous” and with provided password Ftp demon positions user to appropriate directory User pull data
Push – routine data flows to high volume users
Account provided on remote system When data available is pushed to remote system
CEOS Working Group on Information Systems and Services - 13
For routine multiple usage customers Establish “Certificate process” with customer
– Self-signed certificate authority– Customer generates private/public key pair– Generate user certificate with public key– Add user certificate to list of trusted users
Customer must install GridFTP client– Globus toolkit data management client bundle– Gsincftp– Java Commodity Grid Kit for Windows
Potential Future data deliveryGRIDftp based
CEOS Working Group on Information Systems and Services - 14
For routine multiple usage customers Pull –
– Product ready– Email notifies user that data is ready– User using GRIDftp and user certificate for
authentication provided access and pulls data Push –
– Account provided on remote system with host certificate and our user certificate
– These GRID certificate establish Virtual Organization between the two parties
– When data available is GRIDftp used to pushed data to remote system
Potential Future data deliveryGRIDftp based
CEOS Working Group on Information Systems and Services - 15
For single usage customersProcess to
– Establish “Certificate process” with customer– Customer must install GridFTP client
Currently seems too complex (not worth the effort)
Would like to have simplified method such as– Email a one time use “user certificate”– Integrated with browser built in GRIDftp client
Potential Future data deliveryGRIDftp based