Upload
marvin-hancock
View
222
Download
0
Embed Size (px)
Citation preview
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
CLRC e-Science Centre
SRB
Kerstin Kleese -van Dam
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
Special thanks to:
George Kremenek - [email protected] Alasdair Earl - [email protected]
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
Contents
Introduction
Architecture description
What is good
What needs improving
What can it be used for
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
Introduction
More and more information is available today, it can be :
Random Information (e.g. news items)
Scientific Data
Commercial or Administrative Data
Data about Data (metadata describing the content of the actual data)
The information is generally available via/from:
Web-sites, Filesystems, Databases, Tape Libraries or on Paper and other none digital media.
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
Introduction (2)
How do you find the information:
Search Engines, Catalogue Systems or Hard Work (big bucket)
How do you evaluate the information:
Combine, Compare, Present
How do you manage the information:
Preservation, Sharing, Replicating, Transferring, Securing
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
Where does SRB fit into this Scenario?
SRB - the Storage Resource Broker can:
Integrate distributed, heterogeneous storage devices
Make data access transparent for the user
Helps to share, replicate, transfer and preserve data
SRB can not:
Replace metadata catalogues
Provide high level information services
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
How does SRB fit into a Grid Environment?
SRB can used to:
Manage information required internally by Portals
Integrate data across various media
Integrate data across sites
SRB can be used:
For a particular site
In a research collaboration
In a wider Grid community
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
General Facts
Storage Resource Broker - SRB
Developed by the San Diego Supercomputing Centre (SDSC) from the mid 1990’s for the US governments’ National Partnership for Advanced Computational Infrastructure (NPACI).
Initial release 1997
Latest version V1.1.8 - released February 2001
In the US approximately 200TB of data are shared via SRB between 30 participating Universities.
Used by the HPCPortal developed by Mary Thomas group at SDSC.
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
• SDSC Team– Reagan Moore, Arcot Rajasekar, Michael Wan, George
Kremenek, Charlie Coward, Sheau Yen Chen, Roman Olschanowski
• SRB Expertise at SDSC: – Michael Wan (SRB client/server, drivers, srbBrowser)– Arcot Rajasekar (MCAT, DB drivers)– George Kremenek (SRB Client Modules, Security, DAM,
application design)– Charlie Coward – Windows Servers and Browser– Sheau Yen Chen – administration– Roman Olschanowski - testing
The SRB/MCAT Core Team
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
What is SRB?
• SRB is an Intelligent Data Access System
• SRB provides protocol transparency to diverse and distributed
storage systems
• SRB provides location transparency to distributed datasets
• SRB provides access transparency to remote user
• Extends File Systems
• Extends Database Systems
• Extends I/O protocol
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
SRB Access
SRB can be accessed in three ways:
High Level graphical Java interface - SRB Browser
Application Programming interface - SRB API (high and low
level)
Unix shell Command Line Interface - SRB Scommands
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
SRB Concepts(1)
• Provide Scalability (Hosts, Resource Types, Resources, Collections, Data Objects - size and number, Users & Groups)
• Provide Uniform Interfaces (to Resources, Collections and Datasets, authentication across SRB Space)
• Replication of Datasets• Access Control Lists• Ticket-based Access• Authentication and Encryption (text password, encrypted
password, SEA and GSI)• Server-side proxy Operations• Metadata-based Discovery
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
SRB Concepts(2)
• Provide Logical Abstractions – srbSpace - an abstract storage space– Resource Types - resource defined by properties– Resources - resource identified by name and type
• multiple resources tied together as a single resource– Collections - abstraction over directory structure
• distributed & curated– Datasets - identified by properties– Users - authenticated across hosts/networks– Domain - abstraction over physical domains– Metadata Schema/Attributes
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
What is MCAT?
• Cataloging System• Metadata Repository
– Digital Object Metadata• type, format, lineage, usage methods, domain-specific
attributes, collection info, etc– System-level Metadata
• access control, audit trails, location, replication, resource types, user groups, etc
– Schema-level Metadata• ontology, relationships among attributes/schemas,
semantics of attributes, etc• Uniform Access and Federation interface
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
Contents
Introduction
Architecture description
What is good
What needs improving
What can it be used for
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
SRB V1.x Features
• Multi-platform (clients and servers)
– SunOS/Solaris, AIX, Cray C90, SGI, OSX
• API and command line interfaces
• “Low-level” and “high-level” APIs
• Storage systems supported
– Oracle, DB2, Sybase, HPSS, UNIX FS, W2000/NT FS,
• Support for distributed servers, GSI authentication, password encryption
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
The Storage Resource Broker
Application(SRB client)
SRB Server
Distributed Storage Resources(database systems, archival storage systems, file systems, ftp)
MCAT
DB2, Oracle, Sybase, ObjectStore HPSS, UniTree UNIX, ftp
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
How does SRB work?
The SRB Server spawns SRB Agent to authenticates the User/Application (SRB Client) by comparing it with information stored in MCAT
Find file location in MCAT
Check user request against permissions stored in MCAT
SRB Agent contacts user with the result of his/her request
The SRB Agent communicates with the user through a port specific to this client session, it can handle one or more requests from the client.
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
SRBMaster
SRB agents
Application
MCAT
(Host, port)(port)
The SRB Process Model
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
How does SRB handle remote Data Access?
Steps 1-3 are the same as in the simple case - Spawn SRB Agent on local Machine Authenticate, Check User Request, Locate File
SRB Agent contacts remote SRB Agent via SRB Server on the remote Machine where the data is stored
The second SRB Agent returns the pointer to the data item to the first SRB Agent, which passes it on to the user
The SRB Client can then interact with the data item directly (as described before, however all communication still runs via the first SRB Agent and the Machine it is situated on
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
Remote SRB Operation
SRBserver
SRB agent
SRBserver
MCAT
Application
SRB agent
1
2
34
6
5
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
SRB Space
The SRB Space consists of:
• A number of SRB Servers (possibly across multiple sites)
• Many heterogeneous Storage Resources linked to SRB Servers via SRB Media Drivers
• One MCAT System
• Many Users
The SRB Space provides a single view on all the data within the Space.
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
SRB Space
DR
DR DR
DR
DR
DR
DL DL
DL
DL
MC
DR - Data RepositoryDL - Dig LibraryMC - Meta CatalogCP - Comp Process/ SRB Client
MC
CP
CP
CP
CP
CP
CP
CP
CPCP
SRB
SRB
SRB
SRBSRB
SRB
SRB
SRBSRB
SRB
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
MCAT: Metadata Catalog
• Stores metadata about
– Users, Data sets, Resources, Methods
• Provides “collection” abstraction
• Stores detailed access control information
• Maintains audit trail information on data sets
• Implemented as a relational database with referential integrity constraints (currently uses Oracle, DB2 , Sybase)
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
MCAT Interface Functions
MAPSto
SchemaConvertor
DynamicQuery
Generator
Answer Extractor
&Cursor Control
MAPSInitialization
DB2Query System
Schemato
MAPSConvertor
OracleQuery System
MAPSSemantics
SchemaSemantics
SchemaInitialization
MCAT Architecture
MCAT
CAT-1 CAT-2
CATALOG
CATALOG CATALOG
External CATALOGInterface
CATALOG
Internal CatalogsLocal
InterfaceLocal
Interface
Local Routines
Semantics &Definitions
Local Routines
Semantics &Definitions
Semantics &Definitions Local Routines
MAPS
Federated Catalog Architecture
MAPS Interface
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
New MCAT Features
• Meta-Schema to hold System and User meta data schema information
• Extensible meta data schema
• Distributed meta data schema
• Metadata exchange Interface Protocol
– MAPS- Metadata Attribute Presentation Structure
• query, update and result structures
• Close to Z39.50
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
New MCAT Features (contd.)
• Core Schema Implemented
– MCAT Core - Data, Resources, Users and Methods
– Dublin Core
– IV Core - Image Visualization attributes
• Web-based Prototype User Interface
– extensible schema functions
– query,, insert and update of meta data
– integrated presentation of meta data and data
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
SRB Data Replication Support
• Replication via Resource SetResource Set definition
• Replication support integrated into write function
• srbObjReplicate API can be used for post facto replication
• Synchronous replication across all sites. Can choose any k out of n
• Can choose specific replica on read operation
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
Data Replication Example
SRB
MCAT
NWS
NCSA
Oracle
SRB
HPSSHPSS DB2 Unix
SRBSDSC
Caltech
LogRsrc1 LogRsrc2
ApplicationSAIC
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
Ticket-based Access Control
• Owner can request ticket for a data set
• Ticket can be issued for a data set or a collection
• Ticket controls access by
– time-period (start and expire timestamps)
– number of access (count)
– user names ( any, single or group users)
• Non-registered Users can also access using tickets
• Useful for sharing data and access through the web
• Tickets generated and stored in MCAT
• Currently supports read-only tickets
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
SRB API
• Programmatic API
– High-level API
– Low-level API
– SRB Manager API
• Command Level Interface - Scommands
• Graphical User Interface - srbBrowser
• Web Utilities
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
SRB API Interface
Application
MCAT
SRB Master
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
High & Low-level API
• Low-level API – talks to resource drivers– no registration of data sets in MCAT– no authentication through MCAT– User provides all information
• High-level API– Uses low-level API to access resources– Registers data management information in MCAT– Uses MCAT for authentication and meta information– Uses MCAT for resource and data discovery– Access/store data in remote SRB
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
System Manager API
• srbChkMdasAuth(conn, userName, userAuth, domain)
• srbChkMdasSysAuth(conn, userName, userAuth, domain)
• srbRegisterUser(conn, userName, domain, password, userType, userAddress, userPhone, userEmail)
• srbRegisterUserGrp(conn, userGrpName, userGrpPassword, userGrpType, userGrpAddress, userGrpPhone,
userGrpEmail)
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
srbBrowser - A SRB Graphical Interface
Windows or Java GUI
MCAT
USER
Obtain user’s metadata informationvia SRB.
Invoke SRB operations
SRB Agent
Proxy operation
• A java GUI
• Interface with SRB servers using the client API library.
• Performs most SRB operations - cp, replicate, import, export, metadata query, etc.
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
SRB Command Line Interface
MCAT
USER
SRB Agent
SRB “shell” commands: Sls, Scp, Scat, Sput, Sget, ...
Proxy operation
Environment File
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
Scommands
• Sinit - initialize S-environment• Sexit - clean up • Sman - get manpage for Scommand• Scat - display srbObject on screen• Sput - copy local file into srbSpace• Sget - copy srbObject to local space• Sappend - append to srbObject• Srename - change srbObject name• Srm - remove srbObject• Schmod - change/grant access to
srbObject
• Scd - change collection• Spwd - display current collection• Sls - list collection• Smkdir - make new collection• Srmdir - remove old collection
• SgetD - get srbObject information• SgetR - get resource information• SgetU - get user information• SmodD - modify srbObject info• SmodU - modify user info• Stoken - get native type information
• Scopy - copy srbObject in another collection and under another name
• Sreplicate - clone object in new resource - same internal id
• Smove - move srbObject to new collection or resource
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
Scommands (contd …)
• ingestUser - adding a new user or group
• ingestResource - adding a new resource
• ingestLogicalResource - making a new resource grouping
• addLogicalResource - adding to a resource grouping
• ingetLocation - adding new location information
• ingestToken - adding new native types (eg. resourceType, objectType, userType,
domainName, ActionType, . . .)
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
Scommands
• Sls– Sls [-h] [-L number] [-Y number] [-r|-f] [collection ...]– Sls [-L number] [-Y number] srbObj …
• Sput– Sput [-p] [-D dataType] [-R resourceName] [-P
pathName] localFileName ... TargetName– Sput [-p] [-D dataType] [-R resourceName] [-P pathName] -
i TargetName• Sget
– Sget [-C_n ] [-p] srbObj ... localFile• Sreplicate
– Sreplicate [-Cn] [-p] [-R resourceName] [-P pathName] srbObj ...
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
SRBIO
• Open• creat• read• write• close• lseek• fopen• fread• fwrite• fclose• fseek
• fflush• fgetc• fgets• fputc• fputs• getc• putc• ungetc• rewind• vfprintf• fprintf • fscanf
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
Contents
Introduction
Architecture description
What is good
What needs improving
What can it be used for
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
Useful features
Easy interfaces to access data held in SRB
Transparent access independent of location or type
Support for replication of data
Support for logical structuring of data
Database support to locate data
Ticket system
Enhanced access right structure
Modular SRB Media Drivers
Useful to users and system administrators
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
Contents
Introduction
Architecture description
What is good
What needs improving
What can it be used for
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
Current Obstacles
Only one MCAT catalogue - single point of failure, performance, ownership
All MCAT metadata is visible to everyone
Data Access at remote sites - two many interim steps
Documentation not up-to-date
Installation not straight forward - patches needed, dependent on other software
Licence required
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
Contents
Introduction
Architecture description
What is good
What needs improving
What can it be used for
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
Grid Applications within CLRC
Various Portals to access experimental, data and computing facilities within CLRC and outside.
Issues:
Data held widely distributed across the site and in community owned facilities
Data required where it is not stored
Data located through service that is not local to data holding
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
Planned Structure of CLRC - Services
DataPortal
Local
Archives
Remote
Archives
HPCPortal
Remote
systems
Local
systems
Computing
Applications
Experimental
Facilities
Problem Solving EnvironmentsCLRC Authentication
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
Integrated Solution for Earth Science
Data Storage
Disk
Tape
DataPortal RasDaMan
BADC Catalogue
ESUser
Application
SRB
HPCPortal
HPC
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
General CLRC DataPortal Architecture
CLRC DataPortal
Server
XML wrapper
Common metadata catalogue database
Local data
Local metadata
XML wrapper
Facility 1
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
Server Architecture
User input interpreter
pre-set XSL
ScriptQuery
Generator
USER
Central metadata repository
XML File
XML Parser
Key:
Internal
http
Ascii file
External agent
module
User output generator
Response Generator
Wrapper forother
Catalogues
XML File
XML Schema
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
Architecture for integrating existing Catalogues
SQL input translator
DataPortal Server
Local Metadata Catalogue
External agent
XML output generator
Response Generator
RasDaMan SRB
XML Wrapper
Request file(s)
Internal ANSI or RAS SQL
Key:
Internal
module
Http XML
Http SQL
Key:
Internal
module
Http XML
Http SQL
Internal SQL
External agent
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
Local Integration of SRB
BADC RasDaMan DataPortal
MCATSRB Server
SRB Agent SRB Agent SRB Agent SRB Agent
DB2, Oracle, Illustra, ObjectStore - HPSS, Unitree - Unix, ftp
Key:
module
Internal In
External agent
Internal two-way
External two-way
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
Remote Integration of SRB
BADC RasDaMan
DataPortal
MCATSRB Server
SRB Agent
SRB Agent
SRB Agent
EPCC CSAR
HPCPortal
MCATSRB Server
SRB Agent
SRB Agent
SRB Agent
User
Job submission
Locating Data
Data location Data itself
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
Conclusions
SRB is a useful tool in the GRID context:
It has many plus points
But there is still a lot to do
There is nothing comparable out there!
CLRC - e-Science Centre, Kerstin Kleese - van Dam and SDSC, George Kremenek
Where can you get more information?
For a SRB license send mail to:
For general information see the UK Grid Support Centre:
http://www.grid-support.ac.uk/
For specific questions register with the Centre:
http://www.grid-support.ac.uk/form.html
For information on e-science research within CLRC see the CLRC e-Science Centre:
http://www.e-science.clrc.ac.uk/