Upload
athlonin
View
221
Download
0
Embed Size (px)
Citation preview
8/14/2019 Ticer Summer School 24Aug06
1/73
TICER Summer School, August 24th 2006 1
Ticer Summer School
Thursday 24th August 2006
Dave Berry & Malcolm Atkinson
National e-Science Centre, Edinburgh
www.nesc.ac.uk
8/14/2019 Ticer Summer School 24Aug06
2/73
TICER Summer School, August 24th 2006 2
Digital Libraries, Grids & E-ScienceDigital Libraries, Grids & E-Science
What is E-Science?
What is Grid Computing?
Data Grids
Requirements Examples
Technologies
Data Virtualisation
The Open Grid Services Architecture
Challenges
8/14/2019 Ticer Summer School 24Aug06
3/73
TICER Summer School, August 24th 2006 3
8/14/2019 Ticer Summer School 24Aug06
4/73
TICER Summer School, August 24th 2006 4
What is e-Science?What is e-Science?
Goal: to enable better research in alldisciplines
Method: Develop collaboration supported by
advanced distributed computation
to generate, curate and analyse rich data resources
From experiments, observations, simulations & publications Quality management, preservation and reliable evidence
to develop and explore models and simulations
Computation and data at all scales
Trustworthy, economic, timely and relevant results to enable dynamicdistributed collaboration
Facilitating collaboration with information and resource sharing
Security, trust, reliability, accountability, manageability and agility
8/14/2019 Ticer Summer School 24Aug06
5/73
prediction
8/14/2019 Ticer Summer School 24Aug06
6/73
6Courtesy of David Gavaghan &IB Team
Integrative Biology
Tackling two Grand Challenge researchquestions:
What causes heart disease?
How does a cancer form and grow?
Together these diseases cause 61% of all UKdeaths
Buildinga powerful, fault-tolerant Gridinfrastructure for biomedical science
Enabling biomedicalresearchers to usedistributed resources such as high-performancecomputers, databases and visualisationtools todevelop coupled multi-scale models of howthese killer diseases develop.
8/14/2019 Ticer Summer School 24Aug06
7/73
BBiomedicaliomedical RResearchesearch IInformaticsnformatics DDelivered byelivered by GGridrid
EEnablednabled SServiceservices
GlasgowEdinburgh
LeicesterOxford
London
Netherlands
Publically Curated Data
Privatedata
Privatedata
Privatedata
Privatedata
Privatedata
Privatedata
CFG Virtual
OrganisationEnsembl
MGI
HUGO
OMIM
SWISS-PROT
DATAHUB
RGD
Synteny
Grid
Service
blast
Portal
http://www.brc.dcs.gla.ac.uk/projects/bridges/
8/14/2019 Ticer Summer School 24Aug06
8/73TICER Summer School, August 24th 2006 8
eDiaMoND: Screening for Breast CancereDiaMoND: Screening for Breast Cancer
1 Trust Many TrustsCollaborative Working
Audit capability
Epidemiology
Other Modalities
MRIPET
Ultrasound
Better access to
Case information
And digital tools
Supplement Mentoring
With access to digital
Training cases and sharin
Of information across
clinics
Letters
Radiology reporting
systems
eDiaMoND
Grid
2ndary Capture
Or FFD
Case Information
X-Rays and
Case Information
Digital
Reading
SMF
Case and
Reading Information
CAD Temporal Comparison
Screening
Electronic
Patient Records
Assessment/ Symptomatic
Biopsy
Case and
Reading Information
Symptomatic/Assessment
Information
Training
Manage Training Cases
Perform Training
SMF CAD 3D Images
Patients
Provided by eDiamond project: Prof. Sir Mike Brady et al.
8/14/2019 Ticer Summer School 24Aug06
9/73TICER Summer School, August 24th 2006 9
E-Science Data ResourcesE-Science Data Resources
Curated databases
Public, institutional, group, personal
Online journals and preprints
Text mining and indexing services
Raw storage (disk & tape)
Replicated files
Persistent archives Registries
8/14/2019 Ticer Summer School 24Aug06
10/73
TICER
10
EBank
Slide
from
Jeremy
Frey
8/14/2019 Ticer Summer School 24Aug06
11/73
TICER
11
Biomedical data making
connections
12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat
ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag
tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct
cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg
ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt
gcctgttttt ttttaattgg
Slide provided by Carole Goble: University of Manchester
8/14/2019 Ticer Summer School 24Aug06
12/73TICER Summer School, August 24th 2006 12
Using Workflows to Link ServicesUsing Workflows to Link Services
Describe the steps in a Scripting Language
Steps performed by Workflow Enactment Engine
Many languages in use
Trade off: familiarity & availability
Trade off: detailed control versus abstraction Incrementally develop correct process
Sharable & Editable
Basis for scientific communication & validation
Valuable IPR asset
Repetition is now easy
Parameterised explicitly & implicitly
8/14/2019 Ticer Summer School 24Aug06
13/73TICER Summer School, August 24th 2006 13
Workflow SystemsWorkflow Systems
BIRN, GEON & SEEK
http://kepler-project.org/
KeplerKepler
High-level abstract formulation of workflows,automated mapping towards executable forms,
cached result re-use
Chimera &DAGman
VDT /Pegasus
EBI, OMII-UK & MyGridhttp://taverna.sourceforge.net/index.php
ScuflTaverna
OASIS standard for industry coordinating use of
multiple Web Services low level detail - tools
BPEL
Enactment
BPEL
Popular target because JVM ubiquity similar
dependence distribution has to be coded
JVMJava
Popular in bioinformatics. Similar context
dependence distribution has to be coded
Perl
runtime
Perl
Common but not often thought of as WF. Depend
on context, e.g. NFS across all sites
Shell + OSShell
scripts
CommentsWF Enact.Language
http://taverna.sourceforge.net/index.phphttp://taverna.sourceforge.net/index.php8/14/2019 Ticer Summer School 24Aug06
14/73
TICER
14
Workflow example
Taverna in MyGrid http://www.mygrid.org.uk/
allows the e-Scientist to describe and enact theirexperimental processes in a structured, repeatableand verifiable way
GUI
Workflowlanguage Enactment
engine
http://www.mygrid.org.uk/http://www.mygrid.org.uk/8/14/2019 Ticer Summer School 24Aug06
15/73
TICER
15
Pub/Sub for Laboratory datausing a broker and ultimately
delivered over GPRS
Notification
Comb-e-chem: Jeremy Frey
8/14/2019 Ticer Summer School 24Aug06
16/73TICER Summer School, August 24th 2006 16
Relevance to Digital LibrariesRelevance to Digital Libraries
Similar concerns
Data curation & management
Metadata, discovery
Secure access (AAA +)
Provenance & data quality
Local autonomy
Availability, resilience
Common technology Grid as an implementation technology
8/14/2019 Ticer Summer School 24Aug06
17/73TICER Summer School, August 24th 2006 17
8/14/2019 Ticer Summer School 24Aug06
18/73TICER Summer School, August 24th 2006 18
What is a Grid?
License
Printer
A grid is a system consisting of
Distributed but connected resources and Software and/or hardware that provides and manages logically
seamless access to those resources to meet desired objectives
R2AD
Database
Web
server
Data CenterCluster
Handheld Supercomputer
Workstation
Server
Source: Hiro Kishimoto GGF17 Keynote May 2006
8/14/2019 Ticer Summer School 24Aug06
19/73TICER Summer School, August 24th 2006 19
Virtualizing Resources
Resources
Web
services
Access
Storage Sensors Applications InformationComputers
Resource-specific Interfaces
Common Interfaces
Type-specific interfaces
Hiro Kishimoto: Keynote GGF17
8/14/2019 Ticer Summer School 24Aug06
20/73
TICER Summer School, August 24th 2006 20
Ideas and FormsIdeas and Forms
Key ideas
Virtualised resources Secure access
Local autonomy
Many forms Cycle stealing
Linked supercomputers
Distributed file systems
Federated databases
Commercial data centres
Utility computing
8/14/2019 Ticer Summer School 24Aug06
21/73TICER Summer School, August 24th 2006 21
Grid Middleware
Virtualizedresources
Gridmiddleware
services
BrokeringService
Registry
Service
Data
Service
CPU
Resource
Printer
Service
Job-Submit
Service
Compute
Service
Notify
Advertise
Application
Service
Hiro Kishimoto: Keynote GGF17
8/14/2019 Ticer Summer School 24Aug06
22/73
TICER Summer School, August 24th 2006 22
Key Drivers for GridsKey Drivers for Grids
Collaboration
Expertise is distributed Resources (data, software licences) are location-specific
Necessary to achieve critical mass of effort
Necessary to raise sufficient resources
Computational Power Rapid growth in number of processors
Powered by Moores law + device roadmap
Challenge to transform models to exploit this
Deluge of Data Growth in scale: Number and Size of resources
Growth in complexity
Policy drives greater data availability
8/14/2019 Ticer Summer School 24Aug06
23/73
TICER Summer School, August 24th 2006 23
Minimum Grid FunctionalitiesMinimum Grid Functionalities
Supports distributedcomputation
Data and computation Over a varietyof
hardware components (servers, data stores, )
Software components (services: resource managers,
computation and data services)
With regularitythat can be exploited By applications
By other middleware & tools
By providers and operations
It will normally have securitymechanisms To develop and sustain trust regimes
8/14/2019 Ticer Summer School 24Aug06
24/73
TICER Summer School, August 24th 2006 24Source: Hiro Kishimoto GGF17 Keynote May 2006
Grid & Related Paradigms
Utility Computing Computing services No knowledge of provider
Enabled by grid technology
Distributed Computing Loosely coupled Heterogeneous Single Administration
Cluster Tightly coupled Homogeneous Cooperative working
Grid Computing Large scale
Cross-organizational Geographical distribution Distributed Management
8/14/2019 Ticer Summer School 24Aug06
25/73
TICER Summer School, August 24th 2006 25
8/14/2019 Ticer Summer School 24Aug06
26/73
TICER Summer School, August 24th 2006 26
Why use / build Grids?Why use / build Grids?
Research Arguments
Enables new ways of working
New distributed & collaborative research
Unprecedented scale and resources
Economic Arguments Reduced system management costs
Shared resources better utilisation
Pooled resources increased capacity
Load sharing & utility computing
Cheaper disaster recovery
8/14/2019 Ticer Summer School 24Aug06
27/73
TICER Summer School, August 24th 2006 27
Why use / build Grids?Why use / build Grids?
Operational Arguments
Enable autonomous organisations to Write complementary software components
Set up run & use complementary services
Share operational responsibility General & consistent environment for
Abstraction, Automation, Optimisation & Tools
Political & Management Arguments
Stimulate innovation
Promote intra-organisation collaboration
Promote inter-enterprise collaboration
8/14/2019 Ticer Summer School 24Aug06
28/73
TICER Summer School, August 24th 2006 28
Grids In Use: E-Science Examples
Data sharing and integration
Life sciences, sharing standard data-sets,combining collaborative data-sets
Medical informatics, integrating hospital informationsystems for better care and better science
Sciences, high-energy physics
Capability computing Life sciences, molecular modeling, tomography
Engineering, materials science
Sciences, astronomy, physics
High-throughput, capacity computing for Life sciences: BLAST, CHARMM, drug screening
Engineering: aircraft design, materials, biomedical
Sciences: high-energy physics, economic modeling
Simulation-based science and engineering Earthquake simulation
Source: Hiro Kishimoto GGF17 Keynote May 2006
8/14/2019 Ticer Summer School 24Aug06
29/73
TICER Summer School, August 24th 2006 29
Database GrowthDatabase Growth
8/14/2019 Ticer Summer School 24Aug06
30/73
PDB 33,367 Protein structuresEMBL DB 111,416,302,701 nucleotides
Database GrowthDatabase Growth
Slide provided by Richard Baldock: MRC HGU Edinburgh
8/14/2019 Ticer Summer School 24Aug06
31/73
TICER Summer School, August 24th 2006 31
Requirements: Users viewpointRequirements: Users viewpoint
Find Data
Registries & Human communication Understand data
Metadata description, Standard / familiar formats &representations, Standard value systems & ontologies
Data Access Find how to interact with data resource
Obtain permission (authority)
Make connection
Make selection Move Data
In bulk or streamed (in increments)
8/14/2019 Ticer Summer School 24Aug06
32/73
TICER Summer School, August 24th 2006 32
Requirements: Users viewpoint 2Requirements: Users viewpoint 2
Transform Data
To format, organisation & representationrequired for computation or integration
Combine data
Standard database operations + operations relevant to
the application model
Present results
To humans: data movement + transform for viewing
To application code: data movement + transform to therequired format
To standard analysis tools, e.g. R
To standard visualisation tools, e.g. Spitfire
8/14/2019 Ticer Summer School 24Aug06
33/73
TICER Summer School, August 24th 2006 33
Requirements: Owners viewpointRequirements: Owners viewpoint
Create Data
Automated generation, Accession Policies, Metadatageneration
Storage Resources
Preserve Data
Archiving
Replication
Metadata
Protection Provide Services with available resources
Definition & implementation: costs & stability
Resources: storage, compute & bandwidth
8/14/2019 Ticer Summer School 24Aug06
34/73
TICER Summer School, August 24th 2006 34
Requirements: Owners viewpoint 2Requirements: Owners viewpoint 2
Protect Services
Authentication, Authorisation, Accounting, Audit Reputation
Protect data Comply with owner requirements encryption for privacy,
Monitor and Control use
Detect and handle failures, attacks, misbehaving users
Plan for future loads and services
Establish case for Continuation Usage statistics
Discoveries enabled
8/14/2019 Ticer Summer School 24Aug06
35/73
TICER Summer School, August 24th 2006 35
8/14/2019 Ticer Summer School 24Aug06
36/73
TICER Summer School, August 24th 2006 36
Large Hadron ColliderLarge Hadron Collider
The most powerful
instrument ever built toinvestigate elementaryparticle physics
Data Challenge: 10 Petabytes/year of data 20 million CDs each year!
Simulation, reconstruction,
analysis: LHC data handling requires
computing power equivalentto ~100,000 of today's fastestPC processors
8/14/2019 Ticer Summer School 24Aug06
37/73
TICER Summer School, August 24th 2006 37
Composing Observations in AstronomyComposing Observations in Astronomy
Data and images courtesy Alex Szalay, John Hopkins
No. & sizes of data sets as of mid-2002,
grouped by wavelength
12 waveband coverage of largeareas of the sky Total about 200 TB data Doubling every 12 months Largest catalogues near 1B objects
8/14/2019 Ticer Summer School 24Aug06
38/73
8/14/2019 Ticer Summer School 24Aug06
39/73
8/14/2019 Ticer Summer School 24Aug06
40/73
discoveryuse
Gl b l I fli ht E i Di ti
8/14/2019 Ticer Summer School 24Aug06
41/73
Global In-flight Engine DiagnosticsGlobal In-flight Engine Diagnostics
in-flight data
airline
maintenance centre
groundstation
global networkeg SITA
internet, e-mail, pager
DS&S Engine Health Center
data centre
Distributed Aircraft Maintenance Environment: Leeds, Oxford, Sheffield &York, Jim Austin
100,000 aircraft
0.5 GB/flight
4 flights/day
200 TB/day
Now BROADEN
Significant ingetting Boeing787 enginecontract
8/14/2019 Ticer Summer School 24Aug06
42/73
TICER Summer School, August 24th 2006 42
St R M (SRM)St R M (SRM)
8/14/2019 Ticer Summer School 24Aug06
43/73
TICER Summer School, August 24th 2006 43
Storage Resource Manager (SRM)Storage Resource Manager (SRM)
http://sdm.lbl.gov/srm-wg/
de facto & written standard in physics, Collaborative effort
CERN, FNAL, JLAB, LBNL and RAL
Essential bulk file storage
(pre) allocation of storage abstraction over storage systems
File delivery / registration / access
Data movement interfaces
E.g. gridFTP
Rich function set Space management, permissions, directory, data transfer
& discovery
St R B k (SRB)St R B k (SRB)
http://sdm.lbl.gov/srm-wg/http://sdm.lbl.gov/srm-wg/collaboration.htmlhttp://sdm.lbl.gov/srm-wg/collaboration.htmlhttp://sdm.lbl.gov/srm-wg/8/14/2019 Ticer Summer School 24Aug06
44/73
TICER Summer School, August 24th 2006 44
Storage Resource Broker (SRB)Storage Resource Broker (SRB)
http://www.sdsc.edu/srb/index.php/Main_Page
SDSC developed Widely used
Archival document storage
Scientific data: bio-sciences, medicine, geo-sciences,
Manages Storage resource allocation
abstraction over storage systems
File storage
Collections of files Metadata describing files, collections, etc.
Data transfer services
C d D t M tC d D t M t
http://www.sdsc.edu/srb/index.php/Main_Pagehttp://www.sdsc.edu/srb/index.php/Main_Page8/14/2019 Ticer Summer School 24Aug06
45/73
TICER Summer School, August 24th 2006 45
Condor Data ManagementCondor Data Management
Stork
Manages File Transfers
May manage reservations
Nest
Manages Data Storage
C.f. GridFTP with reservations
Over multiple protocols
8/14/2019 Ticer Summer School 24Aug06
46/73
TICER Summer School, August 24th 2006 46
Globus Tools and Servicesfor Data Management
q GridFTPx A secure, robust, efficient data transfer protocol
q The Reliable File Transfer Service (RFT)x Web services-based, stores state about transfers
q The Data Access and Integration Service (OGSA-DAI)x Service to access to data resources, particularly relational and
XML databases
q The Replica Location Service (RLS)
x Distributed registry that records locations of data copies
q The Data Replication Servicex Web services-based, combines data replication and
registration functionality
Slides from Ann Chervenak
8/14/2019 Ticer Summer School 24Aug06
47/73
TICER Summer School, August 24th 2006 47
RLS in Production Use: LIGO
q Laser Interferometer Gravitational Wave Observatory
Currently use RLS servers at 10 sites
x Contain mappings from 6 million logical files to over 40
million physical replicas
q Used in customized data management system: the
LIGO Lightweight Data Replicator System (LDR)
x Includes RLS, GridFTP, custom metadata catalog, tools for
storage management and data validation
Slides from Ann Chervenak
8/14/2019 Ticer Summer School 24Aug06
48/73
TICER Summer School, August 24th 2006 48
RLS in Production Use: ESG
q
Earth System Grid: Climatemodeling data (CCSM, PCM,IPCC)
q RLS at 4 sitesq Data management
coordinated by ESG portalq Datasets stored at NCAR
x 64.41 TB in 397253 total filesx 1230 portal users
q IPCC Data at LLNLx 26.50 TB in 59,300 filesx 400 registered usersx Data downloaded: 56.80 TB
in 263,800 filesx Avg. 300GB downloaded/dayx 200+ research papers being
writtenSlides from Ann Chervenak
gLite Data Management
8/14/2019 Ticer Summer School 24Aug06
49/73
TICER Summer School, August 24th 20062nd EGEE 49
Enabling Grids for E-sciencE
INFSO-RI-508833
gLite Data Management
FTS
File Transfer Service LFC
Logical file catalogue
Replication Service
Accessed through LFC AMGA
Metadata services
Data Management Services
8/14/2019 Ticer Summer School 24Aug06
50/73
TICER Summer School, August 24th 20062nd EGEE 50
Enabling Grids for E-sciencE
INFSO-RI-508833
Data Management Services
FiReMan catalog Resolves logical filenames (LFN) to physical location of files and storage elements Oracle and MySQL versions available
Secure services Attribute support Symbolic link support Deployed on the Pre-Production Service and DILIGENT testbed
gLite I/O Posix-like access to Grid files Castor, dCache and DPM support Has been used for the BioMedical Demo Deployed on the Pre-Production Service and the DILIGENT testbed
AMGA MetaData Catalog Used by the LHCb experiment Has been used for the BioMedical Demo
MedicalData Management3
EnablingGrids forE-sciencE
ClientClient
Medical Data Management
Application
MDM ClientLibraryMDM ClientLibrary
Grid CatalogsGrid Catalogs
MetadataMetadata
Catalog (AMGA)Catalog (AMGA)
Medical
Imager
EncryptionEncryption
KeystoreKeystore (Hydra)(Hydra)
File CatalogFile Catalog
(Fireman)(Fireman)
SRM DICOMSRM DICOM
MDM TriggerMDM Trigger
GridFTPGridFTP
gLitegLite I/OI/O
Trigger:
Retrieve DICOMfilesfromimager.
Register filein
FiremangLiteEDS client:Generateencryption
keysand store theminHydra
Register MetadatainAMGA
ClientLibrary:
Lookup filethroughMetadata(AMGA)
Use gLiteEDS client:
Retrievefile throughgLiteI/O
Retrieveencryption KeyfromHydra
Decrypt data
Serveituptotheapplication
File Transfer Service
https://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppt8/14/2019 Ticer Summer School 24Aug06
51/73
TICER Summer School, August 24th 20062nd EGEE 51
Enabling Grids for E-sciencE
INFSO-RI-508833
File Transfer Service
Reliable file transfer
Full scalable implementation
Java Web Service front-end, C++ Agents, Oracle or MySQL database support Support for Channel, Site and VO management
Interfaces for management and statistics monitoring
Gsiftp, SRM and SRM-copy support
Support for MySQL and Oracle
Multi-VO support
GridFTP and SRM copy support
Commercial SolutionsCommercial Solutions
8/14/2019 Ticer Summer School 24Aug06
52/73
TICER Summer School, August 24th 2006 52
Commercial SolutionsCommercial Solutions
Vendors include:
Avaki Data Synapse
Benefits & costs
Well packaged and documented
Support
Can be expensive
But look for academic rates
8/14/2019 Ticer Summer School 24Aug06
53/73
TICER Summer School, August 24th 2006 53
Data Integration StrategiesData Integration Strategies
8/14/2019 Ticer Summer School 24Aug06
54/73
TICER Summer School, August 24th 2006 54
Data Integration StrategiesData Integration Strategies
Use a Service provided by a Data Owner
Use a scripted workflow Use data virtualisation services
Arrange that multiple data services have commonproperties
Arrange federations of these
Arrange access presenting the commonproperties
Expose the important differences Support integration accommodating those
differences
Data Virtualisation ServicesData Virtualisation Services
8/14/2019 Ticer Summer School 24Aug06
55/73
TICER Summer School, August 24th 2006 55
Data Virtualisation ServicesData Virtualisation Services
Form a federation Set of data resources incremental addition
Registration & description of collected resources Warehouse data or access dynamically to obtain updated data Virtual data warehouses automating division between collection and
dynamic access
Describe relevant relationships between data sources
Incremental description + refinement / correction Run jobs, queries & workflows against combined set of data
resources Automated distribution & transformation
Example systems
IBMs Information Integrator GEON, BIRN & SEEK OGSA-DAI is an extensible framework for building such systems
Virtualisation variationsVirtualisation variations
8/14/2019 Ticer Summer School 24Aug06
56/73
TICER Summer School, August 24th 2006 56
Virtualisation variationsVirtualisation variations
Extent to which homogeneity obtained
Regular representation choices e.g. units Consistent ontologies
Consistent data model
Consistent schema integrated super-schema DB operations supported across federation
Ease of adding federation elements
Ease of accommodating change as federationmembers change their schema and policies
Drill through to primary forms supported
OGSA-DAIOGSA-DAI
8/14/2019 Ticer Summer School 24Aug06
57/73
TICER Summer School, August 24th 2006 57
OGSA-DAIOGSA-DAI
http://www.ogsadai.org.uk
A framework for data virtualisation Wide use in e-Science BRIDGES, GEON, CaBiG, GeneGrid, MyGrid,
BioSimGrid, e-Diamond, IU RGRBench,
Collaborative effort NeSC, EPCC, IBM, Oracle, Manchester, Newcastle
Querying of data resources Relational databases XML databases
Structured flat files Extensible activity documents
Customisation for particular applications
http://www.ogsadai.org.uk/http://www.ogsadai.org.uk/8/14/2019 Ticer Summer School 24Aug06
58/73
TICER Summer School, August 24th 2006 58
The Open Grid Services Architecture
8/14/2019 Ticer Summer School 24Aug06
59/73
TICER Summer School, August 24th 2006 59
The Open Grid Services Architecture
An open, service-oriented architecture (SOA) Resources as first-class entities
Dynamic service/resource creation and destruction
Built on a Web services infrastructure
Resource virtualization at the core
Build grids from small number of standards-basedcomponents Replaceable, coarse-grained
e.g. brokers
Customizable Support for dynamic, domain-specific content within the same standardized framework
Hiro Kishimoto: Keynote GGF17
OGSA Capabilities
8/14/2019 Ticer Summer School 24Aug06
60/73
TICER Summer School, August 24th 2006 60
OGSA Capabilities
Security Cross-organizational users Trust nobody
Authorized access only
Information Services Registry Notification
Logging/auditing
Execution Management
Job description & submission Scheduling Resource provisioning
Data Services
Common access facilities Efficient & reliable transport Replication services
Self-Management Self-configuration
Self-optimization Self-healing
Resource Management Discovery
Monitoring ControlOGSA
OGSA profiles
Web services foundation
Hiro Kishimoto: Keynote GGF17
Basic Data Interfaces
8/14/2019 Ticer Summer School 24Aug06
61/73
TICER Summer School, August 24th 2006 61
Basic Data Interfaces
Storage Management e.g. Storage Resource
Management (SRM)
Data Access
ByteIO Data Access & Integration
(DAI)
Data Transfer
Data Movement InterfaceSpecification (DMIS)
Protocols (e.g. GridFTP)
Replica management
Metadata catalog
Cache management
Hiro Kishimoto: Keynote GGF17
8/14/2019 Ticer Summer School 24Aug06
62/73
TICER Summer School, August 24th 2006 62
The State of the ArtThe State of the Art
8/14/2019 Ticer Summer School 24Aug06
63/73
TICER Summer School, August 24th 2006 63
The State of the ArtThe State of the Art
Many successful Grid & E-Science projects
A few examples shown in this talk
Many Grid systems
All largely incompatible
Interoperation talks under way
Standardisation efforts
Mainly via the Open Grid Forum
A merger of the GGF & EGA
Significant user investment required
Few out of the box solutions
Technical ChallengesTechnical Challenges
8/14/2019 Ticer Summer School 24Aug06
64/73
TICER Summer School, August 24th 2006 64
Technical ChallengesTechnical Challenges
Issues you cant avoid
Lack of Complete Knowledge (LOCK) Latency
Heterogeneity
Autonomy
Unreliability
Scalability
Change
A Challenging goal balance technical feasibility
against virtual homogeneity, stability and reliability
while remaining affordable, manageable and maintainable
Areas In DevelopmentAreas In Development
8/14/2019 Ticer Summer School 24Aug06
65/73
TICER Summer School, August 24th 2006 65
Areas In DevelopmentAreas In Development
Data provenance
Quality of Service Service Level Agreements
Resource brokering
Across all resources
Workflow scheduling
Co-sheduling
Licence management
Software provisioning
Deployment and update
Other areas too!
Operational ChallengesOperational Challenges
8/14/2019 Ticer Summer School 24Aug06
66/73
TICER Summer School, August 24th 2006 66
Operational Challengesp g
Management of distributed systems
With local autonomy Deployment, testing & monitoring
User training
User support Rollout of upgrades
Security
Distributed identity management
Authorisation
Revocation
Incident response
Grids as a Foundation for SolutionsGrids as a Foundation for Solutions
8/14/2019 Ticer Summer School 24Aug06
67/73
TICER Summer School, August 24th 2006 67
The gridper se doesnt provide
Supported e-Science methods Supported data & information resources
Computations
Convenient access
Grids help providers of these, via
International & national secure e-Infrastructure
Standards for interoperation
Standard APIs to promote re-use But Research Support must be built
Application developers
Resource providers
Collaboration ChallengesCollaboration Challenges
8/14/2019 Ticer Summer School 24Aug06
68/73
TICER Summer School, August 24th 2006 68
gg
Defining common goals
Defining common formats E.g. schemas for data and metadata
Defining a common vocabulary
E.g. for metadata
Finding common technology
Standards should help, eventually
Collecting metadata
Automate where possible
Social ChallengesSocial Challenges
8/14/2019 Ticer Summer School 24Aug06
69/73
TICER Summer School, August 24th 2006 69
gg
Changing cultures
Rewarding data & resource sharing Require publication of data
Taking the first steps
If everyone shares, everyone wins The first people to share must not lose out
Sustainable funding
Technology must persist Data must persist
8/14/2019 Ticer Summer School 24Aug06
70/73
TICER Summer School, August 24th 2006 70
SummarySummary
8/14/2019 Ticer Summer School 24Aug06
71/73
TICER Summer School, August 24th 2006 71
yy
E-Science exploits distributed computing
resource to enable new discoveries, newcollaborations and new ways of working
Grid is an enabling technology for e-science.
Many successful projects exist Many challenges remain
UK e ScienceUK e-Science
8/14/2019 Ticer Summer School 24Aug06
72/73
TICER Summer School, August 24th 2006 72
Globus Alliance
CeSC (Cambridge)
DigitalCurationCentre
e-ScienceInstitute
UK e-ScienceUK e-Science
GridOperations
SupportCentre
NationalCentre for
e-SocialScience
NationalInstitute
forEnvironmental
e-Science
OpenMiddleware
InfrastructureInstitute
8/14/2019 Ticer Summer School 24Aug06
73/73