Ticer Summer School 24Aug06

Embed Size (px)

Citation preview

  • 8/14/2019 Ticer Summer School 24Aug06

    1/73

    TICER Summer School, August 24th 2006 1

    Ticer Summer School

    Thursday 24th August 2006

    Dave Berry & Malcolm Atkinson

    National e-Science Centre, Edinburgh

    www.nesc.ac.uk

  • 8/14/2019 Ticer Summer School 24Aug06

    2/73

    TICER Summer School, August 24th 2006 2

    Digital Libraries, Grids & E-ScienceDigital Libraries, Grids & E-Science

    What is E-Science?

    What is Grid Computing?

    Data Grids

    Requirements Examples

    Technologies

    Data Virtualisation

    The Open Grid Services Architecture

    Challenges

  • 8/14/2019 Ticer Summer School 24Aug06

    3/73

    TICER Summer School, August 24th 2006 3

  • 8/14/2019 Ticer Summer School 24Aug06

    4/73

    TICER Summer School, August 24th 2006 4

    What is e-Science?What is e-Science?

    Goal: to enable better research in alldisciplines

    Method: Develop collaboration supported by

    advanced distributed computation

    to generate, curate and analyse rich data resources

    From experiments, observations, simulations & publications Quality management, preservation and reliable evidence

    to develop and explore models and simulations

    Computation and data at all scales

    Trustworthy, economic, timely and relevant results to enable dynamicdistributed collaboration

    Facilitating collaboration with information and resource sharing

    Security, trust, reliability, accountability, manageability and agility

  • 8/14/2019 Ticer Summer School 24Aug06

    5/73

    prediction

  • 8/14/2019 Ticer Summer School 24Aug06

    6/73

    6Courtesy of David Gavaghan &IB Team

    Integrative Biology

    Tackling two Grand Challenge researchquestions:

    What causes heart disease?

    How does a cancer form and grow?

    Together these diseases cause 61% of all UKdeaths

    Buildinga powerful, fault-tolerant Gridinfrastructure for biomedical science

    Enabling biomedicalresearchers to usedistributed resources such as high-performancecomputers, databases and visualisationtools todevelop coupled multi-scale models of howthese killer diseases develop.

  • 8/14/2019 Ticer Summer School 24Aug06

    7/73

    BBiomedicaliomedical RResearchesearch IInformaticsnformatics DDelivered byelivered by GGridrid

    EEnablednabled SServiceservices

    GlasgowEdinburgh

    LeicesterOxford

    London

    Netherlands

    Publically Curated Data

    Privatedata

    Privatedata

    Privatedata

    Privatedata

    Privatedata

    Privatedata

    CFG Virtual

    OrganisationEnsembl

    MGI

    HUGO

    OMIM

    SWISS-PROT

    DATAHUB

    RGD

    Synteny

    Grid

    Service

    blast

    Portal

    http://www.brc.dcs.gla.ac.uk/projects/bridges/

  • 8/14/2019 Ticer Summer School 24Aug06

    8/73TICER Summer School, August 24th 2006 8

    eDiaMoND: Screening for Breast CancereDiaMoND: Screening for Breast Cancer

    1 Trust Many TrustsCollaborative Working

    Audit capability

    Epidemiology

    Other Modalities

    MRIPET

    Ultrasound

    Better access to

    Case information

    And digital tools

    Supplement Mentoring

    With access to digital

    Training cases and sharin

    Of information across

    clinics

    Letters

    Radiology reporting

    systems

    eDiaMoND

    Grid

    2ndary Capture

    Or FFD

    Case Information

    X-Rays and

    Case Information

    Digital

    Reading

    SMF

    Case and

    Reading Information

    CAD Temporal Comparison

    Screening

    Electronic

    Patient Records

    Assessment/ Symptomatic

    Biopsy

    Case and

    Reading Information

    Symptomatic/Assessment

    Information

    Training

    Manage Training Cases

    Perform Training

    SMF CAD 3D Images

    Patients

    Provided by eDiamond project: Prof. Sir Mike Brady et al.

  • 8/14/2019 Ticer Summer School 24Aug06

    9/73TICER Summer School, August 24th 2006 9

    E-Science Data ResourcesE-Science Data Resources

    Curated databases

    Public, institutional, group, personal

    Online journals and preprints

    Text mining and indexing services

    Raw storage (disk & tape)

    Replicated files

    Persistent archives Registries

  • 8/14/2019 Ticer Summer School 24Aug06

    10/73

    TICER

    10

    EBank

    Slide

    from

    Jeremy

    Frey

  • 8/14/2019 Ticer Summer School 24Aug06

    11/73

    TICER

    11

    Biomedical data making

    connections

    12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat

    ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag

    tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct

    cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg

    ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt

    gcctgttttt ttttaattgg

    Slide provided by Carole Goble: University of Manchester

  • 8/14/2019 Ticer Summer School 24Aug06

    12/73TICER Summer School, August 24th 2006 12

    Using Workflows to Link ServicesUsing Workflows to Link Services

    Describe the steps in a Scripting Language

    Steps performed by Workflow Enactment Engine

    Many languages in use

    Trade off: familiarity & availability

    Trade off: detailed control versus abstraction Incrementally develop correct process

    Sharable & Editable

    Basis for scientific communication & validation

    Valuable IPR asset

    Repetition is now easy

    Parameterised explicitly & implicitly

  • 8/14/2019 Ticer Summer School 24Aug06

    13/73TICER Summer School, August 24th 2006 13

    Workflow SystemsWorkflow Systems

    BIRN, GEON & SEEK

    http://kepler-project.org/

    KeplerKepler

    High-level abstract formulation of workflows,automated mapping towards executable forms,

    cached result re-use

    Chimera &DAGman

    VDT /Pegasus

    EBI, OMII-UK & MyGridhttp://taverna.sourceforge.net/index.php

    ScuflTaverna

    OASIS standard for industry coordinating use of

    multiple Web Services low level detail - tools

    BPEL

    Enactment

    BPEL

    Popular target because JVM ubiquity similar

    dependence distribution has to be coded

    JVMJava

    Popular in bioinformatics. Similar context

    dependence distribution has to be coded

    Perl

    runtime

    Perl

    Common but not often thought of as WF. Depend

    on context, e.g. NFS across all sites

    Shell + OSShell

    scripts

    CommentsWF Enact.Language

    http://taverna.sourceforge.net/index.phphttp://taverna.sourceforge.net/index.php
  • 8/14/2019 Ticer Summer School 24Aug06

    14/73

    TICER

    14

    Workflow example

    Taverna in MyGrid http://www.mygrid.org.uk/

    allows the e-Scientist to describe and enact theirexperimental processes in a structured, repeatableand verifiable way

    GUI

    Workflowlanguage Enactment

    engine

    http://www.mygrid.org.uk/http://www.mygrid.org.uk/
  • 8/14/2019 Ticer Summer School 24Aug06

    15/73

    TICER

    15

    Pub/Sub for Laboratory datausing a broker and ultimately

    delivered over GPRS

    Notification

    Comb-e-chem: Jeremy Frey

  • 8/14/2019 Ticer Summer School 24Aug06

    16/73TICER Summer School, August 24th 2006 16

    Relevance to Digital LibrariesRelevance to Digital Libraries

    Similar concerns

    Data curation & management

    Metadata, discovery

    Secure access (AAA +)

    Provenance & data quality

    Local autonomy

    Availability, resilience

    Common technology Grid as an implementation technology

  • 8/14/2019 Ticer Summer School 24Aug06

    17/73TICER Summer School, August 24th 2006 17

  • 8/14/2019 Ticer Summer School 24Aug06

    18/73TICER Summer School, August 24th 2006 18

    What is a Grid?

    License

    Printer

    A grid is a system consisting of

    Distributed but connected resources and Software and/or hardware that provides and manages logically

    seamless access to those resources to meet desired objectives

    R2AD

    Database

    Web

    server

    Data CenterCluster

    Handheld Supercomputer

    Workstation

    Server

    Source: Hiro Kishimoto GGF17 Keynote May 2006

  • 8/14/2019 Ticer Summer School 24Aug06

    19/73TICER Summer School, August 24th 2006 19

    Virtualizing Resources

    Resources

    Web

    services

    Access

    Storage Sensors Applications InformationComputers

    Resource-specific Interfaces

    Common Interfaces

    Type-specific interfaces

    Hiro Kishimoto: Keynote GGF17

  • 8/14/2019 Ticer Summer School 24Aug06

    20/73

    TICER Summer School, August 24th 2006 20

    Ideas and FormsIdeas and Forms

    Key ideas

    Virtualised resources Secure access

    Local autonomy

    Many forms Cycle stealing

    Linked supercomputers

    Distributed file systems

    Federated databases

    Commercial data centres

    Utility computing

  • 8/14/2019 Ticer Summer School 24Aug06

    21/73TICER Summer School, August 24th 2006 21

    Grid Middleware

    Virtualizedresources

    Gridmiddleware

    services

    BrokeringService

    Registry

    Service

    Data

    Service

    CPU

    Resource

    Printer

    Service

    Job-Submit

    Service

    Compute

    Service

    Notify

    Advertise

    Application

    Service

    Hiro Kishimoto: Keynote GGF17

  • 8/14/2019 Ticer Summer School 24Aug06

    22/73

    TICER Summer School, August 24th 2006 22

    Key Drivers for GridsKey Drivers for Grids

    Collaboration

    Expertise is distributed Resources (data, software licences) are location-specific

    Necessary to achieve critical mass of effort

    Necessary to raise sufficient resources

    Computational Power Rapid growth in number of processors

    Powered by Moores law + device roadmap

    Challenge to transform models to exploit this

    Deluge of Data Growth in scale: Number and Size of resources

    Growth in complexity

    Policy drives greater data availability

  • 8/14/2019 Ticer Summer School 24Aug06

    23/73

    TICER Summer School, August 24th 2006 23

    Minimum Grid FunctionalitiesMinimum Grid Functionalities

    Supports distributedcomputation

    Data and computation Over a varietyof

    hardware components (servers, data stores, )

    Software components (services: resource managers,

    computation and data services)

    With regularitythat can be exploited By applications

    By other middleware & tools

    By providers and operations

    It will normally have securitymechanisms To develop and sustain trust regimes

  • 8/14/2019 Ticer Summer School 24Aug06

    24/73

    TICER Summer School, August 24th 2006 24Source: Hiro Kishimoto GGF17 Keynote May 2006

    Grid & Related Paradigms

    Utility Computing Computing services No knowledge of provider

    Enabled by grid technology

    Distributed Computing Loosely coupled Heterogeneous Single Administration

    Cluster Tightly coupled Homogeneous Cooperative working

    Grid Computing Large scale

    Cross-organizational Geographical distribution Distributed Management

  • 8/14/2019 Ticer Summer School 24Aug06

    25/73

    TICER Summer School, August 24th 2006 25

  • 8/14/2019 Ticer Summer School 24Aug06

    26/73

    TICER Summer School, August 24th 2006 26

    Why use / build Grids?Why use / build Grids?

    Research Arguments

    Enables new ways of working

    New distributed & collaborative research

    Unprecedented scale and resources

    Economic Arguments Reduced system management costs

    Shared resources better utilisation

    Pooled resources increased capacity

    Load sharing & utility computing

    Cheaper disaster recovery

  • 8/14/2019 Ticer Summer School 24Aug06

    27/73

    TICER Summer School, August 24th 2006 27

    Why use / build Grids?Why use / build Grids?

    Operational Arguments

    Enable autonomous organisations to Write complementary software components

    Set up run & use complementary services

    Share operational responsibility General & consistent environment for

    Abstraction, Automation, Optimisation & Tools

    Political & Management Arguments

    Stimulate innovation

    Promote intra-organisation collaboration

    Promote inter-enterprise collaboration

  • 8/14/2019 Ticer Summer School 24Aug06

    28/73

    TICER Summer School, August 24th 2006 28

    Grids In Use: E-Science Examples

    Data sharing and integration

    Life sciences, sharing standard data-sets,combining collaborative data-sets

    Medical informatics, integrating hospital informationsystems for better care and better science

    Sciences, high-energy physics

    Capability computing Life sciences, molecular modeling, tomography

    Engineering, materials science

    Sciences, astronomy, physics

    High-throughput, capacity computing for Life sciences: BLAST, CHARMM, drug screening

    Engineering: aircraft design, materials, biomedical

    Sciences: high-energy physics, economic modeling

    Simulation-based science and engineering Earthquake simulation

    Source: Hiro Kishimoto GGF17 Keynote May 2006

  • 8/14/2019 Ticer Summer School 24Aug06

    29/73

    TICER Summer School, August 24th 2006 29

    Database GrowthDatabase Growth

  • 8/14/2019 Ticer Summer School 24Aug06

    30/73

    PDB 33,367 Protein structuresEMBL DB 111,416,302,701 nucleotides

    Database GrowthDatabase Growth

    Slide provided by Richard Baldock: MRC HGU Edinburgh

  • 8/14/2019 Ticer Summer School 24Aug06

    31/73

    TICER Summer School, August 24th 2006 31

    Requirements: Users viewpointRequirements: Users viewpoint

    Find Data

    Registries & Human communication Understand data

    Metadata description, Standard / familiar formats &representations, Standard value systems & ontologies

    Data Access Find how to interact with data resource

    Obtain permission (authority)

    Make connection

    Make selection Move Data

    In bulk or streamed (in increments)

  • 8/14/2019 Ticer Summer School 24Aug06

    32/73

    TICER Summer School, August 24th 2006 32

    Requirements: Users viewpoint 2Requirements: Users viewpoint 2

    Transform Data

    To format, organisation & representationrequired for computation or integration

    Combine data

    Standard database operations + operations relevant to

    the application model

    Present results

    To humans: data movement + transform for viewing

    To application code: data movement + transform to therequired format

    To standard analysis tools, e.g. R

    To standard visualisation tools, e.g. Spitfire

  • 8/14/2019 Ticer Summer School 24Aug06

    33/73

    TICER Summer School, August 24th 2006 33

    Requirements: Owners viewpointRequirements: Owners viewpoint

    Create Data

    Automated generation, Accession Policies, Metadatageneration

    Storage Resources

    Preserve Data

    Archiving

    Replication

    Metadata

    Protection Provide Services with available resources

    Definition & implementation: costs & stability

    Resources: storage, compute & bandwidth

  • 8/14/2019 Ticer Summer School 24Aug06

    34/73

    TICER Summer School, August 24th 2006 34

    Requirements: Owners viewpoint 2Requirements: Owners viewpoint 2

    Protect Services

    Authentication, Authorisation, Accounting, Audit Reputation

    Protect data Comply with owner requirements encryption for privacy,

    Monitor and Control use

    Detect and handle failures, attacks, misbehaving users

    Plan for future loads and services

    Establish case for Continuation Usage statistics

    Discoveries enabled

  • 8/14/2019 Ticer Summer School 24Aug06

    35/73

    TICER Summer School, August 24th 2006 35

  • 8/14/2019 Ticer Summer School 24Aug06

    36/73

    TICER Summer School, August 24th 2006 36

    Large Hadron ColliderLarge Hadron Collider

    The most powerful

    instrument ever built toinvestigate elementaryparticle physics

    Data Challenge: 10 Petabytes/year of data 20 million CDs each year!

    Simulation, reconstruction,

    analysis: LHC data handling requires

    computing power equivalentto ~100,000 of today's fastestPC processors

  • 8/14/2019 Ticer Summer School 24Aug06

    37/73

    TICER Summer School, August 24th 2006 37

    Composing Observations in AstronomyComposing Observations in Astronomy

    Data and images courtesy Alex Szalay, John Hopkins

    No. & sizes of data sets as of mid-2002,

    grouped by wavelength

    12 waveband coverage of largeareas of the sky Total about 200 TB data Doubling every 12 months Largest catalogues near 1B objects

  • 8/14/2019 Ticer Summer School 24Aug06

    38/73

  • 8/14/2019 Ticer Summer School 24Aug06

    39/73

  • 8/14/2019 Ticer Summer School 24Aug06

    40/73

    discoveryuse

    Gl b l I fli ht E i Di ti

  • 8/14/2019 Ticer Summer School 24Aug06

    41/73

    Global In-flight Engine DiagnosticsGlobal In-flight Engine Diagnostics

    in-flight data

    airline

    maintenance centre

    groundstation

    global networkeg SITA

    internet, e-mail, pager

    DS&S Engine Health Center

    data centre

    Distributed Aircraft Maintenance Environment: Leeds, Oxford, Sheffield &York, Jim Austin

    100,000 aircraft

    0.5 GB/flight

    4 flights/day

    200 TB/day

    Now BROADEN

    Significant ingetting Boeing787 enginecontract

  • 8/14/2019 Ticer Summer School 24Aug06

    42/73

    TICER Summer School, August 24th 2006 42

    St R M (SRM)St R M (SRM)

  • 8/14/2019 Ticer Summer School 24Aug06

    43/73

    TICER Summer School, August 24th 2006 43

    Storage Resource Manager (SRM)Storage Resource Manager (SRM)

    http://sdm.lbl.gov/srm-wg/

    de facto & written standard in physics, Collaborative effort

    CERN, FNAL, JLAB, LBNL and RAL

    Essential bulk file storage

    (pre) allocation of storage abstraction over storage systems

    File delivery / registration / access

    Data movement interfaces

    E.g. gridFTP

    Rich function set Space management, permissions, directory, data transfer

    & discovery

    St R B k (SRB)St R B k (SRB)

    http://sdm.lbl.gov/srm-wg/http://sdm.lbl.gov/srm-wg/collaboration.htmlhttp://sdm.lbl.gov/srm-wg/collaboration.htmlhttp://sdm.lbl.gov/srm-wg/
  • 8/14/2019 Ticer Summer School 24Aug06

    44/73

    TICER Summer School, August 24th 2006 44

    Storage Resource Broker (SRB)Storage Resource Broker (SRB)

    http://www.sdsc.edu/srb/index.php/Main_Page

    SDSC developed Widely used

    Archival document storage

    Scientific data: bio-sciences, medicine, geo-sciences,

    Manages Storage resource allocation

    abstraction over storage systems

    File storage

    Collections of files Metadata describing files, collections, etc.

    Data transfer services

    C d D t M tC d D t M t

    http://www.sdsc.edu/srb/index.php/Main_Pagehttp://www.sdsc.edu/srb/index.php/Main_Page
  • 8/14/2019 Ticer Summer School 24Aug06

    45/73

    TICER Summer School, August 24th 2006 45

    Condor Data ManagementCondor Data Management

    Stork

    Manages File Transfers

    May manage reservations

    Nest

    Manages Data Storage

    C.f. GridFTP with reservations

    Over multiple protocols

  • 8/14/2019 Ticer Summer School 24Aug06

    46/73

    TICER Summer School, August 24th 2006 46

    Globus Tools and Servicesfor Data Management

    q GridFTPx A secure, robust, efficient data transfer protocol

    q The Reliable File Transfer Service (RFT)x Web services-based, stores state about transfers

    q The Data Access and Integration Service (OGSA-DAI)x Service to access to data resources, particularly relational and

    XML databases

    q The Replica Location Service (RLS)

    x Distributed registry that records locations of data copies

    q The Data Replication Servicex Web services-based, combines data replication and

    registration functionality

    Slides from Ann Chervenak

  • 8/14/2019 Ticer Summer School 24Aug06

    47/73

    TICER Summer School, August 24th 2006 47

    RLS in Production Use: LIGO

    q Laser Interferometer Gravitational Wave Observatory

    Currently use RLS servers at 10 sites

    x Contain mappings from 6 million logical files to over 40

    million physical replicas

    q Used in customized data management system: the

    LIGO Lightweight Data Replicator System (LDR)

    x Includes RLS, GridFTP, custom metadata catalog, tools for

    storage management and data validation

    Slides from Ann Chervenak

  • 8/14/2019 Ticer Summer School 24Aug06

    48/73

    TICER Summer School, August 24th 2006 48

    RLS in Production Use: ESG

    q

    Earth System Grid: Climatemodeling data (CCSM, PCM,IPCC)

    q RLS at 4 sitesq Data management

    coordinated by ESG portalq Datasets stored at NCAR

    x 64.41 TB in 397253 total filesx 1230 portal users

    q IPCC Data at LLNLx 26.50 TB in 59,300 filesx 400 registered usersx Data downloaded: 56.80 TB

    in 263,800 filesx Avg. 300GB downloaded/dayx 200+ research papers being

    writtenSlides from Ann Chervenak

    gLite Data Management

  • 8/14/2019 Ticer Summer School 24Aug06

    49/73

    TICER Summer School, August 24th 20062nd EGEE 49

    Enabling Grids for E-sciencE

    INFSO-RI-508833

    gLite Data Management

    FTS

    File Transfer Service LFC

    Logical file catalogue

    Replication Service

    Accessed through LFC AMGA

    Metadata services

    Data Management Services

  • 8/14/2019 Ticer Summer School 24Aug06

    50/73

    TICER Summer School, August 24th 20062nd EGEE 50

    Enabling Grids for E-sciencE

    INFSO-RI-508833

    Data Management Services

    FiReMan catalog Resolves logical filenames (LFN) to physical location of files and storage elements Oracle and MySQL versions available

    Secure services Attribute support Symbolic link support Deployed on the Pre-Production Service and DILIGENT testbed

    gLite I/O Posix-like access to Grid files Castor, dCache and DPM support Has been used for the BioMedical Demo Deployed on the Pre-Production Service and the DILIGENT testbed

    AMGA MetaData Catalog Used by the LHCb experiment Has been used for the BioMedical Demo

    MedicalData Management3

    EnablingGrids forE-sciencE

    ClientClient

    Medical Data Management

    Application

    MDM ClientLibraryMDM ClientLibrary

    Grid CatalogsGrid Catalogs

    MetadataMetadata

    Catalog (AMGA)Catalog (AMGA)

    Medical

    Imager

    EncryptionEncryption

    KeystoreKeystore (Hydra)(Hydra)

    File CatalogFile Catalog

    (Fireman)(Fireman)

    SRM DICOMSRM DICOM

    MDM TriggerMDM Trigger

    GridFTPGridFTP

    gLitegLite I/OI/O

    Trigger:

    Retrieve DICOMfilesfromimager.

    Register filein

    FiremangLiteEDS client:Generateencryption

    keysand store theminHydra

    Register MetadatainAMGA

    ClientLibrary:

    Lookup filethroughMetadata(AMGA)

    Use gLiteEDS client:

    Retrievefile throughgLiteI/O

    Retrieveencryption KeyfromHydra

    Decrypt data

    Serveituptotheapplication

    File Transfer Service

    https://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppthttps://edms.cern.ch/file/678613/1/EGEE-JRA1-PRE-678613-ENCRYPTS-261005-V1.0.ppt
  • 8/14/2019 Ticer Summer School 24Aug06

    51/73

    TICER Summer School, August 24th 20062nd EGEE 51

    Enabling Grids for E-sciencE

    INFSO-RI-508833

    File Transfer Service

    Reliable file transfer

    Full scalable implementation

    Java Web Service front-end, C++ Agents, Oracle or MySQL database support Support for Channel, Site and VO management

    Interfaces for management and statistics monitoring

    Gsiftp, SRM and SRM-copy support

    Support for MySQL and Oracle

    Multi-VO support

    GridFTP and SRM copy support

    Commercial SolutionsCommercial Solutions

  • 8/14/2019 Ticer Summer School 24Aug06

    52/73

    TICER Summer School, August 24th 2006 52

    Commercial SolutionsCommercial Solutions

    Vendors include:

    Avaki Data Synapse

    Benefits & costs

    Well packaged and documented

    Support

    Can be expensive

    But look for academic rates

  • 8/14/2019 Ticer Summer School 24Aug06

    53/73

    TICER Summer School, August 24th 2006 53

    Data Integration StrategiesData Integration Strategies

  • 8/14/2019 Ticer Summer School 24Aug06

    54/73

    TICER Summer School, August 24th 2006 54

    Data Integration StrategiesData Integration Strategies

    Use a Service provided by a Data Owner

    Use a scripted workflow Use data virtualisation services

    Arrange that multiple data services have commonproperties

    Arrange federations of these

    Arrange access presenting the commonproperties

    Expose the important differences Support integration accommodating those

    differences

    Data Virtualisation ServicesData Virtualisation Services

  • 8/14/2019 Ticer Summer School 24Aug06

    55/73

    TICER Summer School, August 24th 2006 55

    Data Virtualisation ServicesData Virtualisation Services

    Form a federation Set of data resources incremental addition

    Registration & description of collected resources Warehouse data or access dynamically to obtain updated data Virtual data warehouses automating division between collection and

    dynamic access

    Describe relevant relationships between data sources

    Incremental description + refinement / correction Run jobs, queries & workflows against combined set of data

    resources Automated distribution & transformation

    Example systems

    IBMs Information Integrator GEON, BIRN & SEEK OGSA-DAI is an extensible framework for building such systems

    Virtualisation variationsVirtualisation variations

  • 8/14/2019 Ticer Summer School 24Aug06

    56/73

    TICER Summer School, August 24th 2006 56

    Virtualisation variationsVirtualisation variations

    Extent to which homogeneity obtained

    Regular representation choices e.g. units Consistent ontologies

    Consistent data model

    Consistent schema integrated super-schema DB operations supported across federation

    Ease of adding federation elements

    Ease of accommodating change as federationmembers change their schema and policies

    Drill through to primary forms supported

    OGSA-DAIOGSA-DAI

  • 8/14/2019 Ticer Summer School 24Aug06

    57/73

    TICER Summer School, August 24th 2006 57

    OGSA-DAIOGSA-DAI

    http://www.ogsadai.org.uk

    A framework for data virtualisation Wide use in e-Science BRIDGES, GEON, CaBiG, GeneGrid, MyGrid,

    BioSimGrid, e-Diamond, IU RGRBench,

    Collaborative effort NeSC, EPCC, IBM, Oracle, Manchester, Newcastle

    Querying of data resources Relational databases XML databases

    Structured flat files Extensible activity documents

    Customisation for particular applications

    http://www.ogsadai.org.uk/http://www.ogsadai.org.uk/
  • 8/14/2019 Ticer Summer School 24Aug06

    58/73

    TICER Summer School, August 24th 2006 58

    The Open Grid Services Architecture

  • 8/14/2019 Ticer Summer School 24Aug06

    59/73

    TICER Summer School, August 24th 2006 59

    The Open Grid Services Architecture

    An open, service-oriented architecture (SOA) Resources as first-class entities

    Dynamic service/resource creation and destruction

    Built on a Web services infrastructure

    Resource virtualization at the core

    Build grids from small number of standards-basedcomponents Replaceable, coarse-grained

    e.g. brokers

    Customizable Support for dynamic, domain-specific content within the same standardized framework

    Hiro Kishimoto: Keynote GGF17

    OGSA Capabilities

  • 8/14/2019 Ticer Summer School 24Aug06

    60/73

    TICER Summer School, August 24th 2006 60

    OGSA Capabilities

    Security Cross-organizational users Trust nobody

    Authorized access only

    Information Services Registry Notification

    Logging/auditing

    Execution Management

    Job description & submission Scheduling Resource provisioning

    Data Services

    Common access facilities Efficient & reliable transport Replication services

    Self-Management Self-configuration

    Self-optimization Self-healing

    Resource Management Discovery

    Monitoring ControlOGSA

    OGSA profiles

    Web services foundation

    Hiro Kishimoto: Keynote GGF17

    Basic Data Interfaces

  • 8/14/2019 Ticer Summer School 24Aug06

    61/73

    TICER Summer School, August 24th 2006 61

    Basic Data Interfaces

    Storage Management e.g. Storage Resource

    Management (SRM)

    Data Access

    ByteIO Data Access & Integration

    (DAI)

    Data Transfer

    Data Movement InterfaceSpecification (DMIS)

    Protocols (e.g. GridFTP)

    Replica management

    Metadata catalog

    Cache management

    Hiro Kishimoto: Keynote GGF17

  • 8/14/2019 Ticer Summer School 24Aug06

    62/73

    TICER Summer School, August 24th 2006 62

    The State of the ArtThe State of the Art

  • 8/14/2019 Ticer Summer School 24Aug06

    63/73

    TICER Summer School, August 24th 2006 63

    The State of the ArtThe State of the Art

    Many successful Grid & E-Science projects

    A few examples shown in this talk

    Many Grid systems

    All largely incompatible

    Interoperation talks under way

    Standardisation efforts

    Mainly via the Open Grid Forum

    A merger of the GGF & EGA

    Significant user investment required

    Few out of the box solutions

    Technical ChallengesTechnical Challenges

  • 8/14/2019 Ticer Summer School 24Aug06

    64/73

    TICER Summer School, August 24th 2006 64

    Technical ChallengesTechnical Challenges

    Issues you cant avoid

    Lack of Complete Knowledge (LOCK) Latency

    Heterogeneity

    Autonomy

    Unreliability

    Scalability

    Change

    A Challenging goal balance technical feasibility

    against virtual homogeneity, stability and reliability

    while remaining affordable, manageable and maintainable

    Areas In DevelopmentAreas In Development

  • 8/14/2019 Ticer Summer School 24Aug06

    65/73

    TICER Summer School, August 24th 2006 65

    Areas In DevelopmentAreas In Development

    Data provenance

    Quality of Service Service Level Agreements

    Resource brokering

    Across all resources

    Workflow scheduling

    Co-sheduling

    Licence management

    Software provisioning

    Deployment and update

    Other areas too!

    Operational ChallengesOperational Challenges

  • 8/14/2019 Ticer Summer School 24Aug06

    66/73

    TICER Summer School, August 24th 2006 66

    Operational Challengesp g

    Management of distributed systems

    With local autonomy Deployment, testing & monitoring

    User training

    User support Rollout of upgrades

    Security

    Distributed identity management

    Authorisation

    Revocation

    Incident response

    Grids as a Foundation for SolutionsGrids as a Foundation for Solutions

  • 8/14/2019 Ticer Summer School 24Aug06

    67/73

    TICER Summer School, August 24th 2006 67

    The gridper se doesnt provide

    Supported e-Science methods Supported data & information resources

    Computations

    Convenient access

    Grids help providers of these, via

    International & national secure e-Infrastructure

    Standards for interoperation

    Standard APIs to promote re-use But Research Support must be built

    Application developers

    Resource providers

    Collaboration ChallengesCollaboration Challenges

  • 8/14/2019 Ticer Summer School 24Aug06

    68/73

    TICER Summer School, August 24th 2006 68

    gg

    Defining common goals

    Defining common formats E.g. schemas for data and metadata

    Defining a common vocabulary

    E.g. for metadata

    Finding common technology

    Standards should help, eventually

    Collecting metadata

    Automate where possible

    Social ChallengesSocial Challenges

  • 8/14/2019 Ticer Summer School 24Aug06

    69/73

    TICER Summer School, August 24th 2006 69

    gg

    Changing cultures

    Rewarding data & resource sharing Require publication of data

    Taking the first steps

    If everyone shares, everyone wins The first people to share must not lose out

    Sustainable funding

    Technology must persist Data must persist

  • 8/14/2019 Ticer Summer School 24Aug06

    70/73

    TICER Summer School, August 24th 2006 70

    SummarySummary

  • 8/14/2019 Ticer Summer School 24Aug06

    71/73

    TICER Summer School, August 24th 2006 71

    yy

    E-Science exploits distributed computing

    resource to enable new discoveries, newcollaborations and new ways of working

    Grid is an enabling technology for e-science.

    Many successful projects exist Many challenges remain

    UK e ScienceUK e-Science

  • 8/14/2019 Ticer Summer School 24Aug06

    72/73

    TICER Summer School, August 24th 2006 72

    Globus Alliance

    CeSC (Cambridge)

    DigitalCurationCentre

    e-ScienceInstitute

    UK e-ScienceUK e-Science

    GridOperations

    SupportCentre

    NationalCentre for

    e-SocialScience

    NationalInstitute

    forEnvironmental

    e-Science

    OpenMiddleware

    InfrastructureInstitute

  • 8/14/2019 Ticer Summer School 24Aug06

    73/73