34
The Global Bio Grid Andrew Grimshaw University of Virginia January, 2006 Virginia Center for Grid Research

The Global Bio Grid

Embed Size (px)

DESCRIPTION

The Global Bio Grid. Virginia Center for Grid Research. Andrew Grimshaw University of Virginia January, 2006. Why Bio Grids? Grid Basics The Global Bio Grid. In ten years the world will be very different. Think back ten years. No web Wide-spread internet was new - PowerPoint PPT Presentation

Citation preview

Page 1: The Global Bio Grid

The Global Bio GridAndrew Grimshaw

University of VirginiaJanuary, 2006

Virginia Center for Grid Research

Page 2: The Global Bio Grid

• Why Bio Grids?

• Grid Basics

• The Global Bio Grid

Page 3: The Global Bio Grid

In ten years the world will be very different.

Page 4: The Global Bio Grid

Think back ten years.

• No web

• Wide-spread internet was new

• Human Genome Project still far from completion

• Science (biology) done primarily in individual labs

Page 5: The Global Bio Grid

Today

• Billions a year in e-commerce• Internet everywhere

• Broadband to your home• Wireless becoming pervasive

• Pervasive device are proliferating – motes

• Sequencing of organisms a daily event. Bioinformatics hitting the main stream

Page 6: The Global Bio Grid
Page 7: The Global Bio Grid

Tomorrow

• $1000/sequnce for humans – becomes standard clinical practice

• “Biology is becoming an information science”(Large Scale Biomedical Science: Exploring Strategies for future research, Institute of Medicine, National Research Council, 2003)

• Global interconnected networks – grids• Provide transparent, secure, access to data, applications,

and on-demand compute.

• Research using not just your data, but all trusted data, not just your applications, but any trusted application.

• Implications for progress are significant.

Page 8: The Global Bio Grid

There are a number of “catches”

• So much data!

• So many organizations with so little trust!

• So much complexity!

Page 9: The Global Bio Grid

An IT guys view

• Data is all over, of all different forms, with lots of different policies• Need to get the right data in the right place at the

right time

• Ontology problem – how do we compare, integrate, the databases• Need to understand semantics, automatically

transform

• Semantics• Knowledge Discovery – “mining”

Page 10: The Global Bio Grid

This is where grids enter the picture

(we do the plumbing)

Page 11: The Global Bio Grid

Some lessons learned

• 10+ years in academic and commercial grids• All/most problems are not technical• Users don’t want change!

• Too many grids are technology centric• Must keep “activation energy low”• Need a user-centric approach• There are at least four classes of users• Wide variance in computational savvy

Page 12: The Global Bio Grid

A grid enables users to collaborate securely by sharing processing, applications, work flows and processes, and data across heterogeneous systems and administrative domains for collaboration, faster application execution, and easier access to data.

What is a Grid? A grid is all about gathering together resources and making them accessible to users and applications.

The emphasis is on secure access to a widevariety of resources

Page 13: The Global Bio Grid

Characteristics of Grid systems

Numerous ResourcesOwnership by

MutuallyDistrustful

Organizations & Individuals

Potentially FaultyResources

Different Security

Requirements & Policies Required

Resources areHeterogeneous

GeographicallySeparated

Different Resource

ManagementPolicies

Connected byHeterogeneous, Multi-Level

NetworksGrid System

Page 14: The Global Bio Grid

Characteristics of a Grid system

Numerous ResourcesOwnership by

MutuallyDistrustful

Organizations & Individuals

Potentially FaultyResources

Different Security

Requirements & Policies Required

Resources areHeterogeneous

GeographicallySeparated

Different Resource

ManagementPolicies

Connected byHeterogeneous, Multi-Level

Networks

Page 15: The Global Bio Grid

What grids are not

• The solution to all problems

• Clusters of machines

• SETI@home

• Any one particular technology

Page 16: The Global Bio Grid

Users view

Site 0 Site 1 Site 2 Site 3

Cluster

Cluster

HPSS

UsersUsers

Grid

Runprograms

AccessData Collaborate

Provideshared

services

Page 17: The Global Bio Grid

Grid Computing Scenarios

Desktop Cycle Aggregation• Limited acceptance in commercial enterprises

Cluster Grids• Single owner, department, project • Single domain, file system• LAN connection

Campus/Enterprise Grids• Multiple owners, domains• Multiple file systems• WAN connection

Partner Grids• Multiple owners, sites, domains• Multiple file systems• Internet connectivity

Legion Grid

Software – C

ompute

and Data G

rid

Page 18: The Global Bio Grid

Standards

• Global Grid Forum – ggf.org• OGSA – Open Grid Services Architecture

• Web-Services based IPC• WSRF and possibly other• OGSA-BES – Basic Execution Service• OGSA-ByteIO – file IO• WS-Naming – abstract name to EPR• RNS-lite – Resource Name Space

Page 19: The Global Bio Grid

The Global Bio Grid

Page 20: The Global Bio Grid

• Federated access to multiple • Data sources

• Public databases• Commercial databases• In-house databases, annotations, etc.

• Application suites (including processes and workflows)

• Compute resources

• Shared among collaborative research teams• Multiple research locations• Virtual organizations

• Built on evolving computing standards (GGF, I3C, WS-*)

GBG concept

Page 21: The Global Bio Grid

Global Bio Grid• Datagrid using Avaki DG technology

• Working on ADG available free for “.edu”• UVA, NCBIO, U-Texas, Texas Tech• Already operational• Flat file and relational• Working on an OGSA-compliant implementation

• Compute grid at UVA on-line• 64 dual processor Opteron’s available• Sunfires• Hundreds of Windows machines• Legion 1.8 based – moving towards OGSA-compliant services

• Applications• Biomarker• Searching pub med• Hospital info integration

Page 22: The Global Bio Grid

Three resource classes illustrate the Grid-effect

• Data

• Processing

• Applications

Page 23: The Global Bio Grid

Data• Suppose you have collaborators with critical

databases (clinical, protein, other) that you need to use.

• You use a number of databases that change on a regular basis.

• You want to “mine” heterogeneous data sets (relational, flat-file, XML, …) in different locations – say in a hospital

• Want to produce, consume, or share derivative data products, e.g., the result of a set of joins and data transformation steps.

• This applies to business data (BI/EII) as well as life science data

Page 24: The Global Bio Grid

SEQ_3

BiochemistryBiology

Partner Institution

SEQ_2SEQ_1

Partner Institution

Public DB Public DB

Research Institution

APP 2APP 1

Public DBDataGrid: Unifying fabric for data access • Transparent access to multiple DBs• Multiple domains• Highly-secure, flexible access control• Automatic cache management and

coherence

PDB

NCBI

EMBL

SEQ_1

Data

Page 25: The Global Bio Grid

Three Concrete Examples

• KDS – “data mining” on widely separated data sets such as PubMed.

• “Map” UniProt datasets into data grid• Researchers no longer need to spend time

downloading latest

• Extended Hospital

Page 26: The Global Bio Grid

Extended Hospital

Insurance companies

Emergency vehicles

Research

DataWarehouse

Department Domain

Data

Department Domain

Data

Department Domain

Data

HOSPITAL

Clinics / Large Practices

Non-relatedHospitals

AuthorizedFamily

Page 27: The Global Bio Grid

Processing• Classic high-throughput computing

• Suppose you have thousands of computationally intensive jobs to run• SW, CHARMm, Sequest, a.out

• Your usage is bursty – need a lot over short period of time, but often have idle resources

• You wish you had more!

Page 28: The Global Bio Grid

SEQ_3

BiochemistryBiology

Partner Institution

SEQ_2SEQ_1

Partner Institution

Public DB Public DB

Research Institution

APP 2APP 1

Cluster 1

Cluster 2

Cluster N

Processing

Public DBCompute Grid: Shared access to processing

• Flexible, location-independent access to virtually unlimited processing, on-demand

• Scheduling, usage, management policies• System detects, recovers from job failures• Heterogeneous platform support• Usage accounting, as required

PDB

NCBI

EMBL

SEQ_1

Data

Page 29: The Global Bio Grid

Concrete Examples

• Biomarkers project wants to run Sequest-2 using public databases

• Charmm/Amber

• Gnomad (Altman et al)

• BLAST, FASTA, ….

• Autodock

Page 30: The Global Bio Grid

Applications

• Suppose you want to use applications or workflows developed, maintained, and supported by others – without the hassle of installing all of them on your gear.

• Suppose you want to couple multiple applications developed at different institutions together.

Page 31: The Global Bio Grid

SEQ_3

BiochemistryBiology

Partner Institution

SEQ_2SEQ_1

Partner Institution

Public DB Public DB

Research Institution

APP 2APP 1

PDBNCBIEMBLSEQ_NData

Cluster 1

Cluster 2

Cluster N

Processing

APP 1

APP 2

APP N

Applications

Public DB

• Flexible binary management• No need to recompile applications• Securely share applications

• Restrict who gains access• Restrict where apps run

Grid users share applications, employing multiple data & processing resources

PDB

NCBI

EMBL

SEQ_1

Data

Page 32: The Global Bio Grid

SEQ_3

BiochemistryBiology

Partner Institution

SEQ_2SEQ_1

Partner Institution

Public DB Public DB

Research Institution

APP 2APP 1

Cluster 1

Cluster 2

Cluster N

Processing

APP 1

APP 2

APP N

Applications

Public DBBetter Research, Faster

• Secure, wide-area access to global breadth of consistent, current data

• Access to vast processing power• Ability to securely share proprietary

data and applications, as needed

PDB

NCBI

EMBL

SEQ_1

Data

Page 33: The Global Bio Grid

Evolution in action

Bare Metal Programming

50’s

Batch OS

Multi-UserTimeshare

60’s to 80’s

Low Level Network

Programming

Today

Grid & WS

Now & Future!

Summary

Page 34: The Global Bio Grid

Summary

• Grids will have a huge impact on the life sciences

• Prototype GBG operational

• Applications are underway

• We’re always looking for new applications