Shared CyberInfrastructure for Global Medical Research (pdf)

Preview:

Citation preview

Garuda : The National Grid Computing Initiative - The shared cyberinfrastructure for data and compute intensive research

Subrata ChattopadhyayCDAC Knowledge Park,Bangaloresubratac@cdac.in

www.garudaindia.in

Outline

• Introduction on Garuda • NKN – highlights• Tools and Services - GarudaWare • Major Applications• Collaborations • Q & A

Global Access to Resources Using Distributed Architecture

Garuda on MPLS based NKN

LegendH Head NodeG Gateway

CC--DAC, DAC, BangaloreBangalore

LANLocalUser

Compute Nodes

H

InternetAccess

Partner Partner without without

resourcesresources

PartnerPartnerwith resourceswith resources

Compute Nodes

H

User

Tele-scope

LAN

Storage

LANAccess Terminal

Gridfs AccessTerminal

Access Terminal

M P L S AccessM P L S Access

Access Terminals

G

G

G

• High Capacity, Highly Scalable Backbone

• Provide Quality of Service (QoS) and Security

• Wide Geographical Coverage

• Common Standard Platform

• Bandwidth from Many NLD’s

• Highly Reliable & Available by Design

• Test beds ( for various implementation)

• Dedicated and Owned.

NKN – National Knowledge Network

Garuda High Level System Components

ProgrammingDevelopment Environment

Computing Resources and Virtual Organizations

Research Organizations

Educational institutions Computing Centers

WSRF+GT4 + other Services + Cloud S/W (Nimbus/ VMware)WSRF+GT4 + other Services + Cloud S/W (Nimbus/ VMware)

NKN

Grid PSE

Virtualization support

Workflows

Grid Security and High-Performance Grid Networking

Data

Grid

Reso

urce

En

ab

ler &

Mo

nito

ring

CDAC Resource centers

Access PortalCLI Visualization

Federated Information Server Job Scheduler

Programming Environments Grid ApplicationsSecurity

Resource Management User

EnvironmentsMiddleware Data GridResources

Hand held devices

GARUDA – enabled Applications

Non – Research

Organizations

Cloud Interface

GARUDA Middleware componentsGARUDA Middleware components

Utility tools• RAT• Compiler Service• Gridftp GUI• GARUDA Information

Registry

Access Methods• Access Portal • Problem Solving

Environments• Workflows• Visualization

gateways• Hand held device• Cloud Interface

Management, Monitoring & Accounting• Paryaveekshanam• GARUDA Accounting• MDS4

Security Framework• IGCA Certificates• VOMS • MyProxy• Login Service

Resource Mgmt & Scheduling• Resource Reservation• QoS• GridWay Meta-scheduler• Torque, Load Leveler• Globus 4.x (WS Components)

Legend••••

Data Management• SRB• GSRM• GridFTP

– Indian Grid Certification Authority located at C-DAC, Knowledge Park, Bangalore, India.

– IGCA is the accredited member of APGridPMA.– Issues X.509 Certificates to support the secure environment

in Grid. (for GARUDA, institutes that do research in grid from India and foreign institutes that collaborates with GARUDA).

– http://ca.garudaindia.in

GARUDA SLCS provides gridusers an instant access toGARUDA grid for a trial periodof 30days.

Highlights:• Hassle free registration• Get an access in less than 5mins.• Service over the internet.

Features:• GARUDA Job submission portal• GARUDA Compiler Service

Website: http://labs.garudaindia.in

GARUDA Short Live Certificate

GARUDA Resources

CDAC Resource :

Fourteen of the partner institutions are also contributing resources including satellite terminals.Total computing power is more than 5500 CPUs equivalent to 65TFStorage space 220 TB

GARUDA Resources – cont...Institution Location Resources

Space Application Centre Ahmedabad VSAT Terminal - 2 Nos.

Indian Institute of Science Bangalore 64 cpu; POWER5; Linux

Raman Research Institute Bangalore 32 cpu; Opteron; Linux

Institute of Mathematical Sciences Chennai 24 cpu; Opteron cluster (Cray XD1)

Madras Institute of Technology Chennai 16 cpu; P4; Linux

Indian Institute of Technology Delhi 32 cpu; Opteron; Linux

Jawaharlal Nehru University Delhi 32+16+16 cpu; Opteron, Opteron, Itanium; Linux

Institute of Genomics and Integrative Biology

Delhi 48 cpu; Xeon; Linux

Indian Institute of Technology Guwahati 128 cpu; Opteron; Linux

University of Hyderabad Hyderabad 32 way SMP; POWER4, AIX

Indian Institute of Technology Kharagapur 16+16 cpu; Power PC2, Xeon; AIX, Linux

Physical Research Laboratory Ahmedabad 320cpus; 64bit AMD

CDAC Bangalore 64 cpu Power 5; 320 cpu Xeon Linux

CDAC Hyderabad 320 cpu Xeon Linux

CDAC chennai 320 cpu Xeon Linux

CDAC Pune 32 cpu Xeon Linux: 4068 CPU Linux

GARUDA Operations & Management

• Looks after deployment of

middleware and network

• Operates from CDAC KP

Bangalore

• Operation Centre with High

resolution, scalable display wall

• Conduct Regular Monday

meetings among administrators

to maintain Garuda health

gridsupport@garudaindia.in rt@cdacb.ernet.in

GARUDA Partners• Motivation

– To Collaborate on Research and Engineering of Technologies, Architectures, Standards and Applications

– To Contribute to the aggregation of GARUDA resources

• Participation– 36 research & academic

institutions in the 17 cities– 8 centres of C-DAC– Total of 45 institutions– Additional over 20 labs

with LOE

Virtual User Community (VOMS)Group Name Description

Bioinformatics application of statistics and computer science to the molecular biology

ClimateModelling Deals with the dynamics of the climate system.

OSDD Community dedicated to develop drugs for tropical infectious diseases like malaria, tuberculosis

GeoPhysis Study related to physics of the Earth and its environment in space

CAE usage of computer software to solve engineering problems

IndianHeritage Focused on technology products for preserving & processing Heritage texts

HealthInformatics Focused on utilizing compute power for health informatics

MaterialScience interdisciplinary field applying the properties of matter to science and engineering

Euindia The vision of a worldwide Grid for Research by both Europe and India

ToolsDeveloper Forum to communicate and collaborate on developing Garuda Tools

GarudaAdmin Meant for administrators from resource providers & Garuda Operation team members

Applications on GARUDA

OSDD Chemo-informatics

datasets

Curatedmolecule datasets

CheminformaticsModels

Analysis

Data Mining and

Analysis

HT Virtual screening

PubChem

ChEMBL

DrugBank

Experimental Assays

Community of About 400

Role of Garuda Grid in OSDD

Project Team

Internet/NKN

Results

NKN

OSDD-Garuda Interface

Galaxy Workflow

Weka Workflow

Customized Galaxy Framework on GARUDA for OSDD:Chemo-informatics

• Integrated with Grid Authentication mechanism - Indian Grid Certificate Authority (IGCA)

• Integrated with Gridway Metascheduler - Job scheduling and management

• Integrated OSDD required tools - Weka (for data mining) and Autodock (Virtual screening)

• Provided support to upload multiple input files as tar file

• Data libraries of OSDD community are uploaded and are shared by all users

• Integrated with PostgreSQL

Bioinformatics: Protein Structure Prediction on Grid

• Genetic Algorithm for Protein Structure Prediction (PSP), an in-house developed code is Grid-enabled

• Concurrent jobs of PSP are done by splitting the protein molecule into multiple overlapping parts

• Uses Divide-and-Construct approach for– Reduction in Complexity– Possibility of Concurrency– To handle larger protein molecules

Flow of PSP

Dividing the sequence into parts

Mapping of each part onto a grid resource and to run GA

Constructing the molecule by combining parts and to run GA on combined sequence

Input:Protein

Sequence

Protein Sequence:

Part 1

Protein Sequence:

Part 2

Protein Sequence:

Part 3

Torsion Angles of

Part 1

Torsion Angles of

Part 2

Torsion Angles of

Part 3

Grid Resource 1 Grid

Resource 2Grid Resource 3

Combined GA output

for full molecule

Dividing the sequence into parts

Mapping of each part onto a grid resource and to run GA

Constructing the molecule by combining parts and to run GA on combined sequence

Input:Protein

Sequence

Protein Sequence:

Part 1

Protein Sequence:

Part 2

Protein Sequence:

Part 3

Torsion Angles of

Part 1

Torsion Angles of

Part 2

Torsion Angles of

Part 3

Grid Resource 1 Grid

Resource 2Grid Resource 3

Combined GA output

for full molecule

Performance of GA based PSP on Garuda

• Dataset:– 1TUP – a tumor suppressor protein having 219

amino acids.

• Molecule is splitted into 9 parts and each part has 30 amino acids

• GA on full molecule took 76 hours whereas distributed GA on Garuda took only 3 hours

Data/Memory Intensive Applications on Garuda

Computationally Intensive Applications on Garuda

caBIG - Garuda

• Exploring possibilities in Collaboration

• Interoperability of the “grid” technologies– Make the software

components talk to each other– Follow same Data standards

for publication– Common tool base for

researchers• Leverage HPC capabilities for

applications

Other Areas of Collaboration

• Indian Cancer Grid

• Protein folding analysis (using caGrid workflow, transport and security technology)

• caTissue – implement in the Software as a Service (SaaS) model

• Building a regional biobanking system—based on caTissue—at the Tata Memorial Centre & Hospital in Mumbai

Overall goal• Facilitate meeting priorities of NCI and ICMR towards discovering

and application of carcinogenesis biology into cancer prevention.• Discuss development of personalized approach to cancer

prevention and control through linking cancer biology to population diversity.

• To consider development and validation of cost effective biomarkers capable of early detection of cancers through global scientific, population and technological resources.

• Improve and share population databases from India and the United States to compare cancer biology, incidence, mortality, natural history, geographic and population diversity.

• To create understanding between India and Western nations to develop collaborative studies.

• Further details at http://canbio.in/overview.htm

Translational Cancer Prevention & Biomarkers Workshop 2011 @ Bangalore

Highlights• Founded in 2002 by two Yale trained physicians.

• Teleradiology services to hospitals around the globe.

• Teleradiology services include interpretation of all non-invasive imaging studies, namely CT, MRI, ultrasound, nuclear medicine studies and digitized Xrays.

• Emergency reports are provided within thirty minutes.

• Joint research partnerships with major technology vendors such as GE, to explore new techniques in 3D imaging analysis

• Further details at http://www.telradsol.com/

Innovative Startup

31

Collaborative Class Room

Supported Features:-

• Interface to Access grid• GSRM based data storage for maintaining course repositories• Indexing of course material based on key words

Website: http://ccr.garudaindia.in

Interoperability with International Grids

• Integrating technological components of Garuda and EGI– Glite and Globus– Customizing Gridway meta-scheduler – To run real life application across both Infrastructures

• Collaboration between CaBig and Garuda– Interoperation of technological service among these grids– Cancer Research application portability – Contribution to standards for using distributed computing in Health care

!

! !

! !

Thank you!很好, 谢谢

!

Grazie tanto!

Recommended