19
www.hp-see.eu HP-SEE HP-SEE project and the HPC Bioinformatics Life Science gateway M. KOZLOVSZKY Obuda University The HP-SEE initiative is co-funded by the European Commission under the FP7 Research Infrastructures contract no. 261499

HP-SEE project and the HPC Bioinformatics Life Science g ateway

  • Upload
    riona

  • View
    33

  • Download
    0

Embed Size (px)

DESCRIPTION

HP-SEE project and the HPC Bioinformatics Life Science g ateway. M. KOZLOVSZKY Obuda University. Overview. The HP-SEE project HP-SEE Life Sciences Virtual Community HP-SEE Bioinformatics Life Science gateway Sequence alignment a pplications  workflow based online bioinformatics services - PowerPoint PPT Presentation

Citation preview

Page 1: HP-SEE project  and  the HPC  Bioinformatics Life Science g ateway

www.hp-see.eu

HP-SEEHP-SEE project and the HPC Bioinformatics Life Science gateway

M. KOZLOVSZKYObuda University

The HP-SEE initiative is co-funded by the European Commission under the FP7 Research Infrastructures contract no. 261499

Page 2: HP-SEE project  and  the HPC  Bioinformatics Life Science g ateway

Overview

The HP-SEE project

HP-SEE Life Sciences Virtual Community

HP-SEE Bioinformatics Life Science gateway

Sequence alignment applications workflow based online bioinformatics services

Working with workflows/gUSESummer School on Workflows and Gateways for Grids and Clouds 2012 – Budapest ,Hungary 2-6.07.2012 2

Page 3: HP-SEE project  and  the HPC  Bioinformatics Life Science g ateway

Pan-European e-Infrastructures vision

The Research Network infrastructure provides fast interconnection and advanced services among Research and Education institutes of different countries

The Research Distributed Computing Infrastructure (Grid, HPC) provides a distributed environment for sharing computing power, storage, instruments and databases through the appropriate software (middleware) in order to solve complex application problems

This integrated environment is called electronic infrastructure (eInfrastructure) allowing new methods of global collaborative research - often referred to as electronic science (eScience)

The creation of the eInfrastructure is one of the key objectives to facilitate building of the European Research Area

Network Infrastructure

e-Science Collaborations

DCI Infrastructure

Summer School on Workflows and Gateways for Grids and Clouds 2012 – Budapest ,Hungary 2-6.07.2012 3

Page 4: HP-SEE project  and  the HPC  Bioinformatics Life Science g ateway

Context: the Model -Converged Communication & Service Infrastructure for South-East Europe

SEE-LIGHT & GEANT

Comp physics,Comp chem, Life sciences

Seismology, Meteorology, Environment

HP-SEE

Summer School on Workflows and Gateways for Grids and Clouds 2012 – Budapest ,Hungary 2-6.07.2012 4

Page 5: HP-SEE project  and  the HPC  Bioinformatics Life Science g ateway

Context: Timeline and funding

Summer School on Workflows and Gateways for Grids and Clouds 2012 – Budapest ,Hungary 2-6.07.2012 5

Page 6: HP-SEE project  and  the HPC  Bioinformatics Life Science g ateway

HP-SEE: Project

Contract : RI-261499 Project type: CP & CSA Call: INFRA-2010-1.2.3: VRCs Start date: 01/09/2010 Duration: 24 + 9 months Total budget: 3 885 196 € Funding from the EC: 2 100 000 € Total funded effort, PMs: 539.5 Web site: www.hp-see.eu

Summer School on Workflows and Gateways for Grids and Clouds 2012 – Budapest ,Hungary 2-6.07.2012 6

Page 7: HP-SEE project  and  the HPC  Bioinformatics Life Science g ateway

HP-SEE: Partnership

Contractors (14)

Third Party / JRU mechanism usedassociate universities / research centresSummer School on Workflows and Gateways for Grids and Clouds 2012 – Budapest ,Hungary 2-6.07.2012 7

Page 8: HP-SEE project  and  the HPC  Bioinformatics Life Science g ateway

HP-SEE: Project Objectives

Objective 1 – Empowering multi-disciplinary virtual research communities

Objective 2 – Deploying integrated infrastructure for virtual research communities Including a GEANT link to Southern Caucasus

Objective 3 – Policy development and stimulating regional inclusion in pan-European HPC trends

Objective 4 – Strengthening the regional and national human network

8Summer School on Workflows and Gateways for Grids and Clouds 2012 – Budapest ,Hungary 2-6.07.2012 8

Page 9: HP-SEE project  and  the HPC  Bioinformatics Life Science g ateway

The HP-SEE Life Science VRC and its objectives

Main goal: Utilize the combined HPC resources with regional needs coming from the life/bioscience

communities, fostering the research process in the field within the region with the help of the large-scale high availability infrastructure, and facilitate the cooperation between the sparsely distributed life science research centres.

Data and limitations

The Life Sciences domain has been revolutionized by advances in both computer hardware and software algorithms.

Assembling the Human Genome Gene-expression chips to understand cellular processes

Exponential growth in the amount of publicly available genomic data. GeneBank

Traditional database approaches are no longer sufficient for rapidly performing life science queries involving the fusion of data types.

Existing computational tools were created by experimentalists dealing with data sets that were miniscule in comparison to those available today. As a result, software that was once perfectly adequate now performs slowly or is incapable of successful analysis on traditional computational platforms.

Summer School on Workflows and Gateways for Grids and Clouds 2012 – Budapest ,Hungary 2-6.07.2012 9

Page 10: HP-SEE project  and  the HPC  Bioinformatics Life Science g ateway

Accessible infrastructure

HP-SEE Supercomputing infrastructure

SEE-GRID-SCI Grid infrastructure

Country Center Computing Cores

Teraflops

Bulgaria

BG Blue Gene/P 8192 27.85

HPCG 576 3.23

FYR of Macedonia

FINKI SC 2016 9

Hungary

NIIFI SC 144 0.5

Pecs SC 1152 10

Debrecen SC 3078 18

Szeged 2112 14

Romania

InfraGRID 400 2.5

IFIN_BIO 256 2.72

IFIN_BC 368 3.9

NCIT 562 3.4

UVT Blue Gene/P 4096 13.9

Serbia

PARADOX 672 6.26

TOTAL 23624 115.26

Summer School on Workflows and Gateways for Grids and Clouds 2012 – Budapest ,Hungary 2-6.07.2012 10

Page 11: HP-SEE project  and  the HPC  Bioinformatics Life Science g ateway

HP-SEE’s LS Applications

7 applications from 5 countries Greece:

Searching for novel miRNA genes and their targets (miRs) Network models of short and long term memory (CMSLTM)

Montenegro: DNA Multi-core Analysis (DNAMA)

Hungary: Deep sequencing for short fragment alignment (DeepAligner) - gUSE & workflow based In-silico Disease Gene Mapper (DiseaseGene) - gUSE & workflow

based

Georgia: Modeling of some biochemical processes with the purpose of realization of their thin and

purposeful synthesis (MSBP)

Armenia: Molecular Dynamics Study of Complex systems (MDSCS)

Summer School on Workflows and Gateways for Grids and Clouds 2012 – Budapest ,Hungary 2-6.07.2012 11

Page 12: HP-SEE project  and  the HPC  Bioinformatics Life Science g ateway

Why gUSE/WS-PGRADE

Infrastructure HP-SEE infrastructure

Based on gLite and Arc as middlewareAuthentication procedures are painfull (as usual)

Interoperabilty with grids is a plus Application

Workflow like process with embedded (legacy) applications Restricted input parameter sets for the algorithms Service like operation Portal features for a community

Knowledge, licensing & support Open source software environment needed Knowledge transfer required for the application specific modules

Summer School on Workflows and Gateways for Grids and Clouds 2012 – Budapest ,Hungary 2-6.07.2012 12

Page 13: HP-SEE project  and  the HPC  Bioinformatics Life Science g ateway

HP-SEE Bioinformatics eScience Gateway

HP-SEE Bioinformatics eScience Gateway hosted at Obuda University, operated by MTA SZTAKI.

gUSE+WS-PGRADE (v3.3.2) - Liferay based SEE region’s supercomputing & grid infrastructure used Accessible at: http://ls-hpsee.nik.uni-obuda.hu:8080/liferay-portal-6.0.5

Summer School on Workflows and Gateways for Grids and Clouds 2012 – Budapest ,Hungary 2-6.07.2012 13

Page 14: HP-SEE project  and  the HPC  Bioinformatics Life Science g ateway

Architecture and application porting steps

Unified porting steps of the applications:

Summer School on Workflows and Gateways for Grids and Clouds 2012 – Budapest ,Hungary 2-6.07.2012 14

Page 15: HP-SEE project  and  the HPC  Bioinformatics Life Science g ateway

DeepAligner-Deep sequencing for short fragment alignment

Description & Objectives

Mapping short fragment reads to open-access eukaryotic genomes is solvable by a group of algorithms (BLAST, BWA, PatternHunter, and other sequence alignment tools – BLAST /mpiblast or scalablast/ is one of the most frequently used tool in bioinformatics and the others are relative new fast light-weighted tools that aligns short sequences. Local installations of these algorithms are typically not able to handle such problem size therefore the procedure runs slowly, while web based implementations cannot accept high number of queries. The HP-SEE infrastructure allows accessing massively parallel architectures and the sequence alignment code is distributed free for

academia.

ResultOnline workflow based short sequence alignment service

ImpactFreely available service/code for large scale short sequence alignment

Collaborations Hungarian Bioinformatics Association, Semmelweis University HP-SEE infrastructure used: Hungarian HPC, NIIF’s supercomputing sites

Summer School on Workflows and Gateways for Grids and Clouds 2012 – Budapest ,Hungary 2-6.07.2012 15

Page 16: HP-SEE project  and  the HPC  Bioinformatics Life Science g ateway

DeepAligner-Deep sequencing for short fragment alignment (contd.)

Small scale launch (Home cluster): PBS/Linux Cluster, at the Obuda University – John von Neumann Faculty of Informatics.

Activity and technical assistance in pre-production stage: Technical assistance was provided by MTA SZTAKI and NIIF.

Porting: Application was ported using(Perl/C). Workflow and GUI was created for the application by Obuda University.

BenchmarkingScaled from 32 cores to 96 cores (MPI).

DeepAligner Status The online service is using two from NIIF’s supercomputing infrastructure (Budapest site and Szeged site).

Foreseen activities: Parameter assignments optimization of the GUI, more scientific publications about short sequence alignment. Further scaling is planned with performance analysis.

More information: http://hpseewiki.ipb.ac.rs/index.php/DeepAligner

Summer School on Workflows and Gateways for Grids and Clouds 2012 – Budapest ,Hungary 2-6.07.2012 16

Page 17: HP-SEE project  and  the HPC  Bioinformatics Life Science g ateway

Development & working on gUSE/WS-PGRADE

Pros Close collaboration and useful support (pros)

ARC middleware connector was developed from scratch by MTA SZTAKI on request

ASM and ARC submitter related bugs have been found and reported

Helpful and skilled support & development team Cons

ARC middleware problems (internal) hard to find

Summer School on Workflows and Gateways for Grids and Clouds 2012 – Budapest ,Hungary 2-6.07.2012 17

Page 18: HP-SEE project  and  the HPC  Bioinformatics Life Science g ateway

Future plans

Additional plug-in like online bioinformatics services More sequence alignment workflows More sequence multiple alignment workflows Sequence database quality measurement workflows

Open up the gateway for users outside SEE region

Thank you for you attention!

Questions?

Summer School on Workflows and Gateways for Grids and Clouds 2012 – Budapest ,Hungary 2-6.07.2012 18

Page 19: HP-SEE project  and  the HPC  Bioinformatics Life Science g ateway

gUSE/WS-PGRADE architecture

ASMApplication specific Module WS-PGRADE

DeepAligner DiseaseGene

Summer School on Workflows and Gateways for Grids and Clouds 2012 – Budapest ,Hungary 2-6.07.2012 19