40
| | Scientific IT Services (SIS) Michal Okoniewski, Samuel Fux, Scientific IT Services, ETH Zurich 24.11.2016 1 Bioinformatics and cluster support by Scientific IT Services of ETH Using the CLC Genomics Server with the EULER cluster

Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS)

Michal Okoniewski, Samuel Fux, Scientific IT Services, ETH Zurich

24.11.2016 1

Bioinformatics and cluster support by Scientific IT Services of ETH

Using the CLC Genomics Server with the EULER cluster

Page 2: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS) 24.11.2016 2

SIS @ ETH Zurich: An Integrative Approach

Consulting & Training

High-Performance Computing

Scientific Software and

Data Management Research

Informatics

Page 3: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS) 24.11.2016 3

Bio/Medical/Social Sciences: Data Avalanche ...

Growing a quantitative branch

Computing know-how greatly varying

Availability of “big data”

Much larger data volumes

Increasing data complexity

More complex workflows

Large collaborations

More groups and sites

More people

Longer projects

Page 4: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS) 24.11.2016 4

… and Data Analysis / Management

Data Management

Provenance Tracking

Lifecycle Mgt.

Automation

Processing

Analysis

Data Integration

Visualization

Sharing

Page 5: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS) 24.11.2016 5

Triaging in the «Jungle» of Computing Options

Page 6: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS) 24.11.2016 6

Consulting and Training

Courses:

Best practices in scientific programming

Various Python courses

Introduction to Apache Spark for large scale data processing

Workshop on next-generation sequencing analysis using HPC

Usage of portals (e.g. proteomics data analysis)

Introduction to “electronic lab notebook”

Data management plans for research proposals

(sustainability, reproducible research)

Procurement of large scale computational infrastructures

Page 7: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS) 7

EULER cluster. Euler I (right) & II (left)

© 2

015

Oliv

ier

Byrd

e

24.11.2016

Page 8: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS)

EULER stands for

Erweiterbarer, Umweltfreundlicher, Leistungsfähiger ETH Rechner

It is the 5th central (shared) cluster of ETH

1999–2007 Asgard ➔ decommissioned

2004–2008 Hreidar ➔ integrated into Brutus

2005–2008 Gonzales ➔ integrated into Brutus

2007–2016 Brutus

2014–2018+ Euler

It benefits from the 16 years of experience gained with

those previous large clusters

8

What is EULER?

24.11.2016

Page 9: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS)

Like its predecessors, Euler has been financed (for the most

part) by its users

Since 2014 over 70 (!) research groups from almost all departments of

ETH have invested in Euler

These so-called “shareholders” receive a share of the cluster’s

resources (processors, memory, storage) proportional to their

investment

The (small) share of Euler financed by IT Services is open to all

members of ETH

The only requirement is a valid NETHZ account

These “guest users” can use limited resources

If someone needs more computing power, he/she can invest in the

cluster and become a shareholder at any time

9

Shareholder model

24.11.2016

Page 10: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS) 10

Shareholders by department (January 2016)

BSSE5%

CHAB14%

ERDW25%

GESS4%

MATH8%

MATL1%

MAVT10%

USYS14%

Other4%

Public11%

Cloud3%

Admin1%

24.11.2016

Page 11: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS)

Euler I (2014) 448 x HP BL460c Gen8 (352 x 64 GB, 32 x 128 GB, 64 x 256 GB)

Each node contains 2 x 12-core Intel Xeon E5-2697v2 @ 2.7 GHz

Euler II (2015) 768 x HP BL460c Gen9 (736 x 64 GB, 32 x 512 GB)

Each node contains 2 x 12-core Intel Xeon E5-2680v3 @ 2.5 GHz

Plus 4 very large memory nodes (4 x 3072 GB)

Euler III (2016) Over 1200 compute nodes with faster CPUs (3.0-3.5 GHz)

To be delivered in November 2016, in production in January 2017

High-speed networks 10-Gigabit Ethernet (Cisco) for file access

56-Gigabit InfiniBand (Mellanox) for inter-node communication

11

Hardware generations

24.11.2016

Page 12: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS) 12

Performance growth

0

200

400

600

800

1000

1200

2010 2011 2012 2013 2014 2015 2016

Peak p

erf

orm

ance [

TF

]

Euler

Brutus

24.11.2016

Page 13: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS)

The only requirement to use Euler is a valid NETHZ account No need to fill out an account request form

Immediate access using your NETHZ credentials

You can login right now using your NETHZ account ssh [email protected]

Your Euler account is created automatically upon first login Becomes active once you have accepted the cluster’s usage rules

Euler uses NETHZ database to identify shareholders and guest users, and sets privileges and priorities automatically

13

Who can use Euler

24.11.2016

Page 14: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS) 24.11.2016 14

Adding server plugin in your workbench

Start the Workbench as administrator

In the Workbench, go to Help => Plugins

Page 15: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS) 24.11.2016 15

Adding server plugin

Help => Plugins

Page 16: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS)

File=> CLC Server Login

24.11.2016 16

Connect to CLC server

Page 17: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS)

Login with your CLC username and password

24.11.2016 17

Connect to CLC server

Page 18: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS) 24.11.2016 18

You have own space for data on EULER

Page 19: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS)

Import

“Workbench” to get your local file

Choose destination on EULER

24.11.2016 19

Importing the data into the EULER space

Page 20: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS)

Local file

24.11.2016 20

Importing the data into the EULER space

Page 21: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS)

Choose a local file

24.11.2016 21

Importing the data into the EULER space

Page 22: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS)

Choose a destination on Euler

24.11.2016 22

Importing the data into the EULER space

Page 23: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS)

Choosing the right cluster queue for the compute jobs

“Grid” option

24.11.2016 23

Data processing on EULER

Page 24: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS)

Choosing the right cluster queue for the compute jobs

“Grid” option

24.11.2016 24

Data processing on EULER

Page 25: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS)

Multiple BLAST

Next-gen sequencing alignment to a genome

De-novo assembly

Alignment to contigs

….

Please only choose the parallel queues if you are running

on of these tasks

24.11.2016 25

Typical tasks that can use multiple cores

Page 26: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS)

Import the paired reads into the Workbench

24.11.2016 26

De-novo assembly example

Page 27: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS)

Choose the de-novo assembly option…

24.11.2016 27

De-novo assembly example

Page 28: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS)

… and Grid option with large memory

24.11.2016 28

De-novo assembly example

Page 29: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS)

Choose the input files…

24.11.2016 29

De-novo assembly example

Page 30: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS)

… and parameters

24.11.2016 30

De-novo assembly example

Page 31: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS)

Wait until the process is executed (surprisingly fast…)

24.11.2016 31

De-novo assembly example

Page 32: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS)

Check the created assembly

24.11.2016 32

De-novo assembly example

Page 33: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS)

It is enough to use Grid => single_core

Select the files

24.11.2016 33

Exporting the (bigger) data from EULER

Page 34: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS)

Careful with “custom file name” for multiple files – use

wildcards, e.g. {1}.{2} for filename, extension

24.11.2016 34

Exporting the (bigger) data from EULER

Page 35: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS)

Destination can be one of admins’ folders

Talk to the admin

To get the files

To get own export folder on EULER

24.11.2016 35

Exporting the (bigger) data from EULER

Page 36: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS)

Scientific IT Services and ID representatives in DBIOL are

glad to help you with your bioinformatic needs

CLC support

Help with parallelization on EULER of big computing

tasks

Help with EULER command line software stack

Help with data co-analysis

Time-slots for small projects (up to 3 man/days of work) available

upon request and discussion

“Code clinic” for your programs/scripts

24.11.2016 36

Summary – support modes, Scientific IT Service

Page 37: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS)

https://sis.id.ethz.ch

https://scientific.ethz.ch

24.11.2016 37

Summary – support modes, Scientific IT Service

Page 38: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS)

Wiki

https://scicomp.ethz.ch

https://scicomp.ethz.ch/wiki/Getting_started_with_clusters

Ticket system

http://tinyurl.com/cluster-support (NETHZ authentication)

E-mail

[email protected]

Please do not send questions to individual members of the team

Person-to-person

Contact us to set up an appointment at your place

Visit us at Weinbergstrasse 11, WEC, D floor (please call first)

38

Contact / Getting help - EULER

24.11.2016

Page 39: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

||Scientific IT Services (SIS)

Samuel Fux – application specialist

EULER software stack

CLC server administration

Michal Okoniewski – bioinformatician

bioinformatics support

co-analysis projects

code clinic

24.11.2016 39

Responsible for bioinformatic software in SIS

Page 40: Bioinformatics and cluster support by Scientific IT ... · Lifecycle Mgt. Automation Processing Analysis Data Integration Visualization Sharing. ... 448 x HP BL460c Gen8 (352 x 64

Thank you for your attention!

Questions/comments?

24.11.201640