Sequence Services Phase 2--Eagle Genomics and Cycle Computing

Preview:

DESCRIPTION

William Spooner (Eagle) and Carl Chesal (Cycle) introduce the proof of concept provided by this consortium for Phase 2 of the Pistoia Alliance Sequence Services project. The presentation was delivered at the Pistoia Alliance Conference in Boston, MA, on April 24, 2012.

Citation preview

Sequence Services Phase 2Pistoia Alliance AGM, Boston MA, April 24th 2012

NurtureBuild trust, shared language

CollaborateEnterpriseAcademiaGovernmentFoundations Open

Innovation

ExploreWork together

to find a common purpose

ExploitTurn ideas into

tangible benefits

2/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012

The Requirements

3/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012

$

?

Share

FUNCTIONALLogin and workspace

Manage users

Manage data

Upload private data

Access public data

Export

Delete/archive

Manage applications

Upload scripts/pipelines

Analyse data

Monitor use/performance

NON-FUNCTIONAL

Charging Model

Service Support

Operational Requirements

Security Requirements

The Partnership

4/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012

Established: 2005 2008

Domain: High performance computing

Operational bioinformatics

Employees: 18, 16 engineers 12, 9 engineers, pool of external consultants

Location: Across USA/Canada Cambridge, UK

Sectors: Pharmaceutical, biotechnology, financial, computer gaming, engineering, academia.

Pharmaceutical, biotechnology, agri-biotechnology, consumer goods, food, other life sciences.

Customers: North America, Europe North America, Europe, Asia

Partnerships: Schrodinger, VMWare, Canonical

Amazon Web Services, Cognizant, European Bioinformatics Institute, University of Manchester, John Innes Centre

The Platform

The platform for storage, analysis and sharing of life sciences data in the cloud

5/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012

The Proposal

6/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012

ANALYSESUpload

Pipeline process

Stored data

Manual process

Start

StopStored data

Share

Depositor

Collaborator

BioinformaticianCIO

Biologist

The Architecture

7/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012

A

mazo

n E

C2 C

loud

Gateway Shiboleth

Web ServerCycleCloud

MySQLAssets

DB

Bioinformatician

Collaborator

Depositor

OpenAM IdP

Customer Single Sign On

SA

ML

Token

Exch

an

ge

HTTPSWeb

Web ServerSEEK

HTTPSWeb

Encrypt/Decrypt

Data FiData Fi

Data Files

Customer Sandbox

S3 Storage

Data FiData Fi

Data Files

Customer SandboxEC2/AMIs

Customer SandboxEC2/AMIs

Condor

Ensembl

BioLinux

HTTPSWeb

SA

ML

Au

then

ticate

HTTPSWeb

HTTPSWeb

The Present

8/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012

Bioinformatician

DepositorCollaborator

The1000 Genomes• A Deep Catalogue of Human

Variation– Freely available on AWS– 1,700 Individuals– 200Tb data– 10,000s data files– Almost no metadata!

• ElasticAP evaluating 1000 Genomes Project Pilot 2– 20X resequencing– 2 trios (6 individuals)

TRUP: Tumor RNA-seq Unified Pipeline

• Collaboration between–Max Planck Institute for Molecular

Genetic– Bayer Pharma AG

• Identifies gene fusion events in tumor samples

• Involves both alignment and de-novo sequencing steps

• Pipeline is being implemented on ElasticAP– Using public GEO datasets for validation

The PoC

11/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012

FUNCTIONAL

Login and workspace

Load dataManage public

dataLoad scripts and

pipelinesAnalyse dataExport dataArchive dataManage

applicationsManage usersMonitor

use/performance

NON-FUNCTIONAL

Charging ModelService SupportOperational

RequirementsSecurity

Requirements

KEYFully implementedPartially implementedTo-do list

$?

The Prior Art• Eagle have been building analysis

pipelines and hosting secure cloud apps for years.

• Cycle have been developing HPC solutions and deploying them on the cloud for years

• We built this as a platform we could use ourselves in order to carry on delivering what we already do.

• But now the results are interactive, and everyone can share and participate.

• The most common tasks won’t need to involve us at all.

12/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012

The Price• AWS-style pay as you go business model

– Free sign-up and account creation– Tiered applications by the hour.– Discounts for up-front reservation fee.– Offline data import/export also available.– Flat-rate data by the gigabyte-month.– Backup data by the gigabyte-month.– Monthly billing.– Support contracts available.

• Customisation and new pipelines at Eagle/Cycle standard consulting rates.

13/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012

The Plan• Early access to preferred partner

customers in July– talk to us now if you’d like to be part of that.

• Full production in September with all partial/todo items implemented.

• Increased number of public datasets.

• Increased range of applications and pipelines.

• User interface improvements based on feedback from early access period.

14/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012

The Potential• Available as customisation projects:

– Conversions to other clouds.

– Conversions to run on in-house infrastructure.

• Truly secure and scalable R&D collaboration environment.– Applicable to all sciences, not just genomics.

15/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012

Change the way you do science

Will Spoonerwill.spooner@eaglegenomics.com

www.eaglegenomics.com

Recommended