Science Cloud Paul Watson Newcastle University, UK paul.watson@ncl.ac.uk

Preview:

Citation preview

Science Cloud

Paul WatsonNewcastle University, UK

paul.watson@ncl.ac.uk

Research Challenge

Understanding the brain is the greatest informatics challenge

• Enormous implications for science:

• Medicine

• Biology

• Computer Science

Collecting the Evidence

100,000 neuroscientists generate huge quantities of data – molecular (genomic/proteomic)– neurophysiological (time-series activity)– anatomical (spatial)– behavioural

Neuroinformatics Problems

• Data is:• expensive to collect but rarely shared• in proprietary formats & locally described

• The result is:• a shortage of analysis techniques that can be applied

across neuronal systems• limited interaction between research centres with

complementary expertise

Data in Science

• Bowker’s “Standard Scientific Model”

1. Collect data

2. Publish papers

3. Gradually loose the original data

The New Knowledge Economy & Science & Technology Policy, G.C. Bowker

• Problems:– papers often draw conclusions from data that is not

published– inability to replicate experiments– data cannot be re-used

Codes in Science

• Three stages for codes

1. Write code and apply to data

2. Publish papers

3. Gradually loose the original codes

• Problems:– papers often draw conclusions from codes that are

not published– inability to replicate experiments– codes cannot be re-used

Plan

• Neuroinformatics - a challenging e-science application• CARMEN – addressing the challenges• Cloud Computing for e-science

– Lessons we’ve Learnt• The Promise of Commercial Clouds

cracking the neural code

neurone 1

neurone 2

neurone 3

raw voltage signal data typically collected using single or multi-electrode array recording

Focus on Neural Activity

Epilepsy Exemplar

Data analysis guides surgeon during operation

Further analysis provides evidence

WARNING!The next 2 Slides show an exposed human brain

CARMEN

enables sharing and collaborative exploitation of data, analysis code and expertise that are not physically collocated

CARMEN Project

Stirling

St. Andrews

Newcastle

York

Sheffield

Cambridge

ImperialPlymouth

Warwick

Leicester

Manchester

UK EPSRC e-Science Pilot

$7M (2006-10)

20 Investigators

Industry & Associates

CARMEN e-Science Requirements

• Store– very large quantities of data (100TB+)

• Analyse– suite of neuroinformatics services– support data intensive analysis

• Automate– workflow

• Share– under user-control

Background: North East Regional e-Science Centre

• 25 Research Projects across many domains:• Bioinformatics, Ageing & Health, Neuroscience, Chemical

Engineering, Transport, Geomatics, Video Archives, Artistic Performance Analysis, Computer Performance Analysis,....

• Same key needs:

Store

Analyse

AutomateShare

Result: e-Science Central

• Integrated Store-Analyse-Automate-Share infrastructure• Web-based• Generic

– CARMEN neuroinformatics & chemistry as pilots

Science Cloud Architecture

Data storage

and

analysis

Access over Internet

(typically via browser)

Upload data &

services

Run analyses

Cloud Services Continuum (based on Robert Anderson)

Platform(PaaS)

Infrastructure(IaaS)

Software(SaaS)

Google Apps

Google AppEngine

Amazon EC2 & S3

http://et.cairene.net/2008/07/03/cloud-services-continuum/

Microsoft Azure

Salesforce.com

Science Cloud Options

Cloud Infrastructure:Storage & Compute

Scie

nce

Ap

p 1

....

Scie

nce

Ap

p n

Cloud Infrastructure: Storage & Compute

Science Platform

ScienceApp 1 .... Science

App n

Users

Service Developers

CARMEN Cloud

Filestore with PatternSearch

Database

Metadata

ServiceRepositoryProcessing

Workflow

Enactment

Workflo

w

Secu

rit

y

Browsers &

Rich Clients

Editing and Running a Workflow on the Web

Viewing the output of Workflow Runs

Workflow

Result File

Viewing results

Blogs and links

Communicating Results

Linking to results & workflows

What we learnt: Moving into a Cloud

• Moving existing technologies into a cloud can be difficult– some can’t run in a Cloud at all

Raw Data Exploration with Signal Data Explorer

What we learnt : Scalability

• Clouds offer the potential for scalability– grab compute power only when needed

• But developers have to write scalable code– for Infrastructure as a Service Clouds

Dynasoar: Dynamic Deployment

29

C WSP

req

res

1

Host Provider

node 1s2, s5

node 2

node ns2

Web Service Provider

3

2: service fetch &deploy

SR

Service Repository

R

The deployed service remains in place andcan be re-used - unlike job scheduling

A request to s4

Dynasoar

30

C WSP

req

res

Host Provider

node 1s2, s5

node 2

node ns2

Web Service Provider

Consumer

A request for s2 is routed to an existing

deployment of the service

Adaptive Dynamic Deployment with Dynasoar

0

50

100

150

200

250

300

350

400

450

0.03

0.03

0.03

0.06

0.06

0.13

0.13

0.13

0.25

0.25 0.

5

0.5

0.5 1 1 1

Arrival Rate (messages per second)

Res

pons

e tim

e (s

econ

ds)

0

2

4

6

8

10

12

14

16

18

Proc

esso

rs in

poo

l

Response time(Seconds)

processors in pool

Adding Processors as you need them optimises resources and saves money in pay-as-you-go clouds

Commercial Pay-as-you-go cloudsWould allow us to avoid this limit

Hot Off the Press..

• Recent experiments with Microsoft Azure Cloud– running Chemical analyses– Silverlight UI

Thanks to:

- Paul Appleby & Team at the Microsoft Technology Centre, Reading

- & MS e-Science Group

Microsoft Azure Cloud for e-Science Demo

Why are Commercial Clouds Important: Before

Research

1. Have good idea

2. Write proposal

3. Wait 6 months

4. If successful, wait 3 months

5. Install Computers

6. Start Work

Science Start-ups

1. Have good idea

2. Write Business Plan

3. Ask VCs to fund

4. If successful..

5. Install Computers

6. Start Work

Why Use Commercial Clouds:

1. Have good idea

2. Grab nodes from Cloud provider

3. Start Work

4. Pay for what you used

• also scalability, cost, sustainability

Commercial Clouds to the Rescue?

• Focus currently on infrastructure as a service

• But, this is only part of the stack

• Can we have pay-as-you-go Science Cloud Platforms?

A Sustainable Science Cloud

Science Platform as a Service

ScienceApp 1

.... ScienceApp n

CommercialClouds

?

?

Problem:deliveringthe e-science platform

www.inkspotscience.com

e-Science Central

Cloud Infrastructure: Storage & Compute

Summary: e-Science Central & CARMEN

Software as a Service

Cloud Computi

ng

Social Networki

ng

e-Science Central /CARMEN

• Dynamic Resource

Allocation• Pay-as-you-Go*

• Web based• Works anywhere

• Controlled Sharing

• Collaboration• Communities

Summary

• e-Science Central– Store-Analyse-Automate-Share e-science platform– Adding content from a range of domains

• CARMEN is piloting this approach for neuroinformatics

• Cloud computing can revolutionise e-science– reduce time from idea to realisation

Recommended