HPC Clusters as code in the [almost]* Infinite cloud · HPC Clusters as code in the [almost]* Infinite cloud ... Front-end Functions Identity ... 39 years of computational chemistry

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

@boofla #scico

2016-06-30

HPC Clusters as code in the [almost]* Infinite cloud

Brendan Bouffler, Global Scientific Computing

Science is one of the greatest areas of

computation

Amazon

huge, really disruptive, impact

what we are about

Map of scientific collaboration between researchers - Olivier H. Beauchesne - http://bit.ly/e9ekP2

Research means Collaboration

Existing1. Oregon

2. California

3. Virginia

4. Dublin

5. Frankfurt

6. Singapore

7. Sydney

8. Seoul

9. Tokyo

10. Sao Paulo

11. Beijng

12. US GovCloud

1. Ohio

2. India

3. UK

4. Canada

5. China+1

AWS Region

Availability Zone regions are sovereign your data never

leaves

More time spent computing the data than moving the data.

Peak: 58K cores

Valley: 12K cores

In the first year:

Over 400,000 scenes available

Over 1 billion hits globally

Used for new product development by:

Colin Reilly

Senior Director GIS

NYC Department of IT & Telecom

“Within 2 hours of the Ecuador test project going

live with a first set of 1,300 images, each photo

had been checked at least 20 times. “It was one

of the fastest responses I’ve seen,” says Brooke

Simmons, an astronomer at the University of

California, San Diego, who leads the image

processing. Steven Reece, who heads the

Oxford team’s machine-learning effort, says that

results — a “heat map” of damage with possible

road blockages — were ready in another two

hours.”

ESA, Planet Labs, Copernicus data from ESA’s Sentinels & Zooniverse

The first live public test for an effort dubbed the Planetary Response Network (PRN)

Cray Supercomputer

A top500 supercomputer … 2013-style

750+ popular scientific applications

AWS Marketplace

immediately

Introducing Alces Flight - self-scaling HPC clusters instantly ready to compute, billed by the hour and using

the AWS Spot market by default to achieve supercomputing for ~1c per core per hour.

http://boofla.io/u/alcesFlight

Head

node

Instance

Compute

node

Instance

Compute

node

Instance

Compute

node

Instance

Compute

node

Instance

10G Network

Auto-scaling group

Virtual Private Cloud

/shared

Head Instance

2 or more cores (as needed)

CentOS 7.x

OpenMPI, gcc etc…

Choice of scheduler:

SGE (now)

PBSpro & SLURM (coming soon)

Compute Instances

2 or more cores (as needed)

CentOS 7.x, Amazon Linux

Auto Scaling group driven by scheduler queue length.

Can start with 0 (zero) nodes and only scale when there

are jobs.

Wall clock time: ~1 hour Wall clock time: ~1 week

Cost: the same

AWS Building blocks

TECHNICAL &

BUSINESS

SUPPORT

Account

Management

Support

Professional

Services

Solutions

Architects

Training &

Certification

Security

& Pricing

Reports

Partner

Ecosystem

AWS

MARKETPLACE

Backup

Big Data

& HPC

Business

Apps

Databases

Development

Industry

Solutions

Security

MANAGEMENT

TOOLS

Queuing

Notifications

Search

Orchestration

Email

ENTERPRISE

APPS

Virtual

Desktops

Storage

Gateway

Sharing &

Collaboration

Email &

Calendaring

Directories

HYBRID CLOUD

MANAGEMENT

Backups

Deployment

Direct

Connect

Identity

Federation

Integrated

Management

SECURITY &

MANAGEMENT

Virtual Private

Networks

Identity &

Access

Encryption

KeysConfiguration Monitoring Dedicated

INFRASTRUCTURE

SERVICES

RegionsAvailability

ZonesCompute

Storage

Objects,

Blocks, Files

DatabasesSQL, NoSQL,

CachingCDNNetworking

PLATFORM

SERVICES

App

Mobile

& Web

Front-end

Functions

Identity

Data Store

Real-time

Development

Containers

Source

Code

Build

Tools

Deploymen

t

DevOps

Mobile

Sync

Identity

Push

Notifications

Mobile

Analytics

Mobile

Backend

Analytics

Data

Warehousing

Hadoop

Streaming

Data

Pipelines

Machine

Learning

C4Intel Xeon E5-2666 v3, custom built for AWS.

Intel Haswell, 16 FLOPS/tick

2.9 GHz, turbo to 3.5 GHz

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/c4-instances.html

Feature Specification

Processor Number E5-2666 v3

Intel® Smart Cache 25 MiB

Instruction Set 64-bit

Instruction Set Extensions AVX 2.0

Lithography 22 nm

Processor Base Frequency 2.9 GHz

Max All Core Turbo Frequency 3.2 GHz

Max Turbo Frequency 3.5 GHz (available on c4.2xLarge)

Intel® Turbo Boost Technology 2.0

Intel® vPro Technology Yes

Intel® Hyper-Threading Technology Yes

Intel® Virtualization Technology (VT-x) Yes

Intel® Virtualization Technology for Directed

I/O (VT-d)

Yes

Intel® VT-x with Extended Page Tables (EPT) Yes

Intel® 64 Yes

All the traditional command-line tools will be familiar, but you can also create an Alces “session” and

immediately launch a desktop view of your cluster to run graphical apps.

Command Line (ssh) Graphical Console

graphical console

simultaneously with your

collaborators

work

together with the visual console in real-

time

Make more discoveries faster

There are cluster filesystem options, too– for when you need extreme I/O scaling.

Disrupting research, wherever it’s happening.

39 years of computational chemistry in 9 hours

Novartis ran a project that involved virtually screening 10 million

compounds against a common cancer target in less than a week. They

calculated that it would take 50,000 cores and close to a $40 million

investment if they wanted to run the experiment internally.

Partnering with Cycle Computing and Amazon Web

Services (AWS), Novartis built a platform thst ran

across 10,600 Spot Instances (~87,000 cores) and

allowed Novartis to conduct 39 years of

computational chemistry in 9 hours for a cost of

$4,232. Out of the 10 million compounds screened,

three were successfully identified.

CHILES will produce the first HI deep field, to be carried out with the VLA in B

array and covering a redshift range from z=0 to z=0.45. The field is centered

at the COSMOS field. It will produce neutral hydrogen images of at least 300

galaxies spread over the entire redshift range.

The team at ICRAR in Australia have been able to implement the entire

processing pipeline in the cloud for around $2,000 per month by exploiting the

SPOT market, which means the $1.75M they otherwise needed to spend on

an HPC cluster can be spent on way cooler things that impact their research

… like astronomers.

1. AWS Account

3. A problem to solve

Documents

HPC Clusters as code in the [almost]* Infinite cloud · HPC Clusters as code in the [almost]* Infinite cloud ... Front-end Functions Identity ... 39 years of computational chemistry