16
ORNL is managed by UT-Battelle for the US Department of Energy ORNL Science DMZ and Bridging CADES Workflows Compute and Data Environment for Science (CADES) Advanced Data Workflow Group Ryan Prout

ORNL Science DMZ and Bridging CADES Workflows...7 Presentation name Bridging workflows through SDMZ •ARM CADES NCCS/OLCF –Atmospheric Radiation Measurement

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ORNL Science DMZ and Bridging CADES Workflows...7 Presentation name Bridging workflows through SDMZ •ARM  CADES  NCCS/OLCF –Atmospheric Radiation Measurement

ORNL is managed by UT-Battelle for the US Department of Energy

ORNL Science DMZ and Bridging CADES Workflows

Compute and Data Environment for Science (CADES)

Advanced Data Workflow GroupRyan Prout

Page 2: ORNL Science DMZ and Bridging CADES Workflows...7 Presentation name Bridging workflows through SDMZ •ARM  CADES  NCCS/OLCF –Atmospheric Radiation Measurement

2 Presentation name

Goals of The Presentation

• CADES overview• SDMZ Architecture• Workflows and Projects• Future

Page 3: ORNL Science DMZ and Bridging CADES Workflows...7 Presentation name Bridging workflows through SDMZ •ARM  CADES  NCCS/OLCF –Atmospheric Radiation Measurement

3 Presentation name

CADES Resources

• OpenStack Cloud– 16 Hypervisors (1,024 VCPU’s, 2TB Memory, 20.5 TB Storage) – Lustre host aggregate– Birthright for the lab– Expanding quickly

• Compute• Storage

– Lustre, NFS, Scality (Research Data Archive)

• DTNs and SDMZ• Workflow design and support

Page 4: ORNL Science DMZ and Bridging CADES Workflows...7 Presentation name Bridging workflows through SDMZ •ARM  CADES  NCCS/OLCF –Atmospheric Radiation Measurement

4 Presentation name

CADES Deployment

OIC

Cray Condos

CADES Moderate

CADES Open

Hybrid Cloud

Unique Heterogeneous Platforms

Large-Scale Storage

PHI Enclave

High-Speed Interconnects

• ~6000 Cores of Integrated Condos on Infiniband• ~5000 Cores of Hybrid, Expandable Cloud • SGI UV, Urika-GD/XA: GX• 5PB+ High-Speed Storage• ~3000 Cores of XK7

• ~5000 Cores of Integrated Condos on Infiniband• ~10,000 OIC Cores• Attested PHI Enclave• Integrated with UCAMS and XCAMS

.. and several other smaller projects... and several ORNL projects on OIC

Object store

Page 5: ORNL Science DMZ and Bridging CADES Workflows...7 Presentation name Bridging workflows through SDMZ •ARM  CADES  NCCS/OLCF –Atmospheric Radiation Measurement

5 Presentation name

Science DMZ roadmap

Page 6: ORNL Science DMZ and Bridging CADES Workflows...7 Presentation name Bridging workflows through SDMZ •ARM  CADES  NCCS/OLCF –Atmospheric Radiation Measurement

6 Presentation name

ORNL SDMZ Advantages

• Create advanced workflows– CrossBOW Project– ARM<->CADES<->NCCS/OLCF

• High performance• Scalability• Internal and External collaborations• Scientific workflow systems

Page 7: ORNL Science DMZ and Bridging CADES Workflows...7 Presentation name Bridging workflows through SDMZ •ARM  CADES  NCCS/OLCF –Atmospheric Radiation Measurement

7 Presentation name

Bridging workflows through SDMZ

• ARM <-> CADES <-> NCCS/OLCF– Atmospheric Radiation Measurement Climate Research Facility– Phase I: globus-url-copy for data movement and automation– Phase II: Globus APIs and application

• CrossBOW Project (Cross-platform Big Data Operational Workflows)– Globus APIs and CKAN server with CrossBOW API– Focus on deep learning workflows– Challenge of automating and scheduling of analysis tasks– https://ramanathanlab.org/cosc526/

Page 8: ORNL Science DMZ and Bridging CADES Workflows...7 Presentation name Bridging workflows through SDMZ •ARM  CADES  NCCS/OLCF –Atmospheric Radiation Measurement

8 Presentation name

ARM Resource Overview

Page 9: ORNL Science DMZ and Bridging CADES Workflows...7 Presentation name Bridging workflows through SDMZ •ARM  CADES  NCCS/OLCF –Atmospheric Radiation Measurement

9 Presentation name

Phase I: ARM Workflow and Connection Types

SSHFTP

ARM-DTN

CADES-ARM-DTN

OpenDTN

HPSS-DTN

StratusLogin

CumulusLogin

DataCADES

Compute/StorageCompute Jobs

storagemount

OLCFCompute/Storage

GSIFTP

storage mount

Compute Jobs

long term

storagestorage mount

Science DMZ

Page 10: ORNL Science DMZ and Bridging CADES Workflows...7 Presentation name Bridging workflows through SDMZ •ARM  CADES  NCCS/OLCF –Atmospheric Radiation Measurement

10 Presentation name

Phase II: ARM Workflow

• Start working towards the utilization of Globus APIs

• Shared Endpoints

• Integrate workflow portals

• Use the Phase I time to better understand processes and needs

Page 11: ORNL Science DMZ and Bridging CADES Workflows...7 Presentation name Bridging workflows through SDMZ •ARM  CADES  NCCS/OLCF –Atmospheric Radiation Measurement

11 Presentation name

CrossBOW: Cross-platform Big data Operational Workflow

Front-end portalhttp://data.ornl.gov

• Possible connections to ORNL (i.e., registered OLCF) users

• XCAMS integration• Scientific datasets

CKAN repository

• File pointers within Lustrefile system as URIs

• Access based on OLCF / CADES policies

LUSTRE PFS

Workflow Scheduler

METIS

DGX-1

Spark VMs

RHEA

TITAN

Model Cache

Model Zoo

Intermediate Results

DL-Specific

ML-Specific

ORNL LDRD 8279

Page 12: ORNL Science DMZ and Bridging CADES Workflows...7 Presentation name Bridging workflows through SDMZ •ARM  CADES  NCCS/OLCF –Atmospheric Radiation Measurement

12 Presentation name

Inside CrossBOW

Data Manager

Resource Manager

Model Manager

Visualization Manager

CKAN repository

HPFS Spark VMsRHEA

HyperoptParameter Manager

Spearmint

Resource Listener Resource Listener

New data availableSchedule runner Optional spawn Spark

Schedule parameter sweep

Fetch model

Model Cache

Model Zoo

Intermediate Results

Prel

oad

mod

el

Schedule runner

CKAN Web-service

CKAN Web-service

Resource URIs

Ove

rlap

with

SW

IFT

Page 13: ORNL Science DMZ and Bridging CADES Workflows...7 Presentation name Bridging workflows through SDMZ •ARM  CADES  NCCS/OLCF –Atmospheric Radiation Measurement

13 Presentation name

Grid and Cloud Engine

Catania Science Gateway Framework

http://csgf.readthedocs.io/en/latest/grid-and-cloud-engine/docs/index.html

Similar to CrossBOW in the ”Engine” piece

Page 14: ORNL Science DMZ and Bridging CADES Workflows...7 Presentation name Bridging workflows through SDMZ •ARM  CADES  NCCS/OLCF –Atmospheric Radiation Measurement

14 Presentation name

VA Data Transfer – Genomics Research

• Private 10G circuit for data transfer– Globus-url-copy between sites (not currently allowed to talk with Globus)– Private SDMZ

• Cloud inftrastructure for researchers– Big Data – Spark Cluster– VMs– Science Gateway

Page 15: ORNL Science DMZ and Bridging CADES Workflows...7 Presentation name Bridging workflows through SDMZ •ARM  CADES  NCCS/OLCF –Atmospheric Radiation Measurement

15 Presentation name

Future Work

• “Beef up” ORNL SDMZ infrastructure• Boost ORNL SDMZ project usage• Collaboration on SDMZ workflow systems• Investigate Globus API building blocks and portal integration• Create abstracted cross infrastructure environment to enable easy

workflow automation• Make Data sharing easy between environments• Private SDMZ – Medical SDMZ?

Page 16: ORNL Science DMZ and Bridging CADES Workflows...7 Presentation name Bridging workflows through SDMZ •ARM  CADES  NCCS/OLCF –Atmospheric Radiation Measurement

16 Presentation name

Credit to others

Susan Hicks (CADES)Jason Anderson (OLCF)Dustin Leverman (OLCF)Anthony Clodfelter (ARM)Rob Records (ARM)Arvind Ramanathan (CrossBOW)