18
R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1

R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1

Embed Size (px)

Citation preview

Page 1: R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1

R3 Kickoff Meeting

Ocean Observatories Initiative

Common Execution Infrastructure (CEI) Subsystem

OOI CI System Architecture Team:

1

Page 2: R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1

R3 Kickoff Meeting

CEI Developers

204/18/23

2

CEI DeveloperJohn BresnahanArgonne National Lab(part-time)

CEI DeveloperPatrick ArmstrongUniversity of Chicago

CEI DeveloperPierre RiteauUniversity of Chicago(part-time)

CEI Senior DeveloperPierre RiteauUniversity of Chicago

Page 3: R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1

R3 Kickoff Meeting

Subsystem Purpose

• Allow OOI applications and system to– Provide Highly Available (HA)

services– Scale to demand

• Enact OOI deployment policies in elastic environment

• Provide a deployment foundation for OOI CI

3

Page 4: R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1

R3 Kickoff Meeting

Core System Structure: Service Layers

4

Page 5: R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1

R3 Kickoff Meeting

CEI Scope

• Elastic Computing Services– Implement elastic computing services to provide on-demand scaling and high

availability.

• Execution Engine Catalog & Repository Services– Working with operations and ITV to develop and refine tools to upload and sync the

different deployable type representations adapted to each site.

• Process Management Services– Provide the management services for policy-based process execution within specified

deployable types intended to support the data distribution services; as such the processes are sequential and require primarily a process to resource match.

• Process Catalog & Repository Services– The Process Catalog and Repository Services maintain process definitions as well as

lists active processes.

• Integration with the National Computing Infrastructure– Provide the capability to deploy OOI processing on the Amazon cloud services as well

as academic clouds

5

Page 6: R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1

R3 Kickoff Meeting

High Availability and Scaling

• High Availability– Towards an always-on service model – Failures in outsourced resources– Providing a pool of replenishable compute

resources

• Autoscaling– Provide resources for peaks in demand– Ensure good utilization during “valleys” in

demand– Flexible resource mix

04/18/23

6

Page 7: R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1

R3 Kickoff Meeting

Resources for HA and Scaling

04/18/23

7

EPU ManagementMonitor and regulate set properties

based on system-specific and application-specific metrics

– Cloud resources are available on-demand, but any particular resource may fail at any time

– Applications/processes can absorb new resources– Applications/processes can tolerate failures

EPU

Page 8: R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1

R3 Kickoff Meeting

Managing Resources

8

Page 9: R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1

R3 Kickoff Meeting

EE ioncore 1.3

EPU ManagementEPU ManagementEPU Management

Elastic Processing Unit (EPU) Management

9

EE ioncore 1.2

context-agent

ou-agent

EE matlab 6.1

context-agent

ou-agent

Decision Engine

context-agent

ou-agent

Provisioner

IaaS

create instance

AMQP

OtherDTRS

CB

Page 10: R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1

R3 Kickoff Meeting

Making the EPU HA

ou-agent ou-agent ou-agent

EPU WorkerEPU WorkerEPU WorkerEPU WorkerEPU WorkerEPU Worker

EPU WorkerEPU WorkerEPU Worker

Bootstrap EPU

Dedicated DEProvisioner/DTRS

IaaS

create instance

AMQP

Other

cloudinit.d

Page 11: R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1

R3 Kickoff Meeting

Managing Processes

Page 12: R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1

R3 Kickoff Meeting

Creating a Process I

12

Process Definition Registry

Process Dispatcher EE type A instanceProcess Instance Registry

request to activateprocess X

ee-agentDecision Enginelookup

launch

enter

AMQP

Other

Page 13: R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1

R3 Kickoff Meeting

Creating a Process II

13

Process Definition Registry

Process Dispatcher

Provisioner/DTRS

IaaS

EE type A instance

EPU Management

Process Instance Registry

request to activateprocess X

ee-agentDecision Enginelookup

launch

enter

request instance

create instance

AMQP

Other

Page 14: R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1

R3 Kickoff Meeting

CC instance

CC instance

Inside an Execution Engine

14

EE type A instance

context-agent

ee-agent

ou-agent

supervisord

supervisord

supervisord

Matlab scriptC

C

M

CMR

CMR

CMK

CMKO

CMKO

datastream subscription result

Process Dispatcher

EPU Management

Package Server

process (adapter) 1

AMQP

Other

C – create M – monitor R – restart K – kill O – I/OC – create M – monitor R – restart K – kill O – I/O

Page 15: R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1

R3 Kickoff Meeting

Adventures in Availability

• Time to repair (TTR)– Diagnosis– Time to scale (TTS)

• PENDING (request)• STARTED (deployment)• RUNNING

(contextualization)

04/18/23

15

A = MTBFMTBF+MTTR

Mean time between failures

Mean time to repair

TTS: preliminary results for 2,000 VMs provisioned on AWS EC2

Page 16: R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1

R3 Kickoff Meeting

R3 Scope

• Process management– Activation and validation– New execution site registration

• Integration with National Infrastructure– Framework for integration of academic cloud

providers, TeraGrid and OSG– Integration with Microsoft cloud

16

Page 17: R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1

R3 Kickoff Meeting

R3 Activities

• Refine/change scope to achieve a complete and maintainable system

• Decide on specific solutions for R3 scope

17

Page 18: R3 Kickoff Meeting Ocean Observatories Initiative Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team: 1

R3 Kickoff Meeting

Questions?

18