Dice presents-feb2014

Distributed Computing Environments Team

Marian Bubak

[email protected]

Department of Computer Science and Cyfronet AGH University of Science and Technology

Krakow, Poland

dice.cyfronet.pl

mailto:[email protected]

DICE Team

Academic Computer Centre CYFRONET AGH (1973)

120 employees

http://www.cyfronet.pl/en/

Department of Computer Science AGH (1980)

800 students, 70 employeeshttp://www.ki.agh.edu.pl/uk/index.htm

Faculty of Computer Science, Electronics and Telecommunication (2012)

2000 students, 200 employees

http://www.iet.agh.edu.pl/

AGH University of Science and Technology (1919)

16 faculties, 36000 students; 4000 employeeshttp://www.agh.edu.pl/en

Other 15 faculties

Distributed Computing Environments (DICE) Team http://dice.cyfronet.pl

• Investigation of methods for building complex scientific collaborative applications• Elaboration of environments and tools for e-Science• Integration of large-scale distributed computing infrastructures• Knowledge-based approach to services, components, and their semantic composition



http://www.ki.agh.edu.pl/uk/index.htm

http://www.ki.agh.edu.pl/uk/index.htm

http://www.iet.agh.edu.pl/

http://www.agh.edu.pl/en

http://dice.cyfronet.pl/

http://dice.cyfronet.pl/

• Investigating applicability of cloud computing model for complex scientific applications

• Optimization of resource allocation for applications on clouds• Resource management for services on heterogeneous resources • Urgent computing scenarios on distributed infrastructures• Billing and accounting models • Procedural and technical aspects of ensuring efficient yet secure

data storage, transfer and processing• Methods for component dependency management, composition

and deployment• Information representation model for cloud federating platform,

its components and operating procedures

Current research objectives

• Optimization of service deployment on clouds– Constraint satisfaction and

optimization of multiple criteria (cost, performance)

– Static deployment planning and dynamic auto-scaling

• Billing and accounting model – Adapted for the federated

cloud infrastructure– Handle multiple billing

models

• Supporting system-level (e)Science– tools for effective scientific

research and collaboration– advanced scientific analyses

using HPC/HTC resources

• Cloud security– security of data transfer– reliable storage and removal

of the data

• Cross-cloud service deployment based on container model

Topics for collaboration

seconds

~95%

3 hours

100 jobs

1 job

<10%asynchronous and frequent failures

and hardware/software upgrades

long and unpredictable job waiting times

J. T. Moscicki: Understanding and mastering dynamics in Computing Grids, UvA PhD thesis, promoter: M. Bubak, co-promoter: P. Sloot; 12.04.2011

Spatial and temporal dynamics in grids

• Grids increase research capabilities for science• Large-scale federation of computing and storage resources

– 300 sites, 60 countries, 200 Virtual Organizations– 10^5 CPUs, 20 PB data storage, 10^5 jobs daily

• However operational and runtime dynamics have a negative impact on reliability and efficiency

Completion timewith late binding.

Completion timewith early binding.

40 hours1.5 hours

J. T. Moscicki, M. Lamanna, M. Bubak, P. M. A.Sloot: Processing moldable tasks on the Grid: late job binding with lightweight user-level overlay, FGCS 27(6) pp 725-736, 2011

User-level overlay with late binding scheduling

• Improved job execution characteristics• HTC-HPC Interoperability• Heuristic resource selection• Application aware task scheduling

IaaS Provider

EEA Zoning

jClouds API

Support

BLOB storage support

Per-hour

instance billing

API Access

Published price

VM Image

Import / Export

Relational DB

support Score

Weight 20 20 10 5 5 5 3 2 1 Amazon AWS 1 1 1 1 1 1 0 1 27 2 Rackspace 1 1 1 1 1 1 0 1 27 3 SoftLayer 1 1 1 1 1 1 0 0 25 4 CloudSigma 1 1 0 1 1 1 1 0 18 5 ElasticHosts 1 1 0 1 1 1 1 0 18 6 Serverlove 1 1 0 1 1 1 1 0 18 7 GoGrid 1 1 0 1 1 1 0 0 15 8 Terremark ecloud 1 1 0 1 1 0 1 0 13 9 RimuHosting 1 1 0 0 1 1 0 1 12

10 Stratogen 1 1 0 0 1 0 1 0 8 11 Bluelock 1 1 0 0 1 0 0 0 5 12 Fujitsu GCP 1 1 0 0 1 0 0 0 5

• Performance of VM deployment times• Virtualization overhead Evaluation of open source cloud

stacks (Eucalyptus, OpenNebula, OpenStack)• Survey of European public cloud providers • Performance evaluation of top cloud providers (EC2,

RackSpace, SoftLayer)• A grant from Amazon has been obtained

M. Bubak, M. Kasztelnik, M. Malawski, J. Meizner, P. Nowakowski and S. Varma: Evaluation of Cloud Providers for VPH Applications, poster at CCGrid2013 - 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Delft, the Netherlands, May 13-16, 2013

Cloud performance evaluation

• Infrastructure model– Multiple compute and

storage clouds– Heterogeneous instance

types• Application model

– Bag of tasks– Leyered workflows

• Modeling with AMPL (A Modeling Language for Mathematical Programming)

• Cost optimization under deadline constraints

• Mixed integer programming

• Bonmin, Cplex solvers

M. Malawski, K. Figiela, J. Nabrzyski: Cost minimization for computational applications on hybrid cloud infrastructures, Future Generation Computer Systems, Volume 29, Issue 7, September 2013, Pages 1786-1794, ISSN 0167-739X, http://dx.doi.org/10.1016/j.future.2013.01.004

Cost optimization of applications on clouds

VPH-Share Master Int.

AdminDeveloper Scientist

Development Mode

VPH-Share Core Services Host

OpenStack/Nova Computational Cloud Site

Worker Node

Worker Node

Worker Node

Worker Node

Worker Node

Worker Node

Worker Node

Worker Node

Head Node

Image store (Glance)

Cloud Facade(secure

RESTful API )

Other CS

Amazon EC2

Atmosphere Management Service (AMS)

Cloud stack plugins (Fog)

Atmosphere Internal

Registry (AIR)

Cloud Manager

Generic Invoker

Workflow management

External application

Cloud Facade client

Customized applications may directly interface Atmosphere via its RESTful API called the Cloud Facade

The Atmosphere Cloud Platform is a one-stop management service for hybrid cloud resources, ensuring optimal deployment of application services on the underlying hardware.

P. Nowakowski, T. Bartynski, T. Gubala, D. Harezlak, M. Kasztelnik, M. Malawski, J. Meizner, M. Bubak: Cloud Platform for Medical Applications, eScience 2012 (2012)

Resource allocation management

DRI is a tool which can keeps track of binary data stored in a cloud infrastructure, monitor data availability and faciliate optimal deployment of application services in a hybrid cloud (bringing computations to data or the other way around).

Binarydata

registry

LOBCDER

Amazon S3 OpenStack Swift Cumulus

Register filesGet metadataMigrate LOBs

Get usage stats(etc.)

Distributed Cloud storage

Store and marshal data

End-user features(browsing, querying, direct access to data,checksumming)

VPH Master Int.

Data management portlet (with DRI

management extensions)

DRI Service

A standalone application service, capable of autonomous operation. It periodically verifies access to any datasets submitted for validation and is capable of issuing alerts to dataset owners and system administrators in case of irregularities.Validation

policy

Configurable validation runtime(registry-driven)

Runtime layer

Extensibleresource

client layer

Metadata extensions for DRI

Data reliability and integrity

Data security in clouds

Jan Meizner, Marian Bubak, Maciej Malawski, and Piotr Nowakowski: Secure storage and processing of confidential data on public clouds. In: Proceedings of the International Conference On Parallel Processing and Applied Mathematics (PPAM) 2013

• To ensure security of data in transit • Modern applications use secure tranport protocols

(e.g.TLS)• For legacy unencrypted protocols if absolutly needed,

or as additional security measure:– Site-to-Site VPN, e.g. between cloud sites is outside of

the instance, might use – Remote access – for individual users accessing e.g. from

their laptops

• Data should be secure stored and realiable deleted when no longer needed

• Clouds not secure enough, data optimisations preventing ensuring that data were deleted

• A solution:– end-to-end encryption (decryption key stays in

protected/private zone)– data dispersal (portion of data, dispersed between nodes

so it’s non-trivial/impossible to recover whole message)

• GworkflowDL language (with A. Hoheisel)

• Dynamic, ad-hoc refinement of workflows based on semantic description in ontologies

• Novelty– Abstract, functional blocks translated

automatically into computation unit candidates (services)

– Expansion of a single block into a subworkflow with proper concurrency and parallelism constructs (based on Petri Nets)

– Runtime refinement: unknown or failed branches are re-constructed with different computation unit candidates

T. Gubala, D. Harezlak, M. Bubak, M. Malawski: Semantic Composition of Scientific Workflows Based on the Petri Nets Formalism. In: "The 2nd IEEE International Conference on e-Science and Grid Computing", IEEE Computer Society Press, http://doi.ieeecomputersociety.org/10.1109/E-SCIENCE.2006.127, 2006

Semantic workflow composition

• Design of a laboratory for virologists, epidemiologists and clinicians investigating the HIV virus and the possibilities of treating HIV-positive patients

• Based on notion of in-silico experiments built and refined by cooperating teams of programmers, scientists and clinicians

• Novelty

– Employed full concept-prototype-refinement-production circle for virology tools

– Set of dedicated yet interoperable tools bind together programmers and scientists for a single task

– Support for system-level science with concept of result reuse between different experiments

T. Gubala, M. Bubak, P. M. A. Sloot: Semantic Integration of Collaborative Research Environments, chapter XXVI in “Handbook of Research on Computational Grid Technologies for Life Sciences, Biomedicine and Healthcare”, Information Science Reference IGI Global 2009, ISBN: 978-1-60566-374-6, pages 514-530

Cooperative virtual laboratory for e-Science

T. Gubala, K. Prymula, P. Nowakowski, M. Bubak: Semantic Integration for Model-based Life Science Applications. In: SIMULTECH 2013 Proceedings of the 3rd International Conference on Simulation and Modeling Methodologies, Technologies and Applications, Reykjavik, Iceland 29 - 31 July, 2013, pp. 74-81

• Concept of describing scientific domains for in-silico experimentation and collaboration within laboratories

• Based on separation of the domain model, containing concepts of the subject of experimentation from the integration model, regarding the method of (virtual) experimentation (tools, processes, computations)

• Facets defined in integration model are automatically mixed-in concepts from domain model: any piece of data may show any desired behavior

• Proposed, designed and deployed themethod for 3 domains of science:– Computational chemistry inside InSilicoLab

chemistry portal

– Sensor processing for early warning and crisis simulation in UrbanFlood EWS

– Processing of results of massive bioinformatic computations for protein folding method comparison

– Composition and execution of multiscale simulations

– Setup and management of VPH applications

Semantic integration for science domains

GridSpace - platform for e-Science applications• Experiment: an e-science application

composed of code fragments (snippets), expressed in either general-purpose scripting programming languages, domain-specific languages or purpose-specific notations. Each snippet is evaluated by a corresponding interpreter.

• GridSpace2 Experiment Workbench: a web application - an entry point to GridSpace2. It facilitates exploratory development, execution and management of e-science experiments.

• Embedded Experiment: a published experiment embedded in a web site.

• GridSpace2 Core: a Java library providing an API for development, storage, management and execution of experiments. Records all available interpreters and their installations on the underlying computational resources.

• Computational Resources: servers, clusters, grids, clouds and e-infrastructures where the experiments are computed.

E. Ciepiela, D. Harezlak, J. Kocot, T. Bartynski, M. Kasztelnik, P. Nowakowski, T. Gubała, M. Malawski, M. Bubak: Exploratory Programming in the Virtual Laboratory. In: Proceedings of the International Multiconference on Computer Science and Information Technology, pp. 621-628, October 2010, the best paper award.

Goal: Extending the traditional

scientific publishing model with computational access and interactivity mechanisms; enabling readers (including reviewers) to replicate and verify experimentation results and browse large-scale result spaces.

Challenges: Scientific: A common description schema for primary data (experimental data, algorithms, software, workflows, scripts) as part of publications; deployment mechanisms for on-demand reenactment of experiments in e-Science.Technological: An integrated architecture for storing, annotating, publishing, referencing and reusing primary data sources.Organizational: Provisioning of executable paper services to a large community of users representing various branches of computational science; fostering further uptake through involvement of major players in the field of scientific publishing.

P. Nowakowski, E. Ciepiela, D. Harężlak, J. Kocot, M. Kasztelnik, T. Bartyński, J. Meizner, G. Dyk, M. Malawski: The Collage Authoring Environment. In: Proceedings of the International Conference on Computational Science, ICCS 2011 (2011), Winner of the Elseview/ICCS Executable Paper Grand Challenge

E. Ciepiela, D. Harężlak, M. Kasztelnik, J. Meizner, G. Dyk, P. Nowakowski, M. Bubak: The Collage Authoring Environment: From Proof-of-Concept Prototype to Pilot Service in Procedia Computer Science, vol. 18, 2013

Collage - executable e-Science publications

17

Jun 2012

• Goal: Extend the traditional way of authoring and publishing scientific methods with computational access and interactivity mechanisms thus bringing reproducibility to scientific computational workflows and publications

• Scientific challenge: Conceive a model and methodology to embrace reproducibility in scientific worflows and publications

• Technological challenge: support these by modern Internet technologies and available computing infrastructures

• Solution proposed:• GridSpace2 – web-oriented distributed

computing platform• Collage – authoring environment for

executable publications Dec 2011

Jun 2011

GridSpace2 / Collage - Executable e-Science Publications

Results:• GridSpace2/Collage won Executable

Paper Grand Challenge in 2011• Collage was integrated with Elsevier

ScienceDirect portal so papers can be linked and presented with corresponding computational experiments

• Special Issue of Computers & Graphics journal featuring Collage-based executable papers was released in May 2013

• GridSpace2/Collage has been applied to multiple computational workflows in the scope of PL-Grid, PL-Grid Plus and Mapper projects

E. Ciepiela, P. Nowakowski, J. Kocot, D. Harężlak, T. Gubała, J. Meizner, M. Kasztelnik, T. Bartyński, M. Malawski, M. Bubak: Managing entire lifecycles of e-science applications in the GridSpace2 virtual laboratory–from motivation through idea to operable web-accessible environment built on top of PL-grid e-infrastructure. In: Building a National Distributed e-Infrastructure–PL-Grid, 2012

P. Nowakowski, E. Ciepiela, D. Harężlak, J. Kocot, M. Kasztelnik, T. Bartyński, J. Meizner, G. Dyk, M. Malawski: The Collage Authoring Environment. In: Procedia Computer Science, vol. 4, 2011

GridSpace2 / Collage - Executable e-Science Publications

E. Ciepiela, D. Harężlak, M. Kasztelnik, J. Meizner, G. Dyk, P. Nowakowski, M. Bubak: The Collage Authoring Environment: From Proof-of-Concept Prototype to Pilot Service. In: Procedia Computer Science, vol. 18, 2013

Common Information Space (CIS)• Facilitate creation, deployment and robust operation of Early Warning

Systems in virtualized cloud environment• Early Warning System (EWS): any system working according to four steps: monitoring, analysis, judgment, action (e.g. environmental monitoring)

B. Balis, M. Kasztelnik, M. Bubak, T. Bartynski, T. Gubala, P. Nowakowski, J. Broekhuijsen: The UrbanFlood Common Information Space for Early Warning Systems. In: Elsevier Procedia Computer Science, vol 4, pp 96-105, ICCS 2011.

Common Information Space• connects distributed component

into EWS and deploy it on cloud• optimizes resource usage taking into

acount EWS importance level• provides EWS and self monitoring• equipped with autohealing

• Simple yet expressive model for complex scientific apps• App = set of processes performing well-defined functions and

exchanging signals HyperFlow model JSON serialization{ "name": "...", name of the app "processes": [ ... ], processes of the app "functions": [ ... ], functions used by processes "signals": [ ... ], exchanged signals info "ins": [ ... ], inputs of the app "outs": [ ... ] outputs of the app}

• Supports a rich set of workflow patterns

• Suitable for various application classes

• Abstracts from other distributed app aspects (service model, data exchange model, communication protocols, etc.)

HyperFlow: model & execution engine

• HyperFlow model & engine for distributed apps

• App optimization & scheduling

• Autoscaling and dynamic app reconfiguration

• Multi-cloud resource provisioning

Execution Platform Provisioning platform

VM

VM

VM

Cloud

VM VM

Executor

Input data

Trigger app execution

Monitoring

Provisioner

Start/Stop/Reconfigure VM

Autoscaler

Optimizer & Scheduler

Reconfigure app

Scaling rules

measuremants

HyperFlow Enactment Engine

Enact

Execute

App model

App state

Composite App

Initial deployment

Platform for distributed applications

Objectives• Provide means for ad-hoc metadata model

creation and deployment of corresponding storage facilities

• Create a research space for metadata model exchange and discovery with associated data repositories with access restrictions in place

• Support different types of storage sites and data transfer protocols

• Support the exploratory paradigm by making the models evolve together with data

Architecture• Web Interface is used by users to create, extend

and discover metadata models• Model repositories are deployed in the PaaS

Cloud layer for scalable and reliable access from computing nodes through REST interfaces

• Data items from Storage Sites are linked from the model repositories

Colaborative metadata management

• MAPPER Memory (MaMe) a semantics-aware persistence store to record metadata about models and scales

• Multiscale Application Designer (MAD) visual composition tool transforming high level description into executable experiment

• GridSpace Experiment Workbench (GridSpace) execution and result management of experiments

choose/add/delete

Mapper A

Mapper B

SubmoduleA

SubmoduleB

MAD

Grid

Spac

e

MaM

eK. Rycerz, E. Ciepiela, G. Dyk, D. Groen, T. Gubala, D. Harezlak, M. Pawlik, J. Suter, S. Zasada, P. Coveney, M. Bubak: Support for Multiscale Simulations with Molecular Dynamics, Procedia Computer Science, Volume 18, 2013, pp. 1116-1125, ISSN 1877-0509

K. Rycerz, M. Bubak, E. Ciepiela, D. Harezlak, T. Gubala, J. Meizner, M. Pawlik, B.Wilk: Composing, Execution and Sharing of Multiscale Applications, submitted to Future Generation Computer Systems, after 1st review (2013)

K. Rycerz, M. Bubak, E. Ciepiela, M. Pawlik, O. Hoenen, D. Harezlak, B. Wilk, T. Gubala, J. Meizner, and D. Coster: Enabling Multiscale Fusion Simulations on Distributed Computing Resources, submitted to PLGrid PLUS book 2014

• A method and an environment for composing multiscale applications from single-scale models

• Validation of the the method against real applications structured using tools

• Extension of application composition techniques to multiscale simulations

• Support for multisite execution of multiscale simulations• Proof-of-concept transformation of high-level formal

descriptions into actual execution using e-infrastructures

Multiscale programming and execution tools

Research on Feature Modeling:• modelling eScience applications family

component hierarchy • modelling requirements • methods of mapping Feature Models to

Software Product Line architectures

Research on adapting Software Product Line principles in scientific software projects:• automatic composition of distributed

eScience applications based on Feature Model configuration

• architectural design of Software Product Line engine framework

B. Wilk, M. Bubak, M. Kasztelnik: Software for eScience: from feature modeling to automatic setup of environments, Advances in Software Development, Scientific Papers of the Polish Informations Processing, Society Scientific Council, 2013 pp. 83-96

Building scientific software based on Feature Model

CrossGrid 2002-2005 Interactive compute- and data-intensive applications

K-Wf Grid 2004-2007 Knowledge-based composition of grid workflow applications

CoreGRID 2004-2008 Problem solving environments, programming models for grid applications

GREDIA 2006-2009 Grid platform for media and banking applications

ViroLab 2006-2009 Script based composition of applications, GridSpace virtual laboratory

PL-Grid; + 2009-2015 Advanced virtual laboratory, DataNet – metadata models (2 large Polish projects)

gSLM 2009-2012 Service level management for grid and clouds

UrbanFlood 2009-2012 Common Information Space for Early Warning Systems

MAPPER 2010-2013 Computational strategies, software and services for distributed multiscale simulations

VPH-Share 2011-2015 Federating cloud resources for VPH compute- and data intensive applications

Collage 2011-2013 Executable Papers; 1st award of Elsevier Competition at ICCS2011 (Elsevier project)

ISMOP 2013-2016 Management of cloud resources, workflows, big data storage and access, analysis tools (MCBiR)

PaaSage 2013-2016 Optimization of workflow applications on cloud resources

DICE team in EU projects

• Optimization of service deployment on clouds– Constraint satisfaction and

optimization of multiple criteria (cost, performance)

– Static deployment planning and dynamic auto-scaling

• Billing and accounting model – Adapted for the federated

cloud infrastructure– Handle multiple billing

models

• Supporting system-level (e)Science– tools for effective scientific

research and collaboration– advanced scientific analyses

using HPC/HTC resources

• Cloud security– security of data transfer– reliable storage and removal

of the data

• Cross-cloud service deployment based on container model

Topics for collaboration

dice.cyfronet.pl