Fed4FIRE 318389 D2-1 First federation architecture...FP7\ICT\318389/iMinds/R/PU/D2.1!!! 2!of!69! ©Copyright!iMinds!and!other!members!of!theFed4FIREconsortium!!2013!! Abstract! This!

$Page 1: Fed4FIRE 318389 D2-1 First federation architecture...FP7\ICT\318389/iMinds/R/PU/D2.1!!! 2!of!69! ©Copyright!iMinds!and!other!members!of!theFed4FIREconsortium!!2013!! Abstract! This!$
1 of 69

© Copyright iMinds and other members of the Fed4FIRE consortium 2013

Project Acronym Fed4FIRE Project Title Federation for FIRE Instrument Large scale integrating project (IP) Call identifier FP7-‐ICT-‐2011-‐8 Project number 318389 Project website www.fed4fire.eu

D2.1 – First federation architecture Work package WP2 Task T2.1 Due date 31/12/2012 Submission date 31/12/2012 Deliverable lead Brecht Vermeulen (iMinds) Version 1.3 Authors Brecht Vermeulen (iMinds)

Wim Vandenberghe (iMinds) Thomas Leonard (IT Innovation) Timur Friedman (UPMC) Florian Schreiner (Fraunhofer) Kostas Kavoussanakis (UEDIN) Max Ott (NICTA) Felicia Lobillo Vilela (ATOS)

Reviewers Ally Hume (UEDIN) Georgios Androulidakis (NTUA)

FP7-‐ICT-‐318389/iMinds/R/PU/D2.1

2 of 69

© Copyright iMinds and other members of the Fed4FIRE consortium 2013 © Copyright iMinds and other members of the Fed4FIRE consortium 2013

Abstract This first deliverable of task 2.1 in the WP2 architecture work

package defines the architecture for cycle 1 of the Fed4FIRE project. It takes as input requirements of WP3 (Infrastructure community), WP4 (Service community), WP8 (First Level Support), task 2.3 (sustainability) and general FIRE federation requirements (EC call objectives, Fed4FIRE DoW). It defines an architecture that copes with as much requirements as possible and that will be implemented for cycle 1 of Fed4FIRE

Keywords Federation, Architecture, Experiment lifecycle Nature of the deliverable R Report X

P Prototype D Demonstrator O Other

Dissemination level PU Public X PP Restricted to other programme participants

(including the Commission)

RE Restricted to a group specified by the consortium (including the Commission)

CO Confidential, only for members of the consortium (including the Commission)


3 of 69


Disclaimer The information, documentation and figures available in this deliverable, is written by the Fed4FIRE (Federation for FIRE) – project consortium under EC co-‐financing contract FP7-‐ICT-‐318389 and does not necessarily reflect the views of the European Commission. The European Commission is not liable for any use that may be made of the information contained herein.


4 of 69


Executive Summary This first deliverable of task 2.1 in the WP2 architecture work package defines the architecture for cycle 1 of the Fed4FIRE project. It takes as input requirements of WP3 (Infrastructure community), WP4 (Service community), WP8 (First Level Support), task 2.3 (sustainability) and general FIRE federation requirements (EC call objectives, Fed4FIRE DoW). It defines an architecture that copes with as much requirements as possible and that will be implemented for cycle 1 of Fed4FIRE. As such it will also be deployed for the first round of Open Calls. The details of the first cycle implementation will be worked out in Deliverables D5.1, D6.1 and D7.1 and in the Milestones M3.1 and M4.1. Based on all these requirements, we have evaluated 4 types of architectures: fully central, only a website listing all testbeds to a fully distributed federation of homogeneous or heterogeneous testbeds. Based on this evaluation, we propose the architecture for cycle 1 for the different steps in the experiment lifecycle (discovery, requirements, reservation, provisioning, monitoring and measurement, experiment control and storage). For cycle 1, discovery, requirements and provisioning will be done based on the SFA GENI AM API v3 standard, while for advanced reservation and extended policy based authorization extensions will be made. Regarding the RSpecs, it will be aimed to get them as unified as possible in cycle 1 (with GENI RSpec v3 as a guideline), while for cycle 2 and 3 a unified ontology based RSpec definition is the target. In cycle 1, full interoperability of tools and testbeds is required, but tools may have to interpret multiple RSpec types for this. The number of types should be limited though. The architecture defines also multiple identity providers with a chain of trust, and a central portal accompanied with an identity provider and directories (machine and human readable) for tools, testbeds and certificates. These services function as ‘broker services’. For First Level Support, the architecture defines facility monitoring which should be identical for all testbeds and should make it possible for first level support to have a high level overview of testbed and resource availability. Infrastructure monitoring, experiment measurement and experiment control are not yet standardized in cycle 1, as no standards exist yet. This however is no blocking issue, as the experimenters can deploy their own tools for this and they are not dependent on the testbed providers. Infrastructure monitoring being partly the exception on this, but this should be further tackled in WP6. We have evaluated all high priority requirements to see which are already tackled or not. From this, the non-‐tackled requirements are the ones regarding inter-‐connectivity between testbeds and storage of all kinds of information. Both of them will be tackled in cycle 2 and 3 of Fed4FIRE. Some other requirements are now only partly resolved, but will be fully resolved when the ontology based RSpecs/resource descriptions will be deployed in cycle 2 and 3. As a general conclusion, the first cycle architecture and way forward already will cope with a large number of the requirements that have been set by the other work packages.


5 of 69


Acronyms and Abbreviations AM Aggregate Manager API Application Programming Interface DoW Description of Work (official document describing the Fed4FIRE project) EC Experiment Controller FIRE Future Internet Research and Experimentation FLS First Level Support GENI Global Environment for Network Innovations LDAP Lightweight Directory Access Protocol HTTP HyperText Transfer Protocol OCCI Open Cloud Computing Interface OEDL OMF Experiment Description Language OMF cOntrol and Management Framework OML Measurement Library PLE PlanetLab Europe RC Resource Controller REST Representational State Transfer RSpec Resource Specification SFA Slice-‐based Federation Architecture SOAP Simple Object Access Protocol SQL Structured Query Language SSH Secure Shell SSL Secure Socket Layer Testbed Combination of testbed resources (e.g. servers) and testbed management software URN Uniform Resource Name VCT Virtual Customer Testbed XML-‐RPC eXtensible Markup Language -‐ Remote Procedure Call XMPP Extensible Messaging and Presence Protocol


6 of 69


Table of Contents Executive Summary ................................................................................................................................. 4 Acronyms and Abbreviations .................................................................................................................. 5 Table of Contents .................................................................................................................................... 6 1 Introduction .................................................................................................................................... 8

1.1 This deliverable ....................................................................................................................... 8 1.2 Experimental lifecycle ............................................................................................................. 9

2 Overview of inputs ........................................................................................................................ 11 2.1 Generic requirements of a FIRE federation ........................................................................... 11 2.2 Requirements from the infrastructures community (D3.1) .................................................. 12 2.3 Requirements from the services community (D4.1) ............................................................. 16 2.4 Requirements from First Level Support (D8.1) ...................................................................... 18 2.5 Requirements from a sustainability point of view (Task 2.3) ................................................ 19 2.6 Insights derived from the Fed4FIRE site visits ....................................................................... 19

2.6.1 Input from visits for discovery, reservation, provisioning ............................................. 20 2.6.2 Input from visits for authentication/authorization ....................................................... 20 2.6.3 Input from visits for monitoring/measurements .......................................................... 20 2.6.4 Input from visits for experiment control ....................................................................... 21

3 Evaluation of possible architectural approaches .......................................................................... 22 3.1 Evaluation criteria ................................................................................................................. 22 3.2 Overview of possible architectural approaches .................................................................... 22

3.2.1 Introduction and definitions ......................................................................................... 22 3.2.2 Federated testbed resources under a central management framework ...................... 24 3.2.3 Central overview on testbeds ....................................................................................... 25 3.2.4 Homogenous federation ............................................................................................... 26 3.2.5 Heterogeneous federation ............................................................................................ 27

3.3 Evaluation of the different approaches ................................................................................ 28 4 Architecture for Fed4FIRE development cycle 1 ........................................................................... 30

4.1 Introduction .......................................................................................................................... 30 4.2 Resource discovery, resource requirement, resource reservation and resource provisioning 30

4.2.1 Central location(s) ......................................................................................................... 31 4.2.2 Testbeds ........................................................................................................................ 31 4.2.3 SFA (Slice-‐based Federation Architecture) .................................................................... 32 4.2.4 Authentication and authorization ................................................................................. 33

4.3 Monitoring and measurement .............................................................................................. 33 4.4 Experiment control ............................................................................................................... 34

5 Requirements which are fulfilled with the architecture in cycle 1 ............................................... 36 5.1 Generic requirements of a FIRE federation ........................................................................... 36 5.2 High priority requirements of the infrastructure community (WP3) .................................... 38 5.3 High priority requirements of the services community (WP4) ............................................. 41 5.4 High priority requirements of WP8 ....................................................................................... 42 5.5 Requirements from a sustainability point of view (Task 2.3) ................................................ 43

6 Conclusion ..................................................................................................................................... 44 References ............................................................................................................................................ 45 Appendix A: Architectural details regarding the Fed4FIRE testbeds .................................................... 46

A.1 OMF based wireless testbeds ..................................................................................................... 47 A.1.1 NORBIT ................................................................................................................................. 47


7 of 69


A.1.2 w-‐iLab.t ................................................................................................................................ 50 A.1.3 NITOS ................................................................................................................................... 52 A.1.4 NETMODE ............................................................................................................................ 55

A.2 FuSeCo testbed ........................................................................................................................... 57 A.3 Smart Santander ......................................................................................................................... 59 A.4 OFELIA testbeds .......................................................................................................................... 60 A.5 Virtual Wall ................................................................................................................................. 62 A.6 PlanetLab Europe ........................................................................................................................ 64 A.7 BonFIRE testbeds ........................................................................................................................ 66 A.8 Grid’5000 .................................................................................................................................... 68


8 of 69


1 Introduction This section firstly situates this deliverable in the Fed4FIRE project and then describes shortly the experiment lifecycle definition as used in the Fed4FIRE project.

1.1 This deliverable This deliverable is the first in Task 2.1, the architecture task of the Fed4FIRE project. The Fed4FIRE project is structured around three development cycles. This deliverable defines the baseline architecture for cycle 1, which will further be refined in subsequent deliverables after each development cycle. It is based on several inputs which are listed in more detail in chapter 2:

• D3.1: requirements from the infrastructure community • D4.1: requirements from the services community • D8.1: requirements from first level support • Testbed visits and discussions • Input from task 2.3 on sustainability • Input from the Fed4FIRE DoW and from the general call objectives as published by the EC.

In the figure below, the 3 cycles are shown. This deliverable describes the architecture for cycle 1. The architecture will then be revised in cycle 2 and cycle 3 as can be seen.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

1 Requirements gathering WP3-4-8

2 Architecture and Release Plan WP2

3 Detailed specifications WP5-6-7

4 Implementation and alfa-testing WP3-7

5 Beta- and interoperability testing WP3-7

6 Deployment on facilities WP3-4

7 Operational with FLS WP3-4-8

Call open WP1

Negotiation WP1

Detailed design WP10

Experiment set-up and implementation WP10

Experiment execution and validation WP10

Final reporting WP10

StepMonth

EXPERIMENT CYCLES

Sustainable mode

CYCLE 1 CYCLE 2 CYCLE 3

First wave of experimentation facilities

Second wave of experimentation facilities

DEVELOPMENT CYCLES WP

The objective of Fed4FIRE is to develop a federated FIRE facility that offers benefits to both experimenters and facility providers, as summarized in the following table (from the DoW).


9 of 69


Stakeholder Benefits of federation Experimenter Potential: obtain a better view on the availability and offered functionalities

of different FI experimentation facilities in Europe in order to allow optimal selection. Consistency of process: Access a number of common functionalities (e.g. resource description, resource discovery, trust environment including authentication and authorization, reporting of facility usage, ...). Reuse of Investment: Allow to use as much as possible the common practices and tools used within the own experimenters’ community. This should not be in conflict with the previous benefit on common functionalities because these could (partly) be embedded in the specific community tools. Advanced Resources: access a broader range of resources from experimental facilities that could not be acquired individually or from a single facility provider (commercial or experimental). This could be in order to repeat the same experiments in different environments, to combine different facilities for more complex scenarios, to test innovation pathways through a sequential use of facilities, etc. Best Practice: Learn from a broader experimenter community in terms of best practices in experimentation.

Facility provider Advanced Capability: improve the capabilities of the facility by providing more functionalities in terms of experiment lifecycle management and trustworthiness (e.g. identity management, security assurance and accountability). Demand: improve the attractiveness of the facility by embedding it into a broader community (e.g. by the use of common interfaces and best practices) and increasing the usage of the facility (both from academic and industrial researchers). Return on Investment: reuse as much as possible existing tools, frameworks and libraries in order to reduce the cost in further enhancing the functionality of the facility. Efficiency: allow easy interworking or coupling with other facilities in order to support multi-‐facility experiments. Promotion: advertise the facility within a FIRE Facility portal so that experimenters can learn about the facility capabilities, sign up for them, and obtain the tools and information necessary to use them. Sustainability: enhance the sustainability of the facility (both from a technical as financial point of view).

1.2 Experimental lifecycle Whenever an experimenter performs an experiment, he or she performs a set of sequential actions that will help him or her to execute an experiment idea on an experimentation facility and obtain the desired results. These different actions are considered to be part of what is called “the experimental lifecycle”, which can be fragmented in different functions. In Fed4FIRE, the experimental lifecycle and its different functions are defined as follows:


10 of 69


Function Description

Resource discovery (1) Discovery of the facility resources (e.g., uniform resource description model).

Resource requirements (2) Specification of the resources required during the experiment, including compute, network, storage and software libraries. E.g., 5 compute nodes, 100Mbps network links, specific network topology, 1 TB storage node, 1 IMS server, 5 measurement agents.

Resource reservation (3) How can you reserve the resources? Examples: (a) no hard reservation or best-‐effort (use of a calendar that is loosely linked to the facility), (b) hard reservation (once reserved, you have guaranteed resource availability). Other options: one should reserve sufficient time in advance or one can do instant reservations.

Resource provisioning (4)

Direct Instantiation of specific resources directly (responsibility of the experimenter to select individual resources).

Orchestrated Instantiation of resources through a functional component, orchestrating resource provisioning (e.g., OpenNebula or PII orchestration engine) to decide which resources fit best with the experimenter’s requirements. E.g., the experimenter requests 10 dual-‐core machines with video screens and 5 temperature sensors.

Experiment control (5) Control of the experimentation facility resources and experimenter scripts during experiment execution (e.g. via the OMF experiment controller or SSH). This could be predefined interactions and commands to be executed on resources (events at startup or during experiment workflow). Examples are: startup or shutdown of compute nodes, change in wireless transmission frequency, instantiation of software components during the experiment and breaking a link at a certain time in the experiment. Real-‐time interactions that depend on unpredictable events during the execution of the experiment are also considered.

Monitoring (6)

Resources Monitoring of the infrastructure health and resource usage (e.g., CPU, RAM, network, experiment status and availability of nodes).

Experiment Monitoring of user-‐defined experimentation metrics (e.g., service metrics, application parameters) of the system under test. Examples include: number of simultaneous download sessions, number of refused service requests, packet loss and spectrum analysis.

Permanent storage (data, experiment descriptor) (7)

Storage of the experiment descriptor or experimentation data beyond the experiment lifetime (e.g., disk images and NFS).


11 of 69


2 Overview of inputs This section will describe all inputs leading to the architecture choice for cycle 1:

• Generic requirements (including requirements from the EC call objectives and the project DoW)

• Requirements from the infrastructures community (D3.1) • Requirements from the services community (D4.1) • Requirements from the First Level Support (D8.1) • Requirements from a sustainability point of view (Task 2.3) • Insights from the Fed4FIRE site visits

2.1 Generic requirements of a FIRE federation From a FIRE federation architecture point of view, several generic requirements for a FIRE federation can clearly be distinguished. Many of them have already been described in the EC call for projects (Call objective) that resulted in this Fed4FIRE project, or in the approved corresponding project DoW. The entire list of generic requirements is given below.

• Sustainability: o DoW: Develop a federation architecture taking into account an evolving demand

from facility providers and experimenters coming from different research communities. Can testbeds join and leave easily?

o Call objective: implementing a demand-‐driven high level federation framework for all FIRE prototype facilities and beyond

o DoW: Develop, adapt or adopt a common set of tools for experiment lifecycle management, measurements, and trust management. This includes the specification of APIs towards the experimentation facilities.

o DoW: Provide the route towards experimentation infrastructures that are sustainable, while adapted to different usage and business scenarios.

o DoW: Adapt the experimentation facilities towards a subset of common APIs supporting the experiment lifecycle management and trust management.

o Is there a minimum dependency on specific tools/software (which should avoid that when the development of a specific tool stops, the whole ecosystem stops working, or which would make it necessary that the specific tool should be put in sustained development mode)

o Call objective: building as far as possible on proven existing federation models, for the use of open standards

• Scalability: o How can the architecture cope with a large number and a wide range of testbeds,

resources, experimenters, experiments and tools? • Robustness:

o How vulnerable is the federation to single point of failures (of software, network, servers)?

• Support: o How easily can components/testbeds/software be upgraded ? o How can different versions of protocols be supported ? (e.g. to introduce a new type

of resources or new functionality) • Portal:

o Call objective: including the development of a joint FIRE portal, operated until the end of 2015


12 of 69


• Cost: o Keep the cost for the roll out of the federation as low as possible

• Experimenter ease of use: o DoW: The final goal is to make it easier for experimenters to use all kinds of testbeds

and tools. If an experimenter wants to access resources on multiple testbeds, this should be possible from a single experimenter tool environment.

2.2 Requirements from the infrastructures community (D3.1) The purpose of the document, “D3.1 Infrastructure community federation requirements”, was to gather requirements from the Infrastructure community’s perspective in order to build a federation of FIRE facilities. It is the first deliverable in a cycle of three which will all focus on this same community. Five varied use cases have been developed in order to illustrate the benefits of joining several testbeds together. These use cases have been designed by different project partners based on their experiences in real‐world collaborative scenarios involving heterogeneous resources and thus represent research and commercial trends. Each use case has produced a set of requirements from the perspective of what would be required to deploy them in a federated infrastructure. These requirements have been assembled under the functional areas that Fed4FIRE has considered for achieving a federation:

• Experiment lifecycle: including discovery, reservation and experiment control • Measurement and Monitoring: covering metrics, instrumentation, data management • Trustworthiness: gathering federated identity management and access control, privacy,

accountability, SLA management • Interconnection: including access networks, routing, etc.

These requirements have been prioritised (high, medium, low) according to how critical they are for the use case (or cases) that produced them. If requirements were valid for the majority of the described use cases, they received the status “generic”. The generic requirements with a high priority are considered essential and their implementation should be included in the first cycle of developments as much as possible and as long as this is feasible in terms of other constraints (e.g. effort). From section 4 of D3.1 we get the following requirements matrix:

Field Req. id Req. statement Req. description Resource discovery

I.1.101 Node capabilities Fed4FIRE must provide a clear view on what node (=testbed resource) capabilities are available, and this should be defined in the same way across the federation

Resource discovery

I.1.105 Discovery through federation-‐wide APIs

Resource discovery must be integrated into uniform tools through federation-‐wide APIs. Ideally, these APIs would be compatible with discovery APIs already supported by the infrastructures and/or existing uniform tools. This would decrease the development costs for the infrastructure providers and tool builders.

Resource discovery

I.1.106 Intra-‐infrastructure topology information

For nodes that have wired and/or wireless network connections to other nodes within the same testbed, it should be possible to identify the physical topology. This relates to connections which are part of the data plane of an experiment, not the control interfaces. Similar, if


13 of 69


Field Req. id Req. statement Req. description virtualized topologies are supported, the corresponding possibilities should also be communicated to the experimenter.

Resource discovery

I.1.107 Inter-‐infrastructure topology information

It should be known how different infrastructures are/can be interconnected. Important parameters are the type of interconnection (layer 2, layer 3), and the support for bandwidth reservation. If resources are also reachable beyond the boundaries of the Fed4FIRE partners’ infrastructures (e.g., because they are directly connected to the public Internet), this should also be mentioned. Information regarding IPv6 support on the inter-‐infrastructure topologies is also required.

Resource discovery

I.1.109 Query search It should be able to build tools that allow a query search for suitable infrastructures/nodes

Resource discovery

I.1.110 Catalogue search If an experimenter does not know which parameters to fill in using the query search, it should be able to browse through some kind of Fed4FIRE infrastructures catalogue to find pointers towards the suitable facilities. Likewise, when in doubt regarding resources returned by the query search, such a catalogue would also be useful.

Resource requirements

I.1.201 Manually extract requirements from discovery query results

When the query in the discovery phase returns a certain list of resources, it should be possible for the experimenter to select the resources he/she would like to include in the experiment. This should be supported in relation with a specific resource ID (e.g., I want this specific node at this specific Wi-‐Fi testbed).

Resource reservation

I.1.301 Hard resource reservation

Fed4FIRE must provide hard reservations of the available resources. It should be able to perform immediate reservations (starting from now), or more advanced reservations (given a specific future timeslot that the experimenter would want, or have the reservation system look for the first available slot where all desired resources are available).


I.1.304 Automated reservations handling

The Fed4FIRE reservation system should be able to approve/deny reservation requests in a fully automated manner, without any manual intervention by the infrastructure operators.


I.1.305 Reservation information

Fed4FIRE must provide the information on the availability of resources at a specific timeslot. The other way around, it should also be possible for experimenters and infrastructure providers to receive a clear view on which resources are already reserved, and when.

Resource provisioning

I.1.401 Provisioning API APIs are required to enable direct instantiation of both physical and virtualized resources for experiments. Ideally, these APIs would be compatible with provisioning APIs already supported by the infrastructures and/or


14 of 69


Field Req. id Req. statement Req. description existing uniform tools. This would decrease the development costs for the infrastructure providers and tool builders.


I.1.403 Root access Fed4FIRE must provide the possibility to access a node as root user


I.1.404 Internet access to software package repositories

In Fed4FIRE software installation through a package manager (e.g., apt-‐get) must be possible. Hence the package manager should have Internet access to external software package repositories.

Experiment control

I.1.501 SSH access Nodes must be accessible via SSH.

Experiment control

I.1.502 Scripted control engine

It must be possible to describe advanced experiment scenarios by the use of a script that will be executed by a control engine. The engine will perform all required shell commands on the appropriate resources at the appropriate time. Ideally, this control engine would be compatible with engines already supported by some of the infrastructures. This would decrease the development costs for the infrastructure providers.

Experiment control

I.1.504 Generality of control engine

The experiment control engine should be general enough to support the control of all possible kinds of Future Internet technology: wireless networks, optical networks, OpenFlow devices, cloud computing platforms, etc.

Monitoring I.2.101 Measurement support framework

Fed4FIRE must provide an easy way for experimenters to store measures during the experiment runtime for later analysis. The data should be clearly correlated to the experiment ID.

Monitoring I.2.104 Monitoring resources for operational support

Fed4FIRE must provide tools to continuously monitor the state of the resources so testbed managers can prevent and/or solve problems with them. In case of detected issues with the infrastructure, Fed4FIRE should warn the facility providers about them. If experimenters are actually trying to use these resources at that moment, they should also be informed.

Monitoring I.2.105 Monitoring resources for suitable resource selection and measurement interpretation

Fed4FIRE must also provide the monitoring info regarding the state of the resources to the experimenters. This way they can choose the best resources for their experiments. This information also provides the experimenters with the means to distinguish error introduced by the experiment from errors related to the infrastructure.

Monitoring I.2.106 Minimal impact of monitoring and measuring tools

As less overhead as possible should be expected from the monitoring and measurement support frameworks. The impact of the measurement tools over the experiment results should be negligible.


15 of 69


Field Req. id Req. statement Req. description Permanent storage

I.2.201 Data storage Fed4FIRE must provide the means to store the data measured during an experiment. This data should be accessible during and after the experiment, and should be clearly correlated to the experiment run ID

Permanent storage

I.2.202 Data security Access to the data should be properly secured

Permanent storage

I.2.203 Stored experiment configuration

Experiment configurations should be stored in order to replay experiments and compare results of different runs. These configurations should be versioned in a way that corresponds with significant milestones in the experiment development.

Dynamic federated identity management

I.3.101 Single account Fed4FIRE must provide the mean of accessing all testbeds within the federation using one single account (username/password). Ideally, this authentication framework would be compatible with those already supported by some of the infrastructures. This would decrease the development costs for the infrastructure providers.


I.3.102 Public keys Fed4FIRE should also provide authentication by the use of public SSH keys.


I.3.103 OpenVPN handling Fed4FIRE should take into account that some facilities are now behind an OpenVPN based authentication system. A seamless relation with the single Fed4FIRE account should be put in place, or the OpenVPN based interconnections should be abandoned.


I.3.104 Authentication of API calls

Access to the Fed4FIRE APIs (discovery, reservation, provisioning, etc.) should also be protected by an authentication mechanism

Authorization I.3.201 Per-‐experimenter restrictions

It should be possible for infrastructures to dynamically decide which resources they should make available to a certain Fed4FIRE experimenter, and which experimentation quota that will be appropriate. This can be bases on a set of possible experimenter roles, on specific attributes, etc.

Trust and user experience

I.3.402 Experiment descriptions

In Fed4FIRE experimenters that create an experiment will need to provide a short high-‐level description of the experiment and its purpose. This allows infrastructure providers to keep track of the usage of the infrastructure, and enables them to report about this to their funding sources.


I.3.403 Accountability Fed4FIRE should provide the possibility to trace network traffic back to the originating experiment. This is useful when misuse of the infrastructure has been detected and the corresponding experimenter should be sanctioned


16 of 69


Field Req. id Req. statement Req. description (e.g., by revoking his/her account). The fact that accountability mechanisms are in place will automatically increase the level of trust that infrastructure providers can have in Fed4FIRE experimenters which are unknown to them

Interconnectivity I.4.001 Layer 3 connectivity between testbeds

The resources within a Fed4FIRE infrastructure should be able to reach the resources deployed in the other Fed4FIRE infrastructures through a layer 3 Internet connection.

Interconnectivity I.4.003 Transparency Providers must be able to offer in a transparent way the resources of all the federated testbeds. Interconnectivity solutions should not introduce unneeded complexity in the experiment.

Interconnectivity I.4.005 IPv6 support The ability to conduct IPv6 measurements and to interact with the nodes of other testbeds over IPv6 should be enabled.

Interconnectivity I.4.006 Information about testbed Interconnections

The experimenter needs to know how the several testbeds are interconnected e.g., via layer 3 or layer 2. Especially he/she needs to know which gateways should be used by the resources in order to interconnect them along with other testbed resources.

2.3 Requirements from the services community (D4.1) The purpose of the document, “D4.1 First input from community to architecture”, was to gather requirements from the Service community’s perspective in order to build a federation of FIRE facilities. It is the first deliverable in a cycle of three which will all focus on this same community. The Services and Applications community gathers relevant players within ICT Challenge 1 (Obj 1.2, 1.3, 1.5 and FI-‐PPP) defined by the European Commission. All partners within Fed4FIRE WP4 are members of relevant communities and participate in associated FIRE projects (BonFIRE, TEFIS, EXPERIMEDIA, and SmartSantander). Additionally, several Fed4FIRE partners are involved in the FI-‐PPP initiative (ATOS, Fraunhofer, iMinds, IT-‐Innovation, UC etc.). Six varied use cases have been developed in order to illustrate the benefits of joining several testbeds together. These use cases have been designed by different project partners based on their experiences in real-‐world collaborative scenarios involving heterogeneous resources and thus represent research and commercial trends (see D4.1 for more details).

Each use case has produced a set of requirements from the perspective of what would be required to deploy them in a federated infrastructure. These requirements have been assembled under the functional areas that Fed4FIRE has considered for achieving a federation:

• Experiment lifecycle: including discovery, reservation and experiment control • Measurement and Monitoring: covering metrics, instrumentation, data management


17 of 69


• Trustworthiness: gathering federated identity management and access control, privacy, accountability, SLA management

• Interconnection: including access networks, routing, etc. Requirements have also been prioritised and annotated with the statement that they are generic, or only related to specific use cases. To create a requirements matrix similar to the one produced by D3.1, we retain only the requirements which have a high priority and are generic. These are the most important requirements, and should be pursued as much as possible in this architecture for development cycle 1 of the project. This requirements matrix is given below:

Field Req. id Req. statement Req. description

Resource Discovery

ST.1.005 Describe connectivity options

Users must be able to find testbeds whose network connectivity is compatible with the experiment envisioned. In particular, addressing scheme (public/private IPv4, IPv6) and firewall policies must be taken into account.

Resource Provisioning ST.1.007

Experiment Deployment

Fed4FIRE must provide tools to deploy experiments over a subset of federated testbeds

Resource Requirements

ST.1.013 Resource description

Resources must be described in a homogeneous manner so that the experimenter can better compare the experiments in different testbeds.

Resource Reservation

ST.1.017

Testbed reservation information and methods according to experimenter profiles/policies

Fed4FIRE will make the distinction between requests of local users, PhD students from other institutes (research), students (practical exercises), in order to know what kind of experimenter is logging in and from where, and apply policies accordingly.

ST.1.020 Scheduling parallel processing

Users must be able to access resources on more than one testbed at a time and run parallel processes.

Experiment Control

ST.1.023

Access to internal and external services from a federated testbed

During an experiment, a Fed4FIRE experimenter must be able to automatically (without interaction of testbed administrators) invoke services inside and outside a testbed.

ST.1.029 Multiple testbeds available

Multiple testbeds of different kinds federated together in one experiment. For example, computing resources and network resources connected together, running an experiment on resources provided by different hosting organisations Information must be retrieved from several testbeds and processed in another one. Also, replicate data between different clouds and redirect requests to the appropriate data centre to retrieve the information.

Monitoring ST.2.001 Monitoring management control

Fed4FIRE must provide tools to create, view, update, and terminate monitoring configurations related to shared resource types or experiments. Monitoring data should be reportable for visualisation and analysis purposes with several reporting strategies (only alarms, all data, filters, etc.).


18 of 69


Field Req. id Req. statement Req. description

Permanent Storage ST.2.007

Monitoring data and experiment result storage. Audit, archiving, accountability

Fed4FIRE must provide storage facilities for the experimenter to retrieve historical data concerning an experiment data-‐set. Data centres have to keep log of data stored and access to external resources. It is important for audits and historical data tracking.

Authorization ST.3.002 Single sign on for all testbeds required for experiment

Fed4FIRE should enable the user accessing all services /testbeds with own privileges by performing login only once at the start of the experiment,

Dynamic federated identity

management

ST.3.003 Identity Protection Any Identity information must be protected within the Fed4FIRE system.

ST.3.004 Single set of credentials

Experimenters will use a single set of credentials for all operations on the platform (experimenting, monitoring, etc.).

Inter-‐connectivity

ST.4.001 Access between testbeds (interconnectivity)

WAN links/public internet access between testbeds with detailed monitoring for accountability (for testbed provider).

Policy

ST.4.002 Describe and implement access policy

Within the limits accepted by all parties in the federation, testbed owners should be able to control access to their testbed.

ST.4.003

Enable testbed access to users outside the federation

Within the limits accepted by all parties in the federation, testbed owners should be able to grant credentials to users outside the scope of the federation for local usage.

2.4 Requirements from First Level Support (D8.1) The First Level Support (FLS) relies on operational information provided by the individual testbeds in the federation. It represents a significant element of the service architecture of the federation. The information flows between the testbeds and the FLS need to be incorporated into the overall design. The complete picture will depend on the level of integration achieved within the Fed4FIRE project. A minimum set of requirements at this stage is focused upon monitoring functionality:


19 of 69


Requirement Description FLS.1 Facility monitoring should push RAG (Red, Amber, Green) status to a central dashboard for

FLS reactive monitoring FLS.2 The Facility RAG status should be based upon the monitoring of key components of each

facility that indicate the overall availability status of the facility FLS.3 The FLS should be able to drill down from the facility RAG status to see which components

are degraded or down FLS.4 The key components monitored for facility monitoring, should be standardised across all

facilities as much as possible FLS.5 A commitment is required from each testbed to maintain the quality of monitoring

information (FLS is “monitoring the monitoring” and the information FLS has is only as good as the facility monitoring data)

FLS.6 Any central federation-‐level systems/components that may be implemented will need to be monitored by FLS (e.g. a central directory)

FLS.7 FLS requires visibility of planned outages on a push basis from testbeds and administrators of central systems

FLS.8 Exception alerts from both testbeds and central systems should be filtered prior to reaching the FLS, to avoid reacting to alerts unnecessarily.

2.5 Requirements from a sustainability point of view (Task 2.3) In this early phase of the project, Task 2.3 regarding sustainability has not yet advanced sufficiently to make detailed statements regarding financial sustainability. However, during the site visits described in more detail in section 2.6, the topic of sustainability was also touched. This paved the way to an early definition of requirements from a sustainability point of view. In the architecture deliverables for cycle 2 and 3, these requirements will be worked out in more details, since at that time the first sustainability plan will already have been delivered. The current sustainability requirements are defined below (and overlap with the requirements the EC has set in call objectives or the approved fed4FIRE DoW):

• From a sustainability point of view, it is preferential that the number of required central components is minimal as these components put a high risk on a federation in terms of scalability and long term operability.

• It is also required that the federation framework supports the joining and leaving of testbeds very easily, as this will be common practice.

• New tools can be easily added, while the dependency on specific tools or components for the federation, should be minimized, to avoid that the end of support of a specific tool makes the federation unusable.

• Finally, it is also required that the experimenters can join and leave the federation easily, there is some notion of delegation (to make it more scalable for lots of experimenters) and PIs (principal investigators) can put an end time on experimenters (e.g. students) or can withdraw experiments they have approved.

2.6 Insights derived from the Fed4FIRE site visits A team of iMinds and UPMC visited all European Fed4FIRE testbeds. The goal was to compare testbed frameworks and architectures and point out the similarities and differences. The detailed overview of each testbed can be found in the appendix, but we list here the most important insights:


20 of 69


• Two testbeds (Virtual Wall and PlanetLab Europe) are supporting SFA in an operational, other testbeds are currently implementing it (6 testbeds), 4 testbeds were not involved with implementation of SFA at the start of Fed4FIRE.

• All testbeds have different experimenter sign up procedures. • All testbeds have local experimenter databases that they want to support. • Most testbeds are part of multiple federations. It is not possible to drop these federations

and put e.g. all authentication/resource allocation in a central place. • Only Grid 5000 has a future reservation method, Nitos is currently implementing one. • Some testbeds are interconnected, but this is ad-‐hoc with VPN solutions in a project

environment (BonFIRE, Ofelia). • Most testbeds provide SSH access to nodes. Smart Santander is an exception to this.

Some more details relating the to the different steps in the experiment lifecycle management are provided below.

2.6.1 Input from visits for discovery, reservation, provisioning § Norbit: not visited (but confirmed that they are working on SFA) § Virtual Wall: proprietary + SFA (GENI Rspec) § w-‐iLab.t: proprietary + OMF REST (working on SFA) § Nitos: OMF REST + SFA (wrapper) § Netmode: OMF REST (reservation: Rspec over soap) § Fuseco: FiTeagle REST + SFA (wrapper) § Smart Santander: SOAP (wiseML) § Ofelia (i2cat/Bristol): expedient (Q4 2012: Amsoil-‐> GENI AM with SFA) § PlanetLab: proprietary + SFA (wrapper, PlanetLab+GENI SFA) § Grid 5000: proprietary § BonFIRE: OCCI § Koren: not visited

2.6.2 Input from visits for authentication/authorization § Norbit: not visited (unix account based) § Virtual Wall: login/password, SFA certificate federated with Utah/GENI § w-‐iLab.t: linux account on experiment controller § Nitos: linux account on experiment controller § Netmode: linux account on experiment controller § Fuseco: HTTP authentication § Smart Santander: wisebed SNAA (shibboleth) § Ofelia (i2cat/Bristol): central LDAP § PlanetLab: login/password + SFA certificate § Grid 5000: central LDAP (time limitation of accounts is foreseen) § BonFIRE: central LDAP § Koren: not visited

2.6.3 Input from visits for monitoring/measurements The following types of monitoring and measurement are identified:

• Facility monitoring: this is monitoring for system administrators to see if the testbed is up and running


21 of 69


• Infrastructure monitoring: this is monitoring of the infrastructure which is useful for experimenters. E.g. monitoring of switch traffic or physical host performance if the experimenter uses virtual machines.

• Experiment measuring: measurements which are done by a framework that the experimenter uses and which can be deployed by the experimenter itself.

Note: with OML (prototype) is meant that the deployment and use of OML in that testbed is a prototype, meaning that it is not used on a large scale. OML itself is not a prototype.

2.6.4 Input from visits for experiment control § Norbit: not visited (OMF is in use) § Virtual Wall: emulab experiment control, OMF (prototype, this means it is not used on a

large scale in the virtual wall yet) § w-‐iLab.t: OMF § Nitos: OMF § Netmode: OMF § Fuseco: VCT tool § Smart Santander: N/A § Ofelia (i2cat/Bristol): N/A § PlanetLab: OMF (prototype) § Grid 5000: proprietary § BonFIRE: proprietary Ruby API to REST OCCI, runs when experimenter runs it § Koren: not visited

Testbed Facility mon. Infrastruct mon. Exper. Measur.

Norbit N/A or unknown OMF chassis mgr [8] OML [14]

Virtual wall Zabbix [9] N/A Zabbix (BonFIRE),OML (prototype)

w-‐iLab.t N/A or unknown OMF chassis mgr OML

Nitos N/A or unknown OMF chassis mgr OML

Netmode N/A or unknown Nagios [10] OML

Fuseco N/A or unknown Zabbix Multi-‐hop packet tracing

Smart Santander N/A or unknown SOAP (proprietary) Proprietary

Ofelia Zenoss [11] Zenoss N/A

PlanetLab TopHat [13] TopHat OML (prototype)

Grid5000 Ganglia [12] Ganglia Ganglia

BonFIRE Zabbix Zabbix Zabbix

Koren Not visited


22 of 69


3 Evaluation of possible architectural approaches This chapter first describes shortly how we will evaluate the architecture, then 4 possible types of architectures, followed by the choice of type of architecture that will be further used in Fed4FIRE.

3.1 Evaluation criteria Several technical approaches are possible when thinking about a FIRE federation architecture. To assess which of these approaches are most suitable, the main evaluation criterion that will be applied is the degree to which requirements described in section 2 are covered. Also, if an architectural approach would not yet cover a requirement, it is important that this approach may not block support for this requirement in cycles 2 and 3 of the project. If an approach is in line with the insights derived from the site visits, this is also considered beneficial.

3.2 Overview of possible architectural approaches 3.2.1 Introduction and definitions In Figure 1 below we depict the different components that we will discuss in the different architectures. We consider 4 layers: testbed resources, testbed management software, broker services, experimenter tools:

• At the bottom we have the testbed resources (servers, virtual machines, switches, sensors, …). This can be e.g. a physical server with several virtual servers and on the virtual servers a software component (daemon) can be run for e.g. experiment control. The unit of resource shown is typically accessible by ssh. (e.g. a linux node accessible by ssh, to which sensor devices are attached which can be used in the experiment)

• Then there is the testbed management software which manages these resources. This is software which manages the resources, users and experiments of a testbed. This is e.g. the PlanetLab software, emulab software, OMF, BonFIRE platform, … It runs on specific servers. In appendix A, more information on this is given per testbed.

• On top of that we have a ‘broker services’ layer which contains broker services: services run by 3rd parties or Fed4FIRE that broker between the testbeds and the experimenters (e.g. a broker reservation service that searches which testbeds can provide which resources in the future and try to find a match between the resources demanded by an experimenter and the testbed availability, also an orchestration service or a portal is seen as a broker).

• On top we have the experimenter tools/users interfaces which are used by the experimenters to communicate with the testbed management frameworks, testbed resources and brokers if needed. This can be as simple as a web browser at the experimenter’s pc talking to a central broker/portal (e.g. MySlice [6]), or can be a standalone client as e.g. Flack [3] or a command line client as omni [4] or SFI [5] which talks directly to testbeds. Also an ssh client (or script using ssh to login) is an experimenter tool talking to the nodes of an experiment. The experimenter tool runs at the PC of the experimenter.

We define a testbed in Fed4FIRE as the combination of the testbed resources and testbed management software (and all testbed management software comes typically with one or more experimenter tools, at least a command line tool, to use the testbed). 20 servers without software is not a testbed in this definition. A federation of testbeds is defined as testbeds in different administrative domains which can be used under the same policies or in the same way (same experimenter tools can be used).


23 of 69


Software components have an interface on top of them which describes how other components can communicate with this component. The same colors of the interfaces mean the same interface, different colors mean a different interface. The arrows between components show the interactions (typically not all possible interactions are shown, only examples as illustrations). In the picture (vertical columns) 3 administrative domains are envisioned: testbed A, testbed B and central location(s). This means a logical location, not a physical one, so testbed A resources can be distributed over multiple locations (e.g. PlanetLab), but the management of that testbed is under a single administration. The same for the central location(s): components can be distributed in multiple datacenters, but they are under a single administration entity (multiple ‘central locations’ means then multiple administrative entities).

Exp

erim

ente

rTe

stbe

dTe

stbe

dm

anag

emen

t

Testbed A Testbed B

Software component

Central location(s)

interactions

Bro

kers

SSH

interface

tools

Figure 1: Overview of architectural layers and components

In the following sections we discuss 4 architecture types:

• federated testbed resources under a central management framework • the thinnest federation layer in the form of a website listing all testbeds • homogeneous federation of testbeds which run all the same testbed management software

framework • heterogeneous federation of testbeds which run all different testbed management

frameworks


24 of 69


3.2.2 Federated testbed resources under a central management framework

Expe

rimen

ter

Test

bed

Test

bed

man

agem

ent

Testbed A Testbed BCentral

location(s)

Brok

ers

SSH

Figure 2: Centrally managed testbed resources

For this architecture, all management software is in a central location and manages all resources of all testbeds remotely. The experimenter tools talk with this central management software and directly with the testbed resources. The PlanetLab testbed is an example of this: resources (servers) are distributed on multiple locations without a separate local testbed software framework. The local testbed administrator only boots the nodes once with a specific operating system image (provided by PlanetLab) and then the central PlanetLab software contacts and manages the nodes (no extra management software is needed locally). Evaluation: although there are a lot of advantages to this architecture (seen the success of PlanetLab), following disadvantages result in the architecture to be infeasible for reaching the Fed4FIRE goals in a federation of multiple testbeds:

• This architecture would implicitly mean that all existing testbed management is removed and substituted by a single software framework which copes with all different kinds of resources. As during the visits no candidate software framework was seen which can handle all available testbed resource types at this moment (in the flexible way that they are used now in the testbeds), a lot of plugins/adaptations are needed for this central software, while the current testbed management frameworks are in place and can perfectly cope already with these resources. As such, the cost of bringing such central software framework in place which can cope with all different types of resources would be very high. The same applies for the long term sustainability of such software. (as a comparison, the PlanetLab software targets virtual machines for the experimenters while the physical host is under full control of the PlanetLab administration, which makes this manageable for that specific case).

• Besides this cost, also some practical problems arise:


25 of 69


o In many testbeds, it is possible to deploy disk images on physical nodes. In many cases multicast is used to speed up this process. When deploying disk images from a central point, the network connectivity has to be high speed and all resources should have a public IP address. Both can be tackled by deploying local components, but then one is moving again management components to the testbeds, and one is dependent on software (which has to be developed) running at the testbed and centrally.

o Testbed providers should delegate full control to the central management/ organization. This is not feasible for testbeds which are in multiple federations, or they have to split up their resources (after which you need to go through multiple federation frameworks to be able to use all resources of such a testbed).

o At this moment, all testbeds have local users. These should then be transferred to the central place. ST.4.002 and ST.4.003, described in section 2.3, imply that each administrative domain should still be able to create ‘local’ accounts, meaning users that can access the testbed resources under its administration. The central policy engine will get complex as every administration wants to implement its own specific policies.

o The transition phase to a central framework will bring all testbeds down at a certain moment in time.

o Scalability: all experiment lifecycles are run through the central management software, so it should scale enough (the number of experiments and experimenters is in some relation to the number of testbed resources)

o Single point of failure: when the central management software or servers where it runs on, goes down, no single experiment can be setup, while in the case with software per testbed only a single testbed goes down. Of course, it is technically feasible to make all components and servers redundant, but this makes it another cost to add (during the visits, no testbed software was seen which was implemented in a fully redundant way). Some testbeds as e.g. the virtual walls at iMinds run multiple instantiations of the management software, each with a part of the nodes, and federate in between. In case something falls out, part of the nodes is still fully usable without a large investment in software engineering to make the testbed software fully redundant.

• Each administration is fully dependent on the central management. If the organization which runs this, decides on quitting the central management, every testbed resource is unusable. Most testbed administrators and authorities want to have control of their testbed.

3.2.3 Central overview on testbeds In this architecture (Figure 3), nothing is changed to the testbeds and tools, so tools remain matched to certain testbeds and each testbed has its own registration procedure and user database. On a central location, there is only a list with testbeds and tools. Evaluation:

• This is feasible and cheap. • It is not a real federation. E.g. if an experimenter wants to use resources on 3 different

testbeds, he has to have 3 different accounts and use 3 different tools/interfaces, which heavily contradicts with the federation requirements.

• There is no common API.


26 of 69


Exp

erim

ente

rTe

stbe

dTe

stbe

dm

anag

emen

t

Testbed A Testbed B

Software component

Central location(s)

Bro

kers

SSH

interface

Overview of all testbedsand their tools

Figure 3: Only central view through an overview website of all testbeds

3.2.4 Homogenous federation In this architecture (Figure 4), we deploy the same testbed management software on all testbeds locally. This solves some problems of the fully centrally managed architecture (as described in 3.2.2):

• Each testbed runs its own instance of the management software and as such is independent of central components or organizations

• Each testbed has full administration of its resources and users • Deploying images on nodes can happen locally, which is fast

However the following disadvantages remain: • This architecture would implicitly mean that all existing testbed management is removed and

substituted by a single software framework which copes with all different kinds of resources. As during the visits no candidate software framework was seen which can handle all available testbed resource types at this moment, a lot of plugins/adaptations are needed for this central software, while the current testbed management frameworks are in place and can perfectly cope already with these resources. As such, the cost of bringing such software framework in place which can cope with all different types of resources would be very high. The same applies for the long term sustainability of such software.

• When bringing in a new software framework, specific functionality will be lost on testbeds, certainly in a first phase. Otherwise, everyone would be using already the same software (e.g. a lot of wireless testbeds indeed use OMF already, but testbeds as PlanetLab, Virtual Wall, Ofelia and BonFIRE differ a lot in functionality).

• The transition phase to the new software will bring all testbeds down at a certain moment in time.

• Experiments have to deal with new tools and user interfaces.


27 of 69


Exp

erim

ente

rTe

stbe

dTe

stbe

dm

anag

emen

t


location(s)

Bro

kers

SSH

Figure 4: Homogenous federation

3.2.5 Heterogeneous federation In this architecture (Figure 5), each testbed keeps its own management software, but interfaces on top of the testbed software are specified, standardized and made interoperable to a federation. E.g. in Figure 5, we see that testbed A and testbed B have different management software, but the interface on top of one of the components (e.g. the provisioning component) is the same. This means that an experimenter tool that can speak to that interface, can talk to both testbeds. On the other hand, other interfaces (e.g. for future reservation), may differ and so different tools must be used. This kind of architecture has the following advantages:

• Each testbed remains managed by the existing software, but interfaces are created on top • Federated functionality can grow in three directions:

o more types of federation interfaces can be specified for the different steps in the experiment lifecycle

o a federation interface can start with limited functionality and extend this later on o testbeds can choose when they implement certain interfaces.

In this way, there is no need for a clean slate approach to the testbed software and the federated functionality can easily grow over time (also If e.g. new testbeds join the federation).

• By standardizing interfaces, everyone can implement such a standard interface and let standard tools operate with it.


28 of 69


Disadvantages: • Specifiying federation interfaces and testing interoperability is a lot of work (and thus a cost),

but this cost is lower than adapting some testbed framework to all types of resources. This comes because the experiment lifecycle is very similar between all testbeds and all testbeds have an API for setting up experiments. The APIs are not the same however and should be standardized.

• Each testbed management software must implement the interfaces.

Exp

erim

ente

rTe

stbe

dTe

stbe

dm

anag

emen

t


location(s)

Bro

kers

SSH

provisioning provisioning

Figure 5: Heterogeneous federation

3.3 Evaluation of the different approaches In the previous section we listed 4 big types of architectural approaches to make distributed resources/testbeds available to experimenters:

a) federated testbed resources under a central management framework b) the thinnest federation layer in the form of a website listing all testbeds c) homogeneous federation of testbeds which run all the same testbed management software

framework d) heterogeneous federation of testbeds which run all different testbed management

frameworks Solution a) seems technically, cost wise and sustainability wise very difficult. b) is technically and cost wise feasible, but lacks too much functionality/added value. c) is technically more possible than (a) but is more difficult and costs more than (d) as all testbeds have already some kind of API set available which is comparable in experiment life cycle. All testbed administrators of the visited


29 of 69


testbed seem to have a request for d) also, as in this way they can still have extra functionality and users available in their testbed (as the testbed is today), while opening up common functionality to a lot more experimenters. They can also easily grow in the federation without a clean slate approach. Solution d) does not exclude solution c) of course, as testbeds can decide on which management software they want to use. If it seems that it is simpler and handier to switch to another framework than implementing the standard interfaces themselves, one can do this. In practice, this means that there will be clusters of testbeds with the same management software and functionality (e.g. PlanetLab based testbeds, OMF based testbeds, Emulab based testbeds, …). Once a specific standardized federation API is implemented for such a framework, it can be deployed on the specific testbeds running that software. The next section will go in more detail on the architecture for the management of a heterogeneous federation.


30 of 69


4 Architecture for Fed4FIRE development cycle 1

4.1 Introduction In the previous chapter, it was discussed that the Fed4FIRE architecture should cope with heterogeneous testbed software frameworks, however with focus on the same federating interfaces on top of the frameworks so that tools can work with multiple testbeds, and so that orchestration engines should only have to cope with a single type of interface and that user accounts are shared over the testbeds which are important requirements. We split up the detailed architecture discussion in multiple parts according the experiment lifecyle:

• Resource discovery, resource requirements, resource reservation and resource provisioning • Monitoring resources and experiment • Experiment control • Permanent storage: will be discussed in cycle 2 of Fed4FIRE

4.2 Resource discovery, resource requirement, resource reservation and resource provisioning

In Figure 6, the components of the architecture for cycle 1 of Fed4FIRE are depicted which play a role in the steps resource discovery, resource requirement, resource reservation and resource provisioning of the experiment lifecycle. The architecture is distributed with components at the testbed location, some at the central locations (or multiple ‘central’ locations) and the experimenter clients at the experimenter’s PCs/laptops/…. It is a goal to add only components in central locations to facilitate the use of the testbeds, but without being necessary for a correct operation. We call these ‘brokers’, as they provide ‘brokered’ access between experimenter tools and the testbeds. Compare this to a DNS service in the internet: a browser/application can perfectly use IP addresses to reach services, but DNS eases this by introducing a mapping between a human readable hostname and domain and an IP address. However, DNS is in principle not necessary in the process of reaching the service.


31 of 69


Exp

erim

ente

rTe

stbe

dTe

stbe

dm

anag

emen

t

Testbed A Testbed B

Discovery, reservation, provisioning


Testbed directory

Identity provider

Rules-basedauthorization

Grant access?


Grant access?

Identity provider

Central location(s)

Portal (portal.fed4fire.eu)

Discover, reserve, provision

Bro

kers

Future reservation broker

Certificate directory

Tool directory

Figure 6: Fed4FIRE cycle 1 architecture for Discovery, requirement, reservation and provisioning

4.2.1 Central location(s) The following components will be provided in one or more central locations in cycle 1 for resource discovery, resource requirement, resource reservation and resource provisioning:

• Portal: a central starting place and register place for new experimenters • Identity provider: experimenters who register at the portal are registered at this identity

provider (as can be seen in the Figure, there will be also identity providers at testbeds) • A testbed directory which is readable by humans and by computers to have an overview of

all testbeds in the federation • A tool directory which gives an overview of available tools for the experimenter • Certificate directory: for the chain of trust, there should be a trusted location of root

certificates of the identity providers • Future reservation broker: to facilitate future reservations of resources, this broker can help

to find the right time slots and resources over multiple testbeds. Instead of an experimenter tool which has to query all testbeds, it can do a single query to this broker.

4.2.2 Testbeds At the testbed side, we have the following components:

• A component which does discovery, reservation and provisioning with a common interface • A testbed may be or may not be an identity provider • For authentication/authorization between users and testbeds a trust model is used, where

identity providers trust each other and specific experimenter properties are included in the


32 of 69


experimenter’s certificate, which is signed by the identity provider. So testbeds can do rules-‐based authorization.

• A testbed can query/trust the central certificate directory to see which root certificates it should trust.

4.2.3 SFA (Slice-‐based Federation Architecture) In the GENI framework, the SFA framework [7] (note: this is a historical document which is a draft of a rough consensus, do not read this as an API description, but only to get the concepts) was developed which has the same requirements and environment. The current concepts and APIs of this framework are documented on the GENI wiki (http://groups.geni.net/geni/wiki/GeniDesign) and key interfaces are the following:

• GENI Aggregate Manager (AM) API version 3 [1] o This contains a description of the API for discovery, resource requirements and

provisioning. o It defines the GENI certificates for authentication. The GENI AM API uses XML-‐RPC

over SSL with client authentication using X.509v3 certificates. o It defines GENI credentials for authorization o It defines GENI URN identifiers for identifying and naming users, slices, slivers, nodes,

aggregates, and others • GENI RSpec (resource specification) version 3 [2], the RSpec comes in thee flavours:

o Advertisement RSpec: for resource discovery (getting a list of all resources) o Request RSpec: for requesting specific resources o Manifest RSpec: for describing the resources in an experiment

Part of the services that we identified as ‘broker services’ in central locations are in GENI identified as a Clearinghouse (http://groups.geni.net/geni/wiki/GeniClearinghouse) :

• An Identity Provider (IdP) provides certificates and PKI key materials to human users, registering them with the GENI federation as GENI users.

• A Project Authority asserts the existence of projects and the roles of members (e.g. PI, Experimenter).

• A Slice Authority provides experimenters with slice credentials by which to invoke AM (Aggregate Manger) API calls on federation aggregates.

• A Service Registry provides experimenters with a ‘yellow pages’ of URLs of all trusted services of different kinds. In particular, the list of all available aggregate managers trusted by GENI (possibly satisfying particular criteria) is provided.

• A Single-‐Sign-‐on Portal, which provides web-‐based authentication and access to the authorized Clearinghouse services and other GENI user tools.

As most of the testbeds in Fed4FIRE have already some implementation of these SFA interfaces and concepts or are currently implementing them, and as it matches the architecture of cycle 1, it was decided to adopt SFA as the framework for Fed4FIRE cycle 1. However, three important addons are needed for the Fed4FIRE community:

• The GENI RSpecs are not tightly specified, which means that the same type of resources (e.g. virtual machines) are defined in multiple ways. It is the goal in Fed4FIRE to explore the use of ontology based descriptions for these RSpecs. This will be done in WP5. This should make it more easy for experimenters, experimenter tool and broker developers to use these resources.


33 of 69


• Policy based authentication is very important for the Fed4FIRE partners. Credentials, certificates and policy engines should be extended as such. This will be tackled in WP7. This is further specifying the ‘Clearinghouse’ functionality.

• There is no concept of future reservation in the AM API at this moment. This extension will be studied in detail in WP5.

The component in Figure 6 which does the discovery, reservation and provisioning is thus an Aggegate Manager (AM) with the GENI AM APIv3 as the common interface (blue interface).

4.2.4 Authentication and authorization One of the advantages of SFA is that authentication and authorization is based on X.509v3 certificates, credentials and a chain of trust. This means that by nature it is distributed and very scalable. It should be however further defined in WP7 how exactly all policy based authorization can be done and how the chain of trust will be defined.

4.3 Monitoring and measurement The following types of monitoring and measurement are identified (Figure 7):

• Facility monitoring: this monitoring is used in the first level support to see if the testbed facilities are still up and running. The most straight forward way for this, is that there is a common distributed tool which monitors each facility (Zabbix, Nagios or similar tools). The interface on top of this facility monitoring should be the same and will further be specified in WP6 (it seems in this case more straightforward to use all the same monitoring tool, then to define and implement new interfaces).

• Infrastructure monitoring: this is monitoring of the infrastructure which is useful for experimenters. E.g. monitoring of switch traffic, wireless spectrum or physical host performance if the experimenter uses virtual machines. This should be provided by the testbed provider (an experimenter has e.g. no access to the physical host if he uses virtual machines) and as such a common interface would be very handy, but is not existing today.

• Experiment measuring 1 : measurements which are done by a framework that the experimenter uses and which can be deployed by the experimenter itself on his testbed resources in his experiment. In the Figure one can see two experiment measuring frameworks each with its own interfaces (and thus experimenter tools). Of course, a testbed provider can ease this by providing e.g. OS images with certain frameworks pre-‐deployed.

In the first cycle of Fed4FIRE, the facility monitoring will be rolled out on all testbeds. Infrastructure monitoring and experiment measuring will be further discussed in WP6.

1 The measuring components could be located at the experimenter, broker or testbed management layer, this will depend on the further specifications in WP6.


34 of 69


Exp

erim

ente

rTe

stbe

dTe

stbe

dm

anag

emen

t

Testbed A Testbed B



Testbed directory

Identity provider


Grant access?


Grant access?

Identity provider

Central location(s)

Central facility monitoring(first level support)

Bro

kers



Tool directory

MeasurementMasurement

Get monitor data

Measurement Measurement

Infrastructuremonitoring

Facilitymonitoring

Facilitymonitoring

Get monitor data

Figure 7: Monitoring and measurement architecture for cycle 1

4.4 Experiment control For experiment control2, the testbeds or central locations should not run specific components, as the experimenter can fully roll this out on his own. However the testbed providers could ease this by putting certain frameworks pre-‐installed in certain images. Figure 8 shows two experiment control frameworks each with its own interfaces and experimenter user tools/command line tools.

2 The experiment control components could be located at the experimenter, broker or testbed management layer, this will depend on the further specifications in WP5.


35 of 69


Exp

erim

ente

rTe

stbe

dTe

stbe

dm

anag

emen

t

Testbed A Testbed B



Testbed directory

Identity provider


Grant access?


Grant access?

Identity provider

Central location(s)

Portal (portal.fed4fire.eu)

Bro

kers



Tool directory

Definescenario

Experiment control server

Experiment control serverExperiment

control serverExperiment

control server

Definescenario

Figure 8: Cycle 1 architecture for Experiment Control


36 of 69


5 Requirements which are fulfilled with the architecture in cycle 1 This chapter takes all requirements from chapter 2 and evaluates them for the specific Fed4FIRE architecture of the previous chapter. We show in green the requirements which will be fully fulfilled by the proposed architecture, in orange the requirements which are partially fulfilled and in red the requirements which are not yet fulfilled.

5.1 Generic requirements of a FIRE federation

• Sustainability: o DoW: Develop a federation architecture taking into account an evolving demand

from facility providers and experimenters coming from different research communities. Can testbeds join and leave easily? Infrastructure is very dynamic.

o Call objective: implementing a demand-‐driven high level federation framework for all FIRE prototype facilities and beyond

• The architecture supports multiple identity providers/portals. There is a common API for discovery, requirements, reservation and provisioning while it imposes no restrictions on the use of specific experiment control, monitoring and storage. The common API makes it straight forward to add new tools and testbeds while a testbed can be an extra identity provider also.

o DoW: Develop, adapt or adopt a common set of tools for experiment lifecycle management, measurements, and trust management. This includes the specification of APIs towards the experimentation facilities.

• Common API for discovery, requirements, reservation, provisioning o DoW: Provide the route towards experimentation infrastructures that are

sustainable, while adapted to different usage and business scenarios. • If you have an experiment tool and a testbed, you can set up an

experiment. So no functionality of central components is needed in a minimum setup. This means that central components are not needed for the basic functionality of setting up an experiment, which makes it very sustainable. New tools, testbeds and broker services can be easily added and as such the whole framework is sustainable from an architecture point of view.

o DoW: Adapt the experimentation facilities towards a subset of common APIs supporting the experiment lifecycle management and trust management.

• This will be done as described above o Is there a minimum dependency on specific tools/software (which should avoid that

when the development of a specific tool stops, the whole ecosystem stops working, or which would make it necessary that the specific tool should be put in sustained development mode)

• Yes, there is no central component which is needed to use testbeds. Of course, the central portal, identity provider, testbed directory, tool directory and certificate directory eases the use of the federation as you have all the information in a single place to get new experimenters use testbeds and tools.

o Call objective: for building as far as possible on proven existing federation models, for the use of open standards

• SFA is adopted by many testbeds already


37 of 69


• Scalability: o How can the architecture cope with a large number and a wide range of testbeds,

resources, experimenters, experiments and tools? • As tools can speak directly to the testbeds through SFA in a peer-‐to-‐peer

way, this is inherently scalable. The same for multiple identity providers with a chain of trust model. For very large experiments over multiple testbeds, experimenter tools can directly talk to all testbeds or broker services can orchestrate this (making the experimenter tool more simple).

• Robustness: o How dependent is the federation on single point of failures (of software, network,

servers) • None, fully distributed

• Support: o How easily can components/testbeds/software be upgraded?

• For this, the APIs should be versioned and tools and testbeds should support 2 or 3 versions at the same time, so that all components can be gradually upgraded

o How can different versions of protocols be supported? (e.g. upgrade of RSpec) • With versions

• Portal: o Call objective: including the development of a joint FIRE portal, operated until the

end of 2015 • Yes

• Cost: o Keep the cost for the roll out the federation as low as possible

• The architecture seems very feasible for cycle 1, as SFA is already present or on the roadmap of a lot of testbeds and tools

• Experimenter ease of use: o DoW: The final goal is to make it easier for experimenters to use all kinds of testbeds

and tools. If an experimenter wants to access resources on multiple testbeds, this should be possible from a single experimenter tool environment.

• Is possible, but Fed4FIRE should also aim to keep such tools up-‐to-‐date during the lifetime of the project and set up a body which can further define the APIs, also after the project.


38 of 69


5.2 High priority requirements of the infrastructure community (WP3)

Federation aspect Req. id Req. statement

Remark

Resource discovery I.1.101 Node capabilities In cycle 1, the basic node capabilities can be retrieved in the same way. Not all facilities will provide all capabilities, but will provide the most tangible for their infrastructure. In cycle 2 and 3 this will be further refined to an ontology model of the resources.

Resource discovery I.1.105 Discovery through federation-‐wide APIs SFA will be adopted for this on all testbeds.

Resource discovery I.1.106 Intra-‐infrastructure topology information

Will be provided as is by the facilities, but will be refined further in cycle 2 and 3 with the ontology model

Resource discovery I.1.107 Inter-‐infrastructure topology information

Interconnecting infrastructures in a structured way is not targeted for cycle 1

Resource discovery I.1.109 Query search Based on the resource discovery of SFA, this will be possible. Brokers (with the same interface) can help in doing this on multiple testbeds at once.

Resource discovery I.1.110 Catalogue search Will be tackled by the portal which will line up all testbeds, and user tools which listen up all resources based on the discovery phase.

Resource requirements

I.1.201 Manually extract requirements from discovery query results

When the query in the discovery phase returns a certain list of resources, it should be possible for the experimenter to select the resources he/she would like to include in the experiment. This should be supported in relation with a specific resource ID (e.g., I want this specific node at this specific Wi-‐Fi testbed).


I.1.301 Hard resource reservation SFA will be extended to cope with future reservations. Testbeds should implement hard reservation (is not an architecture requirement).


I.1.304 Automated reservations handling SFA will be extended in this way, brokers can handle this in a multi-‐testbed way.


39 of 69



Remark


I.1.305 Reservation information SFA will be extended in this way


I.1.401 Provisioning API Will be done by SFA


I.1.403 Root access No architecture requirement


I.1.404 Internet access to software package repositories

No architecture requirement

Experiment control I.1.501 SSH access Architecture will make it possible to distribute SSH public keys as part of the SFA API (geni_users option in provision call).

Experiment control I.1.502 Scripted control engine Architecture facilitates the use of experiment control engines

Experiment control I.1.504 Generality of control engine No architecture requirement

Monitoring I.2.101 Measurement support framework Architecture facilitates the use of measurement framework

Monitoring I.2.104 Monitoring resources for operational support

Architecture facilitates the use of such monitoring framework

Monitoring I.2.105 Monitoring resources for suitable resource selection and measurement interpretation

The architecture makes a distinction between facility monitoring, infrastructure monitoring and experiment monitoring. All three are supported by the architecture, but should be worked out in WP6. This requirement seems to be more a WP6 requirement.

Monitoring I.2.106 Minimal impact of monitoring and measuring tools

No architecture requirement

Permanent storage I.2.201 Data storage Doing this in a structured way will be tackled in cycle 2 or 3

Permanent storage I.2.202 Data security Doing this in a structured way will be tackled in cycle 2 or 3

Permanent storage I.2.203 Stored experiment configuration Doing this in a structured way will be tackled in cycle 2 or 3


I.3.101 Single account The use of SFA with a chain of trust makes it possible to access all testbeds with the same certificate if the testbeds trust a bundle of root certificates


40 of 69



Remark


I.3.102 Public keys Public SSH keys will be distributed so that the right nodes can be accessed. (geni_users option in the Provision call of the AM APIv3)


I.3.103 OpenVPN handling The vision in cycle 1 is that these should be abandoned by means of IPv6 or ssh gateways.


I.3.104 Authentication of API calls SFA does this with certificates

Authorization I.3.201 Per-‐experimenter restrictions This will be possible partially in cycle 1 based on attributes in the certificates, but will be further tackled in cycle 2 and 3 with an advanced policy control


I.3.402 Experiment descriptions This is not an architecture requirement, but a requirement for the user sign up process (and in fact for every experiment/slice the experimenter creates !).


I.3.403 Accountability Structured interconnectivity of testbeds and internet access is not envisioned for all testbeds in cycle 1. PlanetLab Europe does provide internet access and has already an advanced traffic trace framework.

Interconnectivity I.4.001 Layer 3 connectivity between testbeds Not present in a structured way in cycle 1.

Interconnectivity I.4.003 Transparency Not present in a structured way in cycle 1.

Interconnectivity I.4.005 IPv6 support In cycle 1, some testbed resources will be reachable through IPv6. The architecture can cope with this, e.g. through DNS names which resolve to IPv6

Interconnectivity I.4.006 Information about testbed Interconnections

Not present in a structured way in cycle 1.


41 of 69


5.3 High priority requirements of the services community (WP4)

Field Req. id Req. statement Remark

Resource Discovery

ST.1.005 Describe connectivity options

Interconnectivity in a structured way is not tackled in cycle 1

Resource Provisioning

ST.1.007 Experiment Deployment

Architecture makes it possible to deploy such tools.

Resource Requirements ST.1.013 Resource

description

Identical to I.1.101

Resource Reservation

ST.1.017

Testbed reservation information and methods according to experimenter profiles/policies

Similar to I.3.201

ST.1.020 Scheduling parallel processing

The architecture in cycle 1 can cope with this, but interconnectivity will not be supported yet in a structured way.

Experiment Control

ST.1.023

Access to internal and external services from a federated testbed

Not an architecture requirement.

ST.1.029 Multiple testbeds available

The architecture in cycle 1 can cope with this, but interconnectivity will not be supported yet in a structured way. Some testbeds will probably use public IP addresses (IPv4 or IPv6) which can be used for inter testbed communication over the public internet.

Monitoring ST.2.001 Monitoring management control

From an architectural viewpoint, the facility, infrastructure and experiment monitoring can cope with this. The details should be filled in by WP6.

Permanent Storage

ST.2.007

Monitoring data and experiment result storage. Audit, archiving, accountability

Not yet foreseen in the architecture.


42 of 69


Field Req. id Req. statement Remark

Authorization ST.3.002 Single sign on for all testbeds required for experiment

This is tackled by using certificates signed by identity providers. Once you upload this certificated in the experimenter tool and provide a passphrase, you can use it on all testbeds.

Dynamic federated identity

management

ST.3.003 Identity Protection

The certificates of the experimenters are not freely listed on the internet, but of course, the testbeds know the identity of an experimenter wanting to create an experiment on its testbed.

ST.3.004 Single set of credentials

The single set of credentials (signed certificate) is okay for resource discovery, requirements, reservation and provisioning, not for monitoring and experiment control (as this is not standardized yet)

Interconnectivity ST.4.001

Access between testbeds (interconnectivity)

Interconnectivity in a structured way is not tackled in cycle 1

Policy

ST.4.002 Describe and implement access policy

This will be possible with agreed attributes in the certificates.

ST.4.003

Enable testbed access to users outside the federation

The testbeds can still be identity provider for their own testbed and as such provide access to users outside of the federation.

5.4 High priority requirements of WP8 Requirement Description FLS.1 Facility monitoring should push RAG (Red, Amber, Green) status to a central dashboard

for FLS reactive monitoring FLS.2 The Facility RAG status should be based upon the monitoring of key components of

each facility that indicate the overall availability status of the facility FLS.3 The FLS should be able to drill down from the facility RAG status to see which

components are degraded or down FLS.4 The key components monitored for facility monitoring, should be standardised across all

facilities as much as possible FLS.5 A commitment is required from each testbed to maintain the quality of monitoring

information (FLS is “monitoring the monitoring” and the information FLS has is only as good as the facility monitoring data)

FLS.6 Any central federation-‐level systems/components that may be implemented will need to be monitored by FLS (e.g. a central directory)

FLS.7 FLS requires visibility of planned outages on a push basis from testbeds and administrators of central systems

FLS.8 Exception alerts from both testbeds and central systems should be filtered prior to reaching the FLS, to avoid reacting to alerts unnecessarily.

The architecture described a facility monitoring which should be standardized over all testbeds. All the FLS requirements can as such be fulfilled from an architectural point of view, but the tools and protocols used for these are part of WP6.


43 of 69


5.5 Requirements from a sustainability point of view (Task 2.3) • From a sustainability point of view, it is preferential that the number of required central

components is minimal as these components put a high risk on a federation in terms of scalability and long term operability.

o Yes, there is no central component which is needed to use testbeds. Of course, the central portal, identity provider, testbed directory, tool directory and certificate directory eases the use of the federation as you have all the information in a single place to get new experimenters use testbeds and tools.

• It is also required that the federation framework supports the joining and leaving of testbeds very easily, as this will be common practice.

o The architecture supports multiple identity providers/portals. There is a common API for discovery, requirements, reservation and provisioning while it imposes no restrictions on the use of specific experiment control, monitoring and storage. The common API makes it straight forward to add new tools and testbeds while a testbed can be an extra identity provider also.

• New tools can be easily added, while the dependency on specific tools or components for the federation, should be minimized, to avoid that the end of support of a specific tool makes the federation unusable

o See previous • Finally, it is also required that the experimenters can join and leave the federation easily,

there is some notion of delegation (to make it more scalable for lots of experimenters) and PIs (principal investigators) can put an end time on experimenters (e.g. students) or can withdraw experiments they have approved.

o The architecture supports this through the certificates, but of course the certificate creation and sign up process itself has to be defined in detail in WP7 and Task 5.6 (Portal definition)


44 of 69


6 Conclusion This first deliverable of task 2.1 defines the architecture for cycle 1 of the Fed4FIRE project. It takes as input requirements of WP3 (Infrastructure community), WP4 (Service community), WP8 (First Level Support), task 2.3 (sustainability) and general FIRE federation requirements (EC call objectives, Fed4FIRE DoW). It defines an architecture that copes with as much requirements as possible and that will be implemented for cycle 1 of Fed4FIRE. As such it will also be deployed for the first round of Open Calls. The details of the first cycle implementation will be worked out in Deliverables D5.1, D6.1 and D7.1 and in the Milestones M3.1 and M4.1. In chapter 2 all these requirements are listed, while in chapter 3, we have evaluated 4 types of architectures: fully central, only a website listing all testbeds to a fully distributed federation of homogeneous or heterogeneous testbeds. Based on this evaluation, in chapter 4, we described the architecture for cycle 1 for the different steps in the experiment lifecycle (discovery, requirements, reservation, provisioning, monitoring and measurement, experiment control and storage). For cycle 1, discovery, requirements and provisioning will be done based on the SFA GENI AM API v3 standard, while for advanced reservation and extended policy based authorization extensions will be made. Regarding the RSpecs, it will be aimed to get them as unified as possible in cycle 1 (with GENI RSpec v3 as a guideline), while for cycle 2 and 3 a unified ontology based RSpec definition is the target. In cycle 1, full interoperability of tools and testbeds is required, but tools may have to interpret multiple RSpec types for this. The number of types should be limited though. The architecture defines also multiple identity providers with a chain of trust, and a central portal accompanied with an identity provider and directories (machine and human readable) for tools, testbeds and certificates. For First Level Support, the architecture defines facility monitoring which should be identical for all testbeds and should make it possible for first level support to have a high level overview of testbed and resource availability. Infrastructure monitoring, experiment measurement and experiment control are not yet standardized, as no standards exist yet. This however is no blocking issue, as the experimenters can deploy their own tools for this and they are not dependent on the testbed providers. Infrastructure monitoring being partly the exception on this, but this should be further tackled in WP6. Chapter 5 then lists all high priority requirements to see which are already tackled or not. From this, the non-‐tackled requirements are the ones regarding inter-‐connectivity between testbeds and storage of all kinds of information. Both of them will be tackled in cycle 2 and 3 of Fed4FIRE. Other requirements are now partly resolved, but will be fully resolved when the ontology based RSpecs/resource descriptions will be deployed. As a general conclusion, it seems that the first cycle architecture and way forward already will cope with a lot of the requirements that have been set by the other work packages.


45 of 69


References [1] GENI AM API v3, http://groups.geni.net/geni/wiki/GAPI_AM_API_V3 [2] GENI RSpec v3, http://www.geni.net/resources/rspec/3/ [3] Flack, http://www.protogeni.net/trac/protogeni/wiki/Flack [4] Omni, http://trac.gpolab.bbn.com/gcf/wiki/Omni [5] SFI, http://svn.planet-‐lab.org/wiki/SFAGettingStartedGuide [6] MySlice, http://myslice.info/ [7] SFA, Slice-‐Based Federation Architecture, version 2.0, July 2010,

http://groups.geni.net/geni/wiki/SliceFedArch [8] OMF, cOntrol and Management Framework, http://mytestbed.net [9] Zabbix, http://www.zabbix.com/ [10] Nagios, http://www.nagios.org/ [11] Zenoss, http://www.zenoss.com/ [12] Ganglia, http://ganglia.sourceforge.net/ [13] TopHat, http://www.top-‐hat.info/ [14] OML, http://mytestbed.net/projects/oml


46 of 69


Appendix A: Architectural details regarding the Fed4FIRE testbeds In order to define a suitable architecture for FIRE federation, it is of interest to get a deeper understanding of the architectural details of the different FIRE facilities belonging to this project (Figure 9). Through the identification of commonalities, it should be possible to minimize required adaptations on the testbeds, while the reuse of tools can be stimulated. This is not only beneficial in the context of the successful completion of the Fed4FIRE project, but will also ease the introduction of additional new infrastructures in the federation on the long term. Keeping this goal in mind, the architectures of the different testbeds currently belonging to the project are presented in this section in a uniform manner. The corresponding legend is depicted in Figure 10. For the sake of clarity, similar testbeds are clustered as much as possible.

Figure 9: Overview of testbeds currently belonging to the Fed4FIRE consortium


47 of 69


Figure 10: Legend of the uniform architectural representation

A.1 OMF based wireless testbeds A first cluster of similar testbeds that can be identified is that of the wireless testbeds NORBIT, NETMODE, NITOS and w-‐iLab.t. They are all similar in technical scope: their main purpose is IEEE 802.11 (Wi-‐Fi) experimentation, possible expanded with additional capabilities such as wireless sensor networking or OpenFlow experimentation. From a testbed management point of view, they have in common that they all rely on the OMF framework. In the remainder of this section, more details regarding each of these four testbeds are given.

A.1.1 NORBIT The NORBIT testbed is a Wi-‐Fi testbed located in Sydney, Australia. It belongs to the NICTA group. The testbed consists of 38 nodes equipped with a VIA C3 CPU, 512 MB RAM, 40 GB HD, 2 Ethernet interfaces and 2 IEEE 802.11a/b/g interfaces. These nodes are installed indoors in an office environment, and are fully managed by OMF. The corresponding generic OMF architecture is depicted in Figure 11. At the bottom the resources are depicted, in this case an embedded PC with several wired and wireless connections. Through SSH the experimenter can login to this resource. On the OMF layer, three entities can be observed. The Aggregate Manager (AM) is responsible for maintaining the inventory of all managed resources. It also provides a service (based on the Frisbee software) that can extract binary images from a node’s hard drive, and that can restore that image for continuation or repetition of the experiment later on. The third service of the AM is that of the Chassis manager, which is responsible for powering nodes on or off when needed. These three services are reachable through a REST webinterface. The second OMF management entity is the Experiment Controller (EC). It processes a description of the experiment scenario (written in the OEDL language), and will automatically trigger the right actions at the right nodes when needed. Although it is part of the OMF management layer from a logical point of view, the EC can both be provided as a service by the testbed, and can be run locally by the


48 of 69


experimenter on its own PC. To perform these actions at the resource, the EC will send a message to a daemon running an every resource: the Resource Controller (RC). The RC is capable of executing everything what a user could do manually on the command line. It can also make abstraction of certain commands by wrapping them in OMF commands. An example is the OMF command to change the Wi-‐Fi channel. Behind the curtains it will determine which wireless driver is used on the resource, and will then select the suitable set of command line commands to execute, depending on the driver. To support this messaging between EC and RC, the XMPP protocol is used. Hence a XMPP server is added as the third entity of the OMF control framework, and all elements within the OMF architecture are given a XMPP interface.

The OMF framework is currently designed to do two things: provisioning the nodes and controlling the experiment. The provisioning operation is executed as follows (Figure 11)

1. The experimenter gains access to the PC that runs the EC (his own PC, or a server at the testbed that is reachable through SSH)

2. The experimenter starts the provisioning procedure by giving the command “OMF load”. The name of the image to be loaded is given as an argument.

3. If the target node is powered off, the EC will call the chassis manager service of the AM to power it on. For this it will send an appropriate XMPP message.

4. Once the target node is powered on, the EC will call the disk imaging service of the AM to flash the hard drive of the node. Again an XMPP message will be sent. The Frisbee software will be used to actually perform the flash operation.

The experiment control on the other hand is executed as follows:

1. The experimenter gains access to the PC that runs the EC (his own PC, or a server at the testbed that is reachable through SSH)

Figure 11: Details of OMF provisioning functionality


49 of 69


2. The experimenter starts the experiment control procedure by giving the command “OMF exec”. The name of the scenario description that is to be executed is given as an argument. The EC will process this description, and will initiate specific commands at certain resources at the appropriate time.

3. If the target node is powered off, the EC will call the chassis manager service of the AM to power it on. For this it will send an appropriate XMPP message.

4. Once the target node is powered on, the EC will request the RC running at the desired resource to perform a certain command. As mentioned, this can be any command that could also be given on the command line manually. To trigger the RC, an XMPP message is sent from the EC to the RC.

It is a common misassumption that OMF and OML are the same things, while in fact they are not. OMF is the framework for provisioning and experiment control as described above. OML is an additional software suite targeting experiment monitoring. They are very often deployed together in testbeds, but this is not a strict requirement. From a logical point of view they can be considered as two separate entities. You can run OMF without running OML, and vice versa. But their corresponding software libraries are installed on the same entities. As depicted in Figure 12, OML consists of a service running on the resource, and a service and database running on the AM. On the resource, the Measurement Library (ML) takes measured values as an input, and is responsible for getting them added to the database at the AM. It does so by calling the OML service running at the AM. Again XMPP messaging is applied. Annotations to the measured values such are experiment ID, source ID and so on are automatically added by the OML framework. From an experimenter point of view, it is sufficient to redirect the measured value to the ML to collect all of them in a single place for future processing. All these described architectural characteristics are common for all OMF testbeds described in this section. However, some properties specific to NORBIT should also be mentioned. Reservations on the testbed are currently performed using a Google calendar. It is not coupled to the NORBIT management software, hence it can be considered as a separate tool to coordinate the testbed usage according to a gentlemen’s agreement. Authentication is performed by logging in to an experiment controller which is connected to the same LAN as the testbed. Therefore, when not located at the NORBIT premises, it is needed to login to the permanent EC server provided by the testbed. Authentication is then based on a Linux user account that is configured by the testbed administrators for a new user. No authorization mechanisms are in place at the moment. Another remark to be made is that a new version of OMF will be released soon: OMF6. Important in this version is that the format of the XMPP messages will be standardized. This enables developers to create new experiment control tools that interact with the deployed OMF resource controllers. Another important feature of OMF6 is that it will include a native SFA interface.


50 of 69


A.1.2 w-‐iLab.t The w-‐iLab.t testbed is intended for Wi-‐Fi and sensor networking experimentation. It is located in Gent, and belongs to iMinds. Two different installations can be identified: w-‐iLab.t Office which consists of 200 nodes spread across 3 floors of the iMinds building (90x18m). Each node consists of an embedded PC, 2 IEEE 802.11a/b/g interfaces and a sensor node (IEEE 802.15.4). The second installation is called w-‐iLab.t Zwijnaarde, and consists of 60 fixed nodes in a pseudo-‐shielded environment (attic above cleanroom). In this case the embedded PCs are more powerful, and they are equipped with 2 IEEE 802.11a/b/g/n interfaces, a sensor node (IEEE 802.15.4) and a Bluetooth interface. An additional 20 nodes are installed on mobile robots. Since the testbed is managed by the OMF framework, its architecture looks very similar to that of NORBIT. The only difference is the larger heterogeneity of resources and the presence of one custom service at the AM: the Wical service. A custom webinterface is also provided to retrieve information about the resources, together with a specific Google calendar account for reservations. As depicted in Figure 13, the webinterface polls the inventory of the AM through the REST interface. Figure 14 illustrates how reservations are implemented:

1. On the webinterface an experimenter can check on a Google Calendar page which reservations have been scheduled for the testbed.

2. If the experimenter wants to schedule its own reservation, he/she fills in the correct information in his/her personal Google Calendar page. In short, when creating a meeting, a specific Google Calendar account related to the testbed has to be invited, and information regarding which nodes to reserve has to be put in the subject field in a well-‐defined manner.

Figure 12: Details of OML experiment monitoring functionality


51 of 69


3. The Wical service will periodically poll the Google Calender API, checking for new reservations.

4. If a new reservation has been identified, it is checked if it does not conflict with an already existing reservations. If not, it will approve the reservation.

5. When nodes are not part of any active reservation, they are automatically shut down by the Wical service. Likewise, they are automatically powered on when needed in a reservation.

It should be mentioned that this is a soft reservation: after reservation of a node, other users will still be able to log into it, reflash the hard drive, reboot it through the webinterface, etc. The Google Calendar can be considered as a tool to coordinate the testbed usage according to a gentlemen’s agreement

Figure 13: Discovery in the w-‐iLab.t testbed


52 of 69


Provisioning in w-‐iLab.t is standard OMF, and hence equal to the process of NORBIT. The same can be said about experiment monitoring, permanent storage, authentication and authorization. Users can control their experiment through manually log in to the nodes through SSH, or by using the OMF experiment control framework.

A.1.3 NITOS The NITOS testbed is a Wi-‐Fi testbed of the University of Thessaly, Volos, Greece. It consists of 50 nodes installed in-‐ and outdoors, and 50 additional nodes are currently being installed in an indoor shielded environment. Three different hardware types are installed, with different performance characteristics and 802.11 technologies. Each node is equipped with an experimental Ethernet interface which is connected to an OpenFlow switch. This way, the testbed can also be used for OpenFlow experimentation. The NITOS testbed is managed by the OMF framework. Two additional developments differentiate it from the standard OMF deployment as described previously: is provides an SFA interface to the testbed, and it implements a scheduler on top of OMF. The former is illustrated in Figure 15, the latter in Figure 16.

Figure 14: Reservation in the w-‐iLab.t testbed


53 of 69


Discovery in the NITOS testbed is supported in two different ways. First of all, a dedicated webinterface can be used. This webpage calls the REST interface of the AM, which in its turn queries the Inventory to compile the list of testbed nodes. The other approach is to use the SFA interface. This interface is based on the Generic SFA wrapper from INRIA. It can be seen as a framework that already implements the SFA side of the interface (authentication, definition of the SFA API calls, etc), and provide stubs where the corresponding calls to the specific testbed APIs should be injected. Therefore any testbed wanting to adopt this generic SFA wrapper needs to have such an API in place. In NITOS a novel XML-‐RPC interface was added to the AM, and this API is called by the Generic SFA wrapper. So when using the SFA interface of the testbed, the user relies on a SFA tool (at the moment only MySlice was intensively tested on NITOS) for discovery, this tool connects to the SFA interface on the AM, which calls the XML-‐RPC interface on the AM, which calls the scheduler database. This database contains all information available in the inventory, and can therefore respond with the appropriate information.

Figure 15: Discovery in the NITOS testbed


54 of 69


Of course the Scheduler DB is more than merely a duplicate of the Inventory. It also contains all scheduling information. This content is manipulated by the NITOS scheduler, which is an additional service implemented on the AM. This service can be called through the XML-‐RPC and SFA interfaces. It provides users the possibility to reserve nodes and channels, an important feature in the context of wireless experimentation. Users can only view resources which are still available for a given timeslot. If the scheduler decides that a new reservation request is not in conflict with any existing reservation, then it will add it to the DB. Note that the nodes are hard reserved (meaning only the experimenter that reserved them can access them), while channels are soft reserved (gentlemen’s agreement not to use any other channel than your own). To implement this hard node reservation, the firewall on the NITOS EC will continuously poll the scheduler DB and dynamically adjust its settings. Since experimenters are forced to SSH to their nodes through the EC, this gives the scheduler full control regarding node access. Similarly, the “omf exec” command was adjusted to first check in the Scheduler DB if the experimenter has reserved the nodes before processing the experiment descriptor. A third service that regularly polls the scheduler DB is the AM Chassis manager. It will always attempt to shutdown non-‐reserved nodes. One additional remark to be made about this reservation mechanism is the support for OpenFlow. As mentioned previously, every NITOS node has one experimental interface that is directly connected to an OpenFlow switch. When you reserve a node, the corresponding port on the OpenFlow switch will automatically be reserved for you.

Figure 16: Reservation in the NITOS testbed


55 of 69


It must be briefly mentioned that NITOS has recently federated with PlanetLab Europe. This federation was achieved by adding an XMPP server to PLE, and providing its users the possibility to apply a specific provided installation on their virtual machine that contains the OMF resource controller. By configuring both XMPP servers so that they know about each other’s pub/sub domains, an experiment controller from NITOS or PLE can control all both the nodes within NITOS and PLE that are part of its experiment. Note that the experiment control is federated, but the reservation is not. This means that if you want to perform a joint NITOS-‐PLE experiment, that you first have to reserve all required nodes separately on the two testbeds.

A.1.4 NETMODE The NETMODE testbed is a Wi-‐Fi testbed belonging to the National Technical University of Athens (NTUA). It consists of 20 x86 compatible nodes positioned indoors in an office environment. This testbed is managed by the OMF framework. The main difference with the default OMF installation as described previously is the Netmode scheduler that was added, and the Nagios server that was installed for the purpose of infrastructure monitoring.


56 of 69


Similar to the NITOS testbed, NETMODE added a scheduler service to the OMF AM. However, NETMODE extended the OMF Inventory to include all information related to reservations (while NITOS provided a new database). Again users can hard-‐reserve nodes and soft-‐reserve channels, but this time only through the native web-‐interface. An SFA interface is not yet provided, but the format for reservation requests is based on GENI v2 SFA Rspecs. The Netmode scheduler performs conflict resolution, and if it grants a request, it stores the appropriate information in the extended inventory. Similar to NITOS, the firewall of the NETMODE OMF EC will periodically poll this inventory, and dynamically change its settings when needed.

Figure 17: Reservation at the NETMODE testbed


57 of 69


To support infrastructure monitoring besides the experiment measuring capabilities provided by OML, a Nagios server was added to the NETMODE setup. It can periodically check if all testbed nodes are up and running (can they be pinged?), and if users can connect to them (is there a response on SSH requests?). These checks do not require the presence of any agent on the nodes, only the Nagios server need to be in place. The results are provided through the standard Nagios webinterface.

A.2 FuSeCo testbed The FuSeCo testbed belongs to the Fraunhofer FOKUS group located in Berlin. It consists of two different test infrastructures: OpenIMS and OpenEPC. The former is intended to test IMS applications, which can be considered as an advanced form of VoIP (video, many-‐to-‐many, etc.). It is an evolution of SIP. The latter on the other hand is a reference implementation of the LTE packet core network. Both test infrastructures are managed and used through the same tool: FITeagle. The GUI that experimenters use to define and run their experiments is called VCT Tool. Behind the curtains it relies on the FITeagle framework to get its job done.

Figure 18: Infrastructure monitoring at the NETMODE testbed


58 of 69


The discovery of resources in the FuSeCo testbed is managed by FITeagle. The appropriate steps are: 1. First of all testbed owners have to provide the required information about their testbed to

FITeagle. They can enter this info on the FITeagle Web portal. This portal will call the FITeagle REST interface, which pushes this information to the repository. In the future it is intended to automate this process, in that case the resource adapters will do this automatically by calling the REST interface.

2. When an experimenter wants to discover all resources, he/she can do so on the Web portal. The portal will call the REST interface of FITeagle, which will collect the required information from the repository.

3. Experimenters can also discover resources using the VCT Tool. It will call the same REST interface, and the same information will be collected from the repository.

4. Experimenters can also use SFA tools to discover resources. This is possible because of the SFA Aggregate Manager deployed at FuSeCo. It will translate the SFA discovery call to the appropriate FITeagle REST call, and communicate the results back.

Figure 19: Discovery in the FuSeCo testbed


59 of 69


Provisioning in the FuSeCo testbed is also managed by FITeagle. The process goes as follows: 1. Using the VCT Tool, an experimenter books a certain experiment on FuSeCo. To do so, the

tool will contact the FITeagle REST interface. 2. This request will be handled by the FITeagle Request processor. It retrieve the details about

the experiment from the repository, checks of it is possible and allowed using the policy engine, and if all is good it will book it using the orchestration engine.

3. The orchestration engine will compile the correct sequence of provisioning commands, and will send them in the correct order to the FITeagle gateway.

4. This gateway forwards these provisioning commands to the appropriate domain managers. These will forward it through the resource adapters to the actual testbed components.

Note that an experimenter can do the same using a SFA tool. In that case the experiment provisioning request first goes to the SFA aggregate manager, where it will be translated to the appropriate call on the FITeagle REST interface.

A.3 Smart Santander The Smart Santander testbed is a large scale smart city deployment in the Spanish city Santander. It consists of more than 1000 sensor nodes based on IEEE 802.15.4 technology, and 23 gateway nodes interconnecting these sensor nodes to the Internet. The testbed supports two types of experiments: Internet of Things native experimentation (wireless sensor network experiments) and service provision experiments (applications using smart city data). For the former a testbed runtime based on Wisebed is used, for the latter the Telefonica iDAS system is used, together with a specific service provisioning gateway. At the moment, it is only planned to include service provision experiments in Fed4FIRE. In this case, no reservations are needed, and no typical provisioning has to be performed.

Figure 20: Provisioning in the FuSeCo testbed


60 of 69


The architecture of Smart Santander is illustrated below. Only the part related to service provision experiments needs further explanation, since for now only this is within scope of Fed4FIRE. In this case it is important to know that all sensors deployed in the testbed continuously upload their data to the iDAS system. Service applications will retrieve this information according to their needs. They always have to contact the iDAS system through the Service Provisioning Gateway, direct contact is not allowed.

A.4 OFELIA testbeds In OFELIA, experiments are all about OpenFlow, on top of optical hardware or packet switches. In a typical experiment some virtual machines are deployed both as data sources and as data consumers, an OpenFlow path is established between them, and an OpenFlow controller is deployed that can dynamically change the path based on certain conditions. In Fed4FIRE, three different OFELIA testbeds are represented: the instantiations at i2CAT (Experimenta), University of Bristol and iMinds (Virtual Wall). Since the architecture of these three OFELIA deployments is identical, they are described here together.

Figure 21: Experiment monitoring in the Smart Santander testbed


61 of 69


In OFELIA much of the functionality is included in the component called Expedient. It provides the web-‐based GUI, acts as a clearing house and hosts the plug-‐ins for the VT and Optin components. This VT component on its turn is responsible for the management of the virtual machines. The Optin component is responsible for the management of the OpenFlow components of the testbed. Whenever the Optin component wants to manipulate this OpenFlow hardware, the corresponding command is first verified by the Flowvisor component. This ensures that experiments cannot alter each others flowspace. If an experimenter deploys a NOX server to control the OpenFlow paths, this will be deployed on a virtual machine belonging to the experiment. Provisioning in OFELIA is rather straightforward. The experimenter uses Expedient to give the provisioning command, while the VT plug-‐in makes sure that the appropriate calls get redirected to the XML-‐RPC interface of the VT AM. The Optin plug-‐in on the other hand will redirect the corresponding calls to the XML-‐RPC interface of Optin. It should be mentioned that within the OFELIA project a lot of efforts are currently being invested in a redesign of this architecture. Since it is not yet released it cannot yet be described in detail in this deliverable, but it can be stated that in this redesign the monolithic Expedient tool replaced by a more modular design, and that SFA support will be provided natively (based on GENI Rspecs).

Figure 22: Provisioning in OFELIA


62 of 69


Experiment control in OFELIA is also rather straightforward. Experimenters can always login as root on their virtual machines using SSH. If they want to control their OpenFlow paths, they have to login to their specific VM running the NOX service. When changing any settings on this NOX service, the corresponding configuration commands sent to the OpenFlow hardware will always be validated first by the Flowvisor component.

A.5 Virtual Wall The Virtual Wall is an Emulab instantiation at iMinds (Ghent). It consists of three separate installations, each consisting of 100 nodes. On each setup, any desired topology can be emulated. This includes features such as LAN and WAN connectivity, and link configuration (delay, jitter, throughput). This emulation is based on automatic VLAN configurations.


63 of 69


At the management plane of the Virtual Wall, thee different components can be distinguished. The Boss server is the main management entity, and is responsible for the inventory of the resources, the disk imaging service and the chassis manager. The Ops server is responsible for user related aspects, while a separate file server provides all home and project directories. Provisioning at the Virtual Wall goes as follows:

1. The experimenter creates the provision request. For this, he/she can use the native Emulab web interface, a specific Python script or the Flack tool. Each tool calls a different interface on the Boss server (Web, XML-‐RPC, SFA).

2. The Boss server will first deploy the appropriate images to the hard drives of the nodes participating in the experiment. This is done using the Frisbee tool. It will also configure the appropriate VLANs on the switch that interconnects all nodes. This way, specific topologies can be emulated. If link characteristics have been defined, it will also take into account any impairment nodes that have to be in place behind the curtains.

3. Once the images are all deployed, the chassis manager will perform a cold reboot at all nodes.

4. Once rebooted, all nodes will mount their home and project directories at the Fs server. If a startup script is present, it will be automatically started.

Figure 23: Provisioning at the Virtual Wall


64 of 69


Experiment control on the Virtual Wall goes as follows: 1. If an experimenter wants to manipulate its nodes, it just logs in as root to them using SSH. 2. If an experimenter wants to manipulate the link characteristics, it will do so using the

webinterface, Python script of Flack. These tools will call the Boss server, which will instruct the corresponding impairment nodes to change their settings appropriately.

3. It has been demonstrated that the experiment control capabilities of OMF can be installed on top of the Virtual Wall. In that case an OMF resource controller has to be added to the images used by the nodes.

A.6 PlanetLab Europe PlanetLab Europe (PLE) is a European instance of PlanetLab. This means that organisations can join the testbed if they provide two physical servers at their premises, which are directly connected to the internet (no firewalls) and are put under the control of the PLE management software. Once you are a member of PLE, you can request virtual machines on any of the PLE physical servers. These virtual machines will always be directly connected to the public Internet. As an experimenter, you have full control over these virtual machines.


65 of 69


Several tools can be used in PLE for discovery. The different steps are explained here: 1. The default PlanetLab management component is called MyPLC. It provides an XML-‐RPC

interface wich can e.g. list all resources. When using an appropriate tool such as the library called Plcsh, experimenters can call this XML-‐RPC interface for discovery.

2. On top of this XML-‐RPC interface of MyPLC, the INRIA Generic SFA wrapper has been deployed. It can translate SFA calls to the appropriate XML-‐RPC calls. This way, any SFA client can be used to discover resources on PLE.

3. Specific to PLE is the MySlice management entity. Its core functionality is exposed in several ways (Python library, XML-‐RPC API, web interface), but it comes down to one thing: it combines the standard information available in the MyPLC component with any additional information collected by the TopHat framework. This can be information about connectivity, node reliability, etc. TopHat can include information from any desired data source, here CoMon is given as an example.

Figure 24: Discovery in PlanetLab Europe


66 of 69


Experiment control is supported in two different ways on PLE:

1. Experimenters can log in as root on their VMs using SSH, and perform any experiment control manually.

2. Experimenters can use OMF on PLE. In that case, they must have selected a specific configuration for their VMs that contains the OMF resource controller.

A.7 BonFIRE testbeds The focus of BonFIRE experiments is on cloud computing experiments. The difference with running a cloud service on a commercial platform is that on BonFIRE, experimenters can retrieve highly detailed information about the status of the physical hosts of the virtual machines. Besides, experimenters can also better control how elasticity is performed, and on which physical machines their virtual machines will be deployed (if this control is desired). BonFIRE is represented in Fed4FIRE with 3 different instantiations: the EPCC site, the Grid’5000 site and the Virtual Wall site. The BonFIRE architecture is shown in Figure 26. It is depicted for a single instance. An important component is the BonFIRE broker. It contains the Experiment manager, which exposes the BonFIRE API. Another element of the broker is the Resource manager. It is responsible for reservations, scheduling, accounting and monitoring. The enactor is the third broker element, it translates commands from the resource manager to the appropriate API calls from a specific BonFIRE testbed. So it can be considered as a generic BonFIRE wrapper around a testbed. On the testbed itself, the testbed manager will process these calls from the enactor. An SSH gateway, storage capabilities and optionally a Zabbix instance are also expected to be present at any BonFIRE testbed.

Figure 25: Experiment control in PLE


67 of 69


Provisioning in BonFIRE goes as follows:

Figure 26: BonFIRE architecture


68 of 69


1. The experimenter defines its experiment on the Portal. In that case, the provision command is an OCCI create request, which is called at the OCCI interface of the resource manager. The resource manager will then forward the appropriate OCCI provision requests to the enactors belonging to the testbeds participating in the experiment. As a result, the required virtual machines will be created.

2. The experimenter can also define its experiment using a BonFIRE client that will call the BonFIRE API. In that case the Experiment manager will first translate this request to the corresponding OCCI call on the resource manager. From then on, everything goes similar to the previous step.

Monitoring is a key aspect in BonFIRE. Both the physical and the virtual machines can be monitored in detail using the Zabbix software:

1. When monitoring the physical hosts, the experimenter contacts the Zabbix server running at the specific testbed on which its experiment is running. The Zabbix server can be called through the Zabbix webinterface, or through a BonFIRE client which in this case calls the Zabbix API.

2. When monitoring the virtual machines, the experimenter can use the same tools, but this time he/she connects to a specific virtual machine. On this machine, a specific Zabbix server in deployed that only monitors the virtual machines belonging to the experiment.

A.8 Grid’5000 As mentioned, Grid’5000 is one of the BonFIRE instances in Fed4FIRE. At the moment, only inclusion of this testbed through the BonFIRE framework is considered. However, it is not impossible that in a latter phase it will be decided to include the entire Grid’5000 testbed natively in Fed4FIRE. In that


69 of 69


case the architecture of this testbed becomes important. As can be seen in Figure 27, this testbed is managed by a rather large collection of loosely coupled tools. Since this architecture is currently out of scope for Fed4FIRE, a detailed discussion is not appropriate here. However, it is important to know that experimenters can only get SSH access to their nodes through an SSH gateway and site frontend. A reference API is provided for other user tools. A very important management component is OAR2, which can perform advanced scheduling.

Discovery in Grid’5000 goes as follows:

1. Experimenters can call the OAR2 custom discovery interface using SSH. In this case they first have to login to the site frontend, through the SSH gateway.

2. Experiments can also call the OAR2 custom discovery interface using the Web portal. In this case the Reference API is called, which in its turn calls the OAR2 interface.

Figure 27: Discovery in Grid'5000

Documents

Fed4FIRE 318389 D2-1 First federation architecture...FP7\ICT\318389/iMinds/R/PU/D2.1!!! 2!of!69! ©Copyright!iMinds!and!other!members!of!theFed4FIREconsortium!!2013!! Abstract! This!