15
CS ROMANIA SA PACII 29, 200692 CRAIOVA, ROMANIA TEL: +40 (0)251 412850 FAX: +40 (0)251 417307 EMAIL: [email protected]; WEB: www.c-s.ro ISO 9001:2015 OHSAS 18001:2008 ISO 14001:2015 ISO 27001:2013 CS ROMANIA S . A. WITH SUBSCRIBED AND PAID- IN SHARE CAPITAL 114.800 LEI - R. C. DOLJ J16/2041/91 - SIRUES 164431207 - C. F . RO2316981 CS COMMUNICATION & SYSTÈMES Date : 8 December 2017 Origin : CS ROMANIA SA Business : Tool Augmentation by User Enhancements and Orchestration (TAO) Title : DESIGN JUSTIFICATION FILE (DJF) Reference : V16ESA0101/DJF0010 version 1.1 Status : APPROVED Last name / First name Date Signature Authors: Ilioiu Alexandru, Fomferra Norman 2017-05-16 Verified by: Udroiu Cosmin 2017-06-19 Approved by: Cara Cosmin 2017-06-20 RESTRICTED CLIENT

EMAIL: [email protected]; WEB: ISO 14001:2015

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

CS ROMANIA SA PACII 29, 200692 CRAIOVA, ROMANIA

TEL: +40 (0)251 412850 FAX: +40 (0)251 417307

EMAIL: [email protected]; WEB: www.c-s.ro

ISO 9001:2015 OHSAS 18001:2008

ISO 14001:2015 ISO 27001:2013

CS ROMANIA S.A.

WITH SUBSCRIBED AND PAID- IN SHARE CAPITAL 114.800 LEI - R.C. DOLJ J16/2041/91 - SIRUES 164431207 - C.F. RO2316981

CS COMMUNICATION & SYSTÈMES

Date : 8 December 2017

Origin : CS ROMANIA SA

Business : Tool Augmentation by User Enhancements and

Orchestration (TAO)

Title : DESIGN JUSTIFICATION FILE

(DJF)

Reference : V16ESA0101/DJF0010 version 1.1

Status : APPROVED

Last name / First name Date Signature

Authors: Ilioiu Alexandru, Fomferra Norman 2017-05-16

Verified by: Udroiu Cosmin 2017-06-19

Approved by: Cara Cosmin

2017-06-20

RESTRICTED CLIENT

TAO

Design Justification File (DJF)

date

08/12/2017

page

2 / 15

reference

V16ESA0101/DJF0010

version

1.1

© Copyright TAO Consortium

RESTRICTED CLIENT

SUCCESSIVE VERSIONS

Vers. Date Authors Verification Approval Motive

1 16/05/2017 A. Ilioiu,

N. Fomferra

C. Udroiu C. Cara Creation.

1.1 08/12/2017 C. Cara C. Udroiu C. Cara Updated section 3.10

TAO

Design Justification File (DJF)

date

08/12/2017

page

3 / 15

reference

V16ESA0101/DJF0010

version

1.1

© Copyright TAO Consortium

RESTRICTED CLIENT

TABLE OF CONTENTS

1 INTRODUCTION ............................................................................................................................ 4

1.1 PURPOSE AND SCOPE ...................................................................................................................... 4

1.2 STRUCTURE OF THE DOCUMENT ..................................................................................................... 4

1.3 REFERENCES ..................................................................................................................................... 4

1.3.1 Applicable documents ............................................................................................................... 4

1.3.2 Reference documents ................................................................................................................ 4

2 TERMS AND ABBREVIATIONS ........................................................................................................ 5

2.1 TERMS .............................................................................................................................................. 5

2.2 ABBREVIATIONS ............................................................................................................................... 6

3 DESIGN DECISIONS ....................................................................................................................... 8

3.1 GENERAL .......................................................................................................................................... 8

3.2 SYNTHESIS STACK DIAGRAM ............................................................................................................ 8

3.3 PLATFORM GUI ................................................................................................................................ 9

3.4 GUI FOR WORKFLOW DEFINITION ................................................................................................... 9

3.5 RESOURCE / USER CATALOG .......................................................................................................... 10

3.6 DATA VISUALIZATION ..................................................................................................................... 10

3.7 SERVICES CONNECTOR ................................................................................................................... 10

3.8 PROCESSING COMPONENT DEPLOYMENT..................................................................................... 10

3.9 SERVICE LAYER ............................................................................................................................... 11

3.10 CLUSTER MANAGER ....................................................................................................................... 11

3.11 DATA ACCESS CONNECTOR ............................................................................................................ 12

3.12 DATABASE ...................................................................................................................................... 12

3.13 INTEGRATION WITH EXTERNAL PROCESSING PLATFORMS ........................................................... 12

3.14 SNAP TOOLBOX INTEGRATION....................................................................................................... 12

3.14.1 Additional SNAP Operators and Operator Enhancements ...................................................... 13

LIST OF TABLES Table 3-1 : List of operators not available in SNAP .................................................................................... 13

Table 3-2 : List of SNAP operators that should be enhanced ..................................................................... 14

LIST OF FIGURES Figure 1 : High-level overview of the TAO logical model ..............................................................................8

Figure 2 : Stack TAO components and COTS .................................................................................................9

Figure 3 : Comparison between jQuery and react ..................................................................................... 10

TAO

Design Justification File (DJF)

date

08/12/2017

page

4 / 15

reference

V16ESA0101/DJF0010

version

1.1

© Copyright TAO Consortium

RESTRICTED CLIENT

1 INTRODUCTION This document represents the Design Justification File (DJF) for the “Tool Augmentation by User

Enhancements and Orchestration (TAO)” project funded by the European Space Agency (ESA).

1.1 PURPOSE AND SCOPE The overall objectives for the TAO project are to:

1. Assess the existing software toolboxes, libraries and processing frameworks in order to identify

commonalities and reuse scenarios;

2. Query the EO user communities in order to extract a common set of requirements to be fulfilled

by the TAO framework;

3. Select the relevant open standards for Machine-to-Machine and Human Machine interfaces that

would allow opening the framework to other software toolboxes;

4. Design and develop a software framework for integration and orchestration of heterogeneous

processing modules and libraries that would allow the automation and parallelization of

processing chains;

5. Define several use case scenarios that would allow demonstrating the effectiveness of the

developed framework.

The DJF presents the result of all significant design choices, trade-offs, technical analyses, and

benchmarking assessments justifying the design of processing components integration and workflow

orchestration.

It records all relevant information showing that the proposed solutions meet the requirements.

1.2 STRUCTURE OF THE DOCUMENT This document contains the sections that describe:

the decisions taken in terms of system design

few considerations regarding alternative solutions

1.3 REFERENCES

1.3.1 Applicable documents

[A1] – TAO Software Requirements Specification CSRO/DMAS/CC/AI/17/0329 version 1.1

1.3.2 Reference documents

[R1] – TAO Statement of Work .................................................... ESA-EOPG-GSTP-SOW-0004 issue 1 rev 1

[R2] – TAO Technical Proposal ....................................................................... CSRO/DMAS/CC/ET/16/0536

[R3] – Software requirements specification (SRS) - DRD ........................................ ECSS-E-ST-40C Annex D

[R4] – Technologies, Interfaces and Standards Technical Note ............................ V16ESA0101/TN0001 v1

[R5] – User Survey Technical Note ....................................................................... V16ESA0101/TN0002 v1

[R6] – Software and Tools Technical Note ............................................................ V16ESA0101/TN0003 v1

TAO

Design Justification File (DJF)

date

08/12/2017

page

5 / 15

reference

V16ESA0101/DJF0010

version

1.1

© Copyright TAO Consortium

RESTRICTED CLIENT

2 TERMS AND ABBREVIATIONS

2.1 TERMS Term Description

Catalog Collection of metadata information about a list of (software) objects. The

objects can be stored into a database or a file system.

Cloud A service for delivery of on-demand computing resources—everything

from applications to data centers—over the internet on a pay-for-use

basis.

Cluster (computing) Consists of a set of loosely or tightly connected computers that work

together so that, in many respects, they can be viewed as a single system.

Computer clusters have each node set to perform the same task,

controlled and scheduled by software.

Data product A descriptor for one or more files. The files represents an EO product,

like GeoTIFF raster file, Sentinel-2 collection.

Data source Describe how a set of data can be accessed, is an abstract representation

of a way to connect to a live set of data (file system, database, stream,

webservice, etc)

Deployment Process by which the processing components are transferred, installed

and configured into a node.

Framework Is a universal, reusable software environment that provides particular

functionality as part of a larger software platform to facilitate

development of software applications, products and solutions.

Grid (computing) Is the collection of computer resources from multiple locations to reach

a common goal, having each node set to perform a different

task/application. Grid computers also tend to be more heterogeneous

and geographically dispersed (thus not physically coupled).

Job At execution time, a workflow is interpreted as a job to be executed.

Module By using a modular design, a system can be subdivided into smaller parts

called modules, that can be independently created and then used in

different systems.

Open source Is computer software with its source code made available with a license

in which the copyright holder provides the rights to study, change, and

distribute the software to anyone and for any purpose

Orchestration Is the automated arrangement, coordination, and management of

computer systems, middleware, and services. An "orchestrator" is

understood to be the entity which manages complex cross-domain

(system, enterprise, firewall) processes and handles exceptions.

Orchestration includes a workflow and provides a directed action

towards larger goals and objectives.

Packaging Is the operation, handled by a collection of software tools that

automates the process of installing, upgrading, configuring, and

TAO

Design Justification File (DJF)

date

08/12/2017

page

6 / 15

reference

V16ESA0101/DJF0010

version

1.1

© Copyright TAO Consortium

RESTRICTED CLIENT

Term Description

removing computer programs for a computer's operating system in a

consistent manner.

Platform A place (logical or hardware) where a piece of software is executed.

Platforms can be specialized for certain purposes.

Processing component A software application or a toolbox with main goal to process (perform

scientific treatments) input data. Example: OTB, Sentinel toolbox

Processing node Represents a processing location (a hardware or virtual machine) where

processing components are deployed.

Scheduler Is a software application for controlling unattended background program

execution of jobs.

Task A pre-defined operation within a job.

Toolbox Represent a set of modules (or functions) which serves to a common

goal.

Workflow Processing service provided in form of a set of pre-defined operations

chained into a graph. Workflows have pre-defined input and output data

types, orchestration logic and scope of the processing, which may be

customized by the user via a set of processing parameters.

2.2 ABBREVIATIONS Abbreviation Definition

API Application Programming Interface

AWS Amazon Web Services

CSW Catalogue Service for Web

EO Earth Observation

ESA European Space Agency

GeoTIFF Georeferenced Tagged Image Format

GIS Geographical Information System

GPF Graph Processing Framework

GPT Graph Processing Tool

GUI Graphical User Interface

HT Hyper Threading

HTTP HyperText Transfer Protocol

IP Internet Protocol

JNA Java Native Access

JNI Java Native Interface

JRMP Java Remote Method Protocol

JSE Java Standard Edition

LDAP Lightweight Directory Access Protocol

LGPL Lesser General Public License

MIT Massachusetts Institute of Technology

NetBIOS Network Basic Input/Output System

OGC Open Geospatial Consortium

TAO

Design Justification File (DJF)

date

08/12/2017

page

7 / 15

reference

V16ESA0101/DJF0010

version

1.1

© Copyright TAO Consortium

RESTRICTED CLIENT

Abbreviation Definition

OTB Orfeo ToolBox

RAM Random Access Memory

REST REpresentational State Transfer

RQa Additional Requirement

SMT Simultaneous MultiThreading

SNAP SeNtinels Application Platform

SOAP Simple Object Access Protocol

SRSD Software Requirements Specification Document

SSH Secure Shell

SSO Single Sign-On

TAO Tool Augmentation by user enhancements and Orchestration

TCP Transfer Control Protocol

UML Unified Modelling Language

URL Uniform Resource Locator

V&V Verification & Validation

WCS Web Coverage Service

WMS Web Map Service

WPS Web Processing Services

XML eXtensible Markup Language

TAO

Design Justification File (DJF)

date

08/12/2017

page

8 / 15

reference

V16ESA0101/DJF0010

version

1.1

© Copyright TAO Consortium

RESTRICTED CLIENT

3 DESIGN DECISIONS

3.1 GENERAL The purpose of TAO platform is to provide a mean for orchestration of heterogeneous processing

components and libraries in order to process scientific data. As explained in TAO SRS [A1], in terms of

macro-components, the logical data flow is described by the following diagram:

Figure 1 : High-level overview of the TAO logical model

The TAO platform is designed to be developed as a web application which integrate other components

specialized in different activities, using Spring Framework (https://spring.io/) and Java language.

Spring framework is lightweight, uses modern technologies like Aspect-Oriented Programming, Inversion

of Control and Dependency Injection, has native support for webservices.

3.2 SYNTHESIS STACK DIAGRAM In the following diagram is presented an overview about the external components chosen to cover

different functional layers. It is noticeable the coverage can be transversal through multiple layers

meaning the external component can respond to different levels of requirements.

User WorkspaceResource

Management

Workflow Execution Management

Processing Components Integration

End-User

Exec

uti

on

Data ProductsExternal Data

Providers

Query

Data Products

Query

Processing Components

Data P

roduct

s

Work

flow

Config

uratio

n

Wo

rkfl

ow

Def

init

ion

Pro

cess

ing

Co

mp

on

ents

Processing Components

TAO

Design Justification File (DJF)

date

08/12/2017

page

9 / 15

reference

V16ESA0101/DJF0010

version

1.1

© Copyright TAO Consortium

RESTRICTED CLIENT

Figure 2 : Stack TAO components and COTS

3.3 PLATFORM GUI [TAO] We will take the advantages of HTML5 language for writing the user interface of application. HTML5

is the latest version of hypertext markup language, supported by the latest modern browsers (IE, Chrome,

Firefox) and offers new features like:

simplified and clear syntax

possibility to include multimedia elements like fluid animations, video streams without help of

specialized frameworks like Flash or Silverlight

natural semantic tags instead of DIV tags

elegant forms with native validation which reduce the need of JavaScript

reducing or entirely replacing the cookie size by using sessionStorage and localStorage concepts

obtain user's geographical location

improved performance with JavaScript threading mechanism

3.4 GUI FOR WORKFLOW DEFINITION [COTS] Based on analysis from [R4] document, the selected framework for workflow definition was

jsPlumb (https://jsplumbtoolkit.com/).

TAO

Design Justification File (DJF)

date

08/12/2017

page

10 / 15

reference

V16ESA0101/DJF0010

version

1.1

© Copyright TAO Consortium

RESTRICTED CLIENT

3.5 RESOURCE / USER CATALOG [COTS] The "catalog" feature has been chosen to be handled by 3rd party application GeoStorm

(http://www.geostorm.eu/) a geospatial platform. It provides the most part of functionalities and will be

improved to cover all requirements of TAO platform in terms of handling concepts like processing

component, workflow definition, node, etc. Furthermore, GeoStorm will be used for EO data visualization.

3.6 DATA VISUALIZATION [COTS] To visualize data products used in TAO (like images, raster maps, vectors maps, sensors, etc) will

be used GeoStorm's render feature, based on open-source components like OpenLayers and Map Server.

3.7 SERVICES CONNECTOR [COTS] The Service Layer will expose a series of REST webservices that will be consumed by platform GUI

through jQuery (https://jquery.com/). jQuery is a fast, small, and feature-rich JavaScript library.

Another solution is the JavaScript library React.js (https://facebook.github.io/react/). React is a library for

designing and rendering user interfaces. A big difference from the jQuery library is that React works

through the "virtual DOM", which is basically just the data about the HTML elements rather than the

elements themselves, whereas jQuery interacts with the DOM directly. The idea is that DOM elements

carry around too much unnecessary data, and the virtual DOM abstracts the relevant parts, allowing for

faster performance. In React, you modify the virtual DOM, which it then compares to the existing DOM

elements and makes the necessary changes/updates.

Figure 3 : Comparison between jQuery and react

3.8 PROCESSING COMPONENT DEPLOYMENT [COTS] Each processing component like OTB, SNAP, Python, etc, need to be installed into a lightweight,

standalone, isolated and safe container. A container image means a packaged application with all the

parts it needs, such as libraries and other dependencies. Containers behave like a virtual machine. To the

TAO

Design Justification File (DJF)

date

08/12/2017

page

11 / 15

reference

V16ESA0101/DJF0010

version

1.1

© Copyright TAO Consortium

RESTRICTED CLIENT

outside world, they can look like their own complete system. But unlike a virtual machine, rather than

creating a whole virtual operating system, containers don't need to replicate an entire operating system,

only the individual components they need in order to operate. This gives a significant performance boost

and reduces the size of the application.

Docker Engine (https://www.docker.com/) is one of the most valuable software container platform. In

TAO platform will be use to deploy processing component into nodes and manage them through Docker

agent.

3.9 SERVICE LAYER [TAO] In order to keep the suppleness of platform, for implementation of web services the built-in

capabilities of Spring framework should be used (e.g. Spring-WS).

3.10 CLUSTER MANAGER [COTS] The most interesting feature into TAO framework is the possibility to create and execute complex

workflows on distributed nodes. For this responsibility we give enough credits to Torque Resource

Manager (http://www.adaptivecomputing.com/products/open-source/torque/) which provides control

over batch jobs and distributed compute nodes. Torque works together with Moab, a component

detached from resource managers, which schedules workload as jobs speaking in a simple sense.

In the same time, we consider another candidate, an open source, fault-tolerant, cluster management

and job scheduling system: Slurm Workload Manager (https://slurm.schedmd.com/). It is fit to TAO

requirements through its main capabilities:

allocate resources within a cluster

launch and manage jobs

supports complex scheduling algorithms

supports resource limits (by queue, user, group)

suitable for portability through plugin mechanism

During the implementation of the Torque DRMAA Java binding, several issues have been found within

Torque or Torque DRMAA C binding:

General unreliability:

o we've seen Torque completely refusing to start new jobs after submitting a job with

questionable configuration (the account that submitted it was missing on the processing

node); for some reason, this only happened more than a day later;

o removing the job mentioned above was really hard, as it kept coming back; this might

have been a user error, maybe it's expected.

Unexpected API semantics:

o the client library caches job information on its connection object; if the caller closes the

connection and creates a new one later, the library will not recognize the submitted jobs,

even though the server is aware of them.

Quality of Implementation Issues:

o there seem to be a number of memory and thread-safety issues that were, and still are

being corrected. Reference: https://i.imgur.com/T8pyOSw.png

o this is one we've ran into: https://github.com/adaptivecomputing/torque/pull/436

TAO

Design Justification File (DJF)

date

08/12/2017

page

12 / 15

reference

V16ESA0101/DJF0010

version

1.1

© Copyright TAO Consortium

RESTRICTED CLIENT

o the client libraries use a mixed error handling approach, with both error codes and global

variables; this plays bad with threading (which was probably introduced at a later time)

and sometimes errors are not propagated correctly

o return values are sometimes not checked:

https://github.com/adaptivecomputing/torque/blob/develop/src/drmaa/src/submit.c#L

271

o locking is sometimes missing for static variable initialization:

https://github.com/adaptivecomputing/torque/blob/develop/src/drmaa/src/session.c#

L554-L558

Maintenance Concerns:

o the PR linked above still has no response from the maintainers

o the unit tests (CI) seem to fail (and have been failing for a long time)

After this experience, SLURM becomes a more recommended candidate for the Cluster Manager

software and priority will be increased on the implementation of the SLURM DRMAA Java binding.

3.11 DATA ACCESS CONNECTOR [COTS] For Object Relational Mapping the Spring framework chosen as development platform supports

integration with Hibernate (http://hibernate.org/orm/) and Java Persistence API (JPA). The major goal of

Spring’s ORM integration is clear application layering, with any data access and transaction technology,

and for loose coupling of application objects. In this way we can avoid business service dependencies on

the data access or transaction strategy, hard-coded resource lookups, hard-to-replace singletons, custom

service registries, etc.

3.12 DATABASE [COTS] In these days, a powerful open source object-relational database system is PostgreSQL

(https://www.postgresql.org). It comes with a lot of advantages against its competitors:

it is proven and mature in production environments

it implements the SQL standard very well

it includes support for "advanced" SQL stuff like window functions or common table expressions

it supports lots of advanced data types, such as (multi-dimensional) arrays, user-defined types

it supports all sorts of performance optimization

it has a nice GIS library which allows easy manipulation of geographical data types

3.13 INTEGRATION WITH EXTERNAL PROCESSING PLATFORMS [TAO] The integration with other components from an external processing platform will be assured by

implementing in TAO platform the Web Processing Service (WPS) standard from OGC, as both client and

server.

In this way a remote processing component can be used inside a workflow definition just like any other

local processing component.

3.14 SNAP TOOLBOX INTEGRATION

TAO

Design Justification File (DJF)

date

08/12/2017

page

13 / 15

reference

V16ESA0101/DJF0010

version

1.1

© Copyright TAO Consortium

RESTRICTED CLIENT

3.14.1 Additional SNAP Operators and Operator Enhancements

For the purpose of fulfilling the proposed user scenarios, there has been identified that the following

operators are not available in SNAP or do not have an equivalent in OTB and should therefore be

developed:

Operator Name Description Use Case / Purpose

Fuzzy Decision Tree

The Decision Tree shall enable a categorization of pixels according to several conditions to certain categories.

GUI to define the tree

Definition of classes (name, color, ID)

Definition of fuzzy logic decisions by band-math expressions

Include pixel neighborhood in decisions

Applying without GUI should be possible

Probability / uncertainty for each pixel to be categorized to a certain class

Intertidal flat classification, in general any kind of land cover classification or water type classification

3-D scatter plot The 3-D scatterplot is an analysis tool. It should plot selected pixels (under ROI mask) in 2- and 3-dimensional space.

The dimensions of the space should be selectable bands.

The points should be colored. This can either be a density plot, or depending on a certain value of the pixels (e.g. from a band; chlorophyll concentration or category gained from a classification)

The user should be able to rotate the space

Analysis of classification results

Table 3-1 : List of operators not available in SNAP

The following operators are available in SNAP but must be enhanced to address a desired use case:

Operator Name Enhancement Use Case / Purpose

Collocation tool Selection of band to be included in the master product (until now all bands are included from both input products)

Combination of more than 2 products (until now only 2 products can be combined at once)

Change product size of the target product (spatial sub-setting, e.g. only use intersection of input products)

Corresponding to ENVI layer stacking

Conflict handling in case of same band names in the input products

The collocation tool combines bands from different products into a single product. Intertidal flat classification, but it is a very general tool that is used for any application of EO data for different purposes.

TAO

Design Justification File (DJF)

date

08/12/2017

page

14 / 15

reference

V16ESA0101/DJF0010

version

1.1

© Copyright TAO Consortium

RESTRICTED CLIENT

Operator Name Enhancement Use Case / Purpose

Feature extraction (optional)

Add retrieval algorithms for individual sets of low level features for atmospheric, land, and ocean applications

Setup and definition of feature query databases for atmospheric, land, and ocean applications

Overcome existing shortcomings, e.g. Variable image segment size, CRS.

Create feature query databases for atmospheric, land, and ocean applications. Allows not only for spatiotemporal queries, but also for feature-based queries. (Heritage of ESA Project “Product Feature Extraction and Analysis” - PFA)

Table 3-2 : List of SNAP operators that should be enhanced

TAO

Design Justification File (DJF)

date

08/12/2017

page

15 / 15

reference

V16ESA0101/DJF0010

version

1.1

© Copyright TAO Consortium

RESTRICTED CLIENT

FILES

Software User files

Windows 10

Word 2016 Document Normal.dotm

model : V16ESA0101_DFJ0010_TAO_DJF_v1.1.docx of 08/12/2017 11:12.

Publishing list

Publishing reason: Contractual Delivery

P. Name Company P. Name Company

M Iapaolo ESA M Savinaud CS SI

J Van Bemmelen ESA N Fomferra BC

=== End of document ===