Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
CS ROMANIA SA PACII 29, 200692 CRAIOVA, ROMANIA
TEL: +40 (0)251 412850 FAX: +40 (0)251 417307
EMAIL: [email protected]; WEB: www.c-s.ro
ISO 9001:2015 OHSAS 18001:2008
ISO 14001:2015 ISO 27001:2013
CS ROMANIA S.A.
WITH SUBSCRIBED AND PAID- IN SHARE CAPITAL 114.800 LEI - R.C. DOLJ J16/2041/91 - SIRUES 164431207 - C.F. RO2316981
CS COMMUNICATION & SYSTÈMES
Date : 8 December 2017
Origin : CS ROMANIA SA
Business : Tool Augmentation by User Enhancements and
Orchestration (TAO)
Title : DESIGN JUSTIFICATION FILE
(DJF)
Reference : V16ESA0101/DJF0010 version 1.1
Status : APPROVED
Last name / First name Date Signature
Authors: Ilioiu Alexandru, Fomferra Norman 2017-05-16
Verified by: Udroiu Cosmin 2017-06-19
Approved by: Cara Cosmin
2017-06-20
RESTRICTED CLIENT
TAO
Design Justification File (DJF)
date
08/12/2017
page
2 / 15
reference
V16ESA0101/DJF0010
version
1.1
© Copyright TAO Consortium
RESTRICTED CLIENT
SUCCESSIVE VERSIONS
Vers. Date Authors Verification Approval Motive
1 16/05/2017 A. Ilioiu,
N. Fomferra
C. Udroiu C. Cara Creation.
1.1 08/12/2017 C. Cara C. Udroiu C. Cara Updated section 3.10
TAO
Design Justification File (DJF)
date
08/12/2017
page
3 / 15
reference
V16ESA0101/DJF0010
version
1.1
© Copyright TAO Consortium
RESTRICTED CLIENT
TABLE OF CONTENTS
1 INTRODUCTION ............................................................................................................................ 4
1.1 PURPOSE AND SCOPE ...................................................................................................................... 4
1.2 STRUCTURE OF THE DOCUMENT ..................................................................................................... 4
1.3 REFERENCES ..................................................................................................................................... 4
1.3.1 Applicable documents ............................................................................................................... 4
1.3.2 Reference documents ................................................................................................................ 4
2 TERMS AND ABBREVIATIONS ........................................................................................................ 5
2.1 TERMS .............................................................................................................................................. 5
2.2 ABBREVIATIONS ............................................................................................................................... 6
3 DESIGN DECISIONS ....................................................................................................................... 8
3.1 GENERAL .......................................................................................................................................... 8
3.2 SYNTHESIS STACK DIAGRAM ............................................................................................................ 8
3.3 PLATFORM GUI ................................................................................................................................ 9
3.4 GUI FOR WORKFLOW DEFINITION ................................................................................................... 9
3.5 RESOURCE / USER CATALOG .......................................................................................................... 10
3.6 DATA VISUALIZATION ..................................................................................................................... 10
3.7 SERVICES CONNECTOR ................................................................................................................... 10
3.8 PROCESSING COMPONENT DEPLOYMENT..................................................................................... 10
3.9 SERVICE LAYER ............................................................................................................................... 11
3.10 CLUSTER MANAGER ....................................................................................................................... 11
3.11 DATA ACCESS CONNECTOR ............................................................................................................ 12
3.12 DATABASE ...................................................................................................................................... 12
3.13 INTEGRATION WITH EXTERNAL PROCESSING PLATFORMS ........................................................... 12
3.14 SNAP TOOLBOX INTEGRATION....................................................................................................... 12
3.14.1 Additional SNAP Operators and Operator Enhancements ...................................................... 13
LIST OF TABLES Table 3-1 : List of operators not available in SNAP .................................................................................... 13
Table 3-2 : List of SNAP operators that should be enhanced ..................................................................... 14
LIST OF FIGURES Figure 1 : High-level overview of the TAO logical model ..............................................................................8
Figure 2 : Stack TAO components and COTS .................................................................................................9
Figure 3 : Comparison between jQuery and react ..................................................................................... 10
TAO
Design Justification File (DJF)
date
08/12/2017
page
4 / 15
reference
V16ESA0101/DJF0010
version
1.1
© Copyright TAO Consortium
RESTRICTED CLIENT
1 INTRODUCTION This document represents the Design Justification File (DJF) for the “Tool Augmentation by User
Enhancements and Orchestration (TAO)” project funded by the European Space Agency (ESA).
1.1 PURPOSE AND SCOPE The overall objectives for the TAO project are to:
1. Assess the existing software toolboxes, libraries and processing frameworks in order to identify
commonalities and reuse scenarios;
2. Query the EO user communities in order to extract a common set of requirements to be fulfilled
by the TAO framework;
3. Select the relevant open standards for Machine-to-Machine and Human Machine interfaces that
would allow opening the framework to other software toolboxes;
4. Design and develop a software framework for integration and orchestration of heterogeneous
processing modules and libraries that would allow the automation and parallelization of
processing chains;
5. Define several use case scenarios that would allow demonstrating the effectiveness of the
developed framework.
The DJF presents the result of all significant design choices, trade-offs, technical analyses, and
benchmarking assessments justifying the design of processing components integration and workflow
orchestration.
It records all relevant information showing that the proposed solutions meet the requirements.
1.2 STRUCTURE OF THE DOCUMENT This document contains the sections that describe:
the decisions taken in terms of system design
few considerations regarding alternative solutions
1.3 REFERENCES
1.3.1 Applicable documents
[A1] – TAO Software Requirements Specification CSRO/DMAS/CC/AI/17/0329 version 1.1
1.3.2 Reference documents
[R1] – TAO Statement of Work .................................................... ESA-EOPG-GSTP-SOW-0004 issue 1 rev 1
[R2] – TAO Technical Proposal ....................................................................... CSRO/DMAS/CC/ET/16/0536
[R3] – Software requirements specification (SRS) - DRD ........................................ ECSS-E-ST-40C Annex D
[R4] – Technologies, Interfaces and Standards Technical Note ............................ V16ESA0101/TN0001 v1
[R5] – User Survey Technical Note ....................................................................... V16ESA0101/TN0002 v1
[R6] – Software and Tools Technical Note ............................................................ V16ESA0101/TN0003 v1
TAO
Design Justification File (DJF)
date
08/12/2017
page
5 / 15
reference
V16ESA0101/DJF0010
version
1.1
© Copyright TAO Consortium
RESTRICTED CLIENT
2 TERMS AND ABBREVIATIONS
2.1 TERMS Term Description
Catalog Collection of metadata information about a list of (software) objects. The
objects can be stored into a database or a file system.
Cloud A service for delivery of on-demand computing resources—everything
from applications to data centers—over the internet on a pay-for-use
basis.
Cluster (computing) Consists of a set of loosely or tightly connected computers that work
together so that, in many respects, they can be viewed as a single system.
Computer clusters have each node set to perform the same task,
controlled and scheduled by software.
Data product A descriptor for one or more files. The files represents an EO product,
like GeoTIFF raster file, Sentinel-2 collection.
Data source Describe how a set of data can be accessed, is an abstract representation
of a way to connect to a live set of data (file system, database, stream,
webservice, etc)
Deployment Process by which the processing components are transferred, installed
and configured into a node.
Framework Is a universal, reusable software environment that provides particular
functionality as part of a larger software platform to facilitate
development of software applications, products and solutions.
Grid (computing) Is the collection of computer resources from multiple locations to reach
a common goal, having each node set to perform a different
task/application. Grid computers also tend to be more heterogeneous
and geographically dispersed (thus not physically coupled).
Job At execution time, a workflow is interpreted as a job to be executed.
Module By using a modular design, a system can be subdivided into smaller parts
called modules, that can be independently created and then used in
different systems.
Open source Is computer software with its source code made available with a license
in which the copyright holder provides the rights to study, change, and
distribute the software to anyone and for any purpose
Orchestration Is the automated arrangement, coordination, and management of
computer systems, middleware, and services. An "orchestrator" is
understood to be the entity which manages complex cross-domain
(system, enterprise, firewall) processes and handles exceptions.
Orchestration includes a workflow and provides a directed action
towards larger goals and objectives.
Packaging Is the operation, handled by a collection of software tools that
automates the process of installing, upgrading, configuring, and
TAO
Design Justification File (DJF)
date
08/12/2017
page
6 / 15
reference
V16ESA0101/DJF0010
version
1.1
© Copyright TAO Consortium
RESTRICTED CLIENT
Term Description
removing computer programs for a computer's operating system in a
consistent manner.
Platform A place (logical or hardware) where a piece of software is executed.
Platforms can be specialized for certain purposes.
Processing component A software application or a toolbox with main goal to process (perform
scientific treatments) input data. Example: OTB, Sentinel toolbox
Processing node Represents a processing location (a hardware or virtual machine) where
processing components are deployed.
Scheduler Is a software application for controlling unattended background program
execution of jobs.
Task A pre-defined operation within a job.
Toolbox Represent a set of modules (or functions) which serves to a common
goal.
Workflow Processing service provided in form of a set of pre-defined operations
chained into a graph. Workflows have pre-defined input and output data
types, orchestration logic and scope of the processing, which may be
customized by the user via a set of processing parameters.
2.2 ABBREVIATIONS Abbreviation Definition
API Application Programming Interface
AWS Amazon Web Services
CSW Catalogue Service for Web
EO Earth Observation
ESA European Space Agency
GeoTIFF Georeferenced Tagged Image Format
GIS Geographical Information System
GPF Graph Processing Framework
GPT Graph Processing Tool
GUI Graphical User Interface
HT Hyper Threading
HTTP HyperText Transfer Protocol
IP Internet Protocol
JNA Java Native Access
JNI Java Native Interface
JRMP Java Remote Method Protocol
JSE Java Standard Edition
LDAP Lightweight Directory Access Protocol
LGPL Lesser General Public License
MIT Massachusetts Institute of Technology
NetBIOS Network Basic Input/Output System
OGC Open Geospatial Consortium
TAO
Design Justification File (DJF)
date
08/12/2017
page
7 / 15
reference
V16ESA0101/DJF0010
version
1.1
© Copyright TAO Consortium
RESTRICTED CLIENT
Abbreviation Definition
OTB Orfeo ToolBox
RAM Random Access Memory
REST REpresentational State Transfer
RQa Additional Requirement
SMT Simultaneous MultiThreading
SNAP SeNtinels Application Platform
SOAP Simple Object Access Protocol
SRSD Software Requirements Specification Document
SSH Secure Shell
SSO Single Sign-On
TAO Tool Augmentation by user enhancements and Orchestration
TCP Transfer Control Protocol
UML Unified Modelling Language
URL Uniform Resource Locator
V&V Verification & Validation
WCS Web Coverage Service
WMS Web Map Service
WPS Web Processing Services
XML eXtensible Markup Language
TAO
Design Justification File (DJF)
date
08/12/2017
page
8 / 15
reference
V16ESA0101/DJF0010
version
1.1
© Copyright TAO Consortium
RESTRICTED CLIENT
3 DESIGN DECISIONS
3.1 GENERAL The purpose of TAO platform is to provide a mean for orchestration of heterogeneous processing
components and libraries in order to process scientific data. As explained in TAO SRS [A1], in terms of
macro-components, the logical data flow is described by the following diagram:
Figure 1 : High-level overview of the TAO logical model
The TAO platform is designed to be developed as a web application which integrate other components
specialized in different activities, using Spring Framework (https://spring.io/) and Java language.
Spring framework is lightweight, uses modern technologies like Aspect-Oriented Programming, Inversion
of Control and Dependency Injection, has native support for webservices.
3.2 SYNTHESIS STACK DIAGRAM In the following diagram is presented an overview about the external components chosen to cover
different functional layers. It is noticeable the coverage can be transversal through multiple layers
meaning the external component can respond to different levels of requirements.
User WorkspaceResource
Management
Workflow Execution Management
Processing Components Integration
End-User
Exec
uti
on
Data ProductsExternal Data
Providers
Query
Data Products
Query
Processing Components
Data P
roduct
s
Work
flow
Config
uratio
n
Wo
rkfl
ow
Def
init
ion
Pro
cess
ing
Co
mp
on
ents
Processing Components
TAO
Design Justification File (DJF)
date
08/12/2017
page
9 / 15
reference
V16ESA0101/DJF0010
version
1.1
© Copyright TAO Consortium
RESTRICTED CLIENT
Figure 2 : Stack TAO components and COTS
3.3 PLATFORM GUI [TAO] We will take the advantages of HTML5 language for writing the user interface of application. HTML5
is the latest version of hypertext markup language, supported by the latest modern browsers (IE, Chrome,
Firefox) and offers new features like:
simplified and clear syntax
possibility to include multimedia elements like fluid animations, video streams without help of
specialized frameworks like Flash or Silverlight
natural semantic tags instead of DIV tags
elegant forms with native validation which reduce the need of JavaScript
reducing or entirely replacing the cookie size by using sessionStorage and localStorage concepts
obtain user's geographical location
improved performance with JavaScript threading mechanism
3.4 GUI FOR WORKFLOW DEFINITION [COTS] Based on analysis from [R4] document, the selected framework for workflow definition was
jsPlumb (https://jsplumbtoolkit.com/).
TAO
Design Justification File (DJF)
date
08/12/2017
page
10 / 15
reference
V16ESA0101/DJF0010
version
1.1
© Copyright TAO Consortium
RESTRICTED CLIENT
3.5 RESOURCE / USER CATALOG [COTS] The "catalog" feature has been chosen to be handled by 3rd party application GeoStorm
(http://www.geostorm.eu/) a geospatial platform. It provides the most part of functionalities and will be
improved to cover all requirements of TAO platform in terms of handling concepts like processing
component, workflow definition, node, etc. Furthermore, GeoStorm will be used for EO data visualization.
3.6 DATA VISUALIZATION [COTS] To visualize data products used in TAO (like images, raster maps, vectors maps, sensors, etc) will
be used GeoStorm's render feature, based on open-source components like OpenLayers and Map Server.
3.7 SERVICES CONNECTOR [COTS] The Service Layer will expose a series of REST webservices that will be consumed by platform GUI
through jQuery (https://jquery.com/). jQuery is a fast, small, and feature-rich JavaScript library.
Another solution is the JavaScript library React.js (https://facebook.github.io/react/). React is a library for
designing and rendering user interfaces. A big difference from the jQuery library is that React works
through the "virtual DOM", which is basically just the data about the HTML elements rather than the
elements themselves, whereas jQuery interacts with the DOM directly. The idea is that DOM elements
carry around too much unnecessary data, and the virtual DOM abstracts the relevant parts, allowing for
faster performance. In React, you modify the virtual DOM, which it then compares to the existing DOM
elements and makes the necessary changes/updates.
Figure 3 : Comparison between jQuery and react
3.8 PROCESSING COMPONENT DEPLOYMENT [COTS] Each processing component like OTB, SNAP, Python, etc, need to be installed into a lightweight,
standalone, isolated and safe container. A container image means a packaged application with all the
parts it needs, such as libraries and other dependencies. Containers behave like a virtual machine. To the
TAO
Design Justification File (DJF)
date
08/12/2017
page
11 / 15
reference
V16ESA0101/DJF0010
version
1.1
© Copyright TAO Consortium
RESTRICTED CLIENT
outside world, they can look like their own complete system. But unlike a virtual machine, rather than
creating a whole virtual operating system, containers don't need to replicate an entire operating system,
only the individual components they need in order to operate. This gives a significant performance boost
and reduces the size of the application.
Docker Engine (https://www.docker.com/) is one of the most valuable software container platform. In
TAO platform will be use to deploy processing component into nodes and manage them through Docker
agent.
3.9 SERVICE LAYER [TAO] In order to keep the suppleness of platform, for implementation of web services the built-in
capabilities of Spring framework should be used (e.g. Spring-WS).
3.10 CLUSTER MANAGER [COTS] The most interesting feature into TAO framework is the possibility to create and execute complex
workflows on distributed nodes. For this responsibility we give enough credits to Torque Resource
Manager (http://www.adaptivecomputing.com/products/open-source/torque/) which provides control
over batch jobs and distributed compute nodes. Torque works together with Moab, a component
detached from resource managers, which schedules workload as jobs speaking in a simple sense.
In the same time, we consider another candidate, an open source, fault-tolerant, cluster management
and job scheduling system: Slurm Workload Manager (https://slurm.schedmd.com/). It is fit to TAO
requirements through its main capabilities:
allocate resources within a cluster
launch and manage jobs
supports complex scheduling algorithms
supports resource limits (by queue, user, group)
suitable for portability through plugin mechanism
During the implementation of the Torque DRMAA Java binding, several issues have been found within
Torque or Torque DRMAA C binding:
General unreliability:
o we've seen Torque completely refusing to start new jobs after submitting a job with
questionable configuration (the account that submitted it was missing on the processing
node); for some reason, this only happened more than a day later;
o removing the job mentioned above was really hard, as it kept coming back; this might
have been a user error, maybe it's expected.
Unexpected API semantics:
o the client library caches job information on its connection object; if the caller closes the
connection and creates a new one later, the library will not recognize the submitted jobs,
even though the server is aware of them.
Quality of Implementation Issues:
o there seem to be a number of memory and thread-safety issues that were, and still are
being corrected. Reference: https://i.imgur.com/T8pyOSw.png
o this is one we've ran into: https://github.com/adaptivecomputing/torque/pull/436
TAO
Design Justification File (DJF)
date
08/12/2017
page
12 / 15
reference
V16ESA0101/DJF0010
version
1.1
© Copyright TAO Consortium
RESTRICTED CLIENT
o the client libraries use a mixed error handling approach, with both error codes and global
variables; this plays bad with threading (which was probably introduced at a later time)
and sometimes errors are not propagated correctly
o return values are sometimes not checked:
https://github.com/adaptivecomputing/torque/blob/develop/src/drmaa/src/submit.c#L
271
o locking is sometimes missing for static variable initialization:
https://github.com/adaptivecomputing/torque/blob/develop/src/drmaa/src/session.c#
L554-L558
Maintenance Concerns:
o the PR linked above still has no response from the maintainers
o the unit tests (CI) seem to fail (and have been failing for a long time)
After this experience, SLURM becomes a more recommended candidate for the Cluster Manager
software and priority will be increased on the implementation of the SLURM DRMAA Java binding.
3.11 DATA ACCESS CONNECTOR [COTS] For Object Relational Mapping the Spring framework chosen as development platform supports
integration with Hibernate (http://hibernate.org/orm/) and Java Persistence API (JPA). The major goal of
Spring’s ORM integration is clear application layering, with any data access and transaction technology,
and for loose coupling of application objects. In this way we can avoid business service dependencies on
the data access or transaction strategy, hard-coded resource lookups, hard-to-replace singletons, custom
service registries, etc.
3.12 DATABASE [COTS] In these days, a powerful open source object-relational database system is PostgreSQL
(https://www.postgresql.org). It comes with a lot of advantages against its competitors:
it is proven and mature in production environments
it implements the SQL standard very well
it includes support for "advanced" SQL stuff like window functions or common table expressions
it supports lots of advanced data types, such as (multi-dimensional) arrays, user-defined types
it supports all sorts of performance optimization
it has a nice GIS library which allows easy manipulation of geographical data types
3.13 INTEGRATION WITH EXTERNAL PROCESSING PLATFORMS [TAO] The integration with other components from an external processing platform will be assured by
implementing in TAO platform the Web Processing Service (WPS) standard from OGC, as both client and
server.
In this way a remote processing component can be used inside a workflow definition just like any other
local processing component.
3.14 SNAP TOOLBOX INTEGRATION
TAO
Design Justification File (DJF)
date
08/12/2017
page
13 / 15
reference
V16ESA0101/DJF0010
version
1.1
© Copyright TAO Consortium
RESTRICTED CLIENT
3.14.1 Additional SNAP Operators and Operator Enhancements
For the purpose of fulfilling the proposed user scenarios, there has been identified that the following
operators are not available in SNAP or do not have an equivalent in OTB and should therefore be
developed:
Operator Name Description Use Case / Purpose
Fuzzy Decision Tree
The Decision Tree shall enable a categorization of pixels according to several conditions to certain categories.
GUI to define the tree
Definition of classes (name, color, ID)
Definition of fuzzy logic decisions by band-math expressions
Include pixel neighborhood in decisions
Applying without GUI should be possible
Probability / uncertainty for each pixel to be categorized to a certain class
Intertidal flat classification, in general any kind of land cover classification or water type classification
3-D scatter plot The 3-D scatterplot is an analysis tool. It should plot selected pixels (under ROI mask) in 2- and 3-dimensional space.
The dimensions of the space should be selectable bands.
The points should be colored. This can either be a density plot, or depending on a certain value of the pixels (e.g. from a band; chlorophyll concentration or category gained from a classification)
The user should be able to rotate the space
Analysis of classification results
Table 3-1 : List of operators not available in SNAP
The following operators are available in SNAP but must be enhanced to address a desired use case:
Operator Name Enhancement Use Case / Purpose
Collocation tool Selection of band to be included in the master product (until now all bands are included from both input products)
Combination of more than 2 products (until now only 2 products can be combined at once)
Change product size of the target product (spatial sub-setting, e.g. only use intersection of input products)
Corresponding to ENVI layer stacking
Conflict handling in case of same band names in the input products
The collocation tool combines bands from different products into a single product. Intertidal flat classification, but it is a very general tool that is used for any application of EO data for different purposes.
TAO
Design Justification File (DJF)
date
08/12/2017
page
14 / 15
reference
V16ESA0101/DJF0010
version
1.1
© Copyright TAO Consortium
RESTRICTED CLIENT
Operator Name Enhancement Use Case / Purpose
Feature extraction (optional)
Add retrieval algorithms for individual sets of low level features for atmospheric, land, and ocean applications
Setup and definition of feature query databases for atmospheric, land, and ocean applications
Overcome existing shortcomings, e.g. Variable image segment size, CRS.
Create feature query databases for atmospheric, land, and ocean applications. Allows not only for spatiotemporal queries, but also for feature-based queries. (Heritage of ESA Project “Product Feature Extraction and Analysis” - PFA)
Table 3-2 : List of SNAP operators that should be enhanced
TAO
Design Justification File (DJF)
date
08/12/2017
page
15 / 15
reference
V16ESA0101/DJF0010
version
1.1
© Copyright TAO Consortium
RESTRICTED CLIENT
FILES
Software User files
Windows 10
Word 2016 Document Normal.dotm
model : V16ESA0101_DFJ0010_TAO_DJF_v1.1.docx of 08/12/2017 11:12.
Publishing list
Publishing reason: Contractual Delivery
P. Name Company P. Name Company
M Iapaolo ESA M Savinaud CS SI
J Van Bemmelen ESA N Fomferra BC
=== End of document ===