A Microservice Architecture for the Processing of Large Geospatial Data … · 2018. 12. 17. ·...

Preview:

Citation preview

A Microservice Architecture for the Processing of Large Geospatial Data in the CloudMichel Krämer

Urban planning use case

Paparoditis et al. (2012)!2

Goal:

Identify trees in 3D point clouds acquired by LMMS

Challenge:

Process data at least as fast as it is acquired, in order to be able to continuously monitor tree growth

Land monitoring use case

Highhydraulicenergy

Heavyrains

Smallandsteepbasins

Floodsandland slides

Abandonedagriculturalterraces

!3

Goal:

Analyse topography and orographic precipitation to prepare against hazardous events (floods and landslides)

Challenge:

Being able to continuously monitor evolution of terrain for the first time

Problem statement

!4

• Similar challenges in other use cases • change detection in urban areas • traffic pattern analysis • monitoring of seabed and coastal changes • etc.

• These use cases require “Big Geo Data”

• Goals cannot be reached with state-of-the-art approachesYang et al. (2011), Kitchin, R., & McArdle, G. (2016)

Processing of large geospatial data

!5

Cloud-based data processing Developers/researchers

Users

Use platformContribute

processing algorithms

Problem statement – Users

!6

Requirements:

A platform to processBig Geo Data

Automate recurring processing tasks

High-level interface for process automation

My approach:

Microservice architecture for Cloud-based processing

Workflow management

Domain-Specific Language

Problem statement – Users

!6

Requirements:

A platform to processBig Geo Data

Automate recurring processing tasks

High-level interface for process automation

My approach:

Microservice architecture for Cloud-based processing

Workflow management

Domain-Specific Language

Problem statement – Users

!6

Requirements:

A platform to processBig Geo Data

Automate recurring processing tasks

High-level interface for process automation

My approach:

Microservice architecture for Cloud-based processing

Workflow management

Domain-Specific Language

Problem statement – Developers/researchers

!7

Requirements:

Execute existing algorithms in the Cloud without modifications

Focus on algorithms and not on the details of Cloud

Orchestrate algorithms to workflows

My approach:

Service integration based on lightweight metadata

Workflow management

Workflow management

Problem statement – Developers/researchers

!7

Requirements:

Execute existing algorithms in the Cloud without modifications

Focus on algorithms and not on the details of Cloud

Orchestrate algorithms to workflows

My approach:

Service integration based on lightweight metadata

Workflow management

Workflow management

Problem statement – Developers/researchers

!7

Requirements:

Execute existing algorithms in the Cloud without modifications

Focus on algorithms and not on the details of Cloud

Orchestrate algorithms to workflows

My approach:

Service integration based on lightweight metadata

Workflow management

Workflow management

Hypothesis

!8

A microservice architecture and Domain-Specific Languages can be used to orchestrate existing geospatial processing algorithms, and to compose and execute geospatial workflows in a Cloud environment for efficient application development and enhanced stakeholder experience.

Hypothesis

!8

A microservice architecture and Domain-Specific Languages can be used to orchestrate existing geospatial processing algorithms, and to compose and execute geospatial workflows in a Cloud environment for efficient application development and enhanced stakeholder experience.

!9

Microservice architecture

Domain-Specific Languages

Enhanced stakeholder experience

Orchestrate existing geospatial processing algorithms

Cloud

Compose and execute geospatial workflows

Efficient application development

Related work – Cloud for geospatial applications

Mostly for Smart Cities

!10

e.g. Khan, Anjum, & Kiani (2013), Krylovskiy, Jahn, & Patti (2015)

Single experiments

Specific platform/processing model

e.g. Qazi, Smyth, & McCarthy (2013), Warren et al. (2015), Li et al. (2010)

e.g. Ajiy et al. (2013), Eldawy & Mokbel (2013)

No workflow management

Related work: My approach:

Comprehensive platform for user-defined workflows

Supports various use cases

Supports mix of multiple programming paradigms

Workflow management

Related work – Cloud for geospatial applications

Mostly for Smart Cities

!10

e.g. Khan, Anjum, & Kiani (2013), Krylovskiy, Jahn, & Patti (2015)

Single experiments

Specific platform/processing model

e.g. Qazi, Smyth, & McCarthy (2013), Warren et al. (2015), Li et al. (2010)

e.g. Ajiy et al. (2013), Eldawy & Mokbel (2013)

No workflow management

Related work: My approach:

Comprehensive platform for user-defined workflows

Supports various use cases

Supports mix of multiple programming paradigms

Workflow management

Related work – Cloud for geospatial applications

Mostly for Smart Cities

!10

e.g. Khan, Anjum, & Kiani (2013), Krylovskiy, Jahn, & Patti (2015)

Single experiments

Specific platform/processing model

e.g. Qazi, Smyth, & McCarthy (2013), Warren et al. (2015), Li et al. (2010)

e.g. Ajiy et al. (2013), Eldawy & Mokbel (2013)

No workflow management

Related work: My approach:

Comprehensive platform for user-defined workflows

Supports various use cases

Supports mix of multiple programming paradigms

Workflow management

Related work – Cloud for geospatial applications

Mostly for Smart Cities

!10

e.g. Khan, Anjum, & Kiani (2013), Krylovskiy, Jahn, & Patti (2015)

Single experiments

Specific platform/processing model

e.g. Qazi, Smyth, & McCarthy (2013), Warren et al. (2015), Li et al. (2010)

e.g. Ajiy et al. (2013), Eldawy & Mokbel (2013)

No workflow management

Related work: My approach:

Comprehensive platform for user-defined workflows

Supports various use cases

Supports mix of multiple programming paradigms

Workflow management

System design

!11

Architecture overview

!12

Data accessservice

Distributed file system(Storage cloud)

Processing services(Processing cloud)

Catalogue serviceJobManager

Workflows Datacatalogue

Servicecatalogue

Notification

Workfloweditor

Workflow service

GIS expert

R

Data browser

File upload/download

R

Main user interface

R

Interpreter

Parser

Krämer & Senner (2015)

Architecture overview

!12

Data accessservice

Distributed file system(Storage cloud)

Processing services(Processing cloud)

Catalogue serviceJobManager

Workflows Datacatalogue

Servicecatalogue

Notification

Workfloweditor

Workflow service

GIS expert

R

Data browser

File upload/download

R

Main user interface

R

Interpreter

Parser

Krämer & Senner (2015)

Architecture overview

!12

Data accessservice

Distributed file system(Storage cloud)

Processing services(Processing cloud)

Catalogue serviceJobManager

Workflows Datacatalogue

Servicecatalogue

Notification

Workfloweditor

Workflow service

GIS expert

R

Data browser

File upload/download

R

Main user interface

R

Interpreter

Parser

Krämer & Senner (2015)

Architecture overview

!12

Data accessservice

Distributed file system(Storage cloud)

Processing services(Processing cloud)

Catalogue serviceJobManager

Workflows Datacatalogue

Servicecatalogue

Notification

Workfloweditor

Workflow service

GIS expert

R

Data browser

File upload/download

R

Main user interface

R

Interpreter

Parser

Krämer & Senner (2015)

Architecture overview

!12

Data accessservice

Distributed file system(Storage cloud)

Processing services(Processing cloud)

Catalogue serviceJobManager

Workflows Datacatalogue

Servicecatalogue

Notification

Workfloweditor

Workflow service

GIS expert

R

Data browser

File upload/download

R

Main user interface

R

Interpreter

Parser

Krämer & Senner (2015)

Architecture overview

!12

Data accessservice

Distributed file system(Storage cloud)

Processing services(Processing cloud)

Catalogue serviceJobManager

Workflows Datacatalogue

Servicecatalogue

Notification

Workfloweditor

Workflow service

GIS expert

R

Data browser

File upload/download

R

Main user interface

R

Interpreter

Parser

Krämer & Senner (2015)

Processingservices

!13

Node A Node B Node C

Job manager

MapReduce job

MapReduce job

Algorithm B (distributed)

Algorithm A (single core)

Algorithm A (single core)

Algorithm A (single core)

Algorithm B (distributed)

Algorithm C (multi core)

Algorithm C (multi core)

Distributed file system

Algorithm D (GPU)

Algorithm D (GPU)

Krämer & Senner (2015)

Data accessservice

Distributed file system(Storage cloud)

Processing services(Processing cloud)

Catalogue serviceJobManager

Workflows Datacatalogue

Servicecatalogue

Notification

Workfloweditor

Workflow service

GIS expert

R

Data browser

File upload/download

R

Main user interface

R

Interpreter

Parser

Benefits of microservice architecture

• Focus

• Independency

• Development distributability

see also Krämer & Senner (2015) !14

• Composability

Presentation tier

Logic tier

Data tier

Presentation

Data

Presentation

DataData Data

GatewayGateway Gateway

Shared data

a) A monolithic software with astandard three-tier architecture

b) A pure microservice architec-ture with a complex deployment

c) A microservice architecture de-composed along bounded contexts

Architecture overview

!15

Data accessservice

Distributed file system(Storage cloud)

Processing services(Processing cloud)

Catalogue serviceJobManager

Workflows Datacatalogue

Servicecatalogue

Notification

Workfloweditor

Workflow service

GIS expert

R

Data browser

File upload/download

R

Main user interface

R

Interpreter

Parser

Krämer & Senner (2015)

Example Workflow

!16

Data accessservice

Distributed file system(Storage cloud)

Processing services(Processing cloud)

Catalogue serviceJobManager

Workflows Datacatalogue

Servicecatalogue

Notification

Workfloweditor

Workflow service

GIS expert

R

Data browser

File upload/download

R

Main user interface

R

Interpreter

Parser

Architecture overview

!17

Data accessservice

Distributed file system(Storage cloud)

Processing services(Processing cloud)

Catalogue serviceJobManager

Workflows Datacatalogue

Servicecatalogue

Notification

Workfloweditor

Workflow service

GIS expert

R

Data browser

File upload/download

R

Main user interface

R

Interpreter

Parser

Krämer & Senner (2015)

12

Dynamic workflow execution

!18

A priori design-time knowledge (one DAG for the whole workflow)

A priori runtime-time knowledge (iterative)

Existing work: My approach:

A

BC

D

E

JobManager

12

JobManager architecture

!19

JobManager

Rule System

Processing Cloud

Processing Connector 1 Processing Connector n

Process Chain Manager

R R

Service metadata

Data metadata ControllerR

HTTP Server

R

Rules

Working Memory

ProcessChains

Workflows

Client

R

Data accessservice

Distributed file system(Storage cloud)

Processing services(Processing cloud)

Catalogue serviceJobManager

Workflows Datacatalogue

Servicecatalogue

Notification

Workfloweditor

Workflow service

GIS expert

R

Data browser

File upload/download

R

Main user interface

R

Interpreter

Parser

JobManager architecture

!19

JobManager

Rule System

Processing Cloud

Processing Connector 1 Processing Connector n

Process Chain Manager

R R

Service metadata

Data metadata ControllerR

HTTP Server

R

Rules

Working Memory

ProcessChains

Workflows

Client

R

Data accessservice

Distributed file system(Storage cloud)

Processing services(Processing cloud)

Catalogue serviceJobManager

Workflows Datacatalogue

Servicecatalogue

Notification

Workfloweditor

Workflow service

GIS expert

R

Data browser

File upload/download

R

Main user interface

R

Interpreter

Parser

JobManager architecture

!19

JobManager

Rule System

Processing Cloud

Processing Connector 1 Processing Connector n

Process Chain Manager

R R

Service metadata

Data metadata ControllerR

HTTP Server

R

Rules

Working Memory

ProcessChains

Workflows

Client

R

Data accessservice

Distributed file system(Storage cloud)

Processing services(Processing cloud)

Catalogue serviceJobManager

Workflows Datacatalogue

Servicecatalogue

Notification

Workfloweditor

Workflow service

GIS expert

R

Data browser

File upload/download

R

Main user interface

R

Interpreter

Parser

JobManager architecture

!19

JobManager

Rule System

Processing Cloud

Processing Connector 1 Processing Connector n

Process Chain Manager

R R

Service metadata

Data metadata ControllerR

HTTP Server

R

Rules

Working Memory

ProcessChains

Workflows

Client

R

Data accessservice

Distributed file system(Storage cloud)

Processing services(Processing cloud)

Catalogue serviceJobManager

Workflows Datacatalogue

Servicecatalogue

Notification

Workfloweditor

Workflow service

GIS expert

R

Data browser

File upload/download

R

Main user interface

R

Interpreter

Parser

JobManager architecture

!19

JobManager

Rule System

Processing Cloud

Processing Connector 1 Processing Connector n

Process Chain Manager

R R

Service metadata

Data metadata ControllerR

HTTP Server

R

Rules

Working Memory

ProcessChains

Workflows

Client

R

Data accessservice

Distributed file system(Storage cloud)

Processing services(Processing cloud)

Catalogue serviceJobManager

Workflows Datacatalogue

Servicecatalogue

Notification

Workfloweditor

Workflow service

GIS expert

R

Data browser

File upload/download

R

Main user interface

R

Interpreter

Parser

Controller

!20

Execute workflow

Controller Rule System

load metadata

start reasoning

registerprocess chains

lookup processchain results

registerworkflow results

fire rules

newprocess chains

no newprocess chains

all process chains succeeded

error

JobManager

Rule System

Processing Cloud

Processing Connector 1 Processing Connector n

Process Chain Manager

R R

Service metadata

Data metadata ControllerR

HTTP Server

R

Rules

Working Memory

ProcessChains

Workflows

Client

R

Controller

!20

Execute workflow

Controller Rule System

load metadata

start reasoning

registerprocess chains

lookup processchain results

registerworkflow results

fire rules

newprocess chains

no newprocess chains

all process chains succeeded

error

JobManager

Rule System

Processing Cloud

Processing Connector 1 Processing Connector n

Process Chain Manager

R R

Service metadata

Data metadata ControllerR

HTTP Server

R

Rules

Working Memory

ProcessChains

Workflows

Client

R

Controller

!20

Execute workflow

Controller Rule System

load metadata

start reasoning

registerprocess chains

lookup processchain results

registerworkflow results

fire rules

newprocess chains

no newprocess chains

all process chains succeeded

error

JobManager

Rule System

Processing Cloud

Processing Connector 1 Processing Connector n

Process Chain Manager

R R

Service metadata

Data metadata ControllerR

HTTP Server

R

Rules

Working Memory

ProcessChains

Workflows

Client

R

Controller

!20

Execute workflow

Controller Rule System

load metadata

start reasoning

registerprocess chains

lookup processchain results

registerworkflow results

fire rules

newprocess chains

no newprocess chains

all process chains succeeded

error

JobManager

Rule System

Processing Cloud

Processing Connector 1 Processing Connector n

Process Chain Manager

R R

Service metadata

Data metadata ControllerR

HTTP Server

R

Rules

Working Memory

ProcessChains

Workflows

Client

R

Controller

!20

Execute workflow

Controller Rule System

load metadata

start reasoning

registerprocess chains

lookup processchain results

registerworkflow results

fire rules

newprocess chains

no newprocess chains

all process chains succeeded

error

JobManager

Rule System

Processing Cloud

Processing Connector 1 Processing Connector n

Process Chain Manager

R R

Service metadata

Data metadata ControllerR

HTTP Server

R

Rules

Working Memory

ProcessChains

Workflows

Client

R

Process Chain Manager

!21

lookup nextprocess chain

select node request processchain status

new running

register processchain results

finished

executeprocess chain

no nodeavailable

stillrunning

JobManager

Rule System

Processing Cloud

Processing Connector 1 Processing Connector n

Process Chain Manager

R R

Service metadata

Data metadata ControllerR

HTTP Server

R

Rules

Working Memory

ProcessChains

Workflows

Client

R

Process Chain Manager

!21

lookup nextprocess chain

select node request processchain status

new running

register processchain results

finished

executeprocess chain

no nodeavailable

stillrunning

JobManager

Rule System

Processing Cloud

Processing Connector 1 Processing Connector n

Process Chain Manager

R R

Service metadata

Data metadata ControllerR

HTTP Server

R

Rules

Working Memory

ProcessChains

Workflows

Client

R

Process Chain Manager

!21

lookup nextprocess chain

select node request processchain status

new running

register processchain results

finished

executeprocess chain

no nodeavailable

stillrunning

JobManager

Rule System

Processing Cloud

Processing Connector 1 Processing Connector n

Process Chain Manager

R R

Service metadata

Data metadata ControllerR

HTTP Server

R

Rules

Working Memory

ProcessChains

Workflows

Client

R

Process Chain Manager

!21

lookup nextprocess chain

select node request processchain status

new running

register processchain results

finished

executeprocess chain

no nodeavailable

stillrunning

JobManager

Rule System

Processing Cloud

Processing Connector 1 Processing Connector n

Process Chain Manager

R R

Service metadata

Data metadata ControllerR

HTTP Server

R

Rules

Working Memory

ProcessChains

Workflows

Client

R

Process Chain Manager

!21

lookup nextprocess chain

select node request processchain status

new running

register processchain results

finished

executeprocess chain

no nodeavailable

stillrunning

JobManager

Rule System

Processing Cloud

Processing Connector 1 Processing Connector n

Process Chain Manager

R R

Service metadata

Data metadata ControllerR

HTTP Server

R

Rules

Working Memory

ProcessChains

Workflows

Client

R

Process Chain Manager

!21

lookup nextprocess chain

select node request processchain status

new running

register processchain results

finished

executeprocess chain

no nodeavailable

stillrunning

JobManager

Rule System

Processing Cloud

Processing Connector 1 Processing Connector n

Process Chain Manager

R R

Service metadata

Data metadata ControllerR

HTTP Server

R

Rules

Working Memory

ProcessChains

Workflows

Client

R

Controller

!22

Execute workflow

Controller Rule System

load metadata

start reasoning

registerprocess chains

lookup processchain results

registerworkflow results

fire rules

newprocess chains

no newprocess chains

all process chains succeeded

error

JobManager

Rule System

Processing Cloud

Processing Connector 1 Processing Connector n

Process Chain Manager

R R

Service metadata

Data metadata ControllerR

HTTP Server

R

Rules

Working Memory

ProcessChains

Workflows

Client

R

Controller

!22

Execute workflow

Controller Rule System

load metadata

start reasoning

registerprocess chains

lookup processchain results

registerworkflow results

fire rules

newprocess chains

no newprocess chains

all process chains succeeded

error

JobManager

Rule System

Processing Cloud

Processing Connector 1 Processing Connector n

Process Chain Manager

R R

Service metadata

Data metadata ControllerR

HTTP Server

R

Rules

Working Memory

ProcessChains

Workflows

Client

R

Controller

!22

Execute workflow

Controller Rule System

load metadata

start reasoning

registerprocess chains

lookup processchain results

registerworkflow results

fire rules

newprocess chains

no newprocess chains

all process chains succeeded

error

JobManager

Rule System

Processing Cloud

Processing Connector 1 Processing Connector n

Process Chain Manager

R R

Service metadata

Data metadata ControllerR

HTTP Server

R

Rules

Working Memory

ProcessChains

Workflows

Client

R

Controller

!22

Execute workflow

Controller Rule System

load metadata

start reasoning

registerprocess chains

lookup processchain results

registerworkflow results

fire rules

newprocess chains

no newprocess chains

all process chains succeeded

error

JobManager

Rule System

Processing Cloud

Processing Connector 1 Processing Connector n

Process Chain Manager

R R

Service metadata

Data metadata ControllerR

HTTP Server

R

Rules

Working Memory

ProcessChains

Workflows

Client

R

Controller

!22

Execute workflow

Controller Rule System

load metadata

start reasoning

registerprocess chains

lookup processchain results

registerworkflow results

fire rules

newprocess chains

no newprocess chains

all process chains succeeded

error

JobManager

Rule System

Processing Cloud

Processing Connector 1 Processing Connector n

Process Chain Manager

R R

Service metadata

Data metadata ControllerR

HTTP Server

R

Rules

Working Memory

ProcessChains

Workflows

Client

R

Results

!23

Urban planning use case

!24

Dataset:

City of Toulouse, 120.63 GiB,1.58 billion points, 529 tiles Acquisition time: 1h 53m

Evaluation results:

Processing time: 1h 51m 18 compute nodes on Fraunhofer IGD Cloud Almost linear scalability

Result: Point cloud with labels for individual trees

Urban planning use case

!24

Dataset:

City of Toulouse, 120.63 GiB,1.58 billion points, 529 tiles Acquisition time: 1h 53m

Evaluation results:

Processing time: 1h 51m 18 compute nodes on Fraunhofer IGD Cloud Almost linear scalability

Result: Point cloud with labels for individual trees

Land monitoring use case

!25

Dataset:

3D point cloud of Liguria Region 451.16 GiB, 17.35 billion points, 684 strips

Evaluation results:

Processing time: 35m 49s Previously (without my work): several days (*)

Result: Point cloud allowing for fast extraction of basins in a certain level of detail

* according to GIS users from the Liguria region (personal communication)

Highhydraulicenergy

Heavyrains

Smallandsteepbasins

Floodsandland slides

Abandonedagriculturalterraces

Scientific contributions

!26

Microservice Architecture

• Scalability

• Modifiability

• Development distributability

• Availability

Processing

• Service integration

• Service orchestration

• Dynamic workflow management

• Rule-based workflow execution

Workflow Modelling

• Domain-Specific Language (DSL)

• Method for DSL modelling

Scientific contributions

!26

Microservice Architecture

• Scalability

• Modifiability

• Development distributability

• Availability

Processing

• Service integration

• Service orchestration

• Dynamic workflow management

• Rule-based workflow execution

Workflow Modelling

• Domain-Specific Language (DSL)

• Method for DSL modelling

Conclusions

!27

Research hypothesis is supported by • Positive results from detailed evaluation • Successfully satisfied stakeholder

requirements

Microservice architecture

Domain-Specific Languages

Enhanced stakeholder experience

Orchestrate existing geospatial processing algorithms

Cloud

Compose and execute geospatial workflows

Efficient application development

Conclusions

My thesis documents a major step in the paradigmshift from desktop GIS to the Cloud within the

geospatial community and geoinformatics

!27

Research hypothesis is supported by • Positive results from detailed evaluation • Successfully satisfied stakeholder

requirements

Microservice architecture

Domain-Specific Languages

Enhanced stakeholder experience

Orchestrate existing geospatial processing algorithms

Cloud

Compose and execute geospatial workflows

Efficient application development

(Future) work

JobManager has been put into productionat Deutsche Telekom AG

Several improvements:• Create VMs on demand (OpenStack) • Capability-based scheduling • Improved security

!28

General features

‣Geospa'al feature store ‣Schema agnos'c ‣Format preserving ‣Cloud-based ‣Event-driven ‣Easy to use/integrate

Michel KrämerFraunhofer-Institut für Graphische Datenverarbeitung IGD

Fraunhoferstraße 5

64283 Darmstadt michel.kraemer@igd.fraunhofer.de

Recommended