Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
A Microservice Architecture for the Processing of Large Geospatial Data in the CloudMichel Krämer
Urban planning use case
Paparoditis et al. (2012)!2
Goal:
Identify trees in 3D point clouds acquired by LMMS
Challenge:
Process data at least as fast as it is acquired, in order to be able to continuously monitor tree growth
Land monitoring use case
Highhydraulicenergy
Heavyrains
Smallandsteepbasins
Floodsandland slides
Abandonedagriculturalterraces
!3
Goal:
Analyse topography and orographic precipitation to prepare against hazardous events (floods and landslides)
Challenge:
Being able to continuously monitor evolution of terrain for the first time
Problem statement
!4
• Similar challenges in other use cases • change detection in urban areas • traffic pattern analysis • monitoring of seabed and coastal changes • etc.
• These use cases require “Big Geo Data”
• Goals cannot be reached with state-of-the-art approachesYang et al. (2011), Kitchin, R., & McArdle, G. (2016)
Processing of large geospatial data
!5
Cloud-based data processing Developers/researchers
Users
Use platformContribute
processing algorithms
Problem statement – Users
!6
Requirements:
A platform to processBig Geo Data
Automate recurring processing tasks
High-level interface for process automation
My approach:
Microservice architecture for Cloud-based processing
Workflow management
Domain-Specific Language
Problem statement – Users
!6
Requirements:
A platform to processBig Geo Data
Automate recurring processing tasks
High-level interface for process automation
My approach:
Microservice architecture for Cloud-based processing
Workflow management
Domain-Specific Language
Problem statement – Users
!6
Requirements:
A platform to processBig Geo Data
Automate recurring processing tasks
High-level interface for process automation
My approach:
Microservice architecture for Cloud-based processing
Workflow management
Domain-Specific Language
Problem statement – Developers/researchers
!7
Requirements:
Execute existing algorithms in the Cloud without modifications
Focus on algorithms and not on the details of Cloud
Orchestrate algorithms to workflows
My approach:
Service integration based on lightweight metadata
Workflow management
Workflow management
Problem statement – Developers/researchers
!7
Requirements:
Execute existing algorithms in the Cloud without modifications
Focus on algorithms and not on the details of Cloud
Orchestrate algorithms to workflows
My approach:
Service integration based on lightweight metadata
Workflow management
Workflow management
Problem statement – Developers/researchers
!7
Requirements:
Execute existing algorithms in the Cloud without modifications
Focus on algorithms and not on the details of Cloud
Orchestrate algorithms to workflows
My approach:
Service integration based on lightweight metadata
Workflow management
Workflow management
Hypothesis
!8
A microservice architecture and Domain-Specific Languages can be used to orchestrate existing geospatial processing algorithms, and to compose and execute geospatial workflows in a Cloud environment for efficient application development and enhanced stakeholder experience.
“
“
Hypothesis
!8
A microservice architecture and Domain-Specific Languages can be used to orchestrate existing geospatial processing algorithms, and to compose and execute geospatial workflows in a Cloud environment for efficient application development and enhanced stakeholder experience.
“
“
!9
Microservice architecture
Domain-Specific Languages
Enhanced stakeholder experience
Orchestrate existing geospatial processing algorithms
Cloud
Compose and execute geospatial workflows
Efficient application development
Related work – Cloud for geospatial applications
Mostly for Smart Cities
!10
e.g. Khan, Anjum, & Kiani (2013), Krylovskiy, Jahn, & Patti (2015)
Single experiments
Specific platform/processing model
e.g. Qazi, Smyth, & McCarthy (2013), Warren et al. (2015), Li et al. (2010)
e.g. Ajiy et al. (2013), Eldawy & Mokbel (2013)
No workflow management
Related work: My approach:
Comprehensive platform for user-defined workflows
Supports various use cases
Supports mix of multiple programming paradigms
Workflow management
Related work – Cloud for geospatial applications
Mostly for Smart Cities
!10
e.g. Khan, Anjum, & Kiani (2013), Krylovskiy, Jahn, & Patti (2015)
Single experiments
Specific platform/processing model
e.g. Qazi, Smyth, & McCarthy (2013), Warren et al. (2015), Li et al. (2010)
e.g. Ajiy et al. (2013), Eldawy & Mokbel (2013)
No workflow management
Related work: My approach:
Comprehensive platform for user-defined workflows
Supports various use cases
Supports mix of multiple programming paradigms
Workflow management
Related work – Cloud for geospatial applications
Mostly for Smart Cities
!10
e.g. Khan, Anjum, & Kiani (2013), Krylovskiy, Jahn, & Patti (2015)
Single experiments
Specific platform/processing model
e.g. Qazi, Smyth, & McCarthy (2013), Warren et al. (2015), Li et al. (2010)
e.g. Ajiy et al. (2013), Eldawy & Mokbel (2013)
No workflow management
Related work: My approach:
Comprehensive platform for user-defined workflows
Supports various use cases
Supports mix of multiple programming paradigms
Workflow management
Related work – Cloud for geospatial applications
Mostly for Smart Cities
!10
e.g. Khan, Anjum, & Kiani (2013), Krylovskiy, Jahn, & Patti (2015)
Single experiments
Specific platform/processing model
e.g. Qazi, Smyth, & McCarthy (2013), Warren et al. (2015), Li et al. (2010)
e.g. Ajiy et al. (2013), Eldawy & Mokbel (2013)
No workflow management
Related work: My approach:
Comprehensive platform for user-defined workflows
Supports various use cases
Supports mix of multiple programming paradigms
Workflow management
System design
!11
Architecture overview
!12
Data accessservice
Distributed file system(Storage cloud)
Processing services(Processing cloud)
Catalogue serviceJobManager
Workflows Datacatalogue
Servicecatalogue
Notification
Workfloweditor
Workflow service
GIS expert
R
Data browser
File upload/download
R
Main user interface
R
Interpreter
Parser
Krämer & Senner (2015)
Architecture overview
!12
Data accessservice
Distributed file system(Storage cloud)
Processing services(Processing cloud)
Catalogue serviceJobManager
Workflows Datacatalogue
Servicecatalogue
Notification
Workfloweditor
Workflow service
GIS expert
R
Data browser
File upload/download
R
Main user interface
R
Interpreter
Parser
Krämer & Senner (2015)
Architecture overview
!12
Data accessservice
Distributed file system(Storage cloud)
Processing services(Processing cloud)
Catalogue serviceJobManager
Workflows Datacatalogue
Servicecatalogue
Notification
Workfloweditor
Workflow service
GIS expert
R
Data browser
File upload/download
R
Main user interface
R
Interpreter
Parser
Krämer & Senner (2015)
Architecture overview
!12
Data accessservice
Distributed file system(Storage cloud)
Processing services(Processing cloud)
Catalogue serviceJobManager
Workflows Datacatalogue
Servicecatalogue
Notification
Workfloweditor
Workflow service
GIS expert
R
Data browser
File upload/download
R
Main user interface
R
Interpreter
Parser
Krämer & Senner (2015)
Architecture overview
!12
Data accessservice
Distributed file system(Storage cloud)
Processing services(Processing cloud)
Catalogue serviceJobManager
Workflows Datacatalogue
Servicecatalogue
Notification
Workfloweditor
Workflow service
GIS expert
R
Data browser
File upload/download
R
Main user interface
R
Interpreter
Parser
Krämer & Senner (2015)
Architecture overview
!12
Data accessservice
Distributed file system(Storage cloud)
Processing services(Processing cloud)
Catalogue serviceJobManager
Workflows Datacatalogue
Servicecatalogue
Notification
Workfloweditor
Workflow service
GIS expert
R
Data browser
File upload/download
R
Main user interface
R
Interpreter
Parser
Krämer & Senner (2015)
Processingservices
!13
Node A Node B Node C
Job manager
MapReduce job
MapReduce job
Algorithm B (distributed)
Algorithm A (single core)
Algorithm A (single core)
Algorithm A (single core)
Algorithm B (distributed)
Algorithm C (multi core)
Algorithm C (multi core)
Distributed file system
Algorithm D (GPU)
Algorithm D (GPU)
Krämer & Senner (2015)
Data accessservice
Distributed file system(Storage cloud)
Processing services(Processing cloud)
Catalogue serviceJobManager
Workflows Datacatalogue
Servicecatalogue
Notification
Workfloweditor
Workflow service
GIS expert
R
Data browser
File upload/download
R
Main user interface
R
Interpreter
Parser
Benefits of microservice architecture
• Focus
• Independency
• Development distributability
see also Krämer & Senner (2015) !14
• Composability
Presentation tier
Logic tier
Data tier
Presentation
Data
Presentation
DataData Data
GatewayGateway Gateway
Shared data
a) A monolithic software with astandard three-tier architecture
b) A pure microservice architec-ture with a complex deployment
c) A microservice architecture de-composed along bounded contexts
Architecture overview
!15
Data accessservice
Distributed file system(Storage cloud)
Processing services(Processing cloud)
Catalogue serviceJobManager
Workflows Datacatalogue
Servicecatalogue
Notification
Workfloweditor
Workflow service
GIS expert
R
Data browser
File upload/download
R
Main user interface
R
Interpreter
Parser
Krämer & Senner (2015)
Example Workflow
!16
Data accessservice
Distributed file system(Storage cloud)
Processing services(Processing cloud)
Catalogue serviceJobManager
Workflows Datacatalogue
Servicecatalogue
Notification
Workfloweditor
Workflow service
GIS expert
R
Data browser
File upload/download
R
Main user interface
R
Interpreter
Parser
Architecture overview
!17
Data accessservice
Distributed file system(Storage cloud)
Processing services(Processing cloud)
Catalogue serviceJobManager
Workflows Datacatalogue
Servicecatalogue
Notification
Workfloweditor
Workflow service
GIS expert
R
Data browser
File upload/download
R
Main user interface
R
Interpreter
Parser
Krämer & Senner (2015)
12
Dynamic workflow execution
!18
A priori design-time knowledge (one DAG for the whole workflow)
A priori runtime-time knowledge (iterative)
Existing work: My approach:
A
BC
D
E
JobManager
12
JobManager architecture
!19
JobManager
Rule System
Processing Cloud
Processing Connector 1 Processing Connector n
Process Chain Manager
R R
Service metadata
Data metadata ControllerR
HTTP Server
R
Rules
Working Memory
ProcessChains
Workflows
Client
R
Data accessservice
Distributed file system(Storage cloud)
Processing services(Processing cloud)
Catalogue serviceJobManager
Workflows Datacatalogue
Servicecatalogue
Notification
Workfloweditor
Workflow service
GIS expert
R
Data browser
File upload/download
R
Main user interface
R
Interpreter
Parser
JobManager architecture
!19
JobManager
Rule System
Processing Cloud
Processing Connector 1 Processing Connector n
Process Chain Manager
R R
Service metadata
Data metadata ControllerR
HTTP Server
R
Rules
Working Memory
ProcessChains
Workflows
Client
R
Data accessservice
Distributed file system(Storage cloud)
Processing services(Processing cloud)
Catalogue serviceJobManager
Workflows Datacatalogue
Servicecatalogue
Notification
Workfloweditor
Workflow service
GIS expert
R
Data browser
File upload/download
R
Main user interface
R
Interpreter
Parser
JobManager architecture
!19
JobManager
Rule System
Processing Cloud
Processing Connector 1 Processing Connector n
Process Chain Manager
R R
Service metadata
Data metadata ControllerR
HTTP Server
R
Rules
Working Memory
ProcessChains
Workflows
Client
R
Data accessservice
Distributed file system(Storage cloud)
Processing services(Processing cloud)
Catalogue serviceJobManager
Workflows Datacatalogue
Servicecatalogue
Notification
Workfloweditor
Workflow service
GIS expert
R
Data browser
File upload/download
R
Main user interface
R
Interpreter
Parser
JobManager architecture
!19
JobManager
Rule System
Processing Cloud
Processing Connector 1 Processing Connector n
Process Chain Manager
R R
Service metadata
Data metadata ControllerR
HTTP Server
R
Rules
Working Memory
ProcessChains
Workflows
Client
R
Data accessservice
Distributed file system(Storage cloud)
Processing services(Processing cloud)
Catalogue serviceJobManager
Workflows Datacatalogue
Servicecatalogue
Notification
Workfloweditor
Workflow service
GIS expert
R
Data browser
File upload/download
R
Main user interface
R
Interpreter
Parser
JobManager architecture
!19
JobManager
Rule System
Processing Cloud
Processing Connector 1 Processing Connector n
Process Chain Manager
R R
Service metadata
Data metadata ControllerR
HTTP Server
R
Rules
Working Memory
ProcessChains
Workflows
Client
R
Data accessservice
Distributed file system(Storage cloud)
Processing services(Processing cloud)
Catalogue serviceJobManager
Workflows Datacatalogue
Servicecatalogue
Notification
Workfloweditor
Workflow service
GIS expert
R
Data browser
File upload/download
R
Main user interface
R
Interpreter
Parser
Controller
!20
Execute workflow
Controller Rule System
load metadata
start reasoning
registerprocess chains
lookup processchain results
registerworkflow results
fire rules
newprocess chains
no newprocess chains
all process chains succeeded
error
JobManager
Rule System
Processing Cloud
Processing Connector 1 Processing Connector n
Process Chain Manager
R R
Service metadata
Data metadata ControllerR
HTTP Server
R
Rules
Working Memory
ProcessChains
Workflows
Client
R
Controller
!20
Execute workflow
Controller Rule System
load metadata
start reasoning
registerprocess chains
lookup processchain results
registerworkflow results
fire rules
newprocess chains
no newprocess chains
all process chains succeeded
error
JobManager
Rule System
Processing Cloud
Processing Connector 1 Processing Connector n
Process Chain Manager
R R
Service metadata
Data metadata ControllerR
HTTP Server
R
Rules
Working Memory
ProcessChains
Workflows
Client
R
Controller
!20
Execute workflow
Controller Rule System
load metadata
start reasoning
registerprocess chains
lookup processchain results
registerworkflow results
fire rules
newprocess chains
no newprocess chains
all process chains succeeded
error
JobManager
Rule System
Processing Cloud
Processing Connector 1 Processing Connector n
Process Chain Manager
R R
Service metadata
Data metadata ControllerR
HTTP Server
R
Rules
Working Memory
ProcessChains
Workflows
Client
R
Controller
!20
Execute workflow
Controller Rule System
load metadata
start reasoning
registerprocess chains
lookup processchain results
registerworkflow results
fire rules
newprocess chains
no newprocess chains
all process chains succeeded
error
JobManager
Rule System
Processing Cloud
Processing Connector 1 Processing Connector n
Process Chain Manager
R R
Service metadata
Data metadata ControllerR
HTTP Server
R
Rules
Working Memory
ProcessChains
Workflows
Client
R
Controller
!20
Execute workflow
Controller Rule System
load metadata
start reasoning
registerprocess chains
lookup processchain results
registerworkflow results
fire rules
newprocess chains
no newprocess chains
all process chains succeeded
error
JobManager
Rule System
Processing Cloud
Processing Connector 1 Processing Connector n
Process Chain Manager
R R
Service metadata
Data metadata ControllerR
HTTP Server
R
Rules
Working Memory
ProcessChains
Workflows
Client
R
Process Chain Manager
!21
lookup nextprocess chain
select node request processchain status
new running
register processchain results
finished
executeprocess chain
no nodeavailable
stillrunning
JobManager
Rule System
Processing Cloud
Processing Connector 1 Processing Connector n
Process Chain Manager
R R
Service metadata
Data metadata ControllerR
HTTP Server
R
Rules
Working Memory
ProcessChains
Workflows
Client
R
Process Chain Manager
!21
lookup nextprocess chain
select node request processchain status
new running
register processchain results
finished
executeprocess chain
no nodeavailable
stillrunning
JobManager
Rule System
Processing Cloud
Processing Connector 1 Processing Connector n
Process Chain Manager
R R
Service metadata
Data metadata ControllerR
HTTP Server
R
Rules
Working Memory
ProcessChains
Workflows
Client
R
Process Chain Manager
!21
lookup nextprocess chain
select node request processchain status
new running
register processchain results
finished
executeprocess chain
no nodeavailable
stillrunning
JobManager
Rule System
Processing Cloud
Processing Connector 1 Processing Connector n
Process Chain Manager
R R
Service metadata
Data metadata ControllerR
HTTP Server
R
Rules
Working Memory
ProcessChains
Workflows
Client
R
Process Chain Manager
!21
lookup nextprocess chain
select node request processchain status
new running
register processchain results
finished
executeprocess chain
no nodeavailable
stillrunning
JobManager
Rule System
Processing Cloud
Processing Connector 1 Processing Connector n
Process Chain Manager
R R
Service metadata
Data metadata ControllerR
HTTP Server
R
Rules
Working Memory
ProcessChains
Workflows
Client
R
Process Chain Manager
!21
lookup nextprocess chain
select node request processchain status
new running
register processchain results
finished
executeprocess chain
no nodeavailable
stillrunning
JobManager
Rule System
Processing Cloud
Processing Connector 1 Processing Connector n
Process Chain Manager
R R
Service metadata
Data metadata ControllerR
HTTP Server
R
Rules
Working Memory
ProcessChains
Workflows
Client
R
Process Chain Manager
!21
lookup nextprocess chain
select node request processchain status
new running
register processchain results
finished
executeprocess chain
no nodeavailable
stillrunning
JobManager
Rule System
Processing Cloud
Processing Connector 1 Processing Connector n
Process Chain Manager
R R
Service metadata
Data metadata ControllerR
HTTP Server
R
Rules
Working Memory
ProcessChains
Workflows
Client
R
Controller
!22
Execute workflow
Controller Rule System
load metadata
start reasoning
registerprocess chains
lookup processchain results
registerworkflow results
fire rules
newprocess chains
no newprocess chains
all process chains succeeded
error
JobManager
Rule System
Processing Cloud
Processing Connector 1 Processing Connector n
Process Chain Manager
R R
Service metadata
Data metadata ControllerR
HTTP Server
R
Rules
Working Memory
ProcessChains
Workflows
Client
R
Controller
!22
Execute workflow
Controller Rule System
load metadata
start reasoning
registerprocess chains
lookup processchain results
registerworkflow results
fire rules
newprocess chains
no newprocess chains
all process chains succeeded
error
JobManager
Rule System
Processing Cloud
Processing Connector 1 Processing Connector n
Process Chain Manager
R R
Service metadata
Data metadata ControllerR
HTTP Server
R
Rules
Working Memory
ProcessChains
Workflows
Client
R
Controller
!22
Execute workflow
Controller Rule System
load metadata
start reasoning
registerprocess chains
lookup processchain results
registerworkflow results
fire rules
newprocess chains
no newprocess chains
all process chains succeeded
error
JobManager
Rule System
Processing Cloud
Processing Connector 1 Processing Connector n
Process Chain Manager
R R
Service metadata
Data metadata ControllerR
HTTP Server
R
Rules
Working Memory
ProcessChains
Workflows
Client
R
Controller
!22
Execute workflow
Controller Rule System
load metadata
start reasoning
registerprocess chains
lookup processchain results
registerworkflow results
fire rules
newprocess chains
no newprocess chains
all process chains succeeded
error
JobManager
Rule System
Processing Cloud
Processing Connector 1 Processing Connector n
Process Chain Manager
R R
Service metadata
Data metadata ControllerR
HTTP Server
R
Rules
Working Memory
ProcessChains
Workflows
Client
R
Controller
!22
Execute workflow
Controller Rule System
load metadata
start reasoning
registerprocess chains
lookup processchain results
registerworkflow results
fire rules
newprocess chains
no newprocess chains
all process chains succeeded
error
JobManager
Rule System
Processing Cloud
Processing Connector 1 Processing Connector n
Process Chain Manager
R R
Service metadata
Data metadata ControllerR
HTTP Server
R
Rules
Working Memory
ProcessChains
Workflows
Client
R
Results
!23
Urban planning use case
!24
Dataset:
City of Toulouse, 120.63 GiB,1.58 billion points, 529 tiles Acquisition time: 1h 53m
Evaluation results:
Processing time: 1h 51m 18 compute nodes on Fraunhofer IGD Cloud Almost linear scalability
Result: Point cloud with labels for individual trees
Urban planning use case
!24
Dataset:
City of Toulouse, 120.63 GiB,1.58 billion points, 529 tiles Acquisition time: 1h 53m
Evaluation results:
Processing time: 1h 51m 18 compute nodes on Fraunhofer IGD Cloud Almost linear scalability
Result: Point cloud with labels for individual trees
Land monitoring use case
!25
Dataset:
3D point cloud of Liguria Region 451.16 GiB, 17.35 billion points, 684 strips
Evaluation results:
Processing time: 35m 49s Previously (without my work): several days (*)
Result: Point cloud allowing for fast extraction of basins in a certain level of detail
* according to GIS users from the Liguria region (personal communication)
Highhydraulicenergy
Heavyrains
Smallandsteepbasins
Floodsandland slides
Abandonedagriculturalterraces
Scientific contributions
!26
Microservice Architecture
• Scalability
• Modifiability
• Development distributability
• Availability
Processing
• Service integration
• Service orchestration
• Dynamic workflow management
• Rule-based workflow execution
Workflow Modelling
• Domain-Specific Language (DSL)
• Method for DSL modelling
Scientific contributions
!26
Microservice Architecture
• Scalability
• Modifiability
• Development distributability
• Availability
Processing
• Service integration
• Service orchestration
• Dynamic workflow management
• Rule-based workflow execution
Workflow Modelling
• Domain-Specific Language (DSL)
• Method for DSL modelling
Conclusions
!27
Research hypothesis is supported by • Positive results from detailed evaluation • Successfully satisfied stakeholder
requirements
Microservice architecture
Domain-Specific Languages
Enhanced stakeholder experience
Orchestrate existing geospatial processing algorithms
Cloud
Compose and execute geospatial workflows
Efficient application development
Conclusions
My thesis documents a major step in the paradigmshift from desktop GIS to the Cloud within the
geospatial community and geoinformatics
!27
Research hypothesis is supported by • Positive results from detailed evaluation • Successfully satisfied stakeholder
requirements
Microservice architecture
Domain-Specific Languages
Enhanced stakeholder experience
Orchestrate existing geospatial processing algorithms
Cloud
Compose and execute geospatial workflows
Efficient application development
(Future) work
JobManager has been put into productionat Deutsche Telekom AG
Several improvements:• Create VMs on demand (OpenStack) • Capability-based scheduling • Improved security
!28
General features
‣Geospa'al feature store ‣Schema agnos'c ‣Format preserving ‣Cloud-based ‣Event-driven ‣Easy to use/integrate
Michel KrämerFraunhofer-Institut für Graphische Datenverarbeitung IGD
Fraunhoferstraße 5
64283 Darmstadt [email protected]