12
The VERCE Architecture for Data-Intensive Seismology Iraklis A. Klampanos [email protected] http://www.verce.eu 1 Wednesday, 19 June 13

The VERCE Architecture for Data-Intensive Seismologyciara.fiu.edu/osdc-verce_Iraklis.pdf · The VERCE Architecture for Data-Intensive Seismology OSDC Workshop, Edinburgh, 19 June

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The VERCE Architecture for Data-Intensive Seismologyciara.fiu.edu/osdc-verce_Iraklis.pdf · The VERCE Architecture for Data-Intensive Seismology OSDC Workshop, Edinburgh, 19 June

The VERCE Architecture for Data-Intensive Seismology

Iraklis A. Klampanos

[email protected]

http://www.verce.eu1

Wednesday, 19 June 13

Page 2: The VERCE Architecture for Data-Intensive Seismologyciara.fiu.edu/osdc-verce_Iraklis.pdf · The VERCE Architecture for Data-Intensive Seismology OSDC Workshop, Edinburgh, 19 June

OSDC Workshop, Edinburgh, 19 June 2013The VERCE Architecture for Data-Intensive Seismology

Overall Objectives

• VERCE: Virtual Earthquake and Seismology Research Community in Europe

• Unified, Europe-wide computing infrastructure

• Support common data- and CPU-intensive tasks

• Exploit local and remote processing and storage resources

• Use-cases:

• Ambient noise cross-correlation → Data-intensive

• Earth model adjustment → CPU-intensive 2

Wednesday, 19 June 13

Page 3: The VERCE Architecture for Data-Intensive Seismologyciara.fiu.edu/osdc-verce_Iraklis.pdf · The VERCE Architecture for Data-Intensive Seismology OSDC Workshop, Edinburgh, 19 June

OSDC Workshop, Edinburgh, 19 June 2013The VERCE Architecture for Data-Intensive Seismology

Technical Goals

• Transparent parallelism and maximum use of local resources

• Fine-grained Dispel streaming workflows

• Automated data handling

• Preemptive and reactive movement/replication

• Extensive use of science-specific metadata

• Customisation of workflows

• Use of community oriented library tools, e.g. ObsPy

3

Wednesday, 19 June 13

Page 4: The VERCE Architecture for Data-Intensive Seismologyciara.fiu.edu/osdc-verce_Iraklis.pdf · The VERCE Architecture for Data-Intensive Seismology OSDC Workshop, Edinburgh, 19 June

OSDC Workshop, Edinburgh, 19 June 2013The VERCE Architecture for Data-Intensive Seismology

Data-Intensive Requirements

• Data-intensive:

• Size, numbers of files, frequency of movement, etc.

• Data staging, pre-processing and cross-correlation

• mainly on private or Grid resources

• Arbitrarily many continuous waveforms (typically 24h records from local, regional, global networks)

• Over arbitrarily long periods of time

• Unified view of data sources (files, online feeds)

• Through metadata (e.g. station, channel, time, etc.)

4

Wednesday, 19 June 13

Page 5: The VERCE Architecture for Data-Intensive Seismologyciara.fiu.edu/osdc-verce_Iraklis.pdf · The VERCE Architecture for Data-Intensive Seismology OSDC Workshop, Edinburgh, 19 June

OSDC Workshop, Edinburgh, 19 June 2013The VERCE Architecture for Data-Intensive Seismology

Architectural Overview

• Science Gateway: extensible portal

• Users, resources, profiles, collaboration

• Services Integration

• Dispel processing - interpreter, optimiser

• Execution engine

• Registry

• Provenance

5

Science Gateway

Ser

vice

s In

tegr

atio

n

Scientific Catalogues

Data and metadata

Stores

Computing Resources

Catalogues and stores: external scientific and resource description catalogues and registries (e.g. UNICORE)Computing resources: local Dispel/D-I/adhoc, local HPC, external PRACE/HPC, external EGI/Grid, (future) Cloud

Wednesday, 19 June 13

Page 6: The VERCE Architecture for Data-Intensive Seismologyciara.fiu.edu/osdc-verce_Iraklis.pdf · The VERCE Architecture for Data-Intensive Seismology OSDC Workshop, Edinburgh, 19 June

OSDC Workshop, Edinburgh, 19 June 2013The VERCE Architecture for Data-Intensive Seismology

Dispel: Fine-Grained Scientific Workflows

• Composition of arbitrarily fine-grained workflows through processing elements and functions/constructors

• Stream-oriented / online processing:

• Decoupling from lower-level, technical details

• Composable workflow patterns

• An example seismology processing element may have

• High-level specification in Dispel

• Typed I/O streams and parameters

• Lower-level implementation in ObsPy, or other libraries

• Tweaking parameters alters behaviour

6

Wednesday, 19 June 13

Page 7: The VERCE Architecture for Data-Intensive Seismologyciara.fiu.edu/osdc-verce_Iraklis.pdf · The VERCE Architecture for Data-Intensive Seismology OSDC Workshop, Edinburgh, 19 June

OSDC Workshop, Edinburgh, 19 June 2013The VERCE Architecture for Data-Intensive Seismology

The VERCE Registry

• Provides consistency across sites

• w.r.t. Dispel components, resources, data

• Concurrent collaborative workflow development

• Interfaces with external registries and catalogues

• Major registry components

• Dispel

• Resources

• User and workload profiling information7

Wednesday, 19 June 13

Page 8: The VERCE Architecture for Data-Intensive Seismologyciara.fiu.edu/osdc-verce_Iraklis.pdf · The VERCE Architecture for Data-Intensive Seismology OSDC Workshop, Edinburgh, 19 June

OSDC Workshop, Edinburgh, 19 June 2013The VERCE Architecture for Data-Intensive Seismology

Provenance

• Pioneering use of W3C PROV model

• Gathered from remote resources

• Mirrored workflow

• User access through the science gateway

8

Processing*Stage:*CompositePE+**

***

**

**

Processing*PE***

Data*Flow****

Provenance***

Valida5on****

Parameters:*/ *Processing**

Visualize***Store***

*/ *Parallel****Branches*

Novel

Wednesday, 19 June 13

Page 9: The VERCE Architecture for Data-Intensive Seismologyciara.fiu.edu/osdc-verce_Iraklis.pdf · The VERCE Architecture for Data-Intensive Seismology OSDC Workshop, Edinburgh, 19 June

OSDC Workshop, Edinburgh, 19 June 2013The VERCE Architecture for Data-Intensive Seismology

Execution Engine

• OGSA-DAI

• Distributed data access and management

• Inherited from the ADMIRE project

• Storm

• Well designed high-level optimisation defaults

• Stress tested-used by companies and researchers

• Dispel-to-Storm interpreter9

Wednesday, 19 June 13

Page 10: The VERCE Architecture for Data-Intensive Seismologyciara.fiu.edu/osdc-verce_Iraklis.pdf · The VERCE Architecture for Data-Intensive Seismology OSDC Workshop, Edinburgh, 19 June

OSDC Workshop, Edinburgh, 19 June 2013The VERCE Architecture for Data-Intensive Seismology

Research Challenge 1: Optimisation

• Dispel semantics - close to users’ cognitive level

• Optimisation levels

• High-level: workflow partitioning and deployment

• Low-level: on-site data handling, data and code movement, execution management

• Constant ‘learning’ model making use of information from Registry and Provenance

• Adjust to changes in the environmental context10

Wednesday, 19 June 13

Page 11: The VERCE Architecture for Data-Intensive Seismologyciara.fiu.edu/osdc-verce_Iraklis.pdf · The VERCE Architecture for Data-Intensive Seismology OSDC Workshop, Edinburgh, 19 June

OSDC Workshop, Edinburgh, 19 June 2013The VERCE Architecture for Data-Intensive Seismology

Cross-Correlation Optimisation example

11

DataGathering Pre X-Corr. Post Data

Storage

In: 3 stations time periodOut: [waveform]

In: waveformOut: waveform

In: [waveform]Out: waveform

In: waveformOut: waveform

In: waveformOut: Ν/Α

DataGathering Pre Post Data

Storage

In: 3 stations time periodOut: [waveform]

In: waveformOut: Ν/Α

Wi

Wj

Wk

Resource x Resource y Resource x

Pre

Pre

Wi

Wi

s1

s2

s3

Wj

X-Corr.

X-Corr.

User’s view

High-level optimisation

Wednesday, 19 June 13

Page 12: The VERCE Architecture for Data-Intensive Seismologyciara.fiu.edu/osdc-verce_Iraklis.pdf · The VERCE Architecture for Data-Intensive Seismology OSDC Workshop, Edinburgh, 19 June

OSDC Workshop, Edinburgh, 19 June 2013The VERCE Architecture for Data-Intensive Seismology

Research Challenge 2:Registry Design and Evaluation• Registry’s role central for

• informing and supporting users

• being a good citizen in a community of services

• enabling efficient operation

• Distributed Vs. Centralised Vs. Hybrid design

• Balance between replication and federation

• Prioritisation of features/metadata

• Models for data and information retrieval

• Multiple evaluation angles

12

Wednesday, 19 June 13