21
#DPHEP: Status and Outlook Sustainable Strategies for Long-Term DP at the Exa-scale [email protected] LHCC Referees Meeting International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics

#DPHEP : Status and Outlook Sustainable Strategies for Long-Term DP at the Exa -scale

  • Upload
    alissa

  • View
    52

  • Download
    0

Embed Size (px)

DESCRIPTION

#DPHEP : Status and Outlook Sustainable Strategies for Long-Term DP at the Exa -scale. International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics. [email protected] LHCC Referees Meeting. Overview. Sustainable Strategy Collaboration Agreement - PowerPoint PPT Presentation

Citation preview

Page 1: #DPHEP : Status and Outlook Sustainable Strategies for Long-Term DP at the  Exa -scale

#DPHEP: Status and OutlookSustainable Strategies for Long-Term DP at the Exa-scale

[email protected] LHCC Referees Meeting

International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics

Page 2: #DPHEP : Status and Outlook Sustainable Strategies for Long-Term DP at the  Exa -scale

Overview

• Sustainable Strategy

• Collaboration Agreement

• Research Data Alliance

• H2020 (NSF?) Prospects

Page 3: #DPHEP : Status and Outlook Sustainable Strategies for Long-Term DP at the  Exa -scale

2020 Vision for LT DP in HEP• Long-term – e.g. LC timescales: disruptive change

– By 2020, all archived data – e.g. that described in Blueprint, including LHC data – easily findable, fully usable by designated communities with clear (Open) access policies and possibilities to annotate further

– Best practices, tools and services well run-in, fully documented and sustainable; built in common with other disciplines, based on standards

Vision achievable, but we are far from this today

Page 4: #DPHEP : Status and Outlook Sustainable Strategies for Long-Term DP at the  Exa -scale

Data Preservation Maturity ModelLevel Metric Implications

4 Reproducible results by “citizen scientists”

Desired(?) by funding agencies: people able to reproduce an analysis should be awarded “a degree” – beyond what can realistically be afforded?

3 Reproducible results where consumer ≠ producer and outside immediate community

Stronger demonstration of long-term preservation. Knowledge stored is sufficient for physicist outside immediate community to reproduce results

2 Reproducible results where consumer ≠ producer but within same “larger community”, e.g. LHC (ATLAS / CMS; CDF / D0, …)

Highly desirable for “minimal” long-term preservation. “Knowledge” stored is sufficient for a physicist from a different collaboration (but within same overall programme) to reproduce results

1 Reproducible results where consumer = producer

Required during lifetime of collaboration

0 N/A Data is lost: logically or physically.This is probably the reality for the bulk of pre-DPHEP experiments (and even some of those??)

• Scale (complexity) is probably “exponential”

Page 5: #DPHEP : Status and Outlook Sustainable Strategies for Long-Term DP at the  Exa -scale

Software Preservation Maturity ModelLevel Metric Implications

4 Reproducible results by “citizen scientists”

Desired(?) by funding agencies: people able to reproduce an analysis should be awarded “a degree” – beyond what can realistically be afforded?

3 Reproducible results where consumer ≠ producer and outside immediate community

Stronger demonstration of long-term preservation. Knowledge stored is sufficient for physicist outside immediate community to reproduce results

2 Reproducible results where consumer ≠ producer but within same “larger community”, e.g. LHC (ATLAS / CMS; CDF / D0, …)

Highly desirable for “minimal” long-term preservation. “Knowledge” stored is sufficient for a physicist from a different collaboration (but within same overall programme) to reproduce results

1 Reproducible results where consumer = producer

Required during lifetime of collaboration

0 N/A Data is lost: logically or physically.This is probably the reality for the bulk of pre-DPHEP experiments (and even some of those??)

REPRODUCIBLE RESULTS AFTER “PORTING” TO NEW ENVIRONMENT!

Page 6: #DPHEP : Status and Outlook Sustainable Strategies for Long-Term DP at the  Exa -scale

Sustainable Strategy

• A document on a sustainable strategy for LTDP is available – discussed at DPHEP IB today

• This version focuses on CERN (IT) – presented yesterday (attached to agenda: doc, ppt)

• Some comments received (DESY, INFN)– DESY comments included in current draft;– INFN: stress need for standards, e.g. for outreach activities

based on data from multiple experiments• Intent is to update document to reflect activities of

other “Collaboration Members”

Page 7: #DPHEP : Status and Outlook Sustainable Strategies for Long-Term DP at the  Exa -scale

Summary of Recommendations

Page 8: #DPHEP : Status and Outlook Sustainable Strategies for Long-Term DP at the  Exa -scale

ICFA Statement on LTDP• The International Committee for Future Accelerators (ICFA) supports the efforts of the Data

Preservation in High Energy Physics (DPHEP) study group on long-term data preservation and welcomes its transition to an active international collaboration with a full-time project manager. It encourages laboratories, institutes and experiments to review the draft DPHEP Collaboration Agreement with a view to joining by mid- to late-2013.

• ICFA notes the lack of effort available to pursue these activities in the short-term and the possible consequences on data preservation in the medium to long-term. We further note the opportunities in this area for international collaboration with other disciplines and encourage the DPHEP Collaboration to vigorously pursue its activities. In particular, the effort required to prepare project proposals must be prioritized, in addition to supporting on-going data preservation activities.

• ICFA notes the important benefits of long-term data preservation to exploit the full scientific potential of the, often unique, datasets. This potential includes not only future scientific publications but also educational outreach purposes, and the Open Access policies emerging from the funding agencies.

• 15 March 2013

Page 9: #DPHEP : Status and Outlook Sustainable Strategies for Long-Term DP at the  Exa -scale

DPHEP Collaboration Agreement

• A draft has been prepared by the CERN legal service, has been sent to ICFA and available to DPHEP since early 2013

• Some comments have been received and integrated• AFAIK CERN, DESY, FNAL and SLAC “ready” to sign• Target: prior to CHEP 2013 (RDA-2 might be better!)• Next steps: get legal services in touch with each other

and complete process• CERN & DESY: defining activities as part of Collaboration

Page 10: #DPHEP : Status and Outlook Sustainable Strategies for Long-Term DP at the  Exa -scale

RDA Preservation WG

• The RDA – strongly supported by EU, NSF, AU – seen as an element of implementing HLEG 2030 vision

• A WG on DP was approved in May – Chair: David Giaretta (APA, SCIDIP-ES, author of “Advanced

DP”, ex-DCC, ex-STFC)– Co-chair: JDS

• The intent is to show progress by each RDA plenary (March, September) and co-ordinate international activities, identify candidate services for standardization, lobby for funding…

Page 11: #DPHEP : Status and Outlook Sustainable Strategies for Long-Term DP at the  Exa -scale

Component Breakdown

• Can break this down into three distinct areas– (OAIS reference model is somewhat more complex: this

is a zeroth iteration)

• “Archive issues”

• Digital Libraries & “Adding Value” to data

• “Knowledge retention” – the Crux of the Matter

Page 12: #DPHEP : Status and Outlook Sustainable Strategies for Long-Term DP at the  Exa -scale

Archive Issues

We (HEP) has significant experience of 100PB+ distributed data stores

Plan is to coordinate long-term “bit preservation” issues via HEPiX

And with other disciplines e.g. via IEEE MSST×Sustainable models for long-term multi-

disciplinary data archives still to be solvedH2020 funding targetted for this

Page 13: #DPHEP : Status and Outlook Sustainable Strategies for Long-Term DP at the  Exa -scale

Digital LibrariesSignificant investment in this space, including

multiple EU (and other) funded projectsNo reason to believe that the issues will not be

solved, nor that funding models will not exist, e.g. adapted from “traditional” libraries

Related topics: “linked data”, “adding value to data” – again with projects / communities

Should work closely with these projects / communities, not start new initiatives

Page 14: #DPHEP : Status and Outlook Sustainable Strategies for Long-Term DP at the  Exa -scale

14

Where to Invest – Summary

Tools and Services, e.g. Invenio:could be solved. (2-3 years?)

Archival Storage Functionality:should be solved. (i.e. “now”)

Support to the Experiments for DPHEP Levels 3-4:must be solved – but how?

Page 15: #DPHEP : Status and Outlook Sustainable Strategies for Long-Term DP at the  Exa -scale

Who Can Help?• Mobilize resources through existing structures:

– Research Data Alliance:• Funding / strong interest from EU, US, AU, others• Part of roadmap to “Riding the Wave” 2030 Vision• STFC and DCC personnel strongly involved in setup

– WLCG:• Efforts on “software re-design” for new architectures• Experiment efforts on Software Validation (to be coordinated via DPHEP), building on

DESY & others– DPHEP:

• Coordination within HEP and with other projects / disciplines

• National & International Projects– H2020 / NSF funding lines– National projects also play an important role

Page 16: #DPHEP : Status and Outlook Sustainable Strategies for Long-Term DP at the  Exa -scale

Trust

Trus

t

Data

Cur

ation

DataGenerators

Community Support Services

Users

Common Data Services

User functionalities, datacapture & transfer, virtualresearch environments

Data discovery & navigationworkflow generation,annotation, interoperability

Persistent storage,identification, authenticity,workflow execution, mining

Collaborative Data Infrastructure – Riding The Wave HLEG Report

Page 17: #DPHEP : Status and Outlook Sustainable Strategies for Long-Term DP at the  Exa -scale

H2020 Prospects• According to Kostas Glinos (e-IRG meeting, Dublin) first calls:

December 11 2013• “Framework for action” (part of open consultation) has a “fiche”

targetting DP• DPHEP ICFA report (2020 vision) sent to Carlos MP• “References to RDA are appreciated and I really hope that you

take a leading role in bringing people and key players together around a global initiative to tackle the issue of “highly reliable and highly trusted infrastructures for research data preservation”.

• IMHO: need to prepare now (collaboration, WP, tasks) – likely discuss this at RDA Plenary, CHEP 2013, PV …

Page 18: #DPHEP : Status and Outlook Sustainable Strategies for Long-Term DP at the  Exa -scale

A Strategy for H2020?• Front-end: collaborate with on-going efforts in Digital Libraries, Linked Data, PV etc.

– Significant effort (also HEP expertise): very high probability of further funding in H2020 (+RDA)– DP(HEP) is already part of these projects: feed in requirements & collaborate (PRELIDA WS??)

• Back-end: collaborate through HEPiX & IEEE MSST– Seek specific H2020 funding for CDIs, including TCO, long-term, sustainable inter-disciplinary archives

• Middle:– Collaborative effort on Validation Frameworks, Virtualization, Training, Outreach etc.

• Includes institute / national funding– Work for “Concurrency Framework” and other efforts so that future migrations less painful; more repeatable– [ CERNLIB consortium ]– Seek further funds (H2020, RDA) to further develop and generalize

• Several (all?) relevant “fiches” in “Call for Action” document– fiche 01: community support data services– fiche 02: infrastructure for Open Access– fiche 03: storing, managing and preserving research data– fiche 04: discovery and provenance of research data– fiche 05: towards global data e-infrastructures– fiche 06: global A&A e-infrastructures– fiche 07: skills and new professions for research data

Page 19: #DPHEP : Status and Outlook Sustainable Strategies for Long-Term DP at the  Exa -scale

Other Activities

• Various project proposals in preparation / review• On-going activities in the experiments: “DPHEP classic”

as well as LHC• Discussions with CMS on validation system – other LHC

experiments expected to join• DPHEP session at CHEP 2013 – outlook for CHEP 2015?

(tighter integration into programme)• Presentations accepted at numerous conferences /

workshops – building more links with other disciplines• DPHEP IB (modeled on WLCG) monthly call

Page 20: #DPHEP : Status and Outlook Sustainable Strategies for Long-Term DP at the  Exa -scale

What WhenCollaboration Agreement Q3-Q4 2013Preparation for H2020 Now – Q3/Q4 2013HEPiX WG in place <Q4 2014First H2020 calls open Dec 2014ICFA report (work plan, including sustainability plan)

DESY, Feb 20-21 2014

H2020 Proposal End Q1 2014DPHEP Portal Available mid 2014H2020 news July 2014LEP Data “recovery” (CERNLIB???) End 2014?Validation framework(s) 2014 / 2015?Long-term CDI #1 2015 – 2017Full(?) understanding of costs 2016/17?Sustainable, repeatable LTDP 201?

Page 21: #DPHEP : Status and Outlook Sustainable Strategies for Long-Term DP at the  Exa -scale

Summary• Making good progress on multiple fronts

• “Sustainable strategy” being discussed (and then put in place)

• Good inter-disciplinary collaboration

• Optimistic regarding H2020 and also NSF(+) – but needs work!

• #DPHEP for news!