8
Evolution of storage and data management Ian Bird GDB: 12 th May 2010

Evolution of storage and data management Ian Bird GDB: 12 th May 2010

Embed Size (px)

Citation preview

Page 1: Evolution of storage and data management Ian Bird GDB: 12 th May 2010

Evolution of storage and data management

Ian Bird

GDB: 12th May 2010

Page 2: Evolution of storage and data management Ian Bird GDB: 12 th May 2010

[email protected] 2

Discussion on evolving WLCG Storage and Data Management• Background:

– Experiments’ concern with performance and scalability of access to data • Particularly for analysis

– Concern over long term sustainability of existing solutions– Recognize:

• Evolution of technology (networking, file systems, etc)• Huge amounts of disk available to experiments (on aggregate!)• Other communities exist that: have solved similar problems; will have

similar problems soon

– Short term concerns:• Performance, scalability, bottlenecks (e.g. SRM)

– The MONARC model assumed networking was a problem • It is actually something we should invest more in

Page 3: Evolution of storage and data management Ian Bird GDB: 12 th May 2010

[email protected] 3

Scope• Focus on analysis use cases

– Production is in hand • Start from viewpoint of user access to data

– Hide details of back-end storage systems and etc.• Timescale for solutions in production – 2013 run

– But with incremental working prototypes that address some of the short term concerns• Should not be seen as a reason for new development projects

– Must use available tools as far as possible – must keep long term sustainability in mind• Working model:

– More network-centric than now– Few large archival repositories – long term data curation– “Cloud” of storage making optimal use of available capacity

• Should be a single effort of the community– There may be funding opportunities – but we should make sure that they help us in our

goals and obtaining what we need• Must be driven by the real needs of the experiments ...

– But we must learn the lesson of SRM – too many unrealistic “requirements” and no consensus on implementation

Page 4: Evolution of storage and data management Ian Bird GDB: 12 th May 2010

[email protected] 4

Technical areas• Data archives and storage cloud

– Simplify so that “tape” is really an archive– Allow remote data access when needed– Look into peer-peer technologies

• Data access layer– E.g. Xrootd/GFAL + some intelligence to determine when to cache/when to use

remote access• Output datasets

– Still need a service for (asynchronous) movement of datasets to an archive• Global home directory facility

– Model is a global file system. Industrial solutions available?• “Catalogues”

– Still need to locate data. Issue of consistency between different storage systems.• Authorization mechanisms

– For access to files in storage systems (archive + cloud), and quotas etc.

Page 5: Evolution of storage and data management Ian Bird GDB: 12 th May 2010

[email protected] 5

Next steps

• Jamboree– 3-day workshop to look at existing tools in each of

the areas: June 16-18 in Amsterdam• Elaboration of a more concrete plan and

timelines; set up working groups– Could be at WLCG workshop in London, July 7-9

• Develop demonstrator prototypes in each area – testing of components/technologies

• Experiment testing – integration into frameworks

Page 6: Evolution of storage and data management Ian Bird GDB: 12 th May 2010

[email protected] 6

Jamboree agenda – 1 10:00 Welcome & Introduction to the Jamboree

- goals of the workshopIan

10:15 Presentation of a strawman for a new model of data access and management - should include background and rationale behind this

Experiment person (Ian F/Kors?)

11:00 Status of networking (cf expectation in 2001), prospects including intercontinental and far (e.g. Asia) connectivity

David Foster

11:45 Back-end storage - to be used as a true archive: implications /simplifications possible. Should discuss the need to not know about or have to access back-end storage (tape).==> What should be the interface to storage?

Dirk Duellmann?

12:30 Lunch

14:00 Needs from a data access layer - how analysis use needs to access data, etc. Experiment person

14:45 Data transfer use cases: peer-peer (many-many) or point-point. On demand vs scheduled.

Experiment person

15:30 Coffee

15:45 Namespaces, authorization needs, quotas, catalogues - what is needed? Experiment person

16:30 The need for global home directory service - missing so far. What are the use cases?

Experiment person

17:00 Conclusions - summary of discussion points, points arising. Progress on model? Short term improvements possible?

Page 7: Evolution of storage and data management Ian Bird GDB: 12 th May 2010

[email protected] 7

Agenda – 2 Technology

9:00 File systems. Summary of work on Hadoop, Lustre, GPFS, etc. Hepix WG?

9:45 Xrootd - outlook Fabrizio Furano

10:30 Coffee

10:45 FTS, LFC - what is OK, what is not? What is still useful? ??

11:30 Alien FC - why is it interesting? Pablo Saiz

12:15 P2P technologies - can we simply adapt them? ??

13:00 Lunch

14:00 ROOT - outlook and developments Rene Brun

Site Experiences

14:45 NDGF/ARC Josva Kleist

15:15 GridKa - xrootd with tape back-end ??

15:45 Coffee

16:00 CNAF - GPFS/TSM/Storm ??

16:30 Hadoop at a Tier 2 Brian Bockelman?

17:00 Summary and conclusions - are there areas for potential early demonstrators?

Page 8: Evolution of storage and data management Ian Bird GDB: 12 th May 2010

[email protected] 8

Agenda – 3 9:00 Conclusions of the discussions

Summaries from secretaries of Days 1 and 2

10:00 Next steps: points to address: - Draft an outline plan with timelines; - Create working groups - agree mandates and convenors. Set expectations for July workshop

11:00 Summary: draft summary of the workshop and conclusions for presentation to wider community

12:00 Close