18
IT-SDC : Support for Distributed Computing CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013

CMS : T1 Disk/Tape separation

  • Upload
    ismail

  • View
    40

  • Download
    0

Embed Size (px)

DESCRIPTION

CMS : T1 Disk/Tape separation. Nicol ò Magini , CERN IT/SDC Oliver Gutsche , FNAL November 11 th 2013. Outline. Motivation: gains in operations Impact on data federation Progress and technical issues Changes in operations and procedures. Introduction. - PowerPoint PPT Presentation

Citation preview

Page 1: CMS : T1  Disk/Tape separation

IT-SDC : Support for Distributed Computing

CMS: T1 Disk/Tape separation

Nicolò Magini, CERN IT/SDCOliver Gutsche, FNAL

November 11th 2013

Page 2: CMS : T1  Disk/Tape separation

WLCG Workshop: Disk/Tape separation 2IT-SDC

Outline

Motivation: gains in operations Impact on data federation Progress and technical issues Changes in operations and

procedures

2013-11-11

Page 3: CMS : T1  Disk/Tape separation

WLCG Workshop: Disk/Tape separation 3IT-SDC

Introduction CMS asked the Tier-1 sites to change their storage setup to

gain more flexibility and control of the available disk and tape resources

Old setup: One MSS system controlling both disk and tape

Automatic migration of new files to tape Disk pool automatically purges unpopular files to make room for more

popular files Automatic recall of files from tape when accessing files without disk copy

Several disadvantages: Pre-staging needed for organized processing, not 100% efficient because

system was still allowed to automatically purge files if needed User analysis was not allowed at Tier-1 sites to protect the tape drives

from chaotic user access patterns

2013-11-11

Page 4: CMS : T1  Disk/Tape separation

WLCG Workshop: Disk/Tape separation 4IT-SDC

Disk/Tape separation

CMS asked the Tier-1 sites to separate disk and tape and base the management of both on PhEDEx Sites were asked to deploy two independent [*] PhEDEx

endpoints “Large” [**] persistent disk Tape archive with “small” [**] disk buffer

All file access will be restricted to the disk endpoint All processing will write only on the disk endpoint

[*] Can write/delete a file on disk-only, or on tape-only, or on both simultaneously

[**] “small” ~ 10% of “large”, but can be sized according to expected rates to tape

2013-11-11

Page 5: CMS : T1  Disk/Tape separation

WLCG Workshop: Disk/Tape separation 5IT-SDC

Motivation

Increase flexibility for Tier-1 processing

Enable user analysis at Tier-1s

Enable remote access of Tier-1 data

2013-11-11

Page 6: CMS : T1  Disk/Tape separation

WLCG Workshop: Disk/Tape separation 6IT-SDC

Processing at Tier-1s: Location independence

Use case: Organized processing needs to access input samples stored custodially

on tape at one of the Tier-1 sites Old model:

Jobs needed to run close to tape endpoint hosting input and output data (custodial location)

New model: Jobs can run against any disk endpoint, not necessarily close to tape

endpoint hosting input or output data Benefit of new model:

Custodial distribution optimizes tape space utilization taking into account processing capacities of the Tier-1 sites

Not all data is being accessed at the same time causing uneven processing resource utilization

Location independence enables to use both tape and processing resources efficiently at the same time

2013-11-11

Page 7: CMS : T1  Disk/Tape separation

WLCG Workshop: Disk/Tape separation 7IT-SDC

Processing at Tier-1s: Pre-staging and Pinning

Use case: Staging and pinning input files to local disk for organized processing is

required to optimize CPU efficiency Input files need to be released from disk when processing is done

Old model: Pre-staging via SRM or Savannah tickets was used to convince the MSS to

have input files available on disk Release of input relied on automatic purge within MSS

New model: CMS will centrally subscribe and therefore pre-stage input files to have

them available on disk before jobs start CMS will permanently keep input files on disk for regular activities

Benefit of new mode: CMS is in control of what is on disk at the Tier-1 sites and can optimize

disk utilization (CMS will have to actively manage the disk space through PhEDEx)

2013-11-11

Page 8: CMS : T1  Disk/Tape separation

WLCG Workshop: Disk/Tape separation 8IT-SDC

Processing at Tier-1s: Output from central processing

Use case: Central processing produces output which needs to be archived

on tape Old model:

Output of individual workflows could only be produced at one site, the site of the custodial location

New model: Output can be produced at one or more disk endpoints, then

migrated to tape only at single final custodial location Benefit of new model:

CMS can optimize processing resource utilization Tier-1s with no free tape are no longer idle

CMS can validate data before final tape migration, reducing unnecessary tape usage

2013-11-11

Page 9: CMS : T1  Disk/Tape separation

WLCG Workshop: Disk/Tape separation 9IT-SDC

Impact on data federation

CMS would like to benefit from a fully deployed CMS data federation Tier-1s need to publish files on the disk endpoints in

the Xrootd federation Eventually, all popular data will be accessible

through the federation Benefits:

Further optimize processing resource utilization by processing input files without the need to relocate samples through PhEDEx Enables processing not only on remote Tier-1 sites through

the LHCOPN but also at Tier-2 sites

2013-11-11

Page 10: CMS : T1  Disk/Tape separation

WLCG Workshop: Disk/Tape separation 10IT-SDC

Technical implementation

Sites and storage providers free to choose implementation

Two possibilities identified in practice: Two independent storage endpoints

CERN, FNAL Single storage endpoint with two

different trees in the namespace RAL, KIT, CNAF, CCIN2P3, PIC

2013-11-11

Page 11: CMS : T1  Disk/Tape separation

WLCG Workshop: Disk/Tape separation 11IT-SDC

Internal transfers

Currently using standard tools for disktape buffer transfers at all sites e.g. FTS, xrdcp

No bottleneck seen so far If needed, internal optimizations are

possible with a single endpoint e.g. on a single dCache endpoint, internal

data flow can be delegated to the pools

2013-11-11

Page 12: CMS : T1  Disk/Tape separation

WLCG Workshop: Disk/Tape separation 12IT-SDC

Site concerns

Main site concern has been duplication of space used between disk and tape buffer

Should not be a big effect given the “small” size of the buffer in front of tape

For dCache, a solution is planned: “flush-on-demand” command creating a hard

link in tape namespace instead of copy development schedule will depend on need,

for now gather experience with current version

2013-11-11

Page 13: CMS : T1  Disk/Tape separation

WLCG Workshop: Disk/Tape separation 13IT-SDC

Current status

DONE RAL, CNAF KIT (in commissioning last week)

~ DONE CERN (except for Tier-0 streamers and

user) IN PROGRESS

PIC, CCIN2P3, FNAL

2013-11-11

Page 14: CMS : T1  Disk/Tape separation

WLCG Workshop: Disk/Tape separation 14IT-SDC

Issues

At sites No blocking technical issues Not stress-tested yet: challenge in 2014?

In CMS software Minor update needed in PhEDEx to handle

disktape moves Need to settle data location for job

matching PhEDEx node vs. SE… CMS internal, in progress

2013-11-11

Page 15: CMS : T1  Disk/Tape separation

WLCG Workshop: Disk/Tape separation 15IT-SDC

Changes in operations and procedures

The Tier-1 disk endpoint is a central space CMS will manage subscriptions and deletions

on disk Tape endpoint subscriptions are subject to

approval by Tier-1 data managers (functions that are held by site-local colleagues)

CMS would like to auto-approve disk subscription and deletion requests to be able to reduce latencies

2013-11-11

Page 16: CMS : T1  Disk/Tape separation

WLCG Workshop: Disk/Tape separation 16IT-SDC

Changes in operations and procedures

Tape families: Together with the Tier-1 sites, CMS optimized placement of files

on tape for reading by requesting tape families In the old model, tape family requests needed to be made before

processing started, could lead to complications if forgotten New model allows processing on disk endpoints without the need

for tape families A PhEDEx subscription archives the output to tape: needs to be

approved by the site-local data manager Tape family requests by CMS are not needed anymore, Sites can create

tape families before approving archival PhEDEx subscriptions CMS is happy and available for the sites to optimize rules for tape family creation

CMS would like to evolve the tape family procedure from requesting individual families to a dialogue with the sites defining tape family setups and rules

2013-11-11

Page 17: CMS : T1  Disk/Tape separation

WLCG Workshop: Disk/Tape separation 17IT-SDC

Changes in site readiness

Site readiness metrics for Tier-1s will evolve taking into account separated disk and tape PhEDEx endpoints SAM tests only on CEs close to disk SAM tests for SRM both on disk and on tape

endpoints More links to monitor:

diskWAN tapeWAN disktape

2013-11-11

Page 18: CMS : T1  Disk/Tape separation

WLCG Workshop: Disk/Tape separation 18IT-SDC

Conclusions

Hosting Tier-1 data on disk will increase flexibility in all computing workflows

Technical solutions identified for all sites Deployment in progress with no blocking

issues, expecting completion at all sites by beginning of 2014

For more details: https://

twiki.cern.ch/twiki/bin/view/CMSPublic/CompProjDiskTape https://indico.cern.ch/conferenceDisplay.py?confId=249032

2013-11-11