for Experiments The FIFE Project: Distributed Computing€¦ · Going Global with User Jobs • International collaborators often bring additional computing resources to bear •

Vito Di Benedetto for the FIFE ProjectCHEP 20189 July 2018

The FIFE Project: Distributed Computing for Experiments

FabrIc for Frontier Experiments (FIFE)• Standardize computing framework for non-LHC

FNAL experiments• Offer a robust, shared, and modular set of tools for

experiments, including:– Job submission and monitoring– Workflow management software– Data management and transfer tools– Continuous Integration service

• Work closely with experiment contacts during all phases: from experiment workflow development and testing to production campaigns. Standing meetings; regular workshops and tutorials

– Provide an on-boarding process for an experiment’s production team

2 Vito Di Benedetto | FIFE: Computing for Experiments 9 July 2018

g-2

DES

NOvA

FIFE Project Benefits

• Can leverage each other’s expertise and lessons learned• Greatly simplifies life for those on multiple experiments• Allow to identify and address common problems and provide common solutions• Take away a burden from experiments to gain access to new resources

– OSG– European Grid– HPC

• Simplify transition from one common service to another:– From FermiGrid to HEP Cloud– From GUMS & VOMS-Admin to a new authorization registry service (FERRY)

Vito Di Benedetto | FIFE: Computing for Experiments 9 July 20183

FIFE Project: Common problems, Common solutions

FIFE project is bringing experiments under a common umbrella. It provides:• Unified job submission tool for distributed computing: JobSub• Workflow monitors, alarms, and automated job submission (FIFEMON)• Production Operation Workflow Management Service (POMS)• Data handling and distribution:

– Sequential Access via Metadata (FIFE SAM)– dCache/Enstore (data caching and transfer/long-term tape storage)– Fermilab File Transfer Service (FIFE F-FTS)– Intensity Frontier Data Handling Client (ifdh)

• Software stack distribution via CVMFS• User support for authentication, proxy generation, and security• Continuous Integration service (CI)• Electronic logbooks, databases, and beam information• Integration with new technologies and resources: GPUs and HPC • Access to modular software framework such as ART


Resource Reservation

Manager

Disk storage

Resource Provisioning Agent

LEGEND

Job Submission

Resource Holder Agent Submission

Data Transfer and Database Queries

Job Batch Scheduler & Manager

Job Submission

Service

Security and Authorization Services

Meta DataLocation Catalog

Meta Data and Auxiliary Data

Handling Services

Grid/H

EP Cloud Site/H

PCjob

WN/VM

Resource Holder Agent

File Transfer Service

Monitoring Service

Job Submission

Client

Auxiliary Data

Tape storage

Production Workflow Management

FIFE Distributed Computing Architecture

5 9 July 2018

FIFE Experiment Data and Job Volumes

• Nearly 11.6 PB new data catalogued over past 6 months across all expts• Average throughput of 4.1 PB/wk through FNAL dCache• Typically 16K concurrent running jobs; peak over 36K. Approx. 3M wall hours per week• Combined numbers approaching scale of LHC (factor of 6-7 smaller wrt ATLAS+CMS)

Vito Di Benedetto | FIFE: Computing for Experiments

FNAL dCache daily throughput by experiment, last 6 months Total daily wall time by experiment, last 6 months

9 July 20186

Going Global with User Jobs• International collaborators often bring additional computing

resources to bear• Users want to be able to seamlessly run at all sites with

unified submission command.• First International location was for NOvA at FZU in Prague. • Now have expanded to many other sites.• Following OSG setup process makes it easy to have sites

around the globe communicate with a common interface, with a variety of job management systems underneath.

• Integration times as short as 1-2 weeks; all accessible via standard submission tools. Record set-up time is just 2 hours!


FZU: NOvA, DUNE/ProtoDUNEJINR: NOvA Lancaster, Manchester: MicroBooNEImperial, Sheffield,CERN Tier 0, Manchester, Liverpool: DUNE/ProtoDUNE INFN-Pisa: g-2

FIFE jobs at non-US sites, past 6 months

9 July 20187

FIFE: Work in Progress

• Identifying Common Problem and looking for Common Solutions!• Roadmap for this year:

– Migration from FermiGrid to HEPCloud– Improving data management (Rucio evaluation)– Provide a common approach for monitoring (Landscape Program):

• services status and actions• production teams shifter dashboards• comprehensive storage monitoring

– Popularize/customise POMS for production teams– Solve analysers’ code distribution problem– Help with auxiliary files management


Elastic Provisioning with HEPCloud

• Enable elastic Fermilab resource expansion to both other Grids (OSG, EGI, etc.) and commercial Cloud resources (Amazon, Google, Microsoft). This new framework is known as HEPCloud

○ Moves into production December 2018 • Job routes chosen by a decision

engine. Engine considers workflow cost, efficiency, and the requirements of the scientific workflow.

More details at http://hepcloud.fnal.gov/ and http://hepcloud.fnal.gov/papers/. Also, see Eric Vaandering’s talk & Dirk Hufnagel’s poster.

9

CMS: 160KSimultaneousJobs on Google

Plot by B. Holzman, FNAL


http://hepcloud.fnal.gov/

http://hepcloud.fnal.gov/papers/

HEPCloud Demonstration Success Story: NOvA @ NERSC

• NOvA experiment uses Feldman-Cousins procedure as part of final analysis.• Using standard HTC resources, analysis of full dataset would have taken

several weeks; impossible to complete in time without significantly expanding resources.

• NOvA was able to use 1M cores simultaneously at NERSC via HEPCloud, achieving an overall speedup factor of about 150.

• Results presented at Neutrino 2018.

One million cores!


Strive for Common Tools (Rucio Evaluation)

• Changes will be needed to data transfer mechanisms as infrastructure evolves and needs grow.

• Modify existing tools, or adopt something else?• Rucio has been steadily gaining traction outside of ATLAS; being evaluated for

several other experiments.• Fermilab also doing a Rucio evaluation:

– FIFE personnel will interface with evaluation team and experiments– will assist those experiments who wish to transition if Rucio is eventually

adopted.


Monitoring (Landscape Program)


Off-line Production Shifter Dashboards

● Customizable dashboards for each experiment

● Display information experiments deem relevant to shifters in a single place

● Could drill down to more details


protoDUNE Cross Labs Monitoring

● protoDUNE detectors, located at CERN, will take data in Fall 2018

○ Significant computing infrastructure at both CERN and FNAL (buffer disk, long-term storage, reconstruction and analysis resources)

● Developed a unified monitoring system to show data from both laboratories side-by-side:

○ Covers data transfer rates, storage health and usage, job submission and status


Production Workflow Submission

• Several workflow management tools previously available; work fairly well for individual analyzers, but did not scale well with large production workflows.

• Started working on in-house Production Operation Management Service (POMS) suitable for production workflows for FIFE experiments:

• Tracks what processing needs to be done (“Campaigns”) and job submissions from a campaign

• Make automatic job submissions • Launch:

– recovery jobs for files that didn't process automatically – jobs for dependent campaigns automatically to process output of previous passes.

• Interface with SAM to track files processed by submissions and Campaigns• Provides “Triage” interface for examining individual jobs/logs and debugging

failures.


Production Workflow Submission Example

Test workflow for SBND

● Each stage of the workflow takes a SAM dataset input with the output files of the previous stage

● Each stage starts automatically when the previous one is finished

Campaign Stages


Campaign Results

Efficient User Code Distribution

• Until January, 2018 analysers were able to access their Bluearc code dev area– Was NFS mounted on all FermiGrid nodes.

• After Bluearc has been unmounted analyzers have started to use dCache scratch area to upload tarballs and stage them from dCache to worker nodes. It turned out to be a bad idea:

– too much hammering on single pools when many jobs of the same type start– can completely block access to files owned by other users on a given pool

• Interim solution: stage tarballs in “resilient” dCache area (files replicated 20x on different pools):

– Advantages: keeps the system running, doesn’t block users, more efficient.– Disadvantages: tarballs use 20x space; need careful cleanup policies

• Longer-term solution: looking into automated distribution via CVMFS.– Maybe should wait for CVMFS release that allows parallel publishing (should

be released by the end of 2018)


Auxiliary File Delivery via OSG's StashCache

Example of the problem: several neutrino experiments use O(100 MB)"flux files" for simulating neutrinos in rock, etc.

• Several files are used per simulation job• Total input is few GBs• Not well-suited to CVMFS for several reasons (each job uses random subset; can thrash worker node

cache)• Transferring all the time sub-optimal (plus no caching at all that way)

StashCache service developed and widely used by OSG to solve just such a problem:

• Files replicated to regional caches, streamed via transparent xrootd connection from closest cache (geoIP) if not cached locally (small cache on the machine itself)

• Can be layered on top of a dedicated CVMFS repository• User sees a consistent POSIX file path across all sites, no need to worry about "how" the file gets

there

Four FNAL experiments are using the service so far:FIFE assists experiments with setting up and populating area.


• Containers (Docker, Singularity, etc.) are becoming more important in increasingly heterogeneous environments (including GPU machines). Help shepherd users through this process and create some common containers for them

• Help define the overall computing model of the future guide experiments– Seamlessly integrate dedicated, opportunistic, HPC, and commercial computing

resources

• Lower barriers to accessing computing elements around the world in multiple architectures

– Help to connect experimenters and computing professionals to drive experiment SW to increased multithreading and smaller memory per core footprints

– Federated identity management (reduced access barriers for international partners)

Looking to the Future with FIFE

Vito Di Benedetto | FIFE: Computing for Experiments25 Sep 201719 9 July 2018

BACKUP


SAM originally developed for CDF and D0; many FNAL experiments now using it• A File metadata/provenance catalog• A File replica catalog (data need not be at Fermilab)• Allows metadata query-based “dataset” creation• An optimized file delivery system (command-line, C++, Python APIs available)• PostrgreSQL backend; client communication via http: eliminates need to worry

about opening ports for communication with server in nearly all cases– Heretofore mostly followed the "send the data to the jobs" model.

Enhancements underway to also make allowances for the "jobs to the data" school of thought

Data management: FIFE SAM and FTS

Vito Di Benedetto | FIFE: Computing for Experiments21 9 July 2018

Fermilab File Transfer Service• Watches one or more dropboxes for new files• Can extract metadata from files and declare to SAM, or handle files

already declared• Copies files to one or more destinations based on file metadata

and/or dropbox used, register locations w/SAM• Can automatically clean dropboxes, usually N days after files are on

tape• Does not have to run at Fermilab, nor do source or destination

have to be at Fermilab

Data management: FIFE SAM and FTS (2)


• File I/O is a complex problem (Best place to read? What protocol? Best place to send output?)

• FIFE cannot force opportunistic sites to install specific protocols• Intensity Frontier Data Handling client developed as common wrapper around

standard data movement tools; shield user from site-specific requirements and choosing transfer protocols

• Nearly a drop-in replacement for cp, rm, etc., but also extensive features to interface with SAM (can use ifdh to pull in files associated with SAM project, etc.)

• Supports a wide variety of protocols (including xrootd); automatically chooses best protocol depending on host machine, source location, and destination (can override if desired)

– Backend behavior can be changed or new protocols added in completely transparent ways– Special logic for automatically parsing Fermilab dCache and CERN EOS paths; shield user from

having to know proper URIs

Simplifying I/O with IFDH


Full workflow management


• Now combining job submission, data management, databases, and monitoring tools into complete workflow management system– Production Operations Management Service (POMS)

• Can specify user-designed “campaigns” via GUI describing job dependencies, automatic resubmission of failed jobs, complete monitoring and progress tracking in DB– Visible in standard job monitoring tools

24 9 July 2018

Improving Productivity with Continuous Integration• Have built up a Jenkins-based

Continuous Integration system designed for both common software infrastructure (e.g. Art) and experiment-specific software, full web UI

• In addition to software builds, can also perform physics validation tests of new code (run specific datasets as grid jobs and compare to reference plots)

• Supporting SL6/7, working on OSX and Ubuntu support, experiments free to choose any combination of platforms

• Targeted email notifications for failures

NOvA experiment’sCI tests


Service adoption rate

• Led by jobsub; IFDH, FIFEmon also popular• POMS is a newer service


• Clear push from U.S. Department of Energy to use more HPC resources (supercomputers)

• Somewhat of a different paradigm, but current workflows can be adapted• Resources typically require an allocation to access them• FIFE can help experiment link allocations to existing job submission tools

– Looks like just another site to the job, but shields user from complexity of gaining access

– Successfully integrated with NOvA at Ohio Supercomputing Center, MINOS+ at Texas Advanced Computing Center

– Mu2e experiment now testing at NERSC (via HEPCloud)

Access to High Performance Computing

27

Photo by Roy Kaltschmidt, LBNL


• Lots of (justified) excitement about GPUs; heard quite a bit already this week

• Currently no standardized way to access resources

• FIFE now developing such a standard interface within the existing job submission system– Uses a GPU discovery tool from OSG to characterize the system (GPU type, CUDA/OpenCL

version, driver info, etc.)

– Advertises GPU capabilities in a standard way across sites; users can simply add required capabilities to their job requirements (I need GPU Type X, I need CUDA > 1.23, etc.) System will match jobs and slots accordingly.

• Rolling out to experiments over the next several weeks

• In discussions with non-FIFE experiments (CMS, soon ATLAS, smaller Open Science Grid projects) about trying to speak a common language as much as possible in this new area

Access to GPU Resources


Documents

for Experiments The FIFE Project: Distributed Computing€¦ · Going Global with User Jobs • International collaborators often bring additional computing resources to bear •