24
Distributed Analysis To ols Panda & Ganga Tadashi Maeno (Brookhaven National Labora tory)

Distributed Analysis Tools Panda & Ganga

  • Upload
    morwen

  • View
    37

  • Download
    2

Embed Size (px)

DESCRIPTION

Distributed Analysis Tools Panda & Ganga. Tadashi Maeno (Brookhaven National Laboratory). Distributed Analysis on Grid. (script,exe). Grid. A framework to utilize large-scale and geographically distributed resources. Local analysis Limited resources 1~4 CPUs ~1TB of disk. - PowerPoint PPT Presentation

Citation preview

Page 1: Distributed Analysis Tools Panda & Ganga

Distributed Analysis ToolsPanda & Ganga

Tadashi Maeno(Brookhaven National Laboratory)

Page 2: Distributed Analysis Tools Panda & Ganga

2

Distributed Analysis on Grid

(script,exe)

Local analysis Limited resources

1~4 CPUs~1TB of disk

A framework to utilize large-scale and geographically distributed resources

Grid

Page 3: Distributed Analysis Tools Panda & Ganga

3

Distributed Analysis on Grid

Grid

Local analysis

Page 4: Distributed Analysis Tools Panda & Ganga

4

Distributed Analysis on Grid

Run own analysis on distributed resources– Parallelization for fast turnaround

• 1CPU×800hours 800CPUs×1hour

– Unavoidably distributed data• 10 T1 computing center for ATLAS, but

no T1 can host all data

Page 5: Distributed Analysis Tools Panda & Ganga

5

Traditional Procedure of DA

Grid

Local analysis

Gate Keeper = Computing Element

CPUs = Worker Nodes

File upload Authentication

Job execution

Storage Input/Output data

brokerage Site selection

Page 6: Distributed Analysis Tools Panda & Ganga

6

Traditional Procedure of DA

Grid

Local analysis

Gate Keeper = Computing Element

CPUs = Worker Nodes

File upload Authentication

Job execution

Storage Input/Output data

brokerage Site selection

Different among grid-flavors = grid-middleware     dependent

Different among grid-flavors = grid-middleware     dependent local batch-system

condor,LSF,PBS,…local batch-system condor,LSF,PBS,…

local storage for EGEE/OSG dcap,dpm,xrootd,castorremote storage for NDGF

local storage for EGEE/OSG dcap,dpm,xrootd,castorremote storage for NDGF

Page 7: Distributed Analysis Tools Panda & Ganga

7

Three middleware

Not entirely true ! Just for intuitive understandingNot entirely true ! Just for intuitive understanding

e.g., upload a file using different protocol

EGEE Grid = EGEE backend

OSG backend

NDGF backend

Page 8: Distributed Analysis Tools Panda & Ganga

8

Simple Implementation of CommonUser I/F for Various Backends

def upload (file,backendType): if backendType==EGEE:

egeeModule.upload(file) elif backendType==OSG: osgModule.upload(file) elif backendType==NDGF:

ndgfModule.upload(file)

Prepare a plug-in module for each backend Implementation of GANGA to support multiple backe

nds Easily extended for other backends

Page 9: Distributed Analysis Tools Panda & Ganga

9

Simple Implementation of CommonUser I/F for Various Backends

Ultimately users have to understand each backend– e.g., connection failure each backend uses a d

ifferent port check each port 3 backends expertise and support/develo

pment work are 3 times more– Limited manpower

Capability for easy extension is useful in R&D phase but is redundant in production phase

Page 10: Distributed Analysis Tools Panda & Ganga

10

Common I/F using pilot (PANDA System)

EGEE

Panda server

OSG

pilotpilot

Pilot factoryPilot factory

https

https

submit

pull

End-user

analysis job

pilotpilot

job

NDGF

aCTaCT

pilotpilot

arc

Page 11: Distributed Analysis Tools Panda & Ganga

11

EGEE

Panda server

pilotpilot

Pilot factoryPilot factory

https

https

submit

pull

End-user

analysis job

job

NDGF

aCTaCT

pilotpilot

arc

OSGpilotpilot

Interaction with backends is done centrallyInteraction with backends is done centrally

Users access a common server using a single protocolUsers access a common server using a single protocol

Common I/F using pilot (PANDA System)

Page 12: Distributed Analysis Tools Panda & Ganga

12

Operation/Service Model of PANDA End-users are insulated from GRID

– Communicate with the Panda (HTTP) server– Lower threshold especially for physicists

Pilot factory sends pilots using GRID middleware– Only the operator of the scheduler needs to have

enough expertise on GRID Production and Analysis run on the same

infrastructure– Production should suffer from the same problem as

analysis– Once production team (one shift crew) fix the

problem for official production, analysis get cured automatically no additional manpower is needed for analysis

Page 13: Distributed Analysis Tools Panda & Ganga

13

PANDA = Production ANd Distributed Analysis system– Designed for analysis as well as production– Project started Aug 2005, prototype Sep 2005, production Dec 2005– The backend for all ATLAS production jobs– The primary backend for all ATLAS anaysis jobs

A single task queue and pilots– Apache-based Central Server– Pilots retrieve jobs from the server as soon as CPU is available low latency

Highly automated, has an integrated monitoring system, and requires low operation manpower

Integrated with ATLAS Distributed Data Management (DDM) system

Panda

Page 14: Distributed Analysis Tools Panda & Ganga

14

Panda System Overview

EGEE

Panda server

OSG

pilotpilot

Worker Nodes

condor-g

autopyfactoryautopyfactory

https

https

submit

pull

End-user

submit

analysis job

pilotpilot

ProdDB

prod job

job

loggerlogger

bamboobambooLFC

DQ2DQ2

NDGF

aCTaCT

pilotpilot

submitarc

Page 15: Distributed Analysis Tools Panda & Ganga

15

GK maps each job to individual UNIX ID– In traditional model, each user sends job to GK

possible to know who runs a process– In pilot models, pilot factory sends pilots to GK

impossible to distinguish processes using UID. Note that each role is mapped to a different UID and thus it is possible to distinguish role-ed users from end-users

Separation between physical/logical layers is popular– Virtualization (e.g., cloud,LVM,…)– But conflicts with a “policy”

WLCG is going to introduce glexec which changes UID on WN– Each site admin will be able to see who runs a process wit

hout peeking logical layer– File ownership is unrelated to UID since SRM itself sets ow

ner using proxy– Only proxy delegation is required (glexec requires proxy del

egation)

Ownership Issue

Page 16: Distributed Analysis Tools Panda & Ganga

16

Tools to submit or manage analysis jobs on Panda

Five tools– pathena

• Athena jobs– prun

• General jobs (ROOT,python,sh,exe,…)– pbook

• Bookkeeping– psequencer

• Analysis chain (e.g., submit job + download output)

– puserinfo• Access control

panda-client

Page 17: Distributed Analysis Tools Panda & Ganga

17

To submit Athena jobs to PandaA simple command line tool, but contains adv

anced capabilities for more complex needs Provides a consistent interface to users who

are familiar with Athena$ athena jobO.py $ pathena jobO.py

-–inDS inputDatasetName -–outDS outputDatasetName

pathena (1/2)

Page 18: Distributed Analysis Tools Panda & Ganga

18

What pathena does1. Extract job configuration by running Athena wi

th fake application manager2. Collect source/jobO files in local working are

a3. Assign the job to a site where

Athena version is available Input datasets is available CPUs are free

4. Prepare one buildJob to compile source files, and one or many runAthena jobs to run Athena

5. Send them to Panda

pathena (2/2)

Page 19: Distributed Analysis Tools Panda & Ganga

19

What happens when job is submitted (1/2)

Local

sources

Storage

Remote

outputs

outputs

output dataset

buildJob x 1runAthena x N

inputs

inputs

input datasetAutomatically splitAutomatically split

runAthena

runAthena

dq2

pathena

binariestrigger

Single Job =

download

submitcompile

binariesbuildJob

Page 20: Distributed Analysis Tools Panda & Ganga

20

Why buildJob is required?– Platform (OS,CPU-architecture) may be diff

erent between local and remote• Sl5/64bit binaries cannot run on SL4/32bit

– Athena creates some absolute links in InstallArea, i.e., generally not relocatable

– The total time of (buildJob + N x runAthena) is shorter than N x (buildJob+runAthena)• Use CPUs more efficiently

– buildJob can be skipped using an option if you know the step is not required

What happens when job is submitted (2/2)

Page 21: Distributed Analysis Tools Panda & Ganga

21

To submit General jobs to Panda– ROOT (ARA), Python, shell script,exe …

Two-staged Analysis Model of ATLAS– Run Athena on AOD/ESD to produce DPD

pathena– Run ROOT or something on DPD to produce final plots prun

In principle you can do anything, but please avoid careless network operations unless you know well about scalability of those operations– svn co, wget, lcg-cp …– A single job is split to many sub-jobs running in parallel which can easily

break remote servers

prun

Page 22: Distributed Analysis Tools Panda & Ganga

22

Bookkeeping of Panda jobs– Browsing– Kill– Retry

Make local sqlite3 repository to keep personal job information– IMAP like sync-diff mechanism– Not scanning global Panda repository

quick response

Dual user interface– Command-line– Graphical

pbook

Page 23: Distributed Analysis Tools Panda & Ganga

23

Plug-in to access PANDA

def upload (file,backendType): if backendType==EGEE:

egeeModule.upload(file) elif backendType==OSG: osgModule.upload(file) elif backendType==NDGF:

ndgfModule.upload(file) elif backendType==PANDA pandaModule.upload(file)

All ATLAS backends will be consolidated to PANDA Other backends are still maintained for some reason

GangaPanda