Alexei Klimentov Brookhaven National Laboratory September 12,
2013, Varna XXIV International Symposium on Nuclear Electronics and
Computing Extending the ATLAS PanDA Workload Management System For
New Big Data Applications
Slide 2
Alexei Klimentov BNL/PAS Main topics Introduction Large Hadron
Collider at CERN ATLAS experiment ATLAS Computing Model and Big
Data Experiment Computing Challenges PanDA : Workload Management
System for Big Data NEC 2013 9/12/13 2
Slide 3
Alexei Klimentov BNL/PAS Korea and CERN / July 2009 3 Enter a
New Era in Fundamental Science The Large Hadron Collider (LHC), one
of the largest and truly global scientific projects ever built, is
the most exciting turning point in particle physics. Exploration of
a new energy frontier Proton-proton and Heavy Ion collisions at E
CM up to 14 TeV Exploration of a new energy frontier Proton-proton
and Heavy Ion collisions at E CM up to 14 TeV LHC ring: 27 km
circumference TOTEM LHCf MOEDAL CMS ALICE LHCb ATLAS
Slide 4
Alexei Klimentov BNL/PAS Proton-Proton Collisions at the LHC
LHC delivered billions of collision events to the experiments from
proton-proton and proton-lead collisions in the Run 1 period
(2009-2013) collisions every 50 ns = 20 MHz crossing rate 1.6 x 10
11 protons per bunch at L pk ~ 0.8x10 34/ cm 2 /s 35 pp
interactions per crossing pile-up 10 9 pp interactions per second
!!! in each collision 1600 charged particles produced enormous
challenge for the detectors and for data
collection/storage/analysis Raw data rate from LHC detector : 1PB/s
This translates to Petabytes of data recorded world-wide (Grid) The
challenge how to process and analyze the data and produce timely
physics results was substantial, but at the end resulted in a great
success 8/6/13 4 GRID Candidate Higgs decay to four electrons
recorded by ATLAS in 2012.
Slide 5
Alexei Klimentov BNL/PAS Our Task 9/12/13 NEC 20135 We use
experiments to inquire about what reality (nature) does The goal is
to understand in the most general; thats usually also the simplest.
- A. Eddington We intend to fill this gap ATLAS Physics Goals
Explore high energy frontier of particle physics Search for new
physics Higgs boson and its properties Physics beyond Standard
Model SUSY, Dark Matter, extra dimensions, Dark Energy, etc
Precision measurements of Standard Model parameters Reality
Theory
Slide 6
Alexei Klimentov BNL/PAS ATLAS Experiment at CERN 8/6/13 Big
Data Workshop. Knoxville, TN6 A Thoroidal LHC ApparatuS is one of
the six particle detectors experiments at Large Hadron Collider
(LHC) at CERN One of two multi-purpose detectors The project
involves more than 3000 scientists and engineers from 38 countries
ATLAS has 44 meters long and 25 meters in diameter, weighs about
7,000 tons. It is about half as big as the Notre Dame Cathedral in
Paris and weighs the same as the Eiffel Tower or a hundred 747 jets
3000 scientists 174 Universities and Labs From 38 countries More
than 1200 students ATLAS Collaboration 6 Floors Bldg.40 CERN
Slide 7
Alexei Klimentov BNL/PAS ATLAS. Big Data Experiment 8/6/13 Big
Data Workshop7
Slide 8
Alexei Klimentov BNL/PAS Some numbers from ATLAS 9/12/13 NEC
20138 data Rate of events streaming out from High-Level Trigger
farm ~400 Hz each event has a size of the order of 1.5MB about 10 7
events in total per day will have roughly 170 physics days per year
thus about 10 9 evts/year, a few Pbyte prompt processing Reco time
per event on std. CPU: < 30 sec (on CERN batch node) increases
with pileup (more combinatorics in the tracking) simulating a few
billions of events are mostly done at computing centers outside
CERN Simulation very CPU intensive ~4 million lines of code
(reconstruction and simulation) ~1000 software developers on
ATLAS
Slide 9
Alexei Klimentov BNL/PAS Reduce the data volume in stages
8/6/13 NEC 20139 Higgs Selection using the Trigger Level 1: Not all
information available, coarse granularity Level 2: Reconstruct
events Improved ability to reject events Level 3: High quality
reconstruction algorithms, using information from all detectors 400
Hz
Slide 10
Alexei Klimentov BNL/PAS ATLAS High-Level Trigger (Part)
6/27/2013Big Data in High Energy Physics10 Total of 15,000 cores in
1,500 machines Disk Buffer Grid We really do throw away 99.9999% of
LHC data before writing it to persistent storage Reduce data volume
in stages Select ONLY interesting events Initial data rate (50 ns)
: 40 000 000 events/s Selected and stored 400 events/s
Slide 11
Alexei Klimentov BNL/PAS Two main types of physics analysis at
LHC Searching for new particles Making precision measurements
Searches statistically limited More data is the way of improving
the search If dont see anything new set limits on what you have
excluded Precision measurements Precision often limited by the
systematic uncertainties Precision measurements of Standard Model
parameters allows important tests of the consistency of the theory
11 Data Analysis Chain Have to collect data from many channels on
many sub- detectors (millions) Decide to read out everything or
throw event away (Trigger) Build the event (put info together)
Store the data Reconstruct data Analyze them Start with the output
of reconstruction Apply event selection based on reconstructed
objects quantities Estimate efficiency of selection Estimate
background after selection Make plots do the same with a simulation
correct data for detector effects Make final plots Compare data and
theory ~TB ~K B
Slide 12
Alexei Klimentov BNL/PAS 9/12/2013 NEC 201312 Like looking for
a single drop of water from the Geneve Jet dEau over 2+ days
Slide 13
Alexei Klimentov BNL/PAS Starting from this event We are
looking for this signature Selectivity: 1 in 10 13 Like looking for
1 person in a thousand world populations Or for a needle in 20
million haystacks! The ATLAS Data Challenge 800,000,000 proton-
proton interactions per second 0.0002 Higgs per second ~150,000,000
electronic channels ATLAS Data Challenge 13
Slide 14
Alexei Klimentov BNL/PAS Data Volume and Data Storage 1 event
size 1.5 MByte x Rate 400 Hz Taking into account LHC duty cycle
Order of 3 PBytes per year per experiment ATLAS total data storage
130+ PetaBytes distributed between O(100) computing centers New
physics is rare and interesting events are like single drop from
the Jet dEau 9/12/2013 NEC 201314
Slide 15
Click to edit Master title style Big Data in the arts and
humanities Letter of Benjamin Franklin to Lord Kames, April 11,
1767. Franklin warned British official what would happen if the
English kept trying to control the colonists by imposing taxes,
like the Stamp Act. He warned that they would revolt. The
political, scientific and literary papers of Franklin comprise
approx. 40 volumes containing approx. 100,000 documents.
Slide 16
Click to edit Master title styleBig Data in the arts and
humanities George W. Bush Presidential Library: 200 million e-mails
4 million photographs A.Prescott slide
Slide 17
Alexei Klimentov BNL/PAS 9/12/13 NEC 201317 Business e-mails
sent per year 3000 PBytes Content Uploaded to Facebook each year.
182 PBytes Google search index 98 PBytes Health Records 30 PBytes
Youtube 15 PBytes LHC Annual 15 PBytes Climate Library of congress
Nasdaq US census http://www.wired.com/magazine/2013/04/bigdata
ATLAS Annual Data Volume 30 PBytes ATLAS Managed Data Volume 130
PBytes Big Data Has Arrived at an Almost Unimaginable Scale
Slide 18
Alexei Klimentov BNL/PAS ATLAS Computing Challenges 9/12/2013
NEC 201318 A lot of data in a highly distributed environment.
Petabytes of data to be treated and analyzed ATLAS Detector
generates about 1PB of raw data per second most filtered out in
real time by the trigger system Interesting events are recorded for
further reconstruction and analysis As of 2013 ATLAS manages ~130
PB of data, distributed world-wide to O(100) computing centers and
analyzed by O(1000) physicists Expected rate of data influx into
ATLAS Grid ~40 PB of data per year in 2015 Very large international
collaboration 174 Institutes and Universities from 38 countries
Thousands of physicists analyze the data ATLAS uses grid computing
paradigm to organize distributed resources A few years ago ATLAS
started Cloud Computing RnD project to explore virtualization and
clouds Experience with different cloud platforms : Commercial
(Amazon, Google), Academic, National Now we are evaluating how
high-performance and super-computers can be used for data
processing and analysis
Slide 19
Alexei Klimentov BNL/PAS ATLAS Computing Model 12/9/2013 Alexei
Klimentov19 The LHC experiments rely on distributed computing
resources World LHC Computing Grid a global solution, based on Grid
technologies/middleware Tiered structure Tier0 (CERN), 11 Tier1s,
140 Tier2s Capacity 350,000 CPU cores 200 PB of disk space 200 PB
of tape space In ATLAS sites grouped into clouds for organizational
reasons Possible communications Optical private network T0:T1 T1:T1
National network T1-T2 Restricted communications Inter-cloud T1:T2
Inter-cloud T2:T2 K.De slide
Slide 20
Click to edit Master title styleATLAS Data Volume 131.5 PBytes
Derived data Raw data Simulated data ATLAS data volume on the Grid
sites (multiple replicas) TBytes Formats End Run 1 Data
Distribution patterns are discipline dependent. (ATLAS RAW data
volume ~3 PB/year, ATLAS Data Volume on Grid sites 131.5
PBytes)
Slide 21
Click to edit Master title styleHow does 3 PB become 130+ PB
1.Duplicate the raw data (not such a bad idea) 2.Add a similar
volume of simulated data (essential) 3.Make a rich set of derived
data products (some of them larger than the raw data) 4.Re-create
the derived data products whenever the software has been
significantly improved (several times a year) and keep the old
versions for quite a while 5.Place up to 15 copies of the derived
data around the world so that when you send jobs to the data you
can send it almost anywhere 6.Do the math! now far fewer copies due
to demand-driven temporary replication but much more reliance on
wide-area networks
Slide 22
Alexei Klimentov BNL/PAS Processing the Experiment Big Data The
simplest solution in processing LHC data is using data affinity for
the jobs Data is staged to the site where the compute resources are
located and data access by analysis code from local, site-resident
storage However In distributed computing environment we dont have
enough disk space to host all our data at every Grid site Thus we
distribute (pre-place) our data across our sites The popularity of
data sets is difficult predict in advance Thus computing capacity
at a site might not match the demand for certain data sets
Different approaches are being implemented Dynamic or/and on demand
data replication Dynamic : if certain data is popular over Grid
(i.e. processed or/and analyzed often) make additional copies on
other Grid sites (up to 15 replicas) On-demand : User can request
local or additional data copy Remote access The popular data can be
accessed remotely Both approaches have the underlying scenario that
puts the WAN between the data and the executing analysis code
9/12/13 NEC 201322
Slide 23
Alexei Klimentov BNL/PAS PanDA Production and Data Analysis
System ATLAS computational resources are managed by PanDA Workload
Management System (WMS) PanDA project was started in fall of 2005
by BNL and UTA groups Production and Data Analysis system An
automated yet flexible workload management system which can
optimally make distributed resources accessible to all users
Adopted as the ATLAS wide WMS in 2008 (first LHC data in 2009) for
all computing applications. Adopted by AMS in 2012, in
pre-production by CMS. Through PanDA, physicists see a single
computing facility that is used to run all data processing for the
experiment, even though data centers are physically scattered all
over the world. PanDA is flexible Insulates physicists from
hardware, middleware and complexities of underlying systems In
adapting to evolving hardware and network configuration Major
groups of PanDA jobs Central computing tasks are automatically
scheduled and executed Physics groups production tasks, carried out
by group of physicists of varying size are also processed by PanDA
User analysis tasks Now successfully manages O(10 2 ) sites, O(10 5
) cores, O(10 8 ) jobs per year, O(10 3 ) users 9/12/2013 NEC
201323
Slide 24
Alexei Klimentov BNL/PAS PanDA Philosophy PanDA Workload
Management System design goals Deliver transparency of data
processing in a distributed computing environment Achieve high
level of automation to reduce operational effort Flexibility in
adapting to evolving hardware, computing technologies and network
configurations Scalable to the experiment requirements Support
diverse and changing middleware Insulate user from hardware,
middleware, and all other complexities of the underlying system
Unified system for central Monte-Carlo production and user data
analysis Support custom workflow of individual physicists
Incremental and adaptive software development 9/12/13 NEC
201324
Slide 25
Click to edit Master title style PanDA. ATLAS Workload
Management System EGEE/EGI PanDA server OSG pilot Worker Nodes
condor-g pilot scheduler (autopyfactory) pilot scheduler
(autopyfactory) https submit pull End-user analysis job pilot
task/job repository (Production DB) production job job Logging
System Local Replica Catalog (LFC) Data Management System (DQ2)
NDGF ARC Interface (aCT) ARC Interface (aCT) pilot arc Production
managers define https Local Replica Catalog (LFC) Local Replica
Catalog (LFC) submitter (bamboo) https 25
Slide 26
Click to edit Master title styleATLAS Distributed Computing.
Number of concurrently running PanDA jobs (daily average). Aug 2012
- Aug 2013 150k -> Jobs Includes central production and data
(re)processing, user and group analysis on WLCG Grid Running on
~100,000 cores worldwide, consuming at peak 0.2 petaflops Available
resources fully used/stressed MC Simulation Analysis
Slide 27
Click to edit Master title styleATLAS Distributed Computing.
Number of completed PanDA jobs (daily average. Max 1.7M jobs/day).
Aug 2012 - Aug 2013 1M -> Jobs Analysis Jobs types
Slide 28
Click to edit Master title styleATLAS Distributed Computing.
Data Transfer Volume in TB (weekly average). Aug 2012 - Aug 2013 6
PBytes -> TBytes CERN to BNL data transfer time. An average 3.7h
to export data from CERN
Slide 29
Alexei Klimentov BNL/PAS PanDAs Success 9/12/13 NEC 201329
PanDA was able to cope with increasing LHC luminosity and ATLAS
data taking rate Adopted to evolution in ATLAS computing model Two
leading experiments in HEP and astro-particle physics (CMS and AMS)
has chosen PanDA as workload management system for data processing
and analysis. ALICE is interested in PanDA evaluation for Grid MC
Production and Ladership Computing Facilities. PanDA was chosen as
a core component of Common Analysis Framework by CERN-IT/ATLAS/CMS
project PanDA was cited in the document titled Fact sheet: Big Data
across the Federal Government prepared by the Executive Office of
the President of the United States as an example of successful
technology already in place at the time of the Big Data Research
and Development Initiative announcement
Slide 30
Alexei Klimentov BNL/PAS Evolving PanDA for Advanced Scientific
Computing 9/12/13 NEC 201330 Proposal titled Next Generation
Workload Management and Analysis System for BigData Big PanDA was
submitted to ASCR DoE in April 2012. DoE ASCR and HEP funded
project started in Sep 2012. Generalization of PanDA as meta
application, providing location transparency of processing and data
management, for HEP and other data-intensive sciences, and a wider
exascale community. Other efforts PanDA : US ATLAS funded project
Networking : Advance Network Services There are three dimensions to
evolution of PanDA Making PanDA available beyond ATLAS and High
Energy Physics Extending beyond Grid (Leadership Computing
Facilities, Clouds, University clusters) Integration of network as
a resource in workload management
Slide 31
Alexei Klimentov BNL/PAS Big PanDA work plan Factorizing the
code Factorizing the core components of PanDA to enable adoption by
a wide range of exascale scientific communities Extending the scope
Evolving PanDA to support extreme scale computing clouds and
Leadership Computing Facilities Leveraging intelligent networks
Integrating network services and real-time data access to the PanDA
workflow 3 years plan Year 1. Setting the collaboration, define
algorithms and metrics Year 2. Prototyping and implementation Year
3. Production and operations 9/12/2013 NEC 201331
Slide 32
Alexei Klimentov BNL/PAS Big PanDA. Factorizing the Core
Evolving PanDA pilot Until recently the pilot has been ATLAS
specific, with lots of code only relevant for ATLAS To meet the
needs of the Common Analysis Framework project, the pilot is being
refactored Experiments as plug-ins Introducing new experiment
specific classes, enabling better organization of the code E.g.
containing methods for how a job should be setup, metadata and site
information handling etc, that is unique to each experiment CMS
experiment classes have been implemented Changes are being
introduced gradually, to avoid affecting current production PanDA
instance @Amazon Elastic Compute Cloud (EC2) Port PanDA database
back to mySQL VO independent It will be used as a test-bed for
non-LHC experiments PanDA Instance with all functionalities is
installed and running at EC2. Database migration from Oracle to
MySQL is finished. The instance is VO independent. LSST MC
production is the first use-case for the new instance Next step
will be refactoring PanDA monitoring package 9/12/13 NEC
201332
Slide 33
Alexei Klimentov BNL/PAS Big PanDA. Extending the scope Why we
are looking for opportunistic resources ? More demanding LHC
environment after 2014 Higher energy, more complex collisions We
plan to record, process and analyze more data Physics motivated
(Higgs coupling measurement, Physics beyond Standard Model SUSY,
Dark Matter, extra dimensions, Dark Energy, etc) The demands on
computing resources to accommodate the Run2 physics needs increase
HEP now risks to compromise physics because of lack of computing
resources $. Has not been true for ~20 years Several venues to
explore in the next years Optimizing/changing our workflows, both
on analysis and on the Grid High-Performance Computing Leadership
Computing Facilities Research and Commercial Clouds ATLAS@home
Common characteristic to the opportunistic resources: we have to be
agile in how we use them Quick onto them (software, data and
workloads) when they become available Quick off them when theyre
about to disappear Robust against their disappearing under us with
no notice Use them until they disappear dont allow holes with
unused cycles, fill them with fine grained workloads $ I.Bird (WLCG
Leader) presentation at HPC and super-computing workshop for Future
Science Applications (BNL, Jun 2013) 9/12/13 NEC 201333
Slide 34
Alexei Klimentov BNL/PAS 9/12/13 NEC 201334 Spikes in demand
for computational resources Can significantly exceed available
ATLAS Grid resources Lack of resources slows down pace of discovery
1M
Slide 35
Alexei Klimentov BNL/PAS Compute Engine (GCE) preview project
Google allocated additional resources for ATLAS for free ~5M cpu
hours, 4000 cores for about 2 month, (original preview allocation
1k cores) Resources are organized as HTCondor based PanDA queue
Centos 6 based custom built images, with SL5 compatibility
libraries to run ATLAS software Condor head node, proxies are at
BNL Output exported to BNL SE Work on capturing the GCE setup in
Puppet Transparent inclusion of cloud resources into ATLAS Grid The
idea was to test long term stability while running a cloud cluster
similar in size to Tier 2 site in ATLAS Intended for CPU intensive
Monte-Carlo simulation workloads Planned as a production type of
run. Delivered to ATLAS as a resource and not as an R&D
platform. We also tested high performance PROOF based analysis
cluster 9/12/13 NEC 201335 Big PanDA. Extending the scope
Slide 36
Alexei Klimentov BNL/PAS Running PanDA on Google Compute Engine
We ran for about 8 weeks (2 weeks were planned for scaling up) Very
stable running on the Cloud side. GCE was rock solid. Most problems
that we had were on the ATLAS side. We ran computationally
intensive jobs Physics event generators, Fast detector simulation,
Full detector simulation Completed 458,000 jobs, generated and
processed about 214 M events 5/29/2013 Alexei Klimentov36 Failed
and Finished Jobs reached throughput of 15K jobs per day most of
job failures occurred during start up and scale up phase
Slide 37
Alexei Klimentov BNL/PAS Big PanDA for Leadership Computing
Facilities Expanding PanDA from Grid to Leadership Class Facilities
(LCF) will require significant changes in our system Each LCF is
unique Unique architecture and hardware Specialized OS, weak worker
nodes, limited memory per WN Code cross-compilation is typically
required Unique job submission systems Unique security environment
Pilot submission to a worker node is typically not feasible
Pilot/agent per supercomputer or queue model Tests on BlueGene at
BNL and ANL. Geant4 port to BG/P PanDA project at Oak-Ridge
National Laboratory LCF Titan 37
Slide 38
Alexei Klimentov BNL/PAS Leadership Computing Facilities. Titan
8/6/13 Big Data Workshop38 Slide from Ken Read
Slide 39
Alexei Klimentov BNL/PAS Big PanDA project on ORNL LCF Get
experience with all relevant aspects of the platform and workload
job submission mechanism job output handling local storage system
details outside transfers details security environment adjust
monitoring model Develop appropriate pilot/agent model for Titan
Collaboration between ANL, BNL, ORNL, SLAC, UTA, UTK
Cross-disciplinary project - HEP, Nuclear Physics, High-
Performance Computing 39
Slide 40
Alexei Klimentov BNL/PAS ATLAS PanDA Coming to Oak-Ridge
Leadership Computing Facilities Big PanDA/ORNL Common Project.
Slide 41
Alexei Klimentov BNL/PAS 9/4/13 NEC 201341 Big PanDA at ORNL
PanDA & multicore jobs scenario PanDA on Oak-Ridge Leadership
Computing Facilities PanDA deployment at OLCF was discussed and
agreed, including AIMS project component Cyber-Security issues were
discussed both for the near and longer term. Discussion with OLCF
Operations ROOT based analysis is tested Payloads for TITAN CE
(followed by discussion in ATLAS) D.Oleynik
Slide 42
Alexei Klimentov BNL/PAS Adding Network Awareness to PanDA LHC
Computing model for a decade was based on MONARC model Assumes poor
networking Connections are seen as not sufficient or reliable Data
needs to be preplaced. Data comes from specific places Hierarchy of
functionality and capability Grid sites organization in clouds in
ATLAS Sites have specific functions Nothing can happen utilizing
remote resources on the time of running job Canonical HEP strategy
: Jobs go to data Data are partitioned between sites Some sites are
more important (get more important data) than others Planned
replicas A dataset (collection of files produced under the same
conditions and the same SW) is a unit of replication Data and
replica catalogs are needed to broker jobs Analysis job requires
data from several sites triggers data replication and consolidation
at one site or job splitting on several jobs running on all sites A
data analysis job must wait for all its data to be present at the
site The situation can easily degrade into a complex n-to-m
matching problem There was no need to consider network as a
resource in WMS in static data distribution scenario New networking
capabilities and initiatives in the last 2 years (like LHCONE)
Extensive standardized monitoring from network performance
monitoring (perfSONAR) Traffic engineering capabilities Rerouting
of high impact flows onto separate infrastructure Intelligent
networking Virtual Network On Demand Dynamic circuits and dramatic
changes in computing models From strict hierarchy of connections
becomes more of a mesh Data access over wide area no division in
functionality between sites We would like to benefit from new
networking capabilities and to integrate networking services with
PanDA. We start to consider network as a resource on similar way as
for CPUs and data storage 9/12/2013 NEC 201342
Slide 43
Alexei Klimentov BNL/PAS Network as a resource Optimal site
selection to run PanDA jobs Take network capability into account in
jobs assigning and task brokerage Assigned -> Activated jobs
workflow Number of assigned jobs depend on number of running jobs
can we use network status to adjust rate up/down? Jobs are
reassigned if transfer times out (fixed duration) can knowledge of
network status help reduce the timeout? Task brokerage Free disk
space in Tier1 Availability of input dataset (a set of files) The
amount of CPU resources = the number of running jobs in the cloud
(static information system is not used) Downtime at Tier1 Already
queued tasks with equal or higher priorities High priority task can
jump over low priority tasks Can knowledge of network help Can we
consider availability of network as a resource, like we consider
storage and CPU resources? What kind of information is useful? Can
we consider similar (highlighted )factors for networking? 9/12/2013
NEC 201343
Slide 44
Alexei Klimentov BNL/PAS Intelligent Network Services and PanDA
Quick re-run a prior workflow (bug found in reconstruction algo)
Site A has enough job slots but no input data Input data are
distributed between sites B,C and D, but sites have a backlog of
jobs Jobs may be sent to site A and at the same time virtual
circuits to connect sites B,C,D to site will be built. VNOD will
make sure that such virtual circuits have sufficient bandwidth
reservation. Or data can be accessed remotely (if connectivity
between sites is reliable and this information is available from
perfSONAR) In canonical approach data should be replicated to site
A HEP computing is often described as an example of parallel
workflow. It is correct on the scale of worker node (WN). WN doesnt
communicate with other WN during job execution. But the large scale
global workflow is highly interconnected, because each job
typically doesnt produce an end result in itself. Often data
produced by a job serve as input to a next job in the workflow.
PanDA manages workflow extremely well (1M jobs/day in ATLAS). The
new intelligent services will allow to dynamically create the
needed data transport channels on demand. 9/12/2013 NEC 201344
Slide 45
Alexei Klimentov BNL/PAS Intelligent Network Services and PanDA
In BigPanDA we will use information on how much bandwidth is
available and can be reserved before data movement will be
initiated In Task Definition user will specify data volume to be
transferred and deadline by which task should be completed. The
calculations of (i) how much bandwidth to reserve, (ii) when to
reserve, and (iii) along what path to reserve will be carried out
by Virtual Network On Demand (VNOD). 9/12/2013 NEC 201345 PanDA
A.Petrosyan
Slide 46
Alexei Klimentov BNL/PAS Conclusions The ATLAS experiment
Distributed Computing and Software performance was a great success
in LHC Run 1 The challenge how to process and analyze the data and
produce timely physics results was substantial, but at the end
resulted in a great success ASCR gave us a great opportunity to
evolve PanDA beyond ATLAS and HEP and to start BigPanDA project
Project team was set up The work on extending PanDA to LCF has
started Large scale PanDA deployments on commercial clouds are
already producing valuable results Strong interest in the project
from several experiments (disciplines) and scientific centers to
have a joined project. 46
Slide 47
Alexei Klimentov BNL/PAS Acknowledgements Many thanks to
J.Boyd, K.De, B.Kersevan, T.Maeno, R.Mount, P.Nilsson, D.Oleynik,
S.Panitkin, A.Petrosyan, A.Prescott, K.Read, T.Wenaus for slides
and materials used in this talk NEC 2013 9/12/13 47