Upload
marylou-simmons
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
LHC Scale Physics in 2008: Grids, Networks and Petabytes
Shawn McKee ([email protected])May 18th, 2005
Pan-American Advanced Studies Institute (PASI)Mendoza, Argentina
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 2
Acknowledgements
• Much of this talk was constructed from various sources. I would like acknowledge:– Rob Gardner (U Chicago)
– Harvey Newman (Caltech)
– Paul Avery (U Florida)
– Ian Foster (U Chicago/ANL)
– Alan Wilson (Michigan)
–The Globus Team
– The ATLAS Collaboration
– Trillium
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 3
Outline• Large Datasets in High Energy Physics
– Overview of High Energy Physics and the LHC– The ATLAS Experiment’s Data Model
• Managing LHC Scale Data– Grids and Networks Computing Model– Current Planning, Tools, Middleware and Projects
• LHC Scale Physics in 2008• Grids and Networks at Michigan• Virtual Data • The Future of Data Intensive Science
Large Datasets in High Energy Physics
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 5
Introduction to High-Energy Physics
• Before I can talk in detail about large datasets I want to provide a quick context for you to understand where all this data comes from.
• High Energy physics explores the very small constituents of nature by colliding “high energy” particles and reconstructing the zoo of particles which result.
• One of the most intriguing issues in High Energy physics we are trying to address is the origin of mass…
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 6
Physics with ATLAS: The Higgs Particle
• The Riddle of Mass
• One of the main goals of the ATLAS program is to discover and study the Higgs particle. The Higgs particle is of critical importance in particle theories and is directly related to the concept of particle mass and therefore to all masses.
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 7
High-Energy: From an Electron-Volt to Trillions of Electron-Volts
• Energies are often expressed in units of "electron-volts". An electron-volt (eV) is the energy acquired by a electron (or any particle with the same charge) when it is accelerated by a potential difference of 1 volt.
• Typical energies involved in atomic processes (processes such as chemical reactions or the emission of light) are of order a few eV. That is why batteries typically produce about 1 volt, and have to be connected in series to get much larger potentials.
• Energies in nuclear processes (like nuclear fission or radioactive decay) are typically of order one million electron-volts (1 MeV).
• The highest energy accelerator now operating (at Fermilab) accelerates protons to 1 million million electron volts (1 TeV =1012 eV).
• The Large Hadron Collider (LHC) at CERN will accelerate each of two counter-rotating beams of protons to 7 TeV per proton.
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 8
What is an Event?
• In the ATLAS detector there will be about a billion collision events per second, a data rate equivalent to twenty simultaneous telephone conversations by every person on the earth.
ATLAS will measure the collisions of 7 TeV protons.
Each time protons collide or single particles decay is called an “event”
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 9
How Many Collisions?• If two bunches of protons meet head on, the number of collisions from
zero upwards. How often are there actually collisions? – For a fixed bunch size, this depends on how many protons there are in
each bunch, and how large each proton is. • A proton can be roughly thought of as being about 10-15 meter in
radius. If you had bunches 10-6 meters in radius, and only, say, 10 protons in each bunch, the chance of even one proton-proton collision when two bunches met would be extremely small.
• If each bunch had a billion-billion (1018) protons so that its entire cross section were just filled with protons, every proton from one bunch would collide with one from the other bunch, and you would have a billion-billion collisions per bunch crossing.
• The LHC situation is in between these two extremes, a few collisions (up to 20) per bunch crossing, which requires about a billion protons in each bunch.
As you will see, this leads to a lot of data to sift through.
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 10
First Beams: April 2007Physics Runs: from Summer 2007
TOTEM pp, general purpose; HI
pp, general purpose; HI
LHCb: B-physics
ALICE : HI
27 km Tunnel in Switzerland & France
CMS
Atlas
The Large Hadron Collider (LHC)CERN, Geneva: 2007 Start
pp, general purpose; HI
pp, general purpose; HI
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 11
Data Comparison: LHC vs Prior Exp.
105105
104104
103103
102102
Level 1 Rate (Hz)
High Level-1 Trigger(1 MHz)High Level-1 Trigger(1 MHz)
High No. ChannelsHigh Bandwidth(500 Gbit/s)
High No. ChannelsHigh Bandwidth(500 Gbit/s)
High Data Archive(PetaBytes)High Data Archive(PetaBytes)
LHCBLHCB
KLOEKLOE
HERA-BHERA-B
TeV IITeV II
CDF/D0CDF/D0
H1ZEUS
H1ZEUS
UA1UA1
LEPLEP
NA49NA49
ALICEALICE
Event Size (bytes)Event Size (bytes)
104104 105105 106106
ATLASCMSATLASCMS
106106
107107
Hans Hoffman
DOE/NSF
Review, Nov 00
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 12
The ATLAS Experiment
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 13
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 14
ATLAS• A Torroidal LHC ApparatuS
• Collaboration– 150 institutes– 1850 physicists
• Detector– Inner tracker– Calorimeter– Magnet– Muon
• United States ATLAS– 29 universities, 3 national labs– 20% of ATLAS
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 15
Data Flow from ATLAS
level 1 - special hardware
40 MHz (~PB/sec)level 2 - embedded processorslevel 3 - PCs
75 KHz (75 GB/sec)5 KHz (5 GB/sec)200 Hz(100-400 MB/sec)data recording &
offline analysis
ATLAS: 10 PB/y(simulated + raw+sum)
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 16
LHC Timeline for Service Challenges
SC2SC3
LHC Service OperationFull physics run
2005 20072006 2008
First physicsFirst beams
cosmics
June05 - Technical Design Report
Sep05 - SC3 Service Phase
May06 –SC4 Service Phase starts
Sep06 – Initial LHC Service in stable operation
SC4
Apr07 – LHC Service commissioned
Apr05 – SC2 Complete
Jul05 – SC3 Throughput Test
Apr06 – SC4 Throughput Test
Dec05 – Tier-1 Network operational
preparationsetupservice
SC2SC2SC3SC3
LHC Service OperationLHC Service OperationFull physics run
2005 20072006 2008
First physicsFirst beams
cosmicsFull physics run
2005 20072006 20082005 20072006 2008
First physicsFirst beams
cosmics
June05 - Technical Design Report
Sep05 - SC3 Service Phase
May06 –SC4 Service Phase starts
Sep06 – Initial LHC Service in stable operation
SC4SC4
Apr07 – LHC Service commissioned
Apr05 – SC2 Complete
Jul05 – SC3 Throughput Test
Apr06 – SC4 Throughput Test
Dec05 – Tier-1 Network operational
preparationsetupservice
preparationsetupservice
We are here … not much time to get things ready!
Managing LHC Scale Data
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 18
The Data Challenge for LHC
• There is a very real challenge to managing 10’s of Petabytes of data yearly for a globally distributed collaboration of 2000 physicists!
• While much of the interesting data we seek is small in volume we must understand and sort through a huge volume of relatively uninteresting “events” to discover new physics.
• The primary (only!) plan for LHC is to utilize Grid Middleware and high performance networks to harness the complete global resources of our collaborations to manage this data analysis challenge
Managing LHC Scale Data
Grids and Networks Computing Model
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 20
The Problem
Petabytes…
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 21
The Solution
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 22
What is “The Grid”?
• There are many answers and interpretations
• The term was originally coined in the mid-1990’s (in analogy with the power grid) and can be described thusly: “The grid provides flexible, secure, coordinated
resource sharing among dynamic collections of individuals, institutions and resources (virtual organizations:VOs)”
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 23
Grid Perspectives
• Users Viewpoint: – A virtual computer which minimizes time to
completion for my application while transparently managing access to inputs and resources
• Programmers Viewpoint: – A toolkit of applications and API’s which provide
transparent access to distributed resources
• Administrators Viewpoint: – An environment to monitor, manage and secure access
to geographically distributed computers, storage and networks.
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 24
Tier 1
Tier2 Center
Online SystemOffline Farm,
CERN Computer Ctr ~25 TIPS
BNL CenterFrance ItalyUK
InstituteInstituteInstituteInstitute ~0.25TIPS
Workstations
~100-400 MBytes/sec
100 - 10000
Mbits/sec
Physicists work on analysis “channels”
Each institute has ~10 physicists working on one or more channels
Physics data cache
~PByte/sec
10-40 Gbits/sec
Tier2 CenterTier2 CenterTier2 Center
~10+ Gbps
Tier 0 +1
Tier 3
Tier 4
Tier2 Center Tier 2
CERN/Outside Resource Ratio ~1:4Tier0/( Tier1)/( Tier2) ~1:2:2
Data Grids for High Energy Physics
ATLAS version from Harvey Newman’s original
Managing LHC Scale Data
Current Planning, Tools, Middleware and Testbeds
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 26
Grids and Networks: Why Now?• Moore’s law improvements in computing
produce highly functional end systems
• The Internet and burgeoning wired and wireless provide ~universal connectivity
• Changing modes of working and problem solving emphasize teamwork, computation
• Network exponentials produce dramatic changes in geometry and geography
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 27
Living in an Exponential World(1) Computing & Sensors
Moore’s Law: transistor count doubles each ~18 months
Magnetohydro-dynamics
star formation
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 28
Living in an Exponential World:(2) Storage
• Storage density doubles every ~12 months• This led to a dramatic growth in HEP online data
(1 petabyte = 1000 terabyte = 1,000,000 gigabyte)– 2000 ~0.5 petabyte
– 2005 ~10 petabytes
– 2010 ~100 petabytes
– 2015 ~1000 petabytes
• Its transforming entire disciplines in physical and, increasingly, biological sciences; humanities next?
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 29
Network Exponentials• Network vs. computer performance
– Computer speed doubles every 18 months– Network speed doubles every 9 months– Difference = order of magnitude per 5 years
• 1986 to 2000– Computers: x 500– Networks: x 340,000
• 2001 to 2010– Computers: x 60– Networks: x 4000
Moore’s Law vs. storage improvements vs. optical improvements. Graph from Scientific American (Jan-2001) by Cleo Vilett, source Vined Khoslan, Kleiner, Caufield and Perkins.
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 30
The Network
• As can be seen in the previous transparency, it can be argued it is the evolution of the network which has been the primary motivator for the Grid.
• Ubiquitous, dependable worldwide networks have opened up the possibility of tying together geographically distributed resources
• The success of the WWW for sharing information has spawned a push for a system to share resources
• The network has become the “virtual bus” of a virtual computer.
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 31
What Is Needed for LHC-HEP?• We require a number of high level capabilities to do High-
Energy Physics:– Data Processing: All data needs to be reconstructed, first into
fundamental components like tracks and energy deposition and then into “physics” objects like electrons, muons, hadrons, neutrinos, etc.
• Raw -> Reconstructed ->Summarized • Simulation, same path. Critical to understanding our detectors and the
underlying physics.– Data Discovery: We must be able to locate events of interest– Data Movement: We must be able to move discovered data as needed
for analysis or reprocessing– Data Analysis: We must be able to apply our analysis to the data to
determine if – Collaborative Tools: Vital to maintain our global collaborations– Policy and Resource Management: Allow resource owners to specify
conditions under which they will share and allow them to manage those resources as they evolve
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 32
Monitoring Example on OSG-ITB
Collaborative Tools Example: EVO
Managing LHC Scale Data
HEP Related Grid/Network Projects
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 35
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 36
The Evolution of Data Movement
• The recent history of data movement capabilities exemplifies the evolution of network capacity.
• NSFNet started with a 56Kbit modem link as the US network backbone
• Current networks are so fast that end systems are only able to fully drive them when storage clusters are used at each end
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 37
NSFNET 56 Kb/s Site Architecture
VAX
Fuzzball
1024MB 4 MB/s 1 MB/s .007 MB/s
256 s (4 min) 1024 s (17 min) 150,000 s (41 hrs)
Across the room Across the country
Bandwidth in terms of burst data transfer and user wait time.
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 38
OC-48 Cloud
0.5 GB/s 78 MB/s
2000 s (33 min) 13k s (3.6h)
2002 Cluster-WAN Architecture
1 TB
n x GbE (small n)
OC-12
Across the room Across the country
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 39
Distributed Terascale Cluster
Interconnect
Big Fast Interconnect
10 TB 5 GB/s* (Wire speed limit…not yet achieved)
2000 s (33 min)
10 TB
n x GbE (large n)
OC-192
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 40
UltraLight Goal (Near Future)
• A more modest goal in terms of bandwidth achieved is being targeted by the UltraLight collaboration.
• Build, tune and deploy moderately priced servers capable of delivering 1 GB/s between 2 such servers over the WAN
• Provides the ability to utilize the full capability of lambda’s, as available, without requiring 10-100’s of nodes at each end.– Easier to manage, coordinate and deploy a smaller number of
performant servers than a much larger number of less capable ones
• Easier to scale-up as needed to match the available bandwidth
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 41
• UltraLight is a program to explore the integration of cutting-edge network technology with the grid computing and data infrastructure of HEP/Astronomy
• The program intends to explore network configurations from common shared infrastructure (current IP networks) thru dedicated optical paths point-to-point.
• A critical aspect of UltraLight is its integration with two driving application domains in support of their national and international eScience collaborations: LHC-HEP and eVLBI-Astronomy
• The Collaboration includes:– Caltech– Florida Int. Univ.– MIT – Univ. of Florida– Univ. of Michigan
What is UltraLight?
― UC Riverside― BNL― FNAL― SLAC― UCAID/Internet2
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 42
UltraLight Network: PHASE I
• Implementation via “sharing” with HOPI/NLR
• MIT not yet “optically” coupled
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 43
UltraLight Network: PHASE III
• Move into production – Terabyte datasets in 10 minutes
• Optical switching fully enabled amongst primary sites
• Integrated international infrastructure
By 2008
LHC Scale Physics in 2008
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 45
B
S
ATLAS Discovery Potential for SM Higgs Boson
• Good sensitivity over the full mass range from ~100
GeV to ~ 1 TeV
• For most of the mass range at least two channels available
• Detector performance is crucial: b-tag, leptons, , E resolution, / jet separation, ...
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 46
ATLAS
eeZZH *
H
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 47
Data IntensiveComputing and Grids
• The term “Data Grid” is often used– Unfortunate as it implies a distinct infrastructure, which
it isn’t; but easy to say
• Data-intensive computing shares numerous requirements with collaboration, instrumentation, computation, …– Security, resource mgt, info services, etc.
• Important to exploit commonalities as very unlikely that multiple infrastructures can be maintained
• Fortunately this seems easy to do!
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 48
A Model Architecture for Data Grids
Metadata Catalog
Replica Catalog
Tape Library
Disk Cache
Attribute Specification
Logical Collection and Logical File Name
Disk Array Disk Cache
Application
Replica Selection
Multiple Locations
NWS
SelectedReplica
GridFTP Control ChannelPerformanceInformation &Predictions
Replica Location 1 Replica Location 2 Replica Location 3
MDS
GridFTPDataChannel
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 49
Examples ofDesired Data Grid Functionality• High-speed, reliable access to remote data• Automated discovery of “best” copy of data • Manage replication to improve performance• Co-schedule compute, storage, network• “Transparency” wrt delivered performance• Enforce access control on data• Allow representation of “global” resource
allocation policies• Not there yet! Back to the physics…
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 50
Needles in LARGE Haystacks• When protons collide, some events are "interesting" and
may tell us about exciting new particles or forces, whereas many others are "ordinary" collisions (often called "background"). The ratio of their relative rates is about 1 interesting event for 10 million background events. One of our key needs is to separate the interesting events from the ordinary ones.
• Furthermore the information must be sufficiently detailed and precise to allow eventual recognition of certain "events" that may only occur at the rate of one in one million-million collisions (10-12), a very small fraction of the recorded events, which are a very small fraction of all events.
• I will outline the steps ATLAS takes in getting to these interesting particles
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 51
HEP Data Analysis
• Raw data – hits, pulse heights
• Reconstructed data (ESD)– tracks, clusters…
• Analysis Objects (AOD)– Physics Objects– Summarized– Organized by physics topic
• Ntuples, histograms, statistical data
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 52
Production Analysis
Raw dataRaw data
Reconstruction
Data Acquisition
Level 3 trigger
Trigger TagsTrigger Tags
Event Summary Data ESDEvent Summary Data ESD Event Tags Event Tags
Physics Models
Monte Carlo Truth DataMonte Carlo Truth Data
MC Raw DataMC Raw Data
Reconstruction
MC Event Summary DataMC Event Summary Data MC Event Tags MC Event Tags
Detector Simulation
Calibration DataCalibration Data
Run ConditionsRun Conditions
Trigger System
coordination required at the collaboration and group levels
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 53
Physics Analysis
Event Tags Event TagsEvent Selection
Calibration DataCalibration Data
Analysis
ProcessingRaw DataRaw Data
Tier 0,1Collaboration
wide
Tier 2Analysis
Groups
Tier 3, 4Physicists
Physics Analysis
PhysicsObjects
StatObjects
ESDESD
ESDESD
ESD
Analysis
Objects
PhysicsObjects
StatObjects
PhysicsObjects
StatObjects
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 54
LHC pp Running: Data Sizes
Experiment SIM SIMESD RAW Trigger ESD AOD TAG
ALICE 400KB 40KB 1MB 100Hz 200KB 50KB 10KB
ATLAS 2MB 500KB 1.6MB 200Hz 500KB 100KB 1KB
CMS 2MB 400KB 1.5MB 150Hz 250KB 50KB 10KB
LHCb 400KB 25KB 2KHz 75KB 25KB 1KB
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 55
Data Flow Analysis by V. Lindenstruth
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 56
Data Estimates From LHC
Data sizes from the LHC along with some estimates about the tiered resources envisioned
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 57
Example of (Simulated) Data Sizes
• In advance of getting real data we have very sophisticated simulation codes which attempt to model collisions of particles and the corresponding response of the ATLAS detector.
• These simulations are critical to understanding our detector design and our analysis codes
• The next slide will show some information about how much computer time each relevant step takes and how much data is involved as an example of a small research group’s requirements
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 58
Case Study: Simulating Some ATLAS Physics Process
Step Storage CPU Time
Generation 36 MB Seconds
Simulation 845 MB 55 Hours
Digitization 1520 MB 9 Hours
Reconstruction 15 MB 10 Hours
Running 1000 Z μμ events (at Michigan)
This totals ~2.4 GB and 74 CPU hours on a 2 GHz P4 processor 2 GHz P4 processor with 1 GB of RAM. with 1 GB of RAM. Unfortunately in this study we need approximately 1 Million such events which means we must have 2.4 TB of storage and require 3000 CPU DAYS of processing time
Virtual (and Meta) Data(A very important concept for LHC
Physics Infrastructure)
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 60
Programs as Community Resources:Data Derivation and Provenance
• Most [scientific] data are not simple “measurements”; essentially all are:– Computationally corrected/reconstructed– And/or produced by numerical simulation
• And thus, as data and computers become ever larger and more expensive:– Programs are significant community resources– So are the executions of those programs
• Management of the transformations that map between datasets an important problem
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 61
Transformation Derivation
Data
created-by
execution-of
consumed-by/generated-by
“I’ve detected a calibration error in an instrument and
want to know which derived data to recompute.”
“I’ve come across some interesting data, but I need to understand the nature of the corrections applied when it was constructed before I can trust it for my purposes.”
“I want to search an ATLAS event database for events with certain characteristics. If a program that performs this analysis exists, I won’t have to write one from scratch.”
“I want to apply an jet analysis program to
millions of events. If the results already exist, I’ll
save weeks of computation.”
Motivations (1)
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 62
Motivations (2)• Data track-ability and result audit-ability
– Universally sought by GriPhyN applications
• Repair and correction of data– Rebuild data products—c.f., “make”
• Workflow management– A new, structured paradigm for organizing, locating,
specifying, and requesting data products
• Performance optimizations– Ability to re-create data rather than move it
• And others, some we haven’t thought of
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 63
Virtual Data in Action
• Data request may– Compute locally– Compute remotely– Access local data– Access remote data
• Scheduling based on– Local policies– Global policies– Cost
Major facilities, archives
Regional facilities, caches
Local facilities, cachesFetch item
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 64
Size distribution ofgalaxy clusters?
1
10
100
1000
10000
100000
1 10 100
Num
ber
of C
lust
ers
Number of Galaxies
Galaxy clustersize distribution
Chimera Virtual Data System+ iVDGL Data Grid (many CPUs)
Chimera Application: Sloan Digital Sky Survey Analysis
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 65
Virtual Data Queries• A query for events implies:
– Really means asking if a input data sample corresponding to a set of calibrations, methods, and perhaps Monte Carlo history match a set of criteria
• It is vital to know, for example:– What data sets already exist, and in which formats? (ESD,
AOD,Physics Objects) If not, can it be materialized?– Was this data calibrated optimally?– If I want to recalibrate a detector, what is required?
• Methods:– Virtual data catalogs and APIs– Data signatures
• Interface to Event Selector Service
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 66
Virtual Data Scenario• A physicist issues a query for events
– Issues: • How expressive is this query?• What is the nature of the query?• What language (syntax) will be supported for the query?
– Algorithms are already available in local shared libraries– For ATLAS, an Athena service consults an ATLAS Virtual Data
Catalog or Registry Service
• Three possibilities– File exists on local machine
• Analyze it
– File exists in a remote store• Copy the file, then analyze it
– File does not exists• Generate, reconstruct, analyze; possibly done remotely, then copied
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 67
Virtual Data Summary
• The concept of virtual data is an important one for the LHC computing
• Having the ability to either utilize a local copy, move a remote copy or regenerate the dataset (locally or remotely) is very powerful in helping to optimize the overall infrastructure supporting LHC physics.
The Future of Data-Intensive e-Science…
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 69
Distributed Computing Problem Evolution
• Past-present: O(102) high-end systems; Mb/s networks; centralized (or entirely local) control– I-WAY (1995): 17 sites, week-long; 155 Mb/s– GUSTO (1998): 80 sites, long-term experiment– NASA IPG, NSF NTG: O(10) sites, production
• Present: O(104-106) data systems, computers; Gb/s networks; scaling, decentralized control– Scalable resource discovery; restricted delegation; community
policy; Data Grid: 100s of sites, O(104) computers; complex policies
• Future: O(106-109) data, sensors, computers; Tb/s networks; highly flexible policy, control
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 70
A “Grid” (Globus) View of the Future:
All Software is Network-CentricNetwork-Centric• We don’t build or buy “computers” anymore, we borrow
or lease required resources– When I walk into a room, need to solve a problem, need to
communicate
• A “computer” is a dynamically, often collaboratively constructed collection of processors, data sources, sensors, networks– Similar observations apply for software
• Pervasive, extremely high-performance networks provide location independent access to huge datasets
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 71
Major Issues for Grids and eScience
• The vision outlined in the previous slide assumes a level of capability way beyond current grid technology:– Current grids allow access to distributed resources in a
secure (authenticated/authorized) way
– However, the grid users are faced with a very limited and detached view of their “virtual computer”
• Current grid technology and middleware requires the next level of integration and functionality to deliver an effective system for e-Science…
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 72
The Needed Grid Enhancements• We need to provide users with the SAME type of
capabilities which exist on their local workstation and operating systems:– File “browsing”– Task debugging– System monitoring – Process prioritization and management– Accounting and auditing– Fine grained access control – Storage access and management– Error Handling/Resilency
• The network has become the virtual bus on our grid virtual computer…we now need the equivalent of a “grid operating system” to enable easy transparent to our virtual machine
• This is difficult but very necessary…
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 73
Future of the Grid for LHC?• Grid Optimist
– Best thing since the WWW. Don’t worry, the grid will solve all our computational and data problems! Just click “Install”
• Grid Pessimist– The grid is “merely an excuse by computer scientists to milk
the political system for more research grants so they can write yet more lines of useless code” [The Economist, June 21, 2001]
– “A distraction from getting real science done” [McCubbin]
• Grid Realist– The grid can solve our problems, because we design it to! We
must work closely with the developers as it evolves, providing our requirements and testing their deliverables in our environment.
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 74
Conclusions
• We have a significant amount of data to manage for LHC• Networks are a central component in future e-Science.e-Science.• LHC Physics will depend heavily on globally distributed
resources => the NETWORK is critical!• There are many very interesting projects and concepts in
Grids and Networks working toward dealing with the massive amounts of distributed data we expect.
• We have a few more years to see how well it will all work!
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 75
For More Information…
• The ATLAS Project– atlas.web.cern.ch/Atlas/
• Grid Forum– www.gridforum.org
• HENP Internet2 SIG– henp.internet2.edu
• OSG– www.opensciencegrid.org/
• Questions?
May 18, 2005 Shawn McKee - PASI - Mendoza, Argentina 76
Questions?