29
Internet 2 W orkshop (Nov . 1, 2000) Paul Avery (The GriPhyN Project) 1 The GriPhyN Project (Grid Physics Network) Paul Avery University of Florida http://www.phys.ufl.edu/~avery/ [email protected] Internet 2 Workshop Atlanta Nov. 1, 2000

Internet 2 Workshop (Nov. 1, 2000)Paul Avery (The GriPhyN Project)1 The GriPhyN Project (Grid Physics Network) Paul Avery University of Florida avery

Embed Size (px)

Citation preview

Page 1: Internet 2 Workshop (Nov. 1, 2000)Paul Avery (The GriPhyN Project)1 The GriPhyN Project (Grid Physics Network) Paul Avery University of Florida avery

Internet 2 Workshop (Nov. 1, 2000)

Paul Avery (The GriPhyN Project) 1

The GriPhyN Project(Grid Physics Network)

Paul AveryUniversity of Floridahttp://www.phys.ufl.edu/~avery/[email protected]

Internet 2 WorkshopAtlanta

Nov. 1, 2000

Page 2: Internet 2 Workshop (Nov. 1, 2000)Paul Avery (The GriPhyN Project)1 The GriPhyN Project (Grid Physics Network) Paul Avery University of Florida avery

Internet 2 Workshop (Nov. 1, 2000)

Paul Avery (The GriPhyN Project) 2

Motivation: Data Intensive ScienceScientific discovery increasingly driven by IT

Computationally intensive analysesMassive data collectionsRapid access to large subsetsData distributed across networks of varying capability

Dominant factor: data growth~0.5 Petabyte in 2000 (BaBar)~10 Petabytes by 2005~100 Petabytes by 2010~1000 Petabytes by 2015?

Robust IT infrastructure essential for scienceProvide rapid turnaroundCoordinate, manage the limited computing, data handling and

network resources effectively

Page 3: Internet 2 Workshop (Nov. 1, 2000)Paul Avery (The GriPhyN Project)1 The GriPhyN Project (Grid Physics Network) Paul Avery University of Florida avery

Internet 2 Workshop (Nov. 1, 2000)

Paul Avery (The GriPhyN Project) 3

Grids as IT InfrastructureGrid: Geographically distributed IT resources

configured to allow coordinated use

Physical resources & networks provide raw capability

“Middleware” services tie it together

Page 4: Internet 2 Workshop (Nov. 1, 2000)Paul Avery (The GriPhyN Project)1 The GriPhyN Project (Grid Physics Network) Paul Avery University of Florida avery

Internet 2 Workshop (Nov. 1, 2000)

Paul Avery (The GriPhyN Project) 4

Data Grid Hierarchy (LHC Example)

Tier 1

T2

T2

T2

T2

T2

3

3

3

3

3

3

3

3

3

3

3

Tier 0 (CERN)

4 4 4 4

3 3

Tier0 CERNTier1 National LabTier2 Regional Center at UniversityTier3 University workgroupTier4 Workstation

GriPhyN:R&DTier2 centersUnify all IT resources

Page 5: Internet 2 Workshop (Nov. 1, 2000)Paul Avery (The GriPhyN Project)1 The GriPhyN Project (Grid Physics Network) Paul Avery University of Florida avery

Internet 2 Workshop (Nov. 1, 2000)

Paul Avery (The GriPhyN Project) 5

Why a Data Grid: PhysicalUnified system: all computing resources part of grid

Efficient resource use (manage scarcity)Resource discovery / scheduling / coordination truly possible“The whole is greater than the sum of its parts”

Optimal data distribution and proximity Labs are close to the (raw) data they needUsers are close to the (subset) data they needMinimize bottlenecks

Efficient network use local > regional > national > oceanicNo choke points

Scalable growth

Page 6: Internet 2 Workshop (Nov. 1, 2000)Paul Avery (The GriPhyN Project)1 The GriPhyN Project (Grid Physics Network) Paul Avery University of Florida avery

Internet 2 Workshop (Nov. 1, 2000)

Paul Avery (The GriPhyN Project) 6

Why a Data Grid: DemographicCentral lab cannot manage / help 1000s of users

Easier to leverage resources, maintain control, assert priorities at regional / local level

Cleanly separates functionalityDifferent resource types in different TiersOrganization vs. flexibilityFunding complementarity (NSF vs DOE), targeted initiatives

New IT resources can be added “naturally”Matching resources at Tier 2 universitiesLarger institutes can join, bringing their own resourcesTap into new resources opened by IT “revolution”

Broaden community of scientists and studentsTraining and educationVitality of field depends on University / Lab partnership

Page 7: Internet 2 Workshop (Nov. 1, 2000)Paul Avery (The GriPhyN Project)1 The GriPhyN Project (Grid Physics Network) Paul Avery University of Florida avery

Internet 2 Workshop (Nov. 1, 2000)

Paul Avery (The GriPhyN Project) 7

GriPhyN = Applications + CS + GridsSeveral scientific disciplines

US-CMS High Energy PhysicsUS-ATLAS High Energy PhysicsLIGO/LSC Gravity wave researchSDSS Sloan Digital Sky Survey

Strong partnership with computer scientistsDesign and implement production-scale grids

Maximize effectiveness of large, disparate resourcesDevelop common infrastructure, tools and servicesBuild on foundations Globus, PPDG, MONARC, Condor, … Integrate and extend existing facilities

$70M total cost NSF(?)$12M R&D$39M Tier 2 center hardware, personnel, operations$19M? Networking

Page 8: Internet 2 Workshop (Nov. 1, 2000)Paul Avery (The GriPhyN Project)1 The GriPhyN Project (Grid Physics Network) Paul Avery University of Florida avery

Internet 2 Workshop (Nov. 1, 2000)

Paul Avery (The GriPhyN Project) 8

Particle Physics Data Grid (PPDG)

First Round Goal: First Round Goal: Optimized cached read access to 10-100 Gbytes Optimized cached read access to 10-100 Gbytes drawn from a total data set of 0.1 to ~1 Petabytedrawn from a total data set of 0.1 to ~1 Petabyte

Matchmaking, Co-Scheduling: SRB, Condor, Globus services; HRM, NWSMatchmaking, Co-Scheduling: SRB, Condor, Globus services; HRM, NWS

PRIMARY SITEPRIMARY SITEData Acquisition,Data Acquisition,

CPU, Disk, CPU, Disk, Tape RobotTape Robot

SECONDARY SITESECONDARY SITECPU, Disk, CPU, Disk, Tape RobotTape Robot

Site to Site Data Replication Service

100 Mbytes/sec

ANL, BNL, Caltech, FNAL, JLAB, LBNL, ANL, BNL, Caltech, FNAL, JLAB, LBNL, SDSC, SLAC, U.Wisc/CSSDSC, SLAC, U.Wisc/CS

Multi-Site Cached File Access Service

UniversityUniversityCPU, Disk, CPU, Disk,

UsersUsers

PRIMARY SITEPRIMARY SITEDAQ, Tape, DAQ, Tape,

CPU, CPU, Disk, RobotDisk, Robot

Satellite SiteSatellite SiteTape, CPU, Tape, CPU, Disk, RobotDisk, Robot

UniversityUniversityCPU, Disk, CPU, Disk,

UsersUsers

UniversityUniversityCPU, Disk, CPU, Disk,

UsersUsers

UniversityUniversityCPU, Disk, CPU, Disk,

UsersUsers

UniversityUniversityCPU, Disk, CPU, Disk,

UsersUsers

Satellite SiteSatellite SiteTape, CPU, Tape, CPU, Disk, RobotDisk, Robot

Page 9: Internet 2 Workshop (Nov. 1, 2000)Paul Avery (The GriPhyN Project)1 The GriPhyN Project (Grid Physics Network) Paul Avery University of Florida avery

Internet 2 Workshop (Nov. 1, 2000)

Paul Avery (The GriPhyN Project) 9

Grid Data Management Prototype (GDMP)

Distributed Job Execution and Data

Handling:Goals

Transparency Performance Security Fault Tolerance Automation

Submit job

Replicate data

Replicatedata

Site A Site B

Site C

Jobs are executed locally or remotely

Data is always written locally

Data is replicated to remote sites

Job writes data locally

GDMP V1.0: Caltech + EU DataGrid WP2; for CMS “HLT” Production in October

Page 10: Internet 2 Workshop (Nov. 1, 2000)Paul Avery (The GriPhyN Project)1 The GriPhyN Project (Grid Physics Network) Paul Avery University of Florida avery

Internet 2 Workshop (Nov. 1, 2000)

Paul Avery (The GriPhyN Project) 10

GriPhyN R&D Funded!NSF/ITR results announced Sep. 13

$11.9M from Information Technology Research Program$ 1.4M in matching from universitiesLargest of all ITR awardsExcellent reviews emphasizing importance of workJoint NSF oversight from CISE and MPS

Scope of ITR fundingMajor costs for people, esp. students, postdocsNo hardware or professional staff for operations !2/3 CS + 1/3 application science Industry partnerships being developed

Microsoft, Intel, IBM, Sun, HP, SGI, Compaq, CiscoStill require funding for implementation

and operation of Tier 2 centers

Page 11: Internet 2 Workshop (Nov. 1, 2000)Paul Avery (The GriPhyN Project)1 The GriPhyN Project (Grid Physics Network) Paul Avery University of Florida avery

Internet 2 Workshop (Nov. 1, 2000)

Paul Avery (The GriPhyN Project) 11

GriPhyN InstitutionsU FloridaU ChicagoCaltechU Wisconsin, MadisonUSC/ISIHarvard IndianaJohns HopkinsNorthwesternStanfordBoston UU Illinois at ChicagoU PennU Texas, BrownsvilleU Wisconsin, MilwaukeeUC Berkeley

UC San DiegoSan Diego Supercomputer CenterLawrence Berkeley LabArgonneFermilabBrookhaven

Page 12: Internet 2 Workshop (Nov. 1, 2000)Paul Avery (The GriPhyN Project)1 The GriPhyN Project (Grid Physics Network) Paul Avery University of Florida avery

Internet 2 Workshop (Nov. 1, 2000)

Paul Avery (The GriPhyN Project) 12

Fundamental IT Challenge

“Scientific communities of thousands, distributed globally, and served by networks with bandwidths varying by orders of magnitude, need to extract small signals from enormous backgrounds via computationally demanding (Teraflops-Petaflops) analysis of datasets that will grow by at least 3 orders of magnitude over the next decade: from the 100 Terabyte to the 100 Petabyte scale.”

Page 13: Internet 2 Workshop (Nov. 1, 2000)Paul Avery (The GriPhyN Project)1 The GriPhyN Project (Grid Physics Network) Paul Avery University of Florida avery

Internet 2 Workshop (Nov. 1, 2000)

Paul Avery (The GriPhyN Project) 13

GriPhyN Research AgendaVirtual Data technologies

Derived data, calculable via algorithm (e.g., 90% of HEP data) Instantiated 0, 1, or many timesFetch vs execute algorithmVery complex (versions, consistency, cost calculation, etc)

Planning and schedulingUser requirements (time vs cost)Global and local policies + resource availabilityComplexity of scheduling in dynamic environment (hierarchy)Optimization and ordering of multiple scenariosRequires simulation tools, e.g. MONARC

Page 14: Internet 2 Workshop (Nov. 1, 2000)Paul Avery (The GriPhyN Project)1 The GriPhyN Project (Grid Physics Network) Paul Avery University of Florida avery

Internet 2 Workshop (Nov. 1, 2000)

Paul Avery (The GriPhyN Project) 14

Virtual Datain Action

?

Major Archive Facilities

Network caches & regional centers

Local sites

Data request may Compute locally Compute remotely Access local data Access remote data

Scheduling based on Local policies Global policies

Local autonomy

Page 15: Internet 2 Workshop (Nov. 1, 2000)Paul Avery (The GriPhyN Project)1 The GriPhyN Project (Grid Physics Network) Paul Avery University of Florida avery

Internet 2 Workshop (Nov. 1, 2000)

Paul Avery (The GriPhyN Project) 15

Research Agenda (cont.)Execution management

Co-allocation of resources (CPU, storage, network transfers)Fault tolerance, error reportingAgents (co-allocation, execution)Reliable event service across Grid Interaction, feedback to planning

Performance analysis (new) Instrumentation and measurement of all grid componentsUnderstand and optimize grid performance

Virtual Data Toolkit (VDT)VDT = virtual data services + virtual data toolsOne of the primary deliverables of R&D effortOngoing activity + feedback from experiments (5 year plan)Technology transfer mechanism to other scientific domains

Page 16: Internet 2 Workshop (Nov. 1, 2000)Paul Avery (The GriPhyN Project)1 The GriPhyN Project (Grid Physics Network) Paul Avery University of Florida avery

Internet 2 Workshop (Nov. 1, 2000)

Paul Avery (The GriPhyN Project) 16

GriPhyN: PetaScale Virtual Data Grids

Virtual Data Tools

Request Planning &

Scheduling ToolsRequest Execution & Management Tools

Transforms

Distributed resources(code, storage,

computers, and network)

Resource Management

Services

Resource Management

Services

Security and Policy

Services

Security and Policy

Services

Other Grid ServicesOther Grid

Services

Interactive User Tools

Production TeamIndividual InvestigatorWorkgroups

Raw data source

Page 17: Internet 2 Workshop (Nov. 1, 2000)Paul Avery (The GriPhyN Project)1 The GriPhyN Project (Grid Physics Network) Paul Avery University of Florida avery

Internet 2 Workshop (Nov. 1, 2000)

Paul Avery (The GriPhyN Project) 17

LHC Vision (e.g., CMS Hierarchy)

Tier2 Center

Online System

Offline Farm,CERN Computer

Center > 20 TIPS

France Center

FNAL Center Italy Center UK Center

InstituteInstituteInstituteInstitute ~0.25TIPS

Workstations,other portals

~100 MBytes/sec

~2.4 Gbits/sec

100 - 1000

Mbits/sec

Bunch crossing per 25 nsecs.100 triggers per secondEvent is ~1 MByte in size

Physicists work on analysis “channels”.

Each institute has ~10 physicists working on one or more channels

Physics data cache

~PBytes/sec

~0.6 - 2.5 Gbits/sec + Air Freight

Tier2 CenterTier2 CenterTier2 Center

~622 Mbits/sec

Tier 0 +1

Tier 1

Tier 3

Tier 4

Tier2 Center Tier 2

GriPhyN: FOCUS On University Based Tier2 Centers

Experiment

Page 18: Internet 2 Workshop (Nov. 1, 2000)Paul Avery (The GriPhyN Project)1 The GriPhyN Project (Grid Physics Network) Paul Avery University of Florida avery

Internet 2 Workshop (Nov. 1, 2000)

Paul Avery (The GriPhyN Project) 18

SDSS Vision

Three main functions: Main data processing (FNAL)

Processing of raw data on a grid Rapid turnaround with multiple TB data Accessible storage of all imaging data

Fast science analysis environment (JHU) Combined data access and analysis of

calibrated data Shared by the whole collaboration Distributed I/O layer and processing layer Connected via redundant network paths

Public data access Provide the SDSS data for the NVO (National Virtual Observatory) Complex query engine for the public SDSS data browsing for astronomers, and outreach

Page 19: Internet 2 Workshop (Nov. 1, 2000)Paul Avery (The GriPhyN Project)1 The GriPhyN Project (Grid Physics Network) Paul Avery University of Florida avery

Internet 2 Workshop (Nov. 1, 2000)

Paul Avery (The GriPhyN Project) 19

Principal areas of GriPhyN applicability:Main data processing (Caltech/CACR)

Enable computationally limited searches periodic sources)

Access to LIGO deep archive Access to Observatories

Science analysis environment for LSC(LIGO Scientific Collaboration)

Tier2 centers: shared LSC resource Exploratory algorithm, astrophysics research

with LIGO reduced data sets Distributed I/O layer and processing layer builds on existing APIs Data mining of LIGO (event) metadatabases LIGO data browsing for LSC members, outreach

LIGO Vision

Hanford

Livingston

Caltech

MIT

INet2Abilene

Tier1 LSCLSCLSCLSCLSC

Tier2

OC3

OC48

OC3

OC12

OC48

LIGO I Science Run: 2002 – 2004+LIGO II Upgrade: 2005 – 20xx (MRE to NSF 10/2000)

Page 20: Internet 2 Workshop (Nov. 1, 2000)Paul Avery (The GriPhyN Project)1 The GriPhyN Project (Grid Physics Network) Paul Avery University of Florida avery

Internet 2 Workshop (Nov. 1, 2000)

Paul Avery (The GriPhyN Project) 20

GriPhyN Cost Breakdown (May 31)

24.516

13.819

19.245

12.5

4.41 4.1735.4

00

5

10

15

20

25

30

Hardware Personnel Networking ITR R&D

$M

FY00 - FY05

FY06

Page 21: Internet 2 Workshop (Nov. 1, 2000)Paul Avery (The GriPhyN Project)1 The GriPhyN Project (Grid Physics Network) Paul Avery University of Florida avery

Internet 2 Workshop (Nov. 1, 2000)

Paul Avery (The GriPhyN Project) 21

Number of Tier2 Sites vs Time (May 31)

0

2

4

6

8

10

12

14

16

18

20

FY00 FY01 FY02 FY03 FY04 FY05 FY06

# S

ite

s

ATLAS

CMS

LIGO

SDSS

Page 22: Internet 2 Workshop (Nov. 1, 2000)Paul Avery (The GriPhyN Project)1 The GriPhyN Project (Grid Physics Network) Paul Avery University of Florida avery

Internet 2 Workshop (Nov. 1, 2000)

Paul Avery (The GriPhyN Project) 22

LHC Tier2 Architecture and CostLinux Farm of 128 Nodes (256 CPUs + disk) $ 350 KData Server with RAID Array $ 150 KTape Library $ 50 K Tape Media and Consumables $ 40 KLAN Switch $ 60 KCollaborative Tools & Infrastructure $ 50 K Installation & Infrastructure $ 50 KNet Connect to WAN (Abilene) $ 300 KStaff (Ops and System Support) $ 200 K

Total Estimated Cost (First Year) $1,250 K

Average Yearly Cost including evolution, $ 750K upgrade and operations

1.5 – 2 FTE support required per Tier2

Assumes 3 year hardware replacement

Page 23: Internet 2 Workshop (Nov. 1, 2000)Paul Avery (The GriPhyN Project)1 The GriPhyN Project (Grid Physics Network) Paul Avery University of Florida avery

Internet 2 Workshop (Nov. 1, 2000)

Paul Avery (The GriPhyN Project) 23

Tier 2 Evolution (LHC Example) 2001 2006

Linux Farm: 5,000 SI95 50,000 SI95 Disks on CPUs 4 TB 40 TB RAID Array 2 TB 20 TB Tape Library 4 TB 50 - 100 TB LAN Speed 0.1 - 1 Gbps 10 - 100 Gbps WAN Speed 155 - 622 Mbps 2.5 - 10 Gbps Collaborative MPEG2 VGA Realtime HDTV

Infrastructure (1.5 - 4 Mbps) (10 - 20 Mbps)

RAID disk used for “higher availability” data

Reflects lower Tier2 component costs due to less demanding usage, e.g. simulation.

Page 24: Internet 2 Workshop (Nov. 1, 2000)Paul Avery (The GriPhyN Project)1 The GriPhyN Project (Grid Physics Network) Paul Avery University of Florida avery

Internet 2 Workshop (Nov. 1, 2000)

Paul Avery (The GriPhyN Project) 24

Current Grid DevelopmentsEU “DataGrid” initiative

Approved by EU in August (3 years, $9M)Exploits GriPhyN and related (Globus, PPDG) R&DCollaboration with GriPhyN (tools, Boards, interoperability,

some common infrastructure)http://grid.web.cern.ch/grid/

Rapidly increasing interest in GridsNuclear physicsAdvanced Photon Source (APS)Earthquake simulations (http://www.neesgrid.org/)Biology (genomics, proteomics, brain scans, medicine)“Virtual” Observatories (NVO, GVO, …)Simulations of epidemics (Global Virtual Population Lab)

GriPhyN continues to seek funds to implement vision

Page 25: Internet 2 Workshop (Nov. 1, 2000)Paul Avery (The GriPhyN Project)1 The GriPhyN Project (Grid Physics Network) Paul Avery University of Florida avery

Internet 2 Workshop (Nov. 1, 2000)

Paul Avery (The GriPhyN Project) 25

The management problem

Page 26: Internet 2 Workshop (Nov. 1, 2000)Paul Avery (The GriPhyN Project)1 The GriPhyN Project (Grid Physics Network) Paul Avery University of Florida avery

Internet 2 Workshop (Nov. 1, 2000)

Paul Avery (The GriPhyN Project) 26

PPDG

BaB

ar D

ata

Man

agem

ent

BaBar

D0

CDF

Nuclear Physics

CMSAtlas

Globus Users

SRB Users

Condor Users

HENPGC

Users

CM

S D

ata Man

agem

ent

Nucl Physics Data Management

D0 Data M

anagement

CDF Data ManagementA

tlas

Dat

a M

anag

emen

t

Globus Team

Condor

SRB Team

HE

NP

GC

Page 27: Internet 2 Workshop (Nov. 1, 2000)Paul Avery (The GriPhyN Project)1 The GriPhyN Project (Grid Physics Network) Paul Avery University of Florida avery

Internet 2 Workshop (Nov. 1, 2000)

Paul Avery (The GriPhyN Project) 27

Philosophical IssuesRecognition that IT industry is not standing still

Can we use tools that industry will develop?Does industry care about what we are doing?Many funding possibilities

Time scale5 years is looooonng in internet timeExponential growth is not uniform. Relative costs will changeCheap networking? (10 GigE standards)

Page 28: Internet 2 Workshop (Nov. 1, 2000)Paul Avery (The GriPhyN Project)1 The GriPhyN Project (Grid Physics Network) Paul Avery University of Florida avery

Internet 2 Workshop (Nov. 1, 2000)

Paul Avery (The GriPhyN Project) 28

Impact of New TechnologiesNetwork technologies

10 Gigabit Ethernet: 10GigE DWDM-Wavelength (OC-192)Optical switching networksWireless Broadband (from ca. 2003)

Internet information software technologiesGlobal information “broadcast” architecture

E.g, Multipoint Information Distribution Protocol: [email protected] coordinated agent architectures

E.g. Mobile Agent Reactive Spaces (MARS), Cabri et al.

Interactive monitoring and control of Grid resources By authorized groups and individualsBy autonomous agents

Use of shared resourcesE.g., CPU, storage, bandwidth on demand

Page 29: Internet 2 Workshop (Nov. 1, 2000)Paul Avery (The GriPhyN Project)1 The GriPhyN Project (Grid Physics Network) Paul Avery University of Florida avery

Internet 2 Workshop (Nov. 1, 2000)

Paul Avery (The GriPhyN Project) 29

Grids In 2000Grids will change the way we do science, engineering

Computation, large scale dataDistributed communities

The presentKey services, concepts have been identifiedDevelopment has startedTransition of services, applications to production use

The futureSophisticated integrated services and toolsets (Inter- and

IntraGrids+) could drive advances in many fields of science & engineering

Major IT challenges remainAn opportunity & obligation for collaboration