28
US ATLAS Computing Operations Kaushik De Kaushik De University of Texas At Arlington University of Texas At Arlington U.S. ATLAS Tier 2/Tier 3 Workshop, UTA U.S. ATLAS Tier 2/Tier 3 Workshop, UTA November 10, 2009 November 10, 2009

US ATLAS Computing Operations

  • Upload
    sine

  • View
    28

  • Download
    0

Embed Size (px)

DESCRIPTION

US ATLAS Computing Operations. Kaushik De University of Texas At Arlington U.S. ATLAS Tier 2/Tier 3 Workshop, UTA November 10, 2009. Overview. We expect the LHC to start in a few weeks ATLAS is ready – after 15 years of preparations - PowerPoint PPT Presentation

Citation preview

Page 1: US ATLAS Computing Operations

US ATLAS Computing Operations

Kaushik DeKaushik De

University of Texas At ArlingtonUniversity of Texas At Arlington

U.S. ATLAS Tier 2/Tier 3 Workshop, UTAU.S. ATLAS Tier 2/Tier 3 Workshop, UTA

November 10, 2009November 10, 2009

Page 2: US ATLAS Computing Operations

Overview

We expect the LHC to start in a few weeksWe expect the LHC to start in a few weeks

ATLAS is ready – after 15 years of preparationsATLAS is ready – after 15 years of preparations

As soon as collisions start, the focus will be on physicsAs soon as collisions start, the focus will be on physics

The distributed computing infrastructure must performThe distributed computing infrastructure must perform US facilities are required to provide about one quarter of ATLAS

computing (though historically we have often provided one third) US primarily responsible for PanDA software used ATAS wide We have done many readiness exercises during the past couple of

years – with excellent success, learning from each exercise But the stress on the system will be far greater when data arrives We have to adapt quickly to circumstances, as they arise

November 10, 2009November 10, 2009Kaushik De Kaushik De 2

Page 3: US ATLAS Computing Operations

Facilities Organization

See Michael Ernst’s See Michael Ernst’s

talk for overviewtalk for overview

Integration program Integration program

covered in Rob covered in Rob

Gardner’s talkGardner’s talk

Operations activity Operations activity

started 1.5 years agostarted 1.5 years ago

November 10, 2009November 10, 2009Kaushik De Kaushik De 3

Page 4: US ATLAS Computing Operations

Operations Checklist

Data production – MC, reprocessingData production – MC, reprocessing

Data management – storage, distributionData management – storage, distribution

User analysisUser analysis

All three common areas rely on smooth site operationsAll three common areas rely on smooth site operations

November 10, 2009November 10, 2009Kaushik De Kaushik De 4

Page 5: US ATLAS Computing Operations

US Production - Steady

November 10, 2009November 10, 2009Kaushik De Kaushik De 5

Page 6: US ATLAS Computing Operations

November 10, 2009November 10, 2009Kaushik De Kaushik De 6

U.S. Production Shares

US production Q1, 2009

Page 7: US ATLAS Computing Operations

Jobs sent toBNL only

November 10, 2009November 10, 2009 7Kaushik De Kaushik De

Page 8: US ATLAS Computing Operations

November 10, 2009November 10, 2009Kaushik De Kaushik De 8

Includes extra jobssent during first weekonly to BNL

Includes extra jobssent during first weekonly to BNL

Page 9: US ATLAS Computing Operations

November 10, 2009November 10, 2009Kaushik De Kaushik De 9

Job Error SummaryFrom all Sites

Top 10 errors only

Page 10: US ATLAS Computing Operations

November 10, 2009November 10, 2009Kaushik De Kaushik De 10

BNL errors include largeerror rate from early jobs(~11% without first week)

Page 11: US ATLAS Computing Operations

November 10, 2009November 10, 2009Kaushik De Kaushik De 11

Page 12: US ATLAS Computing Operations

ADCoS (ADC Operations Shifts)

ADCoS combined shifts started January 28th, 2008ADCoS combined shifts started January 28th, 2008 Coordinated by K. De and Xavier Espinal (PIC/IFAE)

ADCoS GoalsADCoS Goals World-wide (distributed/remote) shifts To monitor all ATLAS distributed computing resources To provide Quality of Service (QoS) for all data processing

OrganizationOrganization Senior/Trainee: 2 day shifts, Expert: 7 day shifts Three shift times (in CERN time zone):

o ASIA/Pacific: 0h - 8ho EU-ME: 8h - 16ho Americas: 16h - 24h

U.S. shift teamU.S. shift team In operation long before ADCoS was started Yuri Smirnov (captain), Mark Sosebee, Wensheng Deng, Barry Spurlock, Armen

Vartapetian, Rupam Das

November 10, 2009November 10, 2009Kaushik De Kaushik De 12

Page 13: US ATLAS Computing Operations

November 10, 2009November 10, 2009Kaushik De Kaushik De 13

Page 14: US ATLAS Computing Operations

Storage Issues

US will have ~10 PB by Q1 2010US will have ~10 PB by Q1 2010 Already have 4-5 PB deployed

Fast ramp-up needed

Space token managementSpace token management Each site must provide 6-10 different storage partitions (tokens)

This is quite labor intensive – ADC trying to automate

Need to decide soon about group data placement and policies

Good management of space tokens is essential to physics analysis

November 10, 2009November 10, 2009Kaushik De Kaushik De 14

Page 15: US ATLAS Computing Operations

Storage Tokens (~mid Sep09)Site HOTDISK DATADISK MCDISK PRODDISK USERDISK SCRATCH

DISK GROUPDISK

LOCALGROUPDISK

BNL 5 TB 767/935 TB 953/1078 No 0/16 3/16 4/23 2/8

AGLT2 No 43/110 TB 127/138 17/23 16/18 0/21 1/17 No

MWT2-UC No 52/100 TB 127/181 27/40 12/30 0/2 2/10 No

NET2 No Total 173/245 TB

Yes Yes Yes Yes Yes No

SLACT2 Yes Total 203/230 TB

Yes Yes Yes No Yes No

SWT2 No Total 191/229 TB

Yes Yes Yes No Yes No

November 10, 2009November 10, 2009Kaushik De Kaushik De 15

Page 16: US ATLAS Computing Operations

File Transfers - Steady

November 10, 2009November 10, 2009Kaushik De Kaushik De 16

Page 17: US ATLAS Computing Operations

User Analysis

U.S. ATLAS has excellent track record of supporting usersU.S. ATLAS has excellent track record of supporting users US is the most active cloud in ATLAS for user analysis

Analysis sites are in continuous and heavy use for >2 years

We have regularly scaled up resources to match user needs

UAT09 was very important as a readiness exercise

Tier 3 issues will be discussed tomorrow

November 10, 2009November 10, 2009Kaushik De Kaushik De 17

Page 18: US ATLAS Computing Operations

Analysis Usage Growing

November 10, 2009November 10, 2009Kaushik De Kaushik De 18

Page 19: US ATLAS Computing Operations

Growing Tier 2 Activity

November 10, 2009November 10, 2009Kaushik De Kaushik De 19

Page 20: US ATLAS Computing Operations

User Analysis Test - UAT09

520 M events, ~75 TB AOD’s, generated during summer 520 M events, ~75 TB AOD’s, generated during summer

2009 in US cloud, SM sample with jet Pt cut2009 in US cloud, SM sample with jet Pt cut

Distributed to all cloudsDistributed to all clouds

Intensively analyzed by ~100 users worldwide, during a 3 Intensively analyzed by ~100 users worldwide, during a 3

day periodday period

More details in Nurcan’s talk tomorrowMore details in Nurcan’s talk tomorrow

November 10, 2009November 10, 2009Kaushik De Kaushik De 20

Page 21: US ATLAS Computing Operations

UAT09 – Pathena Jobs Worldwide

November 10, 2009November 10, 2009Kaushik De Kaushik De 21

Page 22: US ATLAS Computing Operations

UAT09 – US Sites

November 10, 2009November 10, 2009Kaushik De Kaushik De 22

Page 23: US ATLAS Computing Operations

November 10, 2009November 10, 2009 23Kaushik De Kaushik De

Page 24: US ATLAS Computing Operations

November 10, 2009November 10, 2009Kaushik De Kaushik De 24

More Jobs

Page 25: US ATLAS Computing Operations

November 10, 2009November 10, 2009Kaushik De Kaushik De 25

Page 26: US ATLAS Computing Operations

November 10, 2009November 10, 2009Kaushik De Kaushik De 26

More Jobs

Page 27: US ATLAS Computing Operations

Distributed Analysis Shift Team – DAST

User analysis support is provided by the AtlasDAST (Atlas User analysis support is provided by the AtlasDAST (Atlas Distributed Analysis Shift Team) since September 29, 2008. Distributed Analysis Shift Team) since September 29, 2008. Previously, user support was on a best effort basis Previously, user support was on a best effort basis provided by the Panda and Ganga software developers.provided by the Panda and Ganga software developers.

Nurcan Ozturk (UTA) and Daniel van der Ster (CERN) are Nurcan Ozturk (UTA) and Daniel van der Ster (CERN) are coordinating this effort.coordinating this effort.

DAST organizes shifts currently in two time zones – US and DAST organizes shifts currently in two time zones – US and CERN. One person from each zone is on shift for 7 hours a CERN. One person from each zone is on shift for 7 hours a day covering between 9am-11pm CERN time, and 5 days a day covering between 9am-11pm CERN time, and 5 days a week.week.

Please contact Nurcan to join this effortPlease contact Nurcan to join this effort

November 10, 2009November 10, 2009Kaushik De Kaushik De 27

Page 28: US ATLAS Computing Operations

Conclusion

Waiting for collisions!Waiting for collisions!

November 10, 2009November 10, 2009Kaushik De Kaushik De 28