20
Canada’s national laboratory for particle and nuclear physics and accelerator-based science Canadian Tier-1 Status and Evolution Reda Tafirout TRIUMF GDB meeting, CERN, May 10 2017

Canadian Tier-1 Status and Evolution · 3 Canadian WLCG Scene Tier-1: dedicated facility located at TRIUMF – Managed and operated by ATLAS-Canada, Simon Fraser U. and TRIUMF Tier-2's:

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Canadian Tier-1 Status and Evolution · 3 Canadian WLCG Scene Tier-1: dedicated facility located at TRIUMF – Managed and operated by ATLAS-Canada, Simon Fraser U. and TRIUMF Tier-2's:

Canada’s national laboratoryfor particle and nuclear physicsand accelerator-based science

Canadian Tier-1 Status and Evolution

Reda TafiroutTRIUMF

GDB meeting, CERN, May 10 2017

Page 2: Canadian Tier-1 Status and Evolution · 3 Canadian WLCG Scene Tier-1: dedicated facility located at TRIUMF – Managed and operated by ATLAS-Canada, Simon Fraser U. and TRIUMF Tier-2's:

2

Outline

● Canadian WLCG Scene

– The bigger picture & organization

● Tier-1 centre brief overview

– Historical context & funding history

● Current status and plans

– Computing needs and funding opportunities

– Tier-1 relocation planning & transition activities

● Future outlook

– Remaining activities for 2017 & timeline

Page 3: Canadian Tier-1 Status and Evolution · 3 Canadian WLCG Scene Tier-1: dedicated facility located at TRIUMF – Managed and operated by ATLAS-Canada, Simon Fraser U. and TRIUMF Tier-2's:

3

Canadian WLCG Scene

● Tier-1: dedicated facility located at TRIUMF

– Managed and operated by ATLAS-Canada, Simon Fraser U. and TRIUMF

● Tier-2's: shared facilities located at Compute Canada centres

– National organization serving all research communities

– Management structure and operations are more complex

– Each ~year, ATLAS-Canada submits a proposal to the National Resource Allocation Committee (NRAC) to secure resources.

– Two WLCG federations across 4 sites (was 5 prior to 2013).

● Same funding mechanism: Canada Foundation for Innovation (CFI) & provincial partners for matching

– Tier-1: very successful in securing own funding since 2006

– Compute Canada is refreshing all of its infrastructure and aging equipment (going from ~27 to 4-6 larger centres)

– CFI would like a Tier-1 integration within Compute Canada, to minimize infrastructure and operating costs.

Page 4: Canadian Tier-1 Status and Evolution · 3 Canadian WLCG Scene Tier-1: dedicated facility located at TRIUMF – Managed and operated by ATLAS-Canada, Simon Fraser U. and TRIUMF Tier-2's:

4

Canadian Tier-1 centre

● TRIUMF Tier-1 is well established with an excellent track record in several key areas:

– Availability & Reliability

– Scalability & Performance

– Customer service attitude & user support

– Provision of resources and high level services (under WLCG MOU)

– ~10 years of stable 24x7 operations

● Serving ATLAS VO only

– Providing 10% of Tier-1 resources

● Dedicated facility and personnel

● High visibility project for Canada and TRIUMF

● MOU signed by TRIUMF in 2006 once initial CFI funding secured

~99.7%

Page 5: Canadian Tier-1 Status and Evolution · 3 Canadian WLCG Scene Tier-1: dedicated facility located at TRIUMF – Managed and operated by ATLAS-Canada, Simon Fraser U. and TRIUMF Tier-2's:

5

Funding History & Current Status

● Important component of the current TRIUMF five-year plan funding cycle and prior proposals (2005-2010, 2010-2015)

● Successful and critical funding secured throughout the years:

– Significant prototyping since early 2002 helped secure funding

– 2006: CFI Exceptional opportunities Fund● $8M CFI + $2.5M IOF , $4M BCKDF (provincial match)

● In-kind: significant vendors discounts, TRIUMF, CANARIE, BCNet

– 2011: $3.3M from CFI for operating (through march '15)

– 2012: CFI LEF $2.5M project cost (40% CFI, 40% BCKDF) + in-kind.

● All CFI proposals led by Simon Fraser University (SFU) on behalf of ATLAS-Canada universities consortium

● Strategic procurement and expansions inline with the ATLAS physics program

● Hardware resources: 4830 cores, 7.8 PB disk, 12 PB tape, 85 servers

● Human resources: 9 FTEs

● TRIUMF presently covering full operations costs (since 2015)

Page 6: Canadian Tier-1 Status and Evolution · 3 Canadian WLCG Scene Tier-1: dedicated facility located at TRIUMF – Managed and operated by ATLAS-Canada, Simon Fraser U. and TRIUMF Tier-2's:

6

TRIUMF Tier-1 Physical Infrastructure

● Current physical layout as deployed at TRIUMF

● Usage for power & cooling (relative to capacity):

– Power: ~75% of UPS capacity (225 kVA, dual feeds) ~45% of regular power capacity (112 kVA transformer)

– Cooling: ~70% of total capacity (320 kW design)

2007+2008

2009+2010

2011+2012

2014

Page 7: Canadian Tier-1 Status and Evolution · 3 Canadian WLCG Scene Tier-1: dedicated facility located at TRIUMF – Managed and operated by ATLAS-Canada, Simon Fraser U. and TRIUMF Tier-2's:

7

TRIUMF Tier-1 Network Topology

● Current network topology as deployed at TRIUMF

Page 8: Canadian Tier-1 Status and Evolution · 3 Canadian WLCG Scene Tier-1: dedicated facility located at TRIUMF – Managed and operated by ATLAS-Canada, Simon Fraser U. and TRIUMF Tier-2's:

8

TRIUMF Tier-1 Cluster Diagram

● Current cluster architecture as deployed at TRIUMF

Page 9: Canadian Tier-1 Status and Evolution · 3 Canadian WLCG Scene Tier-1: dedicated facility located at TRIUMF – Managed and operated by ATLAS-Canada, Simon Fraser U. and TRIUMF Tier-2's:

9

Resources Needs & New Funding

● Computing needs are increasing substantially due to excellent LHC performance in 2016, which is expected to continue for 2017 & 2018

● ATLAS requests are reviewed yearly by the WLCG Computing Resources Scrutiny Group (C-RSG) and approved by CERN Resources Review Board.

TRIUMF Tier-1 Required Resources

2017 2018 2019 2020

CPU (cores) 7,236 7,456 8,948 10,737

Disk (PB) 7.5 8.1 9.4 10.8

Tape (PB) 20.7 23.2 26.7 30.7

● New CFI cyber-infrastructure funding competition announced in late 2014:

– Capital for equipment: only Compute Canada can apply (shared resources)

● Several discussions followed between TRIUMF, SFU, CFI, Compute Canada and ATLAS-Canada (during 2015 & 2016):

– Decision was made to integrate new Tier-1 resources into the new Compute Canada centre at SFU and leverage on its infrastructure; also a CFI condition

– CFI proposal submitted for the Innovation Fund in October 2016, led by SFU on behalf of ATLAS-Canada (decision expected in June)

Page 10: Canadian Tier-1 Status and Evolution · 3 Canadian WLCG Scene Tier-1: dedicated facility located at TRIUMF – Managed and operated by ATLAS-Canada, Simon Fraser U. and TRIUMF Tier-2's:

10

Tier-1 relocation plans

● Great majority of TRIUMF Tier-1 equipment reached 5 years in 2017:

– Warranties & support contracts extended until early 2018 (CFI LEF funds)

– Hardware refresh required by then

● TRIUMF infrastructure is aging (~10 years) and floor space limited

● New data centre at Simon Fraser University:

– 2 x 0.5 MW UPS capacity, backed up by generator (HA)

– Large floor space (new building recently renovated)

– Ensures proper expansion going forward into the future (10 MW power)

● For April 2017, need additional Tier-1 capacity as per MoU commitments

– However, limited capital available (remaining CFI LEF)

– Borrowing equipment from Compute Canada for additional tape capacity

– Leveraging on Compute Canada procurement process whenever possible

● Goal is to minimize Tier-1 downtime during the transition

Page 11: Canadian Tier-1 Status and Evolution · 3 Canadian WLCG Scene Tier-1: dedicated facility located at TRIUMF – Managed and operated by ATLAS-Canada, Simon Fraser U. and TRIUMF Tier-2's:

11

Role of TRIUMF unchanged● TRIUMF Tier-1 personnel are still responsible for the operations at SFU;

keeping control, and line management structure unchanged. Activities will be coordinated by the ATLAS Tier-1 Group Leader at TRIUMF

● New data centre infrastructure aspects are the responsibility of SFU

● Drafts of MoU & SLA exist and will be finalized in the coming months

Page 12: Canadian Tier-1 Status and Evolution · 3 Canadian WLCG Scene Tier-1: dedicated facility located at TRIUMF – Managed and operated by ATLAS-Canada, Simon Fraser U. and TRIUMF Tier-2's:

12

● Distance between TRIUMF and SFU: ~28 km with ~1 ms RTT

● New location: SFU_WTB (Simon Fraser University Water Tower Building)

TRIUMF & SFU Locations

TRIUMF

Simon Fraser U.

BCNET TX (CANARIE)

Page 13: Canadian Tier-1 Status and Evolution · 3 Canadian WLCG Scene Tier-1: dedicated facility located at TRIUMF – Managed and operated by ATLAS-Canada, Simon Fraser U. and TRIUMF Tier-2's:

13

New Tier-1 deployment plans

● Implement a distributed Tier-1 centre during the transition phase

– Tier-1 resources at TRIUMF and SFU locations seen as one from ATLAS

● Distributed dCache (similar to NDGF) ; other services should be OK

● Phase 0: pre-production of initial services and testing (Q1 '17)

– Install necessary equipment (core switch, HSM servers, admins, SAN)

– Network configuration (new address space, LHCOPN, DNS, etc.)

– New tape library commissioning and distributed Tier-1 testing

● Phase 1: production at smaller scale & capacity (Q2 '17)

– production with additional tape capacity and related services

– Install additional disk and cpu capacity (for 2017 WLCG pledges)

● Phase 2: production at larger scale & capacity (Q4 '17 – Q1 '18)

– Hardware refresh and expansion (with new CFI funding)

– Data migration from TRIUMF site to SFU site

Page 14: Canadian Tier-1 Status and Evolution · 3 Canadian WLCG Scene Tier-1: dedicated facility located at TRIUMF – Managed and operated by ATLAS-Canada, Simon Fraser U. and TRIUMF Tier-2's:

14

Phase 0 related work

● Intense activities during Q1 of 2017

– Physical installation, configuration, commissioning and testing

● Collaborative effort between TRIUMF, BCNET, CANARIE, SFU

– Network fully implemented with necessary topology (spare core switch from TRIUMF)

● All necessary equipment in place

– Using existing CFI funds for HSM servers, admin nodes, SAN for tape buffer

– Tape library borrowed from Compute Canada (drives and cartridges with logical partitioning)

● New cluster ready for production

Q4 -2017 - 2x100Gto T0,T1,T2

Tier-1 SFU/GP2

Tier-1 TRIUMF

BCNET LHCONE VRF

10G

20G

to T0,T1,T2

BCNET R&E VRF

BCNET LHCOPN VRF

Canarie TRIUMF ASN 36391

10G

10G

10G

GRE IP-IP VPN

GRE IP-IP VPN

Page 15: Canadian Tier-1 Status and Evolution · 3 Canadian WLCG Scene Tier-1: dedicated facility located at TRIUMF – Managed and operated by ATLAS-Canada, Simon Fraser U. and TRIUMF Tier-2's:

15

DNS & IP address space

● TRIUMF delegated DNS for the lcg.triumf.ca and t1dev.triumf.ca sub-domain to the Tier-1

– June 2014:

● IPv4: 206.12.1.0/24 and 142.90.144.0/23 at TRIUMF

– Jan 2017:

● IPv4: 206.12.9.128/25, 206.12.9.112/28 at SFU_WTB

– Mar 2017:

● IPv6: 2607:f8f0:660:1::/64 at TRIUMF

● IPv6: 2607:f8f0:660:3::/64 at SFU_WTB

● Hidden DNS master model ansible-ized and put into production in Feb 2017

– 2 public slaves, 3rd one soon at SFU_WTB site

– 4 private slaves (which are also TRIUMF slaves), 2 at TRIUMF and 2 at SFU_WTB site

– # ansible-playbook dns/update/public.yml

– # ansible-playbook dns/push/public.yml -e "target='dns_slaves'"

Page 16: Canadian Tier-1 Status and Evolution · 3 Canadian WLCG Scene Tier-1: dedicated facility located at TRIUMF – Managed and operated by ATLAS-Canada, Simon Fraser U. and TRIUMF Tier-2's:

16

DNS Master / Slaves

Page 17: Canadian Tier-1 Status and Evolution · 3 Canadian WLCG Scene Tier-1: dedicated facility located at TRIUMF – Managed and operated by ATLAS-Canada, Simon Fraser U. and TRIUMF Tier-2's:

17

IPv6 Status (Network)

● Almost fully implemented: two aspects remaining

– For the TRIUMF site: work required from core computing services to implement IPv6 on R&E network, needs further coordination

– For the SFU_WTB site: IPv6 on commercial/commodity not implemented

Page 18: Canadian Tier-1 Status and Evolution · 3 Canadian WLCG Scene Tier-1: dedicated facility located at TRIUMF – Managed and operated by ATLAS-Canada, Simon Fraser U. and TRIUMF Tier-2's:

18

● Focus has been on storage services (initial Tier-1 WLCG requirement/expectation)

● Implemented in the Middleware Readiness (MW) dCache instance for the moment (pre-production phase)

– IPv4/6 dual stack setup

– MW dCache instance: 1 head node, 1 pool node, SL7, java8, 1Gb network interface

– Straight forward and normal dCache setup procedure:

● listens on any protocol, configure IPv6 protocol into Pool Manager. Removing hostname definition from /etc/hosts

– iptables : open only necessary ports to the outside; internal storage nodes are trusted.

– ip6tables open only necessary ports to any nodes.

– IPv6 is now the primary protocol for data transfer in MW readiness

● Completed and tested very recently

IPv6 Status (WLCG dual stack services)

Page 19: Canadian Tier-1 Status and Evolution · 3 Canadian WLCG Scene Tier-1: dedicated facility located at TRIUMF – Managed and operated by ATLAS-Canada, Simon Fraser U. and TRIUMF Tier-2's:

19

Remaining Activities for 2017

● Phase 1: full production with additional capacity at SFU

– Tape: finalize monitoring aspects for 24x7 operations (now).

(tape infrastructure borrowed from Compute Canada)

– Disk & CPU: procurement process ongoing (expect delivery in June)

(using remaining CFI funds)

– Network: VPN between the two sites for WN access

● Phase 2: large-scale procurement and Tier-1 hardware refresh

– Exact timeline highly dependent on CFI funding decision (June), application for matching funds, lifting of conditions and capital readiness for spending (no later than Q4 '17)

● Finalize MOU & SLA between TRIUMF, SFU and Compute Canada

● Tier-1 Personnel: exact time fraction and personnel count that needs to spend significant amount of time at the SFU_WTB location TBD. Developing a new operations model (to be finalized in early 2018)

Page 20: Canadian Tier-1 Status and Evolution · 3 Canadian WLCG Scene Tier-1: dedicated facility located at TRIUMF – Managed and operated by ATLAS-Canada, Simon Fraser U. and TRIUMF Tier-2's:

Canada’s national laboratoryfor particle and nuclear physics and accelerator-based science

TRIUMF: Alberta | British Columbia | Calgary | Carleton | Guelph | Manitoba | McGill | McMaster | Montréal | Northern British Columbia | Queen’s | Regina | Saint Mary’s | Simon Fraser | Toronto | Victoria | Western | Winnipeg | York

Thank you!Merci!

Follow us at TRIUMFLab