24
Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 1 Apprenticeship: 60 years of computing experience Ben Segal / CERN, Geneva [email protected] www.cern.ch/ben CERN Computing Colloquia October 3rd and 11th, 2019

[email protected] October 3rd and 11th, 2019...SHIFT Architecture and Development The SHIFT system has been outlined in earlier papers 11,2,31. A prime goal of the SHIFT project was

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: b.segal@cern.ch October 3rd and 11th, 2019...SHIFT Architecture and Development The SHIFT system has been outlined in earlier papers 11,2,31. A prime goal of the SHIFT project was

Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 1

Apprenticeship:60 years of computing experience

Ben Segal / CERN, [email protected]/ben

CERN Computing ColloquiaOctober 3rd and 11th, 2019

Page 2: b.segal@cern.ch October 3rd and 11th, 2019...SHIFT Architecture and Development The SHIFT system has been outlined in earlier papers 11,2,31. A prime goal of the SHIFT project was

Second talk(CERN from 1989 - 2018):

• No more mainframes at CERN • Launching Grid computing • Volunteer computing and virtualisation• Reflections of a retiree• Conclusions

Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 2

Page 3: b.segal@cern.ch October 3rd and 11th, 2019...SHIFT Architecture and Development The SHIFT system has been outlined in earlier papers 11,2,31. A prime goal of the SHIFT project was

Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 3

1989-93 CN-SW-DC SectionÞAround 1989 LR stepped down as SW Group Leader and created

the SW-DC Section to concentrate on Distributed Computing:

=> SW-DC Section Leader - Les Robertson(with an initial team of two: F. Hemmer and myself)

Our activities soon led to the:

SHIFT Project… and by 1993 we had become the largest Group in theComputing and Networking (CN) Division:

PDP (Physics Data Processing) Group

Page 4: b.segal@cern.ch October 3rd and 11th, 2019...SHIFT Architecture and Development The SHIFT system has been outlined in earlier papers 11,2,31. A prime goal of the SHIFT project was

Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 4

1989-91 SHIFT Project begins

IDEA: To provide CERN mainframe services on networked clusters of RISC and UNIX-based nodes:

“ Scalable Heterogeneous Integrated FaciliTy ”

• Initial prototype “HOPE” : a single Apollo DN10000• Used Cray to stage tape data• Connected to accounting system: 25% of CPU for CERN !

• For real system, used : Disk, Tape and CPU servers

Page 5: b.segal@cern.ch October 3rd and 11th, 2019...SHIFT Architecture and Development The SHIFT system has been outlined in earlier papers 11,2,31. A prime goal of the SHIFT project was

SHIFT Architecture

Mainframe Services from Gigabit-Networked lVorlstations Baud,. . .

Packard and OPAL, a large physics collabora-tion based at CERN.The current configuration for the centrally

operated RISC-based workstation batch services isgiven in Table 2, and also indicated in Figure 1.

Project GoalsThe goal was to develop an architecture which

could be used for general purpose scientific comput-ing, could be implemented to provide systems withexcellent price/performance when compæed withmainframe solutions, and could be scaled up to pro-vide very large integrated facilities, or down to pro-vide a system suitable for small university depart-ments. The resulting systems should present a fami-liar and unified system image to their users, includ-ing access to many Gigabytes of disk data and toTerabytes of tape data: this is what we imply by theword integrated.

The goals of the SHIFT development were asfollows.

o Provide an INTEGRATED system of CpU,disk and tape servers capable of supporting alarge-scale general-purpose batch service

o Construct the system from heterogeneouscomponents conforming to OPEN standards toretain flexibility towards new technology andproducts

I The system must be SCALABLE, both tosmall sizes for individual collaborations/smallinstitutes, and upwards to at least twice the

current size of the CERN computer centero The batch service quality should be at least as

good as mainframe batch quality, operate in adistributed environment, and have a unifiedpriority scheduling scheme

¡ Provide automatic control of disk file space,integrated with a tape staging service

o Provide support for IBM 3480-compatible car-tridge tapes, Exabyte 8mm tapes, and otherdeveloping tape technologies, with acc€ss toCERN's automatic cartridge-mounting robots

a System operation and accounting to beintegrated into the CERN central computerservices

o The architecture should also be capable ofsupporting interactive scientific applications

SHIFT Architecture and DevelopmentThe SHIFT system has been outlined in earlier

papers 11,2,31. A prime goal of the SHIFT projectwas to build facilities which could scale in capacityfrom relatively small systems up to several timesthat of the combined power of the CERN centralmainframes. To achieve this, an architecture waschosen which encouraged separation of frrnctionality.This allowed modular exrensibility, flexibility, andoptimization of each component foi its specifið func-tion. Figure 2 shows this schematic architecture.

The principal elements of SHIFT are logicallydivided into CPU servers, disk servers and tapeservers, with distributed software which is

cpusefvefs

backr¡ane

Figure 2: SHIFT A¡chitecture

Summer '92 USENIX - June 8-June LZ, L9g2 - San Antonio, TX

servers

167Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 5

Page 6: b.segal@cern.ch October 3rd and 11th, 2019...SHIFT Architecture and Development The SHIFT system has been outlined in earlier papers 11,2,31. A prime goal of the SHIFT project was

Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 6

The SHIFT Backplane (1)(This was my responsibility)

Very high performance network backplane needed

Calculations / simulations showed requirements for a100 CERN unit system (1/2 Computer Centre):

- 6 Mbytes/s sustained / 15 MBytes/s peak- Peak server interface speed: 3-5 Mbytes/s- Big problem was network CPU consumption!

Page 7: b.segal@cern.ch October 3rd and 11th, 2019...SHIFT Architecture and Development The SHIFT system has been outlined in earlier papers 11,2,31. A prime goal of the SHIFT project was

Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 7

The SHIFT Backplane (2)

=> Found and purchased UltraNet :

• Solved CPU consumption problem for streaming I/O

• Could use a reasonable number of powerful servers

• Took DL to visit the UltraNet company to approve it

Page 8: b.segal@cern.ch October 3rd and 11th, 2019...SHIFT Architecture and Development The SHIFT system has been outlined in earlier papers 11,2,31. A prime goal of the SHIFT project was

Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 8

The SHIFT Backplane (3)

As SHIFT grew, we developed a hybrid backplane(using multi-homing) :

- UltraNet- HiPPI : 800 Mb/s- FDDI : 100 Mb/s – (later Fast Ethernet)- Ethernet : 10 Mb/s

• Final iteration used simply Gigabit Ethernet

Page 9: b.segal@cern.ch October 3rd and 11th, 2019...SHIFT Architecture and Development The SHIFT system has been outlined in earlier papers 11,2,31. A prime goal of the SHIFT project was

The CORE System in 1992

Baud,. . .

outside the vault but in active use. A robot with acapacity for 18,000 3480 cartridges handles approxi-mately 20Vo of. the mount requests. Round-the-clockmanual mounts are the responsibility of operationsstaff.

Into this environment, a batch project based onRISC workstations was initiated two years ago.Begínning with a single APOLLO DN10040, theproject has grown substantially and now forms anoperational service which exceeds the total deliver-able CPU capacity of the central mainframes, Theservice is collectively known as the CentrallyOperated RISC Environtnent oÍ CORE, and has threecornponents: SHIFT, CS4 and HOPE.SHIFT The SHIFT system forms the subject of the

present paper. It is a general purpose facilityfor jobs with a broad range of I/O require-ments and which require access to many

Mainframe Services from Gigabit-Networked \ilorkstations

Gigabytes of online data. SHIFT worksta-tions are networked via both Ethernet andUltraNet. The SHIFT CPU and disk serversare currently SGI Power Series 340 worksta-tions and the tape servers are SUN 4/330s.

CSF The Central Simulation Facility or CSF is aplatform for CPU-intensive work with low I/Orequirements. The service runs on 16HP90001720 machines which are networkedvia Ethernet and which have full access to theSHIFT tape service. To the end user, CSFsystems are seen as a single batch facility.

HOPE The HOPE service is an earlier systembased on 3 APOLLO DN10040 machines. Itis for CPU-intensive, low I/O work and it willbe phased out during the course of 1992 asHOPE workload is tàken over by CSF.HOPE is a joint project between Hewlett-

Service cPU (CU) Disk (GB) 3480 Tapes 8mm TapesSHIFT

HOPECSF

100

50150

150

1010

6 manual 2 manual2 robotic

Table 2: CERN - Central RISC Services

shiftgsrl|. lrÞ.a

añd Þbol

AnalysisFacility

SimulationFacilities

16 H.P e00G'720¡

CSF

Figure 1: CERN - Centrally Operated RISC Environment

L66 Summer '92 USENIX - June 8-June 12, L992 - San Antonio, TXApprenticeship: 60 years of computing experience / Ben Segal, October 2019 9

Page 10: b.segal@cern.ch October 3rd and 11th, 2019...SHIFT Architecture and Development The SHIFT system has been outlined in earlier papers 11,2,31. A prime goal of the SHIFT project was

SHIFT Developers1992 Usenix paper authors – (only 12)

Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 10

Page 11: b.segal@cern.ch October 3rd and 11th, 2019...SHIFT Architecture and Development The SHIFT system has been outlined in earlier papers 11,2,31. A prime goal of the SHIFT project was

Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 11

1991-99 SHIFT Production

• First OPAL production system used SGI Power Series

• Later added Unix nodes by HP, IBM, DEC, Sun

=> All four LEP collaborations adopted SHIFT

• Final iteration used commodity PC’s and Linux

=>Mainframes were all replaced by 1997

Page 12: b.segal@cern.ch October 3rd and 11th, 2019...SHIFT Architecture and Development The SHIFT system has been outlined in earlier papers 11,2,31. A prime goal of the SHIFT project was

Les Robertson in June 2001 accepting the Computerworld Honors Award

Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 12

Page 13: b.segal@cern.ch October 3rd and 11th, 2019...SHIFT Architecture and Development The SHIFT system has been outlined in earlier papers 11,2,31. A prime goal of the SHIFT project was

The SHIFT Team in 2001 with the Computerworld Honors Award

Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 13

Page 14: b.segal@cern.ch October 3rd and 11th, 2019...SHIFT Architecture and Development The SHIFT system has been outlined in earlier papers 11,2,31. A prime goal of the SHIFT project was

The Grid Idea, 1999

Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 14

The Grid: Blueprint for a New Computing InfrastructureIan Foster, Carl Kesselman

Morgan Kaufmann Publishers, 1999 - 677 pages

The grid promises to fundamentally change the way we think about and use

computing. This infrastructure will connect multiple regional and national

computational grids, creating a universal source of pervasive and dependable

computing power that supports dramatically new classes of applications…

Page 15: b.segal@cern.ch October 3rd and 11th, 2019...SHIFT Architecture and Development The SHIFT system has been outlined in earlier papers 11,2,31. A prime goal of the SHIFT project was

The Grid at CERN

• “After the Web, the Grid”… • … and why this is nonsense …• … it should have been …• “After SHIFT, the Grid”…

• The Hype factor – Foster et al. …• Overall was good for CERN (and IT)

Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 15

Page 16: b.segal@cern.ch October 3rd and 11th, 2019...SHIFT Architecture and Development The SHIFT system has been outlined in earlier papers 11,2,31. A prime goal of the SHIFT project was

The Grid at CERN(my recollections)

• Globus story• pragmatic choice, but an inverted pyramid

• EDG• EU structure: 22 partners, coordination …?

• “Development” vs “production” pressure• WP2 – Grid Data Management work package• I kept the development line …

Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 16

Page 17: b.segal@cern.ch October 3rd and 11th, 2019...SHIFT Architecture and Development The SHIFT system has been outlined in earlier papers 11,2,31. A prime goal of the SHIFT project was

Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 17

Middleware : WP 1 - WP 3: wide area

Workload Management WP 1n …

Data management WP 2n Manage and share PetaByte-scale information volumes in high-throughput production-quality grid environments.

n Replication/caching; Metadata mgmt.; Authentication; Query optimization.

n High speed WAN data access; interface to Mass Storage Mgmt. systems.

Application monitoring WP 3n …

Page 18: b.segal@cern.ch October 3rd and 11th, 2019...SHIFT Architecture and Development The SHIFT system has been outlined in earlier papers 11,2,31. A prime goal of the SHIFT project was

The Grid Worldwide

• WLCG to the rescue• Les Robertson again… pragmatism!

• LHC starting in 2005? 2008? 2010 …• … but the Grid was ready

• CERN-IT on the map at last• … and experiments now get computing budgets

Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 18

Page 19: b.segal@cern.ch October 3rd and 11th, 2019...SHIFT Architecture and Development The SHIFT system has been outlined in earlier papers 11,2,31. A prime goal of the SHIFT project was

Retirement, 2002

• Becoming an “Honorary Member of the Personnel”:• Invited by DL but said no…• At my farewell drink, given another chance…

• … and I have been around since!

• It’s a privilege and a challenge:• (To do enough but not too much…)

Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 19

Page 20: b.segal@cern.ch October 3rd and 11th, 2019...SHIFT Architecture and Development The SHIFT system has been outlined in earlier papers 11,2,31. A prime goal of the SHIFT project was

BOINC at CERN

• Co-started (with F. Grey) the BOINC project“LHC@home”

• CERN’s 50th Anniversary 2004 - computing Challenge?• Telephoned SETI@home (David Anderson) => BOINC

• Simple BOINC for Sixtrack (beam stability simulations)• FORTRAN only, using BOINC library• 2 Masters students for 6 months• Running from 2004 to today: over 200,000 volunteers…• Windows, Linux, Mac supported

Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 20

Page 21: b.segal@cern.ch October 3rd and 11th, 2019...SHIFT Architecture and Development The SHIFT system has been outlined in earlier papers 11,2,31. A prime goal of the SHIFT project was

BOINC at CERN• 2005: asked by DL to run “Real Physics”…• MUCH HARDER: needed a full Linux environment …

… but most volunteers were using Windows …=> Virtualisation chosen• 2006-2007 showed feasibility but image too big (n x GB’s)

=> CernVM launched by Predrag Buncic in 2008• 2008-2010: BOINC-VM with PH-SFT and many students …• 2011 production for Theory Dept (5 trillion events, still running)• LHC experiments joined later (ATLAS, CMS, LHCb)• CPU worth many millions of CHF has been volunteered

Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 21

Page 22: b.segal@cern.ch October 3rd and 11th, 2019...SHIFT Architecture and Development The SHIFT system has been outlined in earlier papers 11,2,31. A prime goal of the SHIFT project was

Citizen Cyberscience(Outside CERN, with F. Grey et al.)

• Began 2005 with Malaria.net• Collaboration with UniGE and Swiss Tropical Institute

… then AIMS in S. Africa => Africa@home

• Continued in Taiwan, Beijing => Asia@home

• 2009: created CCC with CERN, UniGE, UNOSat=> Citizen Cyberlab: CERN, UniGE, UNITAR

Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 22

Page 23: b.segal@cern.ch October 3rd and 11th, 2019...SHIFT Architecture and Development The SHIFT system has been outlined in earlier papers 11,2,31. A prime goal of the SHIFT project was

Profit from hindsight !(recalling some IT technology choices at CERN):

• No Intel CPU’s (only Motorola) - for much of 1980’s• No IBM PC’s (so no Microsoft) - until late 1980’s• No C programming - until mid 1980’s• No TCP/IP outside CERN - until late 1988• No UNIX - until mid 1980’s• No LINUX - until mid 1990’s• No NexT machines (except 1 or 2 ...) - ever• No SGI machines - until early 1990’s• No cisco routers (except 2 ...) - until early 1990’s• No VMs on the LHC Grid - until early 2010’s

Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 23

Page 24: b.segal@cern.ch October 3rd and 11th, 2019...SHIFT Architecture and Development The SHIFT system has been outlined in earlier papers 11,2,31. A prime goal of the SHIFT project was

Some Conclusions

• Find a good boss !!• Value your mentors – and be a mentor too• Beware of the “SHEEP EFFECT”• Watch out for “career-based” people

• Enjoy your work – if it’s not fun, complain!

Apprenticeship: 60 years of computing experience / Ben Segal, October 2019 24