33
AMS TIM, CERN Jul 23, 2004 AMS Computing and Ground Centers Alexei Klimentov — [email protected]

AMS TIM, CERN Jul 23, 2004

  • Upload
    donnel

  • View
    29

  • Download
    2

Embed Size (px)

DESCRIPTION

AMS TIM, CERN Jul 23, 2004. AMS Computing and Ground Centers. Alexei Klimentov — [email protected]. AMS Computing and Ground Data Centers. AMS-02 Ground Centers AMS centers at JSC Ground data transfer Science Operation Center prototype Hardware and Software evaluation - PowerPoint PPT Presentation

Citation preview

  • AMS TIM, CERN Jul 23, 2004 AMS Computing and Ground Centers

    Alexei Klimentov [email protected]

    Alexei Klimentov. AMS TIM @ CERN. July 2004.

  • AMS Computing and Ground Data CentersAMS-02 Ground CentersAMS centers at JSCGround data transferScience Operation Center prototypeHardware and Software evaluationImplementation planAMS/CERN computing and manpower issues MC Production StatusAMS-02 MC (2004A)Open questions : plans for Y2005 AMS-01 MC

    Alexei Klimentov. AMS TIM @ CERN. July 2004.

  • AMS-02 Ground Support SystemsPayload Operations Control Center (POCC) at CERN (first 2-3 months in Houston)CERN Bldg.892 wing Acontrol room, usual source of commands receives Health & Status (H&S), monitoring and science data in real-time receives NASA video voice communication with NASA flight operationsBackup Control Station at JSC (TBD)Monitor Station in MIT backup of control room receives Health & Status (H&S) and monitoring data in real-time voice communication with NASA flight operationsScience Operations Center (SOC) at CERN (first 2-3 months in Houston)CERN Bldg.892 wing Areceives the complete copy of ALL datadata processing and science analysisdata archiving and distribution to Universities and LaboratoriesGround Support Computers (GSC) at Marshall Space Flight Center receives data from NASA -> buffer -> retransmit to Science CenterRegional Centers Madrid, MIT, Yale, Bologna, Milan, Aachen, Karlsruhe, Lyon, Taipei, Nanjing, Shanghai, analysis facilities to support geographically close Universities

    Alexei Klimentov. AMS TIM @ CERN. July 2004.

  • AMS facilitiesNASAfacilities

    Alexei Klimentov. AMS TIM @ CERN. July 2004.

  • Alexei Klimentov. AMS TIM @ CERN. July 2004.

  • AMS Ground Centers at JSCRequirements to AMS Ground Systems at JSCDefine AMS GS HW and SW componentsComputing facilitiesACOP flightAMS pre-flightAMS flightafter 3 monthsData storageData transmission Discussed with NASA in Feb 2004http://ams.cern.ch/Computing/pocc_JSC.pdf

    Alexei Klimentov. AMS TIM @ CERN. July 2004.

  • AMS-02 Computing facilities at JSC

    CenterLocationFunction(s)ComputersQtyPOCCBldg.30, Rm 212CommandingTelemetry MonitoringOn-line processingPentium MS WinPentium Linux 19 MonitorsNetworking switchesTerminal ConsoleMCC WS42819822SOCBldg.30, Rm 3301Data ProcessingData AnalysisData, Web, News ServersData ArchivingPentium LinuxIBM LTO tapes drivesNetworking switches17 color monitorsTerminal console3521052Terminal roomtbdNotebooks, desktops100AMS CSRBldg.30M, Rm 236MonitoringPentium Linux19 color monitorMCC WS221

    Alexei Klimentov. AMS TIM @ CERN. July 2004.

  • AMS Computing at JSC (TBD)LR launch ready date : Sep 2007, L AMS-02 launch date

    YearResponsibleActionsLR-8 monthsN.Bornas, P.Dennett, A.Klimentov, A.Lebedev, B.Robichaux, G.CarosiSet-up at JSC the basic version of the POCCConduct tests with ACOP for commanding and data transmissionLR-6 monthsP.Dennett, A.Eline, P.Fisher, A.Klimentov, A.Lebedev, A.Eline, Finns (?)Set-up POCC basic version at CERNSet-up AMS monitoring station in MITConduct tests with ACOP/MSFC/JSC commanding and data transmissionLR A.Klimentov, B.RobichauxSet-up POCC flight configuration at JSCLR

    L 2 weeks V.Choutko, A.Eline, A.Klimentov, B.RobichauxA.Lebedev, P.DennettSet-up SOC flight configuration at JSCSet-up terminal room and AMS CSRCommanding and data transmission verificationL+2 months(tbd)A.KlimentovSet-up POCC flight configuration at CERNMove part of SOC computers from JSC to CERNSet-up SOC flight configuration at CERNL+3 months(tbd)A.Klimentov, A.Lebedev, A.Eline, V.ChoutkoActivate AMS POCC at CERNMove all SOC equipment to CERNSet-up AMS POCC basic version at JSC

    Alexei Klimentov. AMS TIM @ CERN. July 2004.

  • Data Transmission Will AMS need a dedicated line to send data from MSFC to ground centers or the public Internet can be used ?What Software (SW) must be used for a bulk data transfer and how reliable is it ?What data transfer performance can be achieved ?G.Carosi ,A.Eline,P.Fisher, A.KlimentovHigh Rate Data Transfer between MSFC Al and POCC/SOC, POCC and SOC, SOC and Regional centerswill become a paramount importance

    Alexei Klimentov. AMS TIM @ CERN. July 2004.

  • Global Network Topology

    Alexei Klimentov. AMS TIM @ CERN. July 2004.

  • Alexei Klimentov. AMS TIM @ CERN. July 2004.

  • Alexei Klimentov. AMS TIM @ CERN. July 2004.

  • A.Elin, A.Klimentov, K.Scholberg and J.Gongamsbbftp tests CERN/MIT & CERN/SEU Jan/Feb 2003

    Alexei Klimentov. AMS TIM @ CERN. July 2004.

  • Data Transmission Tests (conclusions)In its current configuration Internet provides sufficient bandwidth to transmit AMS data from MSFC Al to AMS ground centers at rate approaching 9.5 Mbit/secWe are able to transfer and store data on a high end PC reliably with no data loss Data transmission performance is comparable of what achieved with network monitoring toolsWe can transmit data simultaneously to multiple cites

    Alexei Klimentov. AMS TIM @ CERN. July 2004.

  • Data and Computation for Physics Analysisbatchphysicsanalysisdetectorevent tags datarawdataeventreconstructioneventsimulationinteractivephysicsanalysis

    analysis objects(extracted by physics topic)event filter(selection &reconstruction)processeddata (event summary dataESD/DST)

    Alexei Klimentov. AMS TIM @ CERN. July 2004.

  • Symmetric Multi-Processor (SMP) ModelExperimentTapeStorageTeraBytes of disks

    Alexei Klimentov. AMS TIM @ CERN. July 2004.

  • AMS SOC (Data Production requirements)Reliability High (24h/day, 7days/week)Performance goal process data quasi-online (with typical delay < 1 day)Disk Space 12 months data onlineMinimal human intervention (automatic data handling, job control and book-keeping)System stability monthsScalabilityPrice/PerformanceComplex system that consists of computing components including I/O nodes, worker nodes, data storage and networking switches. It should perform as a single system.Requirements :

    Alexei Klimentov. AMS TIM @ CERN. July 2004.

  • Production Farm Hardware EvaluationProcessing nodedisk server

    ProcessorIntel PIV 3.4+GHz, HTMemory1 GBSystem disk and transient data storage400 GB , IDE diskEthernet cards2x1 GBitEstimated Cost2500 CHF

    ProcessorIntel Pentium dual-CPU Xeon 3.2+GHzMemory2 GBSystem diskSCSI 18 GB double redundantDisk storage3x10x400 GB RAID 5 array or 4x8x400 GB RAID 5 array Effective disk volume 11.6 TBEthernet cards3x1 GBitEstimated cost33000 CHF (or 2.85 CHF/GB)

    Alexei Klimentov. AMS TIM @ CERN. July 2004.

  • AMS-02 Ground Centers.Science Operations Center. Computing Facilities.CERN/AMS NetworkCentral Data ServicesShared Disk Servers25 TeraByte disk6 PC based serverstape robotstape drivesLTO, DLTShared Tape ServersAnalysis Facilities(linux cluster)10-20 dual processor PCs5 PC serversAMS regional Centersbatchdataprocessinginteractivephysicsanalysis

    Interactive and Batch physicsanalysis

    Alexei Klimentov. AMS TIM @ CERN. July 2004.

  • AMS Science Operation Center Computing Facilities Archiving and Staging (CERN CASTOR)Analysis FacilitiesData ServerCell #1PC Linux3.4+GHz PC Linux Server2x3.4+GHz, RAID 5 ,10TBDiskServerDiskServerGigabit Switch (1 Gbit/sec)Gigabit Switch (1 Gbit/sec)AMS dataNASA datametadataProduction FarmDiskServerDiskServerSimulated data MC Data ServerPC Linux3.4+GHzPC Linux3.4+GHzPC Linux3.4+GHzPC Linux3.4+GHzGigabit Switch PC Linux3.4+GHz PC Linux Server 2x3.4+GHzWeb, News Production, DB serversAFS ServerCell #7 PC Linux Server2x3.4+GHz, RAID 5 ,10TB Tested, prototype in productionNot tested and no prototype yet

  • AMS-02 Science Operations CenterYear 2004MC Production (18 AMS Universites and Labs)SW : Data processing, central DB, data mining, serversAMS-02 ESD formatNetworking (A.Eline, Wu Hua, A.Klimentov)Gbit private segment and monitoring SW in production since AprilDisk servers and data processing (V.Choutko, A.Eline, A.Klimentov)dual-CPU Xeon 3.06 GHz 4.5 TB disk space in production since Jan2nd server : dual-CPU Xeon 3.2 GHz, 9.5 TB will be installed in Aug (3 CHF/GB)data processing node : PIV single CPU 3.4 GHz Hyper-Threading mode in production since JanDatatransfer station (Milano group : M.Boschini, D.Grandi,E.Micelotta and A.Eline)Data transfer to/from CERN (used for MC production)Station prototype installed in MaySW in production since January Status report on next AMS TIM

    Alexei Klimentov. AMS TIM @ CERN. July 2004.

  • AMS-02 Science Operations Center Year 2005Q 1 : SOC infrastructure setup Bldg.892 wing A : false floor, cooling, electricityMar 2005 setup production cell prototype 6 processing nodes + 1 disk server with private Gbit ethernetLR-24 months (LR launch ready date) Sep 2005 40% production farm prototype (1st bulk computers purchasing)Database serversData transmission tests between MSFC AL and CERN

    Alexei Klimentov. AMS TIM @ CERN. July 2004.

  • AMS-02 Computing Facilities . Ready = operational, bulk of CPU and disks purchasing LR-9 Months

    FunctionComputerQtyDisks (Tbytes) and TapesReady(*)LR-monthsGSC@MSFCIntel (AMD) dual-CPU, 2.5+GHz33x0.5TB Raid ArrayLR-2POCCPOCC prototype@JSCIntel and AMD, dual-CPU, 2.8+GHz456 TB Raid ArrayLRMonitor Station in MITIntel and AMD, dual-CPU, 2.8+GHz51 TB Raid ArrayLR-6Science Operation Centre :Production FarmIntel and AMD, dual-CPU , 2.8+GHz5010 TB Raid ArrayLR-2Database Serversdual-CPU 2.8+ GHz Intel or Sun SMP 20.5TBLR-3Event Storage and ArchivingDisk Servers dual-CPU Intel 2.8+GHz650 Tbyte Raid ArrayTape library (250 TB)LRInteractive and Batch AnalysisSMP computer, 4GB RAM, 300 Specint95 or Linux farm101 Tbyte Raid ArrayLR-1

    Alexei Klimentov. AMS TIM @ CERN. July 2004.

  • People and Tasks (my incomplete list) 1/4ArchitecturePOIC/GSC SW and HWGSC/SOC data transmission SWGSC installation GSC maintenance

    A.Mujunen,J.Ritakari, P.Fisher,A.KlimentovA.Mujunen, J.RitakariA.Klimentov, A.ElinMIT, HUTMIT

    AMS-02 GSC@MSFCStatus : Concept was discussed with MSFC RepsMSFC/CERN, MSFC/MIT data transmission tests doneHUT have no funding for Y2004-2005

    Alexei Klimentov. AMS TIM @ CERN. July 2004.

  • People and Tasks (my incomplete list) 2/4ArchitectureTReKGate, AMS Cmd StationCommanding SW and ConceptVoice and VideoMonitoringData validation and online processingHW and SW maintenance

    P.Fisher, A.Klimentov, M.PohlP.Dennett, A.Lebedev, G.Carosi, A.Klimentov, A.LebedevG.CarosiV.Choutko, A.LebedevV.Choutko, A.Klimentov

    More manpower will be needed starting LR-4 monthsAMS-02 POCC

    Alexei Klimentov. AMS TIM @ CERN. July 2004.

  • People and Tasks (my incomplete list) 3/4ArchitectureData Processing and AnalysisSystem SW and HEP appl.Book-keeping and DatabaseHW and SW maintenance

    V.Choutko, A.Klimentov, M.PohlV.Choutko, A.KlimentovA.Elin, V.Choutko, A.KlimentovM.Boschini et al, A.KlimentovMore manpower will be needed starting from LR 4 monthsAMS-02 SOCStatus : SOC Prototyping is in progress SW debugging during MC productionImplementation plan and milestones are fulfilled

    Alexei Klimentov. AMS TIM @ CERN. July 2004.

  • People and Tasks (my incomplete list) 4/4INFN ItalyIN2P3 FranceSEU ChinaAcademia SinicaRWTH AachenAMS@CERN

    PG Rancoita et alG.Coignet and C.GoyJ.GongZ.RenT.Siedenburg

    M.Pohl, A.KlimentovAMS-02 Regional CentersStatus : Proposal prepared by INFN groups for IGS and J.Gong/A.Klimentov for CGS can be used by other Universities.Successful tests of distributed MC production and data transmission between AMS@CERN and 18 Universities. Data transmission, book-keeping and process communication SW (M.Boschini, V.Choutko, A.Elin and A.Klimentov) released.

    Alexei Klimentov. AMS TIM @ CERN. July 2004.

  • AMS/CERN computing and manpower issuesAMS Computing and Networking requirements summarized in Memo Nov 2005 : AMS will provide a detailed SOC and POCC implementation planAMS will continue to use its own computing facilities for data processing and analysis, Web and News servicesThere is no request to IT for support for AMS POCC HW or SW SW/HW first line expertise will be provided by AMS personnelY2005 2010 : AMS will have guaranteed bandwidth of USA/Europe lineCERN IT-CS support in case of USA/Europe line problemsData Storage : AMS specific requirements will be defined in annual basisCERN support of mails, printing, CERN AFS as for LHC experiments. Any license fees will be paid by AMS collaboration according to IT specsIT-DB, IT-CS may be called for consultancy within the limits of available manpowerStarting from LR-12 months the Collaboration will need more people to run computing facilities

    Alexei Klimentov. AMS TIM @ CERN. July 2004.

  • Year 2004 MC ProductionStarted Jan 15, 2004Central MC DatabaseDistributed MC ProductionCentral MC storage and archivingDistributed access (under test)SEU Nanjing, IAC Tenerife, CNAF Italy joined production since Apr 2004

    Alexei Klimentov. AMS TIM @ CERN. July 2004.

  • Y2004 MC production centers

    MC CenterResponsibleGB%CIEMATJ.Casuas204524.3CERNV.Choutko, A.Eline,A.Klimentov143817.1YaleE.Finch126815.1Academia SinicaZ.Ren, Y.Lei116213.8LAPP/LyonC.Goy, J.Jacquemier8259.8INFN MilanoM.Boschini, D.Grandi5286.2CNAF & INFN BolognaD.Casadei4415.2UMDA.Malinine2102.5EKP, KarlsruheV.Zhukov2022.4GAM, MontpellierJ.Bolmont, M.Sapinski1411.6INFN Siena&Perugia, ITEP, LIP, IAC, SEU, KNUP.Zuccon, P.Maestro, Y.Lyublev, F.Barao, C.Delgado, Ye Wei, J.Shin1351.6

    Alexei Klimentov. AMS TIM @ CERN. July 2004.

  • MC Production Statistics97% of MC production doneWill finish by end of JulyURL: pcamss0.cern.ch/mm.html185 days, 1196 computers8.4 TB, 250 PIII 1 GHz/day

    ParticleMillion Events% of Totalprotons763099.9helium375099.6electrons128099.7positrons1280100deuterons250100anti-protons 352.5100carbon 291.597.2photons128100Nuclei (Z 328)856.285

    Alexei Klimentov. AMS TIM @ CERN. July 2004.

  • Y2004 MC Production HighlightsData are generated at remote sites, transmitted to AMS@CERN and available for the analysis (only 20% of data was generated at CERN)Transmission, process communication and book-keeping programs have been debugged, the same approach will be used for AMS-02 data handling 185 days of running (~97% stability)18 Universities & Labs8.4 Tbytes of data produced, stored and archivedPeak rate 130 GB/day (12 Mbit/sec), average 55 GB/day (AMS-02 raw data transfer ~24 GB/day)1196 computersDaily CPU equiv 250 1 GHz CPUs running 184 days/24h Good simulation of AMS-02 Data Processing and AnalysisNot tested yet :Remote access to CASTORAccess to ESD from personal desktops TBD : AMS-01 MC production, MC production in Y2005

    Alexei Klimentov. AMS TIM @ CERN. July 2004.

  • AMS-01 MC ProductionSend request to [email protected] meeting in Sep, the target date to start AMS-01 MC production October 1st

    Alexei Klimentov. AMS TIM @ CERN. July 2004.

    Last years conclusion that network use is not limited by available bandwidth would seem to be true again this year. And, although it is not an innovative use, bulk data transfer via the network is increasing. The transfer of BaBar data from SLAC to IN2P3 is an interesting case. No direct second copy of the raw data is made on tape at SLAC, but a backup copy is made as a byproduct of transferring the data is transferred to IN2P3 for analysis.This major change in usage is one factor that prevents any comparison of the 2000 traffic breakdown by protocol with those of 1998 and 1999. Even if we could leave BaBar data transfer aside, though, there is another change since 1999 that prevents direct comparisonthe increasing use of encryption.ssh, the encrypted version of the rsh remote shell protocol, is now widely used in the HEP world. Some labs, SLAC and Fermilab for example, even insist that only ssh is used in order to prevent cleartext passwords being sent across the network. In principle, this isnt a problemssh traffic should be counted as interactive traffic. However, if we do this then interactive traffic increases by factors of 10-15 between 1999 and 2000. Perhaps this is what has happened, but it is more likely that other traffic is hidden in the encrypted transfers. For example, traffic from both scp (secure copy) and rsync (a utility to mirror filesystems) is recorded as ssh traffic by the routers.[2001: Not updated this time round See notes for slide 30.]Taking such a data centric point of view, computer in AMS exist to: generate data (Monte Carlo simulations) : collect data: process data: analyse data : present data: exchange and communicate dataEach function is clearly tied to the different stages in moving from the raw data produced by an experiment to the final result, a published physics paper.Tony Cass lecture to CERN summer studentsGraphics courtesy of Les Robertson.Symmetric Multi-Processor (SMP) systems are, essentially, mainframe computers. These systems combine CPU, memory and high I/O capacity in one box with all resources (including external storage devices) equally available to all processors (and hence transparently available to all processes). SMP systems can meet all the requirements of the Reconstruction and Batch Analysis steps and even have some advantages. In particular, management of a single system is relatively easy although one must be careful about taking on board a proprietary operating system which leads to dedicated support staff for each type of machine. Unfortunately, SMP systems are expensive. Close integration of CPUs and I/O devices with the appropriate bandwidth takes effort and system costs reflect this. Also, few applications elsewhere require this integration and so the development costs contribute significantly to the purchase price.Another important point is that I/O device support is often limited to those devices supplied by the SMP vendor. Even if 3rd party devices can be used support is often patchy. Supporting storage devices through external servers (as illustrated) limits I/O throughput to external network rates rather than internal bandwidths.Graphics courtesy of Bernd Panzer-Steindel.Graphics courtesy of Les Robertson and Tim Smith.Last years test cluster has been formalised as the lxshare cluster. Here, instead of resources being allocated by a job scheduler, nodes are timeshared between experiments to provide resources for major tests.