ATLAS OpenLab Ideas Mario Lassnig (CERN), Sami Kama (SMU), Eric Lancon (CEA Saclay), Simone Campana (CERN), Graeme Stewart (U Glasgow), Walter Lampl (U

ATLAS OpenLab IdeasMario Lassnig (CERN), Sami Kama (SMU),

Eric Lancon (CEA Saclay), Simone Campana (CERN),Graeme Stewart (U Glasgow), Walter Lampl (U Arizona),

[email protected]@cern.ch

mailto:[email protected]

mailto:[email protected]

Disclaimer

● These slides represent

… a collection of preliminary ideas… by a limited number of people… from various ATLAS activities.

● We consider these topics

… long-term,… research-driven,… and disruptive.

2

Introduction

As you probably already know upgrades to the LHC will lead to ● More collisions and more data in next run periods● Which means more computing and more storage requirements● Which can be addressed by 3 things

3

The challenge is to achieve more computing and storage with a similar or less budget than today

Topics● Computing on detector● Platform independence

○ New Hardware

● Software Tools○ Training○ Profiling○ Checking/Validation

● Networking● Machine learning

4

● Future (~current) detectors contain embedded compute resources based on commodity chips (such as FPGA) with custom designs

○ Custom programs

○ Hard to debug/design/change

● Would it be possible to use radiation hard/resistant extremely low power generic chips supporting computing standards

○ IoT chips?

○ Fast intercoms

● Use detector itself as a massively parallel computer?

Computing on Detector

5

Platform independence

● Most LHC code is bound to x86 architecture. That is leaving out○ GPGPUs○ ARMs○ PowerPCs○ Other architectures?

● Multitude of APIs, both open and proprietary○ Vulkan, OpenCL, OpenMP, CUDA, Mantle, …

● Porting software to such platforms will enable more compute resources○ Geant4 on PHIs○ Many-core ARM racks○ Backfill mode HPCs

6

Software tools

● Better utilization of existing platforms, also including training○ Vectorization○ Threading

● Profilers/tools to guide design choices○ Intel tools, GooDa, OProfile, …○ Others?

● Compilers○ Clang, Intel○ Others?

● Static checkers (Coverity), continuous builds, dead/obsolete code identification, Memory profilers?

● Version control, code review, …

7

Networking

● Network is currently mostly used to move data between storage systems○ Jobs run where the data is located

○ Failover to retrieve data from remote sites is possible, though not the primary access path

● Two potential improvements that go hand-in-hand○ Improve the distribution of data to free up network resources

○ Make extensive use of the now available network resources for remote data access

● Two focus areas○ Long flows: Content-delivery networks (CDNs)

○ Short flows: Edge computing

8

Networking — CDNs

● Multimedia companies are the incumbent users and innovators of CDNs○ YouTube/Netflix (VoD), Steam/Origin (Videogames), Twitch/NvidiaGRID (Video broadcasting), …○ Data volumes are comparable to our datasets, sometimes even exceed it!○ But, we don't have a billion USD computing budget …

● Learn from industry best-practices○ Geographically dispersed caching strategies based on social metrics, not technical ones○ Multicast of datasets to multiple destinations — Feasible over the WAN? LAN?○ Approximate correctness — How to deal with partially available / in-flight files?

● Why don't we just do X like company Y?○ Our macro access patterns do not follow the common assumptions — Neither Power nor Zipf's law○ Large-scale LRU-style caching thus not effective

■ Data delivery is strongly dependent on analysis workflows■ If it has been read recently, most likely it's not needed again for some time

9

● Moving towards a remote-IO world requires an even better network○ Make the network aware about the storage and computation demands○ For many data access paths, we only need parts of the file

● All our data access software has intelligence built in○ Usually, data is serialised hierarchically using ROOT trees○ Access path libraries highly optimised — only read selected branches or single events

● Can we offload some of this intelligence into the hardware?○ Reduce latency and increase bandwidth for random IO○ Bring the data closer to the NIC — Cache the file on/close-to the NIC via 3D NANDs○ Programmable NIC — Structure of the 3D NAND and layout of the serialised bytestream○ Edge computing — no need to go "into" the datacentre anymore for the duration of file interaction

Networking — Edge computing 1/2

10

● We are not benefitting from Software-Defined Networks (SDNs) at the moment● Our WANs are optimised for long-flows with large MTUs● Short flows for remote random IO are disruptive

○ IP fragmentation○ Packet reordering

● Decrease of bandwidth, increase of latency● Could a NIC/Switch/NIC infrastructure automatically respond to short flows?

○ Traffic shaping○ VLAN tagging○ IETF Geneve — Automatic network virtualisation

Networking — Edge computing 2/2

11

Machine Learning

● Machine Learning for physics analysis○ E.g., track and vertex reconstruction, physics process and entities classification, …○ Follow the activities here: http://iml.cern.ch○ Next workshop: http://indico.cern.ch/event/395374/

● Machine Learning for computing○ Improve resource usage to reduce costs○ "Easy" to do in constrained environments — global heterogeneous systems are a different story○ Three main areas

Storage — Dynamic data placementNetwork — Data access pathsHuman — Operational support 24/7

12

http://iml.cern.ch/

http://indico.cern.ch/event/395374/

Machine Learning — Storage and network

● Optimum placement of files given a storage volume constraint○ Solved theoretically some time in the 1960s… but why is it still difficult?

● We do not have full system state knowledge for optimal point-in-time decisions○ Our current dynamic data placement / popularity system is not handling all of our cases○ Approximate a "good" solution — unfortunately orthogonal cases of "good"

● How can machine learning help?○ Classification — e.g, which data should be replicated more often, which data should be deleted?○ Regression — e.g., what's the expected bandwidth for a link? expected IO rate for a storage system?○ Clustering — e.g., which data should be kept together for a particular analysis?○ Reduction — e.g., which infrastructure metrics influence the placement policy the most?

● Eventually, a hybrid ML model using all these results should converge towards○ the least amount of data that has to be stored resident, ○ so we can use the remaining capacity for cached/on-demand compute

13

Machine Learning — Human

● Emergent properties are our daily business — the result of complex systems○ System A degrades, System B fails as a result, System C tries to recover B's failure, but causes …

● Manual post-facto analysis, post-mortems, operational procedures take time● Through supervised or reinforcement learning, teach the system to

○ Automate anomaly detection○ Escalate preventive measures through anomaly prediction○ Submit early warnings to shifters/operators

● Through unsupervised learning○ Detect correlations between system behaviours○ Classify categories for reactive operations

● Large overlap with data mining community — cooperation will be very welcome

14

THANK YOU

Documents

ATLAS OpenLab Ideas Mario Lassnig (CERN), Sami Kama (SMU), Eric Lancon (CEA Saclay), Simone Campana (CERN), Graeme Stewart (U Glasgow), Walter Lampl (U