21
DAQ For Water Cherenkov Detectors Dr Benjamin Richards ([email protected])

DAQ For Water Cherenkov Detectors€¦ · ch RBU Offsite backup/ processing Run control / monitoring 19 SN TPU X ~2500 X ~70 X ~26 X 4 TMEX 2018 . Hyper-K Design TMEX 2018 20 TPU

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DAQ For Water Cherenkov Detectors€¦ · ch RBU Offsite backup/ processing Run control / monitoring 19 SN TPU X ~2500 X ~70 X ~26 X 4 TMEX 2018 . Hyper-K Design TMEX 2018 20 TPU

DAQ For Water Cherenkov Detectors

Dr Benjamin Richards ([email protected])

Page 2: DAQ For Water Cherenkov Detectors€¦ · ch RBU Offsite backup/ processing Run control / monitoring 19 SN TPU X ~2500 X ~70 X ~26 X 4 TMEX 2018 . Hyper-K Design TMEX 2018 20 TPU

DAQ specifics for water Cherenkov

DAQ software can sometimes be a low priority in experiment design

Many aspects of DAQ are the same no matter the experiment

However in water Cherenkov detectors:

• Our data is less dense

• More physically separated

• Triggering is more complex and further from hardware

• Reliability is very important

• Large data buffers are required

TMEX 2018 2

Detectors FEE DAQ Disk

Page 3: DAQ For Water Cherenkov Detectors€¦ · ch RBU Offsite backup/ processing Run control / monitoring 19 SN TPU X ~2500 X ~70 X ~26 X 4 TMEX 2018 . Hyper-K Design TMEX 2018 20 TPU

My Prerequisites

Find a DAQ framework that could provide the specific qualities for Cherenkov detectors as well as being modular, easily scalable with large customizability and fault tolerance

• Hardware test stands (few PMTs)

• ANNIE experiment (30-200 PMTs)

• E61 Experiment (~7,500 PMTs [365 MPMTs])

• Hyper-K experiment (~40,000 PMTs)

Unfortunately I couldn’t find a good fit.

TMEX 2018 3

Page 4: DAQ For Water Cherenkov Detectors€¦ · ch RBU Offsite backup/ processing Run control / monitoring 19 SN TPU X ~2500 X ~70 X ~26 X 4 TMEX 2018 . Hyper-K Design TMEX 2018 20 TPU

ToolDAQToolDAQ is an open source DAQ Framework developed in the UK.

It was deigned to incorporate the best features of other DAQ whilst:

1. Being very easy and fast to develop DAQ implementations in a very modular way.

2. Including dynamic service discovery and scalable network infrastructure to allow its use on large scale experiments.

Features• Pure C++• Fast Development• Very Lightweight• Modular • Highly Customisable / Hot swappable modules• Scalable (built in service discovery and control)

• Fault tolerant (dynamic connectivity, discovery, message caching)

• Underlying transport mechanisms ZMQ (Multilanguage Bindings)

• JSON formatted message passing• Few external dependencies (Boost, ZMQ)

4TMEX 2018

Page 5: DAQ For Water Cherenkov Detectors€¦ · ch RBU Offsite backup/ processing Run control / monitoring 19 SN TPU X ~2500 X ~70 X ~26 X 4 TMEX 2018 . Hyper-K Design TMEX 2018 20 TPU

Points of focus

• Modularity / Customisability

• Scalability / Dynamisms

• Fault Tolerance / Error Correcting

TMEX 2018 5

Page 6: DAQ For Water Cherenkov Detectors€¦ · ch RBU Offsite backup/ processing Run control / monitoring 19 SN TPU X ~2500 X ~70 X ~26 X 4 TMEX 2018 . Hyper-K Design TMEX 2018 20 TPU

ToolDAQ Modularity and Customisability In Structure

Tool = Modular classes that make up your program

ToolChain = Class that holds the modular Tools

DataModel = Shared / transient data class. Any object/variable/instance in the DataModel class is shared between all tools

DataModel

Tool 1

ToolChain

Tool 2 Tool 3

TMEX 2018 6

Page 7: DAQ For Water Cherenkov Detectors€¦ · ch RBU Offsite backup/ processing Run control / monitoring 19 SN TPU X ~2500 X ~70 X ~26 X 4 TMEX 2018 . Hyper-K Design TMEX 2018 20 TPU

Tool 2

Tools

DataModel

Tool 1

ToolChain

Tool 2 Tool 3

Tool 1

ToolChain

Tool 2 Tool 3

Tool 1(Thread 0)

Thread 1

Thread 2

Thread 3 Socket Communication

TMEX 2018

7

Page 8: DAQ For Water Cherenkov Detectors€¦ · ch RBU Offsite backup/ processing Run control / monitoring 19 SN TPU X ~2500 X ~70 X ~26 X 4 TMEX 2018 . Hyper-K Design TMEX 2018 20 TPU

Dynamic Service Discovery

The core Framework has multiple threads that run both in the ToolChains and NodeDaemons that take care of all the control systems, service discovery, etc…

Dynamic service discovery lets every single Node Daemon, ToolChain and service know about each other via use of multicast beacons.

This is how remote control is achieved anywhere on the network

Node

Node Node Node

NodeNode

Multicast

[ UUID, Name , IP, Service, Port, Status, Timestamp ]

TMEX 2018 8

Page 9: DAQ For Water Cherenkov Detectors€¦ · ch RBU Offsite backup/ processing Run control / monitoring 19 SN TPU X ~2500 X ~70 X ~26 X 4 TMEX 2018 . Hyper-K Design TMEX 2018 20 TPU

Distributed Node Management & Hot SwappingMost DAQ systems will require multiple distributed nodes

Each can have multiple ToolChains running on them

So ToolDAQ has a node control and monitoring system

Node

Node Daemon

ToolChain 1

ToolChain 2

ToolChain 3

Node

Node

Node

Node

Node

Node

Node

Node

And allows connections and data flow to rerouted dynamically allowing for hot swapping

Multicast

TCP

TMEX 2018 9

Page 10: DAQ For Water Cherenkov Detectors€¦ · ch RBU Offsite backup/ processing Run control / monitoring 19 SN TPU X ~2500 X ~70 X ~26 X 4 TMEX 2018 . Hyper-K Design TMEX 2018 20 TPU

Fault Tolerance And Error Correcting

• ToolDAQ makes use of ZMQ to provide a fault tolerant scalable messaging

• Use of ZMQ, message buffering and Service discovery allows for creation of DAQ that can not just be fault tolerant but handle errors.

Simple Difficult

None Fault Tolerant

Error Handling

Difficulty

Network Error behaviour

TMEX 2018 10

Page 11: DAQ For Water Cherenkov Detectors€¦ · ch RBU Offsite backup/ processing Run control / monitoring 19 SN TPU X ~2500 X ~70 X ~26 X 4 TMEX 2018 . Hyper-K Design TMEX 2018 20 TPU

ANNIE

• Multiple asynchronous data sources MRD, Veto, PMTs LAPPDs • ADC Koto Boards• Camac TDC• PSec LAPPDs• Trigger stream

• Fault tolerant• Flexible to changes in data and

trigger

• Also used for analysis/reconstruction

TMEX 2018 11

TDC

Waveform ADC

Waveform ADC

VME FEE Crates

Veto

MRD

Water PMTs

Camac FEE Crates

Trigger Card

Disc

2

TDC

Waveform ADC

VMECPU

ANNIEDAQ01

Page 12: DAQ For Water Cherenkov Detectors€¦ · ch RBU Offsite backup/ processing Run control / monitoring 19 SN TPU X ~2500 X ~70 X ~26 X 4 TMEX 2018 . Hyper-K Design TMEX 2018 20 TPU

Phase 2 for the DAQ software

TMEX 2018 12

Slack output

PostgreSQL

TriggerNetwork Receive

DataMonitor

Data Recorder

VME Trigger Sender

Board Reader

Network Send Data

Trigger Lecroy

PostgreSQL

Server

File on Disk

VME ToolChain

MRD ToolChain

Main DAQ ToolChain

MRDHVInput

Variables

VME Trigger Sender

Board Reader

Network Send Data

VME ToolChain

Lecroy

Trigger Lecroy Root output

LecroyLecroy

Data sync

MonitorReceive

Data

Trigger Lecroy Root output

LecroyLecroy Data Calibrate

Plot Producer

Monitor ToolChain

PSEC

Trigger outputPSEC

Reader

PSEC ToolChain

Page 13: DAQ For Water Cherenkov Detectors€¦ · ch RBU Offsite backup/ processing Run control / monitoring 19 SN TPU X ~2500 X ~70 X ~26 X 4 TMEX 2018 . Hyper-K Design TMEX 2018 20 TPU

Phase 2 for the DAQ software

TMEX 2018 13

Slack output

PostgreSQL

TriggerNetwork Receive

DataMonitor

Data Recorder

VME Trigger Sender

Board Reader

Network Send Data

Trigger Lecroy

PostgreSQL

Server

File on Disk

VME ToolChain

MRD ToolChain

Main DAQ ToolChain

MRDHVInput

Variables

VME Trigger Sender

Board Reader

Network Send Data

VME ToolChain

Lecroy

Trigger Lecroy Root output

LecroyLecroy

Data sync

MonitorReceive

Data

Trigger Lecroy Root output

LecroyLecroy Data Calibrate

Plot Producer

Monitor ToolChain

PSEC

Trigger outputPSEC

Reader

PSEC ToolChain

Very Modular

Page 14: DAQ For Water Cherenkov Detectors€¦ · ch RBU Offsite backup/ processing Run control / monitoring 19 SN TPU X ~2500 X ~70 X ~26 X 4 TMEX 2018 . Hyper-K Design TMEX 2018 20 TPU

Phase 2 for the DAQ software

TMEX 2018 14

Slack output

PostgreSQL

TriggerNetwork Receive

DataMonitor

Data Recorder

VME Trigger Sender

Board Reader

Network Send Data

Trigger Lecroy

PostgreSQL

Server

File on Disk

VME ToolChain

MRD ToolChain

Main DAQ ToolChain

MRDHVInput

Variables

VME Trigger Sender

Board Reader

Network Send Data

VME ToolChain

Lecroy

Trigger Lecroy Root output

LecroyLecroy

Data sync

MonitorReceive

Data

Trigger Lecroy Root output

LecroyLecroy Data Calibrate

Plot Producer

Monitor ToolChain

PSEC

Trigger outputPSEC

Reader

PSEC ToolChain

Nice features

1. Integrated HV slow control

2. Self correcting SQL database

3. Slack alert system

Page 15: DAQ For Water Cherenkov Detectors€¦ · ch RBU Offsite backup/ processing Run control / monitoring 19 SN TPU X ~2500 X ~70 X ~26 X 4 TMEX 2018 . Hyper-K Design TMEX 2018 20 TPU

Phase 2 for the DAQ software

TMEX 2018 15

Slack output

PostgreSQL

TriggerNetwork Receive

DataMonitor

Data Recorder

VME Trigger Sender

Board Reader

Network Send Data

Trigger Lecroy

PostgreSQL

Server

File on Disk

VME ToolChain

MRD ToolChain

Main DAQ ToolChain

MRDHVInput

Variables

VME Trigger Sender

Board Reader

Network Send Data

VME ToolChain

Lecroy

Trigger Lecroy Root output

LecroyLecroy

Data sync

MonitorReceive

Data

Trigger Lecroy Root output

LecroyLecroy Data Calibrate

Plot Producer

Monitor ToolChain

PSEC

Trigger outputPSEC

Reader

PSEC ToolChain

1. Dynamically Scalable

2. Independent3. asynchronous

Page 16: DAQ For Water Cherenkov Detectors€¦ · ch RBU Offsite backup/ processing Run control / monitoring 19 SN TPU X ~2500 X ~70 X ~26 X 4 TMEX 2018 . Hyper-K Design TMEX 2018 20 TPU

Phase 2 for the DAQ software

TMEX 2018 16

Slack output

PostgreSQL

TriggerNetwork Receive

DataMonitor

Data Recorder

VME Trigger Sender

Board Reader

Network Send Data

Trigger Lecroy

PostgreSQL

Server

File on Disk

VME ToolChain

MRD ToolChain

Main DAQ ToolChain

MRDHVInput

Variables

VME Trigger Sender

Board Reader

Network Send Data

VME ToolChain

Lecroy

Trigger Lecroy Root output

LecroyLecroy

Data sync

MonitorReceive

Data

Trigger Lecroy Root output

LecroyLecroy Data Calibrate

Plot Producer

Monitor ToolChain

PSEC

Trigger outputPSEC

Reader

PSEC ToolChain

Fault tolerant Trigger and data synchronisation

Page 17: DAQ For Water Cherenkov Detectors€¦ · ch RBU Offsite backup/ processing Run control / monitoring 19 SN TPU X ~2500 X ~70 X ~26 X 4 TMEX 2018 . Hyper-K Design TMEX 2018 20 TPU

Phase 2 for the DAQ software

TMEX 2018 17

Slack output

PostgreSQL

TriggerNetwork Receive

DataMonitor

Data Recorder

VME Trigger Sender

Board Reader

Network Send Data

Trigger Lecroy

PostgreSQL

Server

File on Disk

VME ToolChain

MRD ToolChain

Main DAQ ToolChain

MRDHVInput

Variables

VME Trigger Sender

Board Reader

Network Send Data

VME ToolChain

Lecroy

Trigger Lecroy Root output

LecroyLecroy

Data sync

MonitorReceive

Data

Trigger Lecroy Root output

LecroyLecroy Data Calibrate

Plot Producer

Monitor ToolChain

PSEC

Trigger outputPSEC

Reader

PSEC ToolChainANNIE-DAQ01

VME-CPU

VME-CPU

Monitor node

Server redundancy

Page 18: DAQ For Water Cherenkov Detectors€¦ · ch RBU Offsite backup/ processing Run control / monitoring 19 SN TPU X ~2500 X ~70 X ~26 X 4 TMEX 2018 . Hyper-K Design TMEX 2018 20 TPU

Hyper-K

• Larger Beast• 40,000 channels• 2000 FEEs• 150 computing nodes• Dead timeless• Separate GPU Trigger farm• Readout buffering • Event Building

• Self maintaining

• Highly fault tolerant

TPU

TPU

SN TPU

Broker

EBUEBU

Broker

FEE

FEE

FEE

swit

ch

RBU

FEE

FEE

FEE

swit

ch

RBU

Offsite backup/ processing

Run control / monitoring

SN TPU

Node finderBroker Trigger

Sorter

Processor Thread

TPU Sender

Job distributor

Decision Sender

Negative trigger decisions sender

Decision processor

EBU trigger decisions sender

Fake decisionFake decisionFake decisionFake decisionFake decisionTPU communicator

RBU TPU EBU Master/Slave Thread

finder

1) Node finder Thread updates/ maintains connections to TPUs RBUs EBUs adding new ones when they appear and updating lists

It also maintains master slave relationship between brokers

3) Generates trigger job windows based on timestamp

(if external SN trigger) makes job window of snwatch window and triggers SN mode

(not sure if need a separate thread for this)

Run control / possible run

finder from sql

2) Activates run turn on and off, also possibly finds and distributes run number (initialise only with listening for run start stop on execute with noblock

(external SN mode trigger) receive SN watch messages

4) Jobs are distributed to next available TPU in order and results in sibtreads posted to an output list

(SN mode) no longer distribute job windows as handed over to EBU.Also possibly dedicated SN TPU for continuous Nhits search.

5) Based on trigger ouputlist messages are either sent to Available EBU list (same sas tpu with conformation) or if SN trigger SN mode. Where job windows distributed to EBU

TMEX 2018 18

Page 19: DAQ For Water Cherenkov Detectors€¦ · ch RBU Offsite backup/ processing Run control / monitoring 19 SN TPU X ~2500 X ~70 X ~26 X 4 TMEX 2018 . Hyper-K Design TMEX 2018 20 TPU

Current HK DAQ Design

Features:

• FEEs can be rerouted to be read out by any RBU

• Broker assisted communication between RBU TPU and EBU

• Master slave Broker redundancy

• Dynamic routing and fault tolerance (hot swapable)

• Ability to switch on and off parts of the detector from the data stream

• TPU farm (GPU based)

• Supernova buffer/Readout

• Centralised run control and monitoring/logging

TPU

TPU

TPU

Broker

EBUEBU

Broker

FEE

FEE

FEE

swit

ch

RBU

FEE

FEE

FEEsw

itch

RBU

EBU

Offsite backup/ processing

Run control / monitoring

19

SN TPU

X ~2500

X ~70X ~26

X 4

TMEX 2018

Page 20: DAQ For Water Cherenkov Detectors€¦ · ch RBU Offsite backup/ processing Run control / monitoring 19 SN TPU X ~2500 X ~70 X ~26 X 4 TMEX 2018 . Hyper-K Design TMEX 2018 20 TPU

Hyper-K Design

TMEX 2018 20

TPU

TPU

TPU

Broker

EBUEBU

Broker

FEE

FEE

FEE

swit

ch

RBU

FEE

FEE

FEE

swit

ch

RBU

EBU

Offsite backup/ processing

X ~2500

X ~70

X ~26

SN TPU

Page 21: DAQ For Water Cherenkov Detectors€¦ · ch RBU Offsite backup/ processing Run control / monitoring 19 SN TPU X ~2500 X ~70 X ~26 X 4 TMEX 2018 . Hyper-K Design TMEX 2018 20 TPU

Summary

• Watcher Cherenkov detectors have specific challenges to overcome

• Distributed buffering to solve large data buffering issues

• Intelligent distributed triggering (see next talk for algorithms)

• Reliability can be mitigated with redundancy and fault tolerant communications

• Scalability can be addressed with dynamic service discovery and scalable network layers TMEX 2018 21