Upload
branden-fleming
View
255
Download
0
Tags:
Embed Size (px)
Citation preview
Run II DZERO DAQ / Level 3 Trigger System
Ron Rechenmacher, FermilabThe DØ DAQ Group (Brown/FNAL-CD/U.of Washington)
CHEP03
3/24/2003 Ron Rechenmacher, Fermilab Slide 2
What to Expect
DAQ/trigger system overview
Hardware, software description
Performance
Lessons learned
Scaling to the future
3/24/2003 Ron Rechenmacher, Fermilab Slide 3
D0 at Fermilab
DØ
3/24/2003 Ron Rechenmacher, Fermilab Slide 4
D0 DAQ at FermilabCollision rate of 7.6 MHz
3 level trigger system
L1/L2 reduce rate to 1KHz into L3
250 MB/s average event data rate into L3
50 Hz and 12.5 MB/s output from L3
L3/DAQL3/DAQ(Commodity HW/SW)(Commodity HW/SW)
L1/L2L1/L2(Custom Hardware)(Custom Hardware)
Data TapeData Tape7.5 MHz7.5 MHz 1 KHz1 KHz 50 Hz50 Hz
3/24/2003 Ron Rechenmacher, Fermilab Slide 5
Commodity DAQ/L3 System
We chose a good mix of hardware and software and built a system that easily met the 250KB @ 1KHz (=250MB/sec) requirement.
• Great depth of software development tools and methodologies.
Commodity software development environment
Commodity hardware
3/24/2003 Ron Rechenmacher, Fermilab Slide 6
DAQ/trigger System Overview
3/24/2003 Ron Rechenmacher, Fermilab Slide 7
Mechanically supported in crate by custom 9U “Extender” board
933 MHz 933 MHz CPUCPU
128 MB 128 MB Flash ROMFlash ROM
128 MB 128 MB RAMRAM
““PMC” slotPMC” slot(filled with (filled with
BVM I/O BVM I/O module)module)
VME to VME to PCI PCI
(Universe II)(Universe II)
Commodity Single Board Computer “SBC”
Dual Dual 100Mb 100Mb
EthernetEthernet
J3J3
connectoconnectorr
SBCSBCFront-panelFront-panel
connectionsconnections
StatusStatus
lightslights
3/24/2003 Ron Rechenmacher, Fermilab Slide 8
Hardware Description - Switches6509 (single central switch)• 16 GB/s backplane• 9 module slots
• 8 port Gb (fiber or copper)• 48 port 100Mb/s
• 112MB/48 ports for output buffering
2948G (currently 5 of these in the system)• 48 100Mb ports and 2 Gb ports• “Concentrator” switch
• Combines data from up to 20 100mb/s inputs into 2 Gb outputs• No packet loss possible if limited to 20 inputs
Capacity well exceeds D0 requirements
3/24/2003 Ron Rechenmacher, Fermilab Slide 9
Hardware Description - Nodes
82 nodes in total, currently
Dual CPU
1 GB RAM
1 GHz PIII / AMD Athlon 2000’s
Dual ethernet
Cost effective
3/24/2003 Ron Rechenmacher, Fermilab Slide 10
Developed Software Description Multiple runs can be configured simultaneouslyConnections to monitor server (talk on Thursday)All connections TCP• Auto re-connection
Application buffer trackingComponents of the system can be restarted ‘on the fly’
Event Data
Rou
ting Cra
te-li
sts
Buffer-info
Confi
gura
ti on
SelectedEvents
Configuration
RoutingMaster
D0 RunControl
NodesSBCs
Runs
To Tape
DAQsupervisor
Monitoring
3/24/2003 Ron Rechenmacher, Fermilab Slide 11
Software InfrastructureLinux 2.4 kernel• Modifiable
• One arp patch
• Easy development• Kernel debugging – KGDB
TcpdumpFermi Linux Trace• Complete system – kernel <-> users space interaction
Rgang• Single executable• Parallel remote execution and file copy for “farms”
3/24/2003 Ron Rechenmacher, Fermilab Slide 12
Software TRACEroot@d0sbc001b:/proc/trace>cat buffer | head –20 count timeStamp PID TraceName CPU lvl message ------------------------------------------------------------------------ 1 1048198375378446 1620 KERNEL 0 30 exit do_softirq 2 1048198375378425 1620 KERNEL 0 30 enter do_softirq 3 1048198375378422 1620 KERNEL 0 30 exit handle_IRQ_event irq=5 4 1048198375378411 1620 KERNEL 0 30 enter handle_IRQ_event irq=5 5 1048198375377887 1339 KERNEL 0 31 sched: prev=1339 next=1620 6 1048198375377788 1339 KERNEL 0 30 exit do_softirq 7 1048198375377780 1339 KERNEL 0 30 enter do_softirq 8 1048198375377779 1339 KERNEL 0 30 exit handle_IRQ_event irq=5 9 1048198375377771 1339 KERNEL 0 30 enter handle_IRQ_event irq=5 10 1048198375377716 1339 l3xetg 0 8 Node idx 55: total=3 delta=1 latency=0 11 1048198375377688 1339 KERNEL 0 30 exit do_softirq 12 1048198375377686 1339 KERNEL 0 30 enter do_softirq 13 1048198375377684 1339 KERNEL 0 30 exit handle_IRQ_event irq=5 14 1048198375377680 1339 KERNEL 0 30 enter handle_IRQ_event irq=5 15 1048198375377668 1620 KERNEL 0 31 sched: prev=1620 next=1339 16 1048198375377646 1339 KERNEL 0 31 sched: prev=1339 next=1620 17 1048198375377615 1339 KERNEL 0 30 exit do_softirqroot@d0sbc001b:/proc/trace>echo KERNEL=0x0fffffff >|level
3/24/2003 Ron Rechenmacher, Fermilab Slide 13
Software Logic Analyzer
3/24/2003 Ron Rechenmacher, Fermilab Slide 14
ControlDØ Run Control sends configuration commands to Level 3• Level 3 is a black box to the rest of
DØ
DZERORun Control
Level 3/DAQSupervisor
NodeNodeNodeNode
RoutingMaster
Level 3 Supervisor configures L3/DAQ system• Allows configuration of multiple run.
All components can crash or reboot at any time• System will automatically
reconfigure without contacting run control.
3/24/2003 Ron Rechenmacher, Fermilab Slide 15
MonitoringMonitoring
Example use A status display in the Control Room (or your living room!)
All components of the DAQ are clients
The Server caches recent queries to limit the load on clients
There are many displays, each serving a specialized purpose (uMon, l3xqt, history, systray, and web pages)
Based on TCP/IP, ACE, and XML
Real-time “Trace”
Example useSee the event numbers that were in an SBC’s buffers just before some glitch occurs
Combines low-level debugging information and log-file entries in a single real-time circular buffer
The buffer can be “frozen” by either software or hardware triggers
A system-wide display has been demonstrated and is under development
Example useUnderstand why a node was not connected to an SBC yesterday
Centralized, accessible, and time-stamped
Errors go to SES
Log-files
XMLServer
Clients
Displays WebPages
Talk on this topic on Thursday
Able to control the amount to log files
3/24/2003 Ron Rechenmacher, Fermilab Slide 16
Performance
Current rate is 400Hz with 300KB events• 120 MB/sec
Subset at 2KHz with smaller events
Subset utilizing dual ethernet using large events
3/24/2003 Ron Rechenmacher, Fermilab Slide 17
Performance
`Yearly' Graph
Percent backplane utilization Percent backplane utilization
3/24/2003 Ron Rechenmacher, Fermilab Slide 18
Lessons Learned
R&D (system’s analysis) goes a long way
(VME) systems integration expertise goes along way• Transcend sub-system boundaries
TCP expertise needed• 200 ms dropped packet problem
• TCP not tuned for ‘real-time’ applications by default• TCP_RTO_MIN parameter and others need tuning
• Understanding of Linux Kernel and TCP tools
Track all software/configuration
3/24/2003 Ron Rechenmacher, Fermilab Slide 19
R&D
Significant upfront analysis/investigation
“To the metal” understanding/expertise
Basis for smooth integration
3/24/2003 Ron Rechenmacher, Fermilab Slide 20
R&D - VME SBCsUniverse II• VMETRO studies
Linux interrupt latency measurements
VMETRO VBT-325C VME Trace Sampling: STATE at MiddleVMETRO VBT-325C VME Trace Sampling: STATE at Middle Trace Search Jump Count Format Markers Window Quit Help Trace Search Jump Count Format Markers Window Quit Help +DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDVME#1DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD++DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDVME#1DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD+|DDDDDDDDDTimeDDDDDDDBgLDDAMDAddressDDDataDDDDESizeDDDCycleDStatDIRQ7:1*DIackD>||DDDDDDDDDTimeDDDDDDDBgLDDAMDAddressDDDataDDDDESizeDDDCycleDStatDIRQ7:1*DIackD>|| -17 1.35us ---- 39 ..21A004 ....0001 WORD RD OK ....... ---- ^| -17 1.35us ---- 39 ..21A004 ....0001 WORD RD OK ....... ---- ^| -16 47.41us ---- 39 ..224012 ....0500 WORD RD OK ....... ---- #| -16 47.41us ---- 39 ..224012 ....0500 WORD RD OK ....... ---- #| -15 49.97us ---- 39 ..22C012 ....0500 WORD RD OK ....... ---- #| -15 49.97us ---- 39 ..22C012 ....0500 WORD RD OK ....... ---- #| -14 46.61us ---- 39 ..234012 ....0500 WORD RD OK ....... ---- #| -14 46.61us ---- 39 ..234012 ....0500 WORD RD OK ....... ---- #| -13 165.04ms ---- 39 ..21A004 ....0001 WORD RD OK ....... ---- #| -13 165.04ms ---- 39 ..21A004 ....0001 WORD RD OK ....... ---- #| -12 1.15us ---- 39 ..21A004 ....0001 WORD RD OK ....... ---- #| -12 1.15us ---- 39 ..21A004 ....0001 WORD RD OK ....... ---- #| -11 1.15us ---- 39 ..21A004 ....0001 WORD RD OK ....... ---- #| -11 1.15us ---- 39 ..21A004 ....0001 WORD RD OK ....... ---- #| -10 1.35us ---- 39 ..21A004 ....0011 WORD RD OK ....... ---- #| -10 1.35us ---- 39 ..21A004 ....0011 WORD RD OK ....... ---- #| -9 46.93us ---- 39 ..224012 ....0500 WORD RD OK ....... ---- #| -9 46.93us ---- 39 ..224012 ....0500 WORD RD OK ....... ---- #| -8 49.81us ---- 39 ..22C012 ....0500 WORD RD OK ....... ---- #| -8 49.81us ---- 39 ..22C012 ....0500 WORD RD OK ....... ---- #| -7 46.61us ---- 39 ..234012 ....0500 WORD RD OK ....... ---- #| -7 46.61us ---- 39 ..234012 ....0500 WORD RD OK ....... ---- #| -6 166.35ms ---- 39 ..21A004 ....0001 WORD RD OK ....... ---- #| -6 166.35ms ---- 39 ..21A004 ....0001 WORD RD OK ....... ---- #| -5 1.15us ---- 39 ..21A004 ....0001 WORD RD OK ....... ---- #| -5 1.15us ---- 39 ..21A004 ....0001 WORD RD OK ....... ---- #| -4 1.15us ---- 39 ..21A004 ....0001 WORD RD OK ....... ---- #| -4 1.15us ---- 39 ..21A004 ....0001 WORD RD OK ....... ---- #| -3 1.33us ---- 39 ..21A004 ....0001 WORD RD OK ....... ---- #| -3 1.33us ---- 39 ..21A004 ....0001 WORD RD OK ....... ---- #| -2 46.45us ---- 39 ..224012 ....0500 WORD RD OK ....... ---- #| -2 46.45us ---- 39 ..224012 ....0500 WORD RD OK ....... ---- #| -1 50.77us ---- 39 ..22C012 ....0500 WORD RD OK ....... ---- | -1 50.77us ---- 39 ..22C012 ....0500 WORD RD OK ....... ---- | HALT 47.09us ---- 39 ..234012 ....0500 WORD RD OK ....... ---- V| HALT 47.09us ---- 39 ..234012 ....0500 WORD RD OK ....... ---- V+DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD++DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD+Ok. <PF2=Menu> <^W=Nxt wnd> | | | Ok. <PF2=Menu> <^W=Nxt wnd> | | |
Efficient OS access• Writev
• Limit memory copying
Linux RT scheduling
3/24/2003 Ron Rechenmacher, Fermilab Slide 21
R&D – Switches/ethernet
VME to ethernet• Rate and CPU
We analyzed the architecture of the 6509• buffering increased at the last minute and turned out not
to be an issue• Prepared to control required buffering via control of TCP
window size
Round trip messages passing timings
Tests done under Linux
3/24/2003 Ron Rechenmacher, Fermilab Slide 22
ExpandingRoom to grow• This system could easily double
Utilization indicatorUtilization indicator
3/24/2003 Ron Rechenmacher, Fermilab Slide 23
RoutingMaster
DAQNodes
SBCs
Get info from Emperor and pass to SBCs
RoutingMaster
DAQ Nodes
SBCs
SBCs
DAQ Nodes
Event Nodes
Event Nodes
Event Nodes
Routing
EmperorEvent Node Groups
Node Master
Node Master
Node Master
Tell DAQ nodes which event node to use
Advertise total free buffers to the Emperor
Emperor… for each event:
Pick an Event Node Group (ENG) with the most free buffers
Inform the NM and RMs of the choice
Scaling
3/24/2003 Ron Rechenmacher, Fermilab Slide 24
SummaryCommodity-based ethernet DAQ built for D0• 250 MB/s: 1 KHz of 250 KB events• 63 sources and >80 targets
Commodity (ethernet) systems • wow, a lot of stuff can show up!
You need a TCP/IP expert or twoPeople that can transcend boundaries“to the metal” understandingInfrastructure
3/24/2003 Ron Rechenmacher, Fermilab Slide 25
References / Additional InformationFermitools• http://fermitools.fnal.gov
Buffering• http://www-d0.fnal.gov/cgi-bin/cvsweb.cgi/~checkout~/l3xsbc/doc/buffering/index.html?rev=HEAD&content-type=text/html
DAQ Scaling, DAQ Overview, sci2002• http://d0.phys.washington.edu/~haas/d0/L3/
L3DAQ Homepage• http://www-d0online.fnal.gov/www/groups/l3daq/default.html
L3 switch backplane load• http://m-d0-mrtg.fnal.gov/s-d0-dab2cr-l3/s-d0-dab2cr-l3.backp2.html
The D0 Experiment• http://www-d0.fnal.gov/
D0 Run II Operations• http://www-d0.fnal.gov/runcoor/
3/24/2003 Ron Rechenmacher, Fermilab Slide 26
Extra Slides Follow
3/24/2003 Ron Rechenmacher, Fermilab Slide 27
Monitoring
Centralized, caching, monitor server.
Based on TCP/IP, ACE, and XML
Supports many displays and clients• 40 displays simultaneously
• 200 data sources
Talk and poster on this topic Thursday.c
3/24/2003 Ron Rechenmacher, Fermilab Slide 28
Performance
`Weekly' Graph (2 Hour Average)
Max Max 5-min. 5-min.17.0 % Average 17.0 % Average 5-min. 5-min.4.0 % Current 4.0 % Current 5-min. 5-min.0.0 %0.0 %
Max Max 5-min. 5-min.1.0 % Average 1.0 % Average 5-min. 5-min.0.0 % Current 0.0 % Current 5-min. 5-min.0.0 % 0.0 %
3/24/2003 Ron Rechenmacher, Fermilab Slide 29
Software Description - RM
Event routing (Routing Master)
• Receives “run” information from supervisor• Farm node list and crate list per bit
• Gets bit fired by event# from TFW
• Receives no. of free buffers from each farm node
• Decides which nodes receive which events
• Sends routing info by event# to SBCs
• Sends crate list by event# to farm nodes
• Disables triggers when necessary