Upload
mildred-cunningham
View
217
Download
0
Embed Size (px)
DESCRIPTION
FELIX Interface several GBT links to a high bandwidth industry standard network technology – today: Ethernet (40Gb/s) or Infiniband (56Gb/s) – then route logical data flows to/from different off-detector endpoints Separates GBT technology into a standard, fixed, but configurable, building block for several detectors. FELIX does not need to know the running state of the experiment or detailed data formats – high-reliability: always running Enables different logical data flows to be handled by different off-detector end points – In past done by using separate physical interconnects, e.g.CANbus Does not replace the ROD: – Except for GBT link handling, all ROD functions remain – ROD complexity is reduced: hope for a SW ROD Reduces the amount of custom HW in ATLAS in favour of COTS HW and SW L. LevinsonMuon Week, June 20153
Citation preview
Muon Week, June 2015 1
Interfacing to Front-End Links
Muon Week, June 2015
L. Levinson, for the FELIX group
L. Levinson
Muon Week, June 2015 2L. Levinson
Muon Week, June 2015 3
FELIXInterface several GBT links to a high bandwidth industry standard network technology – today: Ethernet (40Gb/s) or Infiniband (56Gb/s) –then route logical data flows to/from different off-detector endpoints
• Separates GBT technology into a standard, fixed, but configurable, building block for several detectors.
• FELIX does not need to know the running state of the experiment or detailed data formats – high-reliability: always running
• Enables different logical data flows to be handled by different off-detector end points– In past done by using separate physical interconnects, e.g.CANbus
• Does not replace the ROD:– Except for GBT link handling, all ROD functions remain– ROD complexity is reduced: hope for a SW ROD
• Reduces the amount of custom HW in ATLAS in favour of COTS HW and SW
L. Levinson
FELIX data paths
BUSY• Backpressure from network• BUSY signal from FE (optional)
…
Data• Configurable packet routing• Data multiplexing with sampling
DCS• Support HDLC for the SCA
…
TTC• Fixed latency• GBT clock derived from LHC clock
…
Commands• Able to address each E-link individually
“ROD”
L. Levinson Muon Week, June 2015 4
“L1DDC”
Muon Week, June 2015 5
Guiding principles• Goal: a GBT to LAN “switch” with as little awareness of detector specifics
as possible– No user code in FELIX: use configuration params for options
• Separate data movement from data processing• Configurable data paths from E-link to network endpoint• Scalable at Front end, Back end, and switch levels• Do as much as possible in software• COTS FPGA PCIe board (at least for the demonstrator)• Readout, control and monitoring data, so latency is not important,
but provide low latency links for trigger• Future proof by following COTS market, separately upgradeable
components:GBT, FPGA PCIe board, server PC, network cards
• Routing is for E-links or “streams”; they are the logical connections
L. Levinson
Muon Week, June 2015 6
What FELIX is, what it is not
L. Levinson
What it is• It’s main function is to route data received on serial FE links to different
destinations (DAQ, DCS, Monitoring, L1 Trigger processors …) and conversely to route data from different sources (including TTC) to the FE links• It should be mainly imagined as a switch with heterogeneous I/O
• It is detector independent• It is quasi-stateless
• It will be re-configurable (mainly for its routing algorithms)• BUT will not be tied to any ATLAS state machine state
• Except for HW failures it will always be on, as any other switch• Hence it will allow access to FE modules even outside of an ATLAS run
What it is not• It is not a complete replacement of the detector specific RODs• All detector-specific data processing or control functionality will be implemented
downstream of FELIX in code or HW under the responsibility of detector experts• FELIX will not touch the detector data
• It will only recognise different streams and fragments for routing them to the relevant destinations
Muon Week, June 2015 7
GBT in one slide• GBT is NOT intended as a point-to-point link multi-gigabit link:
Its goal is to aggregate many (up to 42) low speed links (as low as 80Mb/s), called E-links, onto one 4.8Gb/s optical link with strong error correction for a high radiation environment. (payload is 3.2G of 4.8G line speed)– i.e. data from many front end ASICs are collected onto one link– As well as event data, links can be used for DCS, TTC, calibration, configuration
• Fixed latency• The GBTx (rad tol) ASIC is intended for the Front end.
Compatible FPGA code is provided for the Back end.• A set of rad-tol opto-electrical converters are provided for interfacing to fibers.• E-links: bi-dir serial links to/from the GBTx ASIC at 80, 160 or 320Mb/s
Each uses 2, 4 or 8 data bits in the 84 (or 116)-bit GBT word – The GBT word is clocked at 40MHz
• Three modes: Normal mode: 84-bit word, 4-bit header, 32 bits error correcting codes Wide mode: 116-bit word, 4-bit header, NO error correction× (not used) “8b/10b” mode: 88-bit data word 8b/10b encoded to 110 bits +
10-bit “comma”
L. Levinson
Muon Week, June 2015 8
Routing• Simple/fixed routing:
FELIX port # / E-link-range → IPaddress / portor → Infiniband local identifier / queue pair
• E-link packet protocol: – Allows several logical streams to use the same E-link header with: stream-id + packet len (if not fixed length or framed packets)FELIX port # / stream-id → IPaddress / port…
• Configured via TCP control port on main or separate network.FELIX will run as “infrastructure” in ATLAS Run Control
• Stream-ID can include low bits of L1ID for round robin load balancing
• Cloning, broadcasts / multicasts possible, in either direction
• Reliable transport from FELIX to network end-point via TCPIP
• Configurable Quality of Service per logical connection– Less-than-best-effort for event data for monitoring
L. Levinson
Muon Week, June 2015 9L. Levinson
Every stream has an IP address:port‡
Muon Week, June 2015 10
ROD functions
L. Levinson
• ROD functions like event formatting, monitoring and detector control will NOT be implemented in FELIX HW• FELIX will forward data sent by the detector FE to separate network connected
units (either PCs or custom systems) running detector-specific code.• Notable differences with present approach:
• ROD FE input is Ethernet or Infiniband; this enables a SW ROD.• Different ROD functions can be implemented into different HW boxes
• E.g. one could have a single controlling workstation for a complete TTC partition while having several units for data handing
• Data may be aggregated into data handling units by L1ID rather than by geographical origin
• E.g. unlike present RODs, data handling units may receive all fragments for a certain L1Id but not all L1Ids
Muon Week, June 2015 11
TTC format for the Front End
L. Levinson
For Options 0, 1 & 3, the destination (ASIC) must decode the B-channel. The wider fields may be defined as multiple 2- or 4-bit E-links at lower bit rates.• Phase 2 requirements under discussion: Caveat: L1A no longer has fixed latency.
L1A is defined by a BCID or L0ID which must be sent as a message.L0A does have fixed latency
• Note: E-link clock can be 40MHz, but, for example, the 4-bit field can be receivedat 160 Mb/s if the receiver generates a ×4 multiple of the 40MHz E-link clock.
• FELIX will interface to TTC and copy TTC information into an E-link.• Sent to all E-links with the “TTC” attribute, including E-links to network• Fixed latency for E-links to Front End (not fixed to network endpoints)
• TTC E-link options for Phase 1:option bit 0 bit 1 bit 2 bit 3 bit 4 bit 5 bit 6 bit 7
0 2 bits L1A B-chan 1 4 bits L1A BCR ECR B-chan2 4 bits L1A BCR ECR Brcst[2]3 8 bits L1A BCR ECR Brcst[2] Brcst[3] Brcst[4] B-chan4 8 bits L1A BCR ECR Brcst[2] Brcst[3] Brcst[4] Brcst[5] Brcst[6]
Muon Week, June 2015 12
ROD BUSY
FELIX will interface to present and future BUSY networks.
• FELIX will assert RODBUSY if any of its input buffers becomes almost full.• FE boards can raise RODBUSY via E-links if they choose to do so:
– Frontend devices may send a BUSY-ON or BUSY-OFF command on a dedicated stream with the BUSY attribute, or imbed BUSY-ON/OFF 8b/10b symbols in any 8b/10b stream
• Devices downstream from FELIX (e.g. a ROD) are still able to assert the usual ATLAS hardware RODBUSY, if desired.– Note that the ROD-FELIX connection is TCPIP, which has flow control,
and that FELIX buffers are in PC memory which is large.
• Other devices may request, via TCP, that FELIX assert its RODBUSY for a defined time, or until a de-assert message is sent.
L. Levinson
Muon Week, June 2015 13
FELIX block diagram
L. Levinson
Split the routing problem: • Receive and separate GBT E-link streams in an FPGA• Route the streams to the network in a PC
Muon Week, June 2015 14
FELIX Demonstrator Test results presented in the Preliminary Design Review on 29 May.
Hardware• Commercial FPGA PCIe board: Hitech Global HTG-710 (Virtex7 690T)
– Custom TTC receiver and clock & data recovery mezzanine card• Intel server PC• Commercial network card: Mellanox dual-port 40Gb/s Ethernet or Infiniband
Firmware• GBT and TTC firmware• Central Router• FPGA PCIe DMA core• Configuration, status and housekeeping
Software• PCIe driver based on new ROBINnp; other ROBIN SW• Software pipelines for building packets and routing them to the desired network
end points.L. Levinson
Muon Week, June 2015 15
Virtex-7 PCIe board
L. Levinson
From Global HiTech, PCIe gen 3, 8 lanesVirtex X690T2x12 bidir CXP connectors, FMC
TTC FMC: fiber in:clock & data recovery, jitter cleaner, Busy out
Muon Week, June 2015 16
Demonstrator status• For Phase 1 requirements only
– No low latency, no scheduled xfers, so far no full-mode– Includes TTC distribution
• The following components were tested:– 16 GBT cores built with proper clock structure– Central E-link router from FE to PCIe
• Fake data generator to drive data from Router thru’ to the PC– PCIe DMA for data and programmed transfers for config (now in OpenCores)– Linux PCIe driver for HitechGlobal board, based on RobinNP– FELIX “application”: E-link data capture and transmission out via TCP
• Work done on optimizing reading E-link buffers and xmit to network– First integration with ATLAS run control
• Successful tests with 4-GBT design (16 coming soon):– Full path from fiber input to FELIX PC– L1 Accept from TTCvi to FELIX and then, via fiber, to a Front end emulator FPGA– From GBT source emulator in the FPGA to PC on the network
L. Levinson
Muon Week, June 2015 17
Test of the data path
L. Levinson
• GBT Driver inside the FELIX FPGA• Loop-back inside the emulated FE• Route through centralRouter and DMA to host memory• Transfer over the network up to PC data sink PCNext• Scale the system• Optimize performance
Muon Week, June 2015 18
Reception of L1A by Front End emulator
L. Levinson
Muon Week, June 2015 19
PCIe Packet Processing in the PC
L. Levinson
Muon Week, June 2015 20L. Levinson
Work in progress: In t e g r a t io n w i th TD AQ f r a m e w o r k
Muon Week, June 2015 21
FELIX development team
L. Levinson
Argonne• John Anderson• Soo Ryu• Jinlong Zhang
Brookhaven• Hucheng Chen• Kai Chen• Francesco Lanni
CERN• Benedetto Gorini• Markus Joos• Giovanna Lehmann• Jorn Schumacher • Wainer Vandelli
Nikhef• Andrea Borga• Henk Boterenbrood• Frans Schreuder• Jos Vermeulen
Weizmann• Lorne Levinson• Julia Narevicius• Alex Roich
FELIX schedule
• FELIX Preliminary Design Review 29 May 2015
Definition of strategy for final PCIe board Summer 2015for Phase 1 users
• FELIX demonstrator ready for tests with Oct 2015 detectors, incl NSW Vertical Slice
• FELIX Phase-1 FDR May 2016
L. Levinson Muon Week, June 2015 22
Muon Week, June 2015 23
Backup slides
L. Levinson
Muon Week, June 2015 24
Integrating GBTs to a LAN
L. Levinson
“ROD”
• FELIX separates data movement from data processing– Data movement:
by common hardware– Data processing:
software specific for each sub-detector
• Processing functions can now be separated, physically, or just logically, and often implemented as software.
• COTS, scalable, flexible, easily upgradeable,…
Muon Week, June 2015 25
GBT data flow• The GBT chipset:
many logical data paths are combined on a single bidirectional link. – event data, DCS, TTC, configuration and alignment.
• Advantage: reduces the complexity of connections to the on-detector electronics
• Disadvantage: the different logical streams end up at a single off-detector point– This point is then required to understand which data belongs to
which off-detector entity and to route it accordingly.
• This encourages the bad practice of combining all the logically separate services into one off-detector hardware device, e.g. the ROD,– and forces the ROD to meet the highest availability and reliability
requirements
L. Levinson
Muon Week, June 2015 26
“Full”-mode• This is NOT one of the GBTx modes (normal, wide, 8b/10b)• Need to handle single links at full bandwidth, up to 9.6 Gb/sec
– e.g. links from FPGA’s in USA15• “Full” mode is a single E-link at up to 9.6Gb/s link speed:
– uses 8b/10b encoding – Start-of-Packet, End-of-Packet and stream IDs define packets and their routing.
Like normal E-links, streams can be routed to different network end-points.– supports Busy-On, Busy-Off symbols embedded in the data flow
• The opposite direction can be standard GBT links with several E-links.• For L1Cal, 24 E-links in one downlink will carry XON/XOFF for 24 uplinks
• Note: The GBTx ASIC “8b/10b” mode allows only data.Command symbols, e.g. Start-of-Packet, End-of-Packet, Busy-On, Busy-Off cannot be inserted into the data flow.
– So far, FELIX does not support GBTx “8b/10b” mode
L. Levinson
Muon Week, June 2015 27
Low latency path to Level-1 trigger
L. Levinson
• The meaning of "low-latency" must come from the requirements• Simple data copying from one input fiber to an adjacent output fiber would take
little more than the de-serialization/serialization stages• About 10 clocks
• More complex functionality like data selection, extraction and/or data aggregation would imply longer latency
• Actual numbers need to be evaluated once the functionality is defined• Data aggregation would necessarily introduce jitter in the latency
• Fixed latency cannot be guaranteed unless fixed word slots are dedicated in the output to each input stream, i.e. pre-allocated bandwidth.
• Note that FELIX does not buffer data and should not be expected to time align data from different FE links.
• To maintain FELIX detector-independence, any data selection mustbe based only on information contained in the fragment header
Muon Week, June 2015 28
Scheduled data transfers Feedback on requirements from Pixels is requested on this feature.
It is by no means final.
• There is a need (calibration scenarios) to transmit to front ends a series of data items with guaranteed delays (in BCs) between the data itemse.g. a test pulse followed by a readout after a specific delay
• FELIX will accept lists of [data, relative delay], with 0 delay < 212 BCs .• The list is activated either by command via the network, or
by one of the system bits in the 8-bit TTC broadcast packet.The latter provides synchronicity of all FELIX boxes or with the TTC system.
• One list per GBT; all are activated at the same BC• (It is required to set up FE ASIC specific parameters prior to list activation). • Streams can be selectively enabled for a particular activation by a mask.• Waiting at a point in the list could be implemented, but is it required instead of
waiting for a specified delay before the next item is sent? – i.e. is the worst case delay short enough?
L. Levinson
Muon Week, June 2015 29L. Levinson
Muon Week, June 2015 30L. Levinson
Note: E-link packets are NOT network packets.TCP is a byte stream.
We can consider 66b/64b.Reserve, e.g. 2, bits forframing: start, end, middle, both
Muon Week, June 2015 31L. Levinson
Muon Week, June 2015 32
FELIX block diagram
L. Levinson
Firmware
Software
TTCfxHW
Muon Week, June 2015 33
Error handling
• Just beginning the considerable thought that needs to be done here.• Two issues:
– Discovery of corrupted data– Surviving errors without out loosing event boundaries
• This requires some effort on the front ends• Corruption within a GBT 80-bit word is detected by the GBT’s FEC• Hard to believe E-links can be mixed into each other,
so expect errors within E-link channels: repeated clocks, lost clocks
• If E-links use 8b/10b with packet framing codes, fixed length, or variable length with counts, then errors (not all) can be detected.
• FELIX can flag these errors in the header it sends with a stream’s data
L. Levinson
FELIX Application
Re a d FELIX d a t a
B lo c k D e c o d e
Sta t i s t ic s
Ex t r a c t M e t a In f o rm a t i o n
Ro u t e
Se n d To N e tw o r k
Lo a dB a la n c in g ,
Er ro r H a n d l in g
FEL IXCa r d
Co n f i g u r a t i o n
Co m m a n d s
D a t a
M o n i t o r in g
●
Muon Week, June 2015 L. Levinson 34
Interfaces FELIX with external world● configuration,
monitoring, DCS, …Primary goal●
● efficiently process and route detector data
Fin a l Pip e l in e :
FELIX Application: current dataflow
Re a d B l o c k s f ro m Fi le
D e sc r ip t o r s
B lo c k D e c o d e
Sta t i s t i c s Se t Fix e d D e s t i n a t i o n
Se n d t o n e tw o r k
Cu rr e n t Pi p e l i n e :
e p o l l
b o o s t : :a s i o
a s y n c m sg
FEL IX n e t io
Ke r n e l
M e l la n o x N IC
W r a p p e r , c a n b e e x t e n d e d ( e .g .I n f i n i b a n d su p p o r t )
●
Muon Week, June 2015 L. Levinson 35
Developing functional building blocks for baseline pipeline
● data decoding simple routing network I/O
●
●
● Simple benchmarking so far● performance reach on
the incoming hardwareLooking forward to the integration with FELIX card
●
Muon Week, June 2015 36
Advantages• Allows the end-point builder to concentrate on the end-point functionality
• The off-detector endpoints need not implement GBT hardware interfaces. This means that they may not need to include FPGAs.
• A significant part of the ROD logic would be shared & maintained centrally.
• Rerouting data from a failed off-detector endpoint to a spare, or Load sharing by adding off-detector endpoints easily done by reconfiguring the routing tables in FELIX.– Enables an easily scalable read out system
• Off-detector endpoints do not need to be mapped one-to-one to the geographical areas serviced by a GBT.
• Can upgrade to faster GBT, faster COTS network independently of ROD
• Any GBT can be connected to any FELIX, just reconfigure
L. Levinson
Muon Week, June 2015 37
GBT Streams• Goal: accommodate different kinds of data packets on the same GBT E-link
– i.e. different “kinds” can be routed to different end-points– An E-link is typically a dedicated physical link to a Front End chip
needs to carry different kinds of traffic, e.g. event data, monitoring values, configuration
• Eliminate, or at least drastically reduce, need for detector specific (custom) firmware
• Requires a minimal protocol for GBT input data– Hopefully we can find a protocol with a reasonable set of options
acceptable to many detectors– Document (first attempt):
https://twiki.cern.ch/twiki/pub/Atlas/GBT2LAN/MappingGBT_DataToNetworkEndpoints_V3.pdf
• Add concepts: – channels = “bonded” E-links – Streams = a series of packets for a specific destination
packets from several streams can flow in the same channelL. Levinson
Muon Week, June 2015 38
26-bits header 34-bits payload
Type field: specifies type of payload
R3
L1 data: 1BC packet
L1 data: 3BC packet
Register data
L. Levinson
ITK GBT e-link packet
Muon Week, June 2015 39
Dynamic routing: Load balancing an event streamby means of multiple end-points
• Baseline does not offer this possibility, but could be done if justified• To clarify the issue of whether dynamic routing on the basis of L1Id is needed /
desirable let us first look at a subdetector that produces fragments all of about the same size. The calorimeter pre-processor probably will do this. Then the amount of data to be transferred per e-link per L1A is about fixed and the variability in amount of data to be transferred per unit of time is only determined by the variability of the L1A rate. This is the same for all e-links if everything is working correctly, so to my mind there is no advantage in using dynamic routing to achieve load-balancing of the receiving nodes, if the work to be done per fragment does not depend on its contents: the load balancing can be done completely on the basis of a configurable static assignment of e-links to server nodes. For variable size fragments the amount of data to be transferred per e-link will fluctuate, but fluctuations in the average amount of data per unit of time again will only be determined by the variability of the L1A rate. Again this is the same for all e-links if everything is working correctly. With adequate buffering instantaneous fluctuations in the size of the fragments can be smoothed, so that the load balancing for variable size fragments also can be done completely on the basis of a configurable static assignment of e-links to server nodes.
L. Levinson
Muon Week, June 2015 40
FELIX
L. Levinson
● FELIX was approved by the Upgrade Steering Group as the readout architecture for the ATLAS Phase-2 UpgradeIt also approved its use in Phase-1, for●
● New Small Wheel Level-1 Calo trigger Liquid Argon trigger
●
●
● Working Group reports● Recommendations of the ATLAS Readout Upgrade Working
Group for upgrade of the Readout Architecture:– https://edms.cern.ch/document/
1311772/3● Software ROD: Report of the New Small Wheel ROD Working Group:– https://edms.cern.ch/document/
1354354/1
Muon Week, June 2015 41
END
L. Levinson
Thanks to Dave Sankey