Upload
nicola-bonelli
View
272
Download
3
Embed Size (px)
DESCRIPTION
Running monitoring applications on accelerated capture engines
Citation preview
Running Monitoring Applica0ons on Accelerated Capture Engines
Nicola Bonelli
N. Bonelli, R.G Garroppo, L. Gazzarrini, S. Giordano, G. Procissi, F. Russo, G. Volpi
Agenda
• Capture engines overview • What’s new in PFQ (2.0)
• Accelerated pcap library – PF_RING, PF_RING+DNA, NETMAP, PFQ
• Pcap-‐perf: a tool for benchmarking pcap apps
• Experimental results
Speed maXers…
Accelerated Capture Engine
• Linux is provided with a default capture engine – the PF_PACKET socket
• Because of speed, other capture engines emerged: – 2004: PF_RING
• designed for single core, beXer performance than the then PF_PACKET
– 2011: PFQ • first to address mul0-‐core architecture and mul0-‐queues NICs (Best Paper Award @PAM2012)
– 2012: PF_RING-‐DNA • accelerated drivers (Intel)
– 2012: NetMap • accelerated drivers (Intel,Broadcom) (Best Paper Award @Usenix ATC’12)
… but what happens on these tracks?
What’s new in PFQ 2.0 • From capture engine to monitoring framework… • Improved performance
– ~14.8 Mpps single user-‐space thread
• Improved features: – compliant with a plethora of NICs: pfq-‐oma0c – monitoring groups and classes – in-‐kernel extensible engine for packet steering: dispatching, copying, cloning, filtering
– na0ve bindings: C, C++11, Haskell (more to come) – per-‐group filtering: BFP, vlan (un-‐tagging) – pcap library
Feature comparison PF_PACKET PF_RING 5.x PF_RING-‐DNA NETMAP -‐ 0813 PFQ 2.0
NIC * *, PF-‐AWARE (Intel, Broadcom)
only Intel 1/10G Intel 1/10G, forcedeth
* accelerated
Driver compat. * yes, non accel. no no yes, dynamic
mul0-‐core -‐ Hardware (RSS) Hardware (RSS) Hardware (RSS) Hw RSS + sog
mul0-‐queue yes (poor) yes yes yes yes
na0ve binding C C C C C, C++11, Haskell, Java, Python
groups -‐ -‐ -‐ -‐ yes
class -‐ -‐ -‐ -‐ yes
concurrent mon. yes yes commercial ? -‐ yes
clustering -‐ yes -‐ -‐ yes (MT, group)
steering -‐ -‐ commercial -‐ yes (MT, group)
STM state -‐ -‐ -‐ -‐ work in progress
Feature comparison PF_PACKET PF_RING 5.x PF_RING-‐DNA NETMAP -‐ 0813 PFQ 2.0
Pcap library yes yes yes buggy/incomplete yes
BPF (filters) yes (MT) yes (MT) yes (user-‐space) -‐ yes (MT, group)
vlan filters -‐ yes yes (hw Intel) -‐ yes (MT, group)
vlan untagging -‐ -‐ -‐ -‐ yes (MT, sog.)
Intel hw filters -‐ yes yes -‐ No
bloom filters -‐ -‐ -‐ -‐ work in progress
Accelerated PCAP library • Pcap library is the standard de-‐facto interface for packet capture • Accelerated capture engines provide their own pcap library:
– Both PF_RING and PF_RING-‐DNA provide a complete accelerated version
– NetMap provides an experimental and incomplete pcap support • BPF is missing
• PFQ provides a complete implementa0on – PFQ C-‐API mapped over pcap interface wherever possible,
implemented as environment variables otherwise – Clustering is enabled specifying mul0ple NICs in colon-‐separated
fashion, steering by means of PFQ_STEER variable
PFQ_GROUP=10 PFQ_STEER=ipv4-‐addr tcpdump –n –i eth2:eth3 PFQ_GROUP=10 PFQ_STEER=ipv4-‐addr tcpdump –n –i eth2:eth3
Pcap-‐perf
• Pcap-‐perf is a C++11 applica0on designed for benchmarking capture engines through pcap interfaces
• Support for mul0-‐threads, BPF filter and plug-‐ins:
plug-‐in kind
Null packet counter
IP checksum light CPU computa0on
MD5 CPU computa0on
SHA256 heavy CPU computa0on
Bloom Filter memory (linear)
Protocol Classifica0on memory tree
TCP/UDP flow counter memory (std::unordered_set)
Test-‐bed and measurements
• Intel Xeon 6 cores x5650 @2.67Ghz, 16G Ram + Intel 82599 10G (Debian Wheezy) • Accelerated drivers
– PF_RING: ixgbe 3.11.33 PF_RING-‐aware – PF_RING-‐DNA: ixgbe 3.10.16-‐DNA driver – Netmap: ixgbe driver shipped with the netmap package – PFQ: intel ixgbe 3.11.33 vanilla, recompiled through pfq-‐oma0c
• Best Interrupt affinity (MSI-‐X) – 4 or 5 kernel threads (NAPI) bound to fixed core (RSS), 1 or 2 user-‐space threads bound to
other core(s)
• Traffic is generated with randomized IP addresses, 64/128 bytes long UDP packets – using both PF_DIRECT and PF_RING-‐DNA
10 Gb link
mascara monsters
Coun0ng packets is useless
(na0ve speed)
uint64_t counter = 0;!! ! !for(;;)!! ! !{!
! ! !counter++;!! ! !}!
1 thread user-‐space (Intel 10G)
pcap library
Pcap library, 1 thread counter
Pcap, 1 thread counter, BPF=udp
Pcap, 1 thread counter, BPF=hXp || udp
pcap-‐perf
pcap-‐perf
pcap-‐perf with BPF = udp
pcap-‐perf (2 threads)
tcpdump
tcpdump –s 64 –i dev –w /ramdisk/dump.pcap ([email protected])
tcpdump –s 138 –i dev –w /ramdisk/dump.pcap (100M@~8Mpps)
tcpdump –i dev –w /ramdisk/dump.pcap vlan (5 Gbps)
tcpdump –i dev –w /ramdisk/dump.pcap ip host 192.168.0.10 (voip call)
Thanks for the aXen0on!