27
Running Monitoring Applica0ons on Accelerated Capture Engines Nicola Bonelli N. Bonelli, R.G Garroppo, L. Gazzarrini, S. Giordano, G. Procissi, F. Russo, G. Volpi

PFQ@ 10th Italian Networking Workshop (Bormio)

Embed Size (px)

DESCRIPTION

Running monitoring applications on accelerated capture engines

Citation preview

Page 1: PFQ@ 10th Italian Networking Workshop (Bormio)

Running  Monitoring  Applica0ons  on  Accelerated  Capture  Engines  

Nicola  Bonelli  

N.  Bonelli,  R.G  Garroppo,  L.  Gazzarrini,  S.  Giordano,  G.  Procissi,  F.  Russo,  G.  Volpi  

Page 2: PFQ@ 10th Italian Networking Workshop (Bormio)

Agenda  

•  Capture  engines  overview  •  What’s  new  in  PFQ  (2.0)  

•  Accelerated  pcap  library  – PF_RING,  PF_RING+DNA,  NETMAP,  PFQ  

•  Pcap-­‐perf:  a  tool  for  benchmarking  pcap  apps  

•  Experimental  results  

Page 3: PFQ@ 10th Italian Networking Workshop (Bormio)

Speed  maXers…  

Page 4: PFQ@ 10th Italian Networking Workshop (Bormio)

Accelerated  Capture  Engine    

•  Linux  is  provided  with  a  default  capture  engine  –  the  PF_PACKET  socket  

•  Because  of  speed,  other  capture  engines  emerged:  –  2004:  PF_RING  

•  designed  for  single  core,  beXer  performance  than  the  then  PF_PACKET  

–  2011:  PFQ  •  first  to  address  mul0-­‐core  architecture  and  mul0-­‐queues  NICs  (Best  Paper  Award  @PAM2012)  

–  2012:  PF_RING-­‐DNA  •  accelerated  drivers  (Intel)  

–  2012:  NetMap  •  accelerated  drivers  (Intel,Broadcom)  (Best  Paper  Award  @Usenix  ATC’12)  

Page 5: PFQ@ 10th Italian Networking Workshop (Bormio)

…  but  what  happens  on  these  tracks?  

Page 6: PFQ@ 10th Italian Networking Workshop (Bormio)

What’s  new  in  PFQ  2.0  •  From  capture  engine  to  monitoring  framework…  •  Improved  performance  

–  ~14.8  Mpps  single  user-­‐space  thread  

•  Improved  features:  –  compliant  with  a  plethora  of  NICs:  pfq-­‐oma0c  – monitoring  groups  and  classes  –  in-­‐kernel  extensible  engine  for  packet  steering:  dispatching,  copying,  cloning,  filtering  

–  na0ve  bindings:  C,  C++11,  Haskell  (more  to  come)  –  per-­‐group  filtering:  BFP,  vlan  (un-­‐tagging)  –  pcap  library  

Page 7: PFQ@ 10th Italian Networking Workshop (Bormio)

Feature  comparison  PF_PACKET   PF_RING  5.x   PF_RING-­‐DNA   NETMAP  -­‐  0813   PFQ  2.0  

NIC   *   *,  PF-­‐AWARE  (Intel,  Broadcom)  

only  Intel  1/10G   Intel  1/10G,  forcedeth  

*    accelerated  

Driver  compat.   *   yes,  non  accel.   no   no   yes,  dynamic  

mul0-­‐core   -­‐   Hardware  (RSS)   Hardware  (RSS)   Hardware  (RSS)   Hw  RSS  +  sog  

mul0-­‐queue   yes  (poor)   yes   yes   yes   yes  

na0ve  binding   C   C   C   C   C,  C++11,  Haskell,  Java,  Python  

groups   -­‐   -­‐   -­‐   -­‐   yes  

class   -­‐   -­‐   -­‐   -­‐   yes  

concurrent  mon.   yes   yes   commercial  ?   -­‐   yes  

clustering   -­‐   yes   -­‐   -­‐   yes  (MT,  group)  

steering   -­‐   -­‐   commercial   -­‐   yes  (MT,  group)  

STM  state   -­‐   -­‐   -­‐   -­‐   work  in  progress  

Page 8: PFQ@ 10th Italian Networking Workshop (Bormio)

Feature  comparison  PF_PACKET   PF_RING  5.x   PF_RING-­‐DNA   NETMAP  -­‐  0813   PFQ  2.0  

Pcap  library   yes   yes   yes   buggy/incomplete   yes  

BPF  (filters)   yes  (MT)   yes  (MT)   yes  (user-­‐space)   -­‐   yes  (MT,  group)  

vlan  filters   -­‐   yes   yes  (hw  Intel)   -­‐   yes  (MT,  group)  

vlan  untagging   -­‐   -­‐     -­‐   -­‐   yes  (MT,  sog.)  

Intel  hw  filters   -­‐   yes   yes   -­‐   No  

bloom  filters   -­‐   -­‐   -­‐   -­‐   work  in  progress  

Page 9: PFQ@ 10th Italian Networking Workshop (Bormio)

Accelerated  PCAP  library  •  Pcap  library  is  the  standard  de-­‐facto  interface  for  packet  capture  •  Accelerated  capture  engines  provide  their  own  pcap  library:  

–  Both  PF_RING  and  PF_RING-­‐DNA  provide  a  complete  accelerated  version  

–  NetMap  provides  an  experimental  and  incomplete  pcap  support  •  BPF  is  missing  

•  PFQ  provides  a  complete  implementa0on  –  PFQ  C-­‐API  mapped  over  pcap  interface  wherever  possible,  

implemented  as  environment  variables  otherwise  –  Clustering  is  enabled  specifying  mul0ple  NICs  in  colon-­‐separated  

fashion,  steering  by  means  of  PFQ_STEER  variable  

PFQ_GROUP=10  PFQ_STEER=ipv4-­‐addr  tcpdump  –n  –i  eth2:eth3  PFQ_GROUP=10  PFQ_STEER=ipv4-­‐addr  tcpdump  –n  –i  eth2:eth3  

Page 10: PFQ@ 10th Italian Networking Workshop (Bormio)

Pcap-­‐perf  

•  Pcap-­‐perf  is  a  C++11  applica0on  designed  for  benchmarking  capture  engines  through  pcap  interfaces  

•  Support  for  mul0-­‐threads,  BPF  filter  and  plug-­‐ins:  

plug-­‐in   kind  

Null   packet  counter  

IP  checksum   light  CPU  computa0on  

MD5   CPU  computa0on  

SHA256   heavy  CPU  computa0on  

Bloom  Filter   memory  (linear)  

Protocol  Classifica0on   memory  tree  

TCP/UDP  flow  counter   memory  (std::unordered_set)  

Page 11: PFQ@ 10th Italian Networking Workshop (Bormio)

Test-­‐bed  and  measurements  

•  Intel  Xeon  6  cores  x5650  @2.67Ghz,  16G  Ram  +  Intel  82599  10G  (Debian  Wheezy)  •  Accelerated  drivers  

–  PF_RING:  ixgbe  3.11.33  PF_RING-­‐aware  –  PF_RING-­‐DNA:  ixgbe  3.10.16-­‐DNA  driver  –  Netmap:  ixgbe  driver  shipped  with  the  netmap  package  –  PFQ:  intel  ixgbe  3.11.33  vanilla,  recompiled  through  pfq-­‐oma0c  

•  Best  Interrupt  affinity  (MSI-­‐X)  –  4  or  5  kernel  threads  (NAPI)  bound  to  fixed  core  (RSS),  1  or  2  user-­‐space  threads  bound  to  

other  core(s)  

•  Traffic  is  generated  with  randomized  IP  addresses,  64/128  bytes  long  UDP  packets  –  using  both  PF_DIRECT  and  PF_RING-­‐DNA  

10 Gb link

mascara monsters

Page 12: PFQ@ 10th Italian Networking Workshop (Bormio)

Coun0ng  packets  is  useless  

(na0ve  speed)  

     uint64_t counter = 0;!! ! !for(;;)!! ! !{!

! ! !counter++;!! ! !}!

Page 13: PFQ@ 10th Italian Networking Workshop (Bormio)

1  thread  user-­‐space  (Intel  10G)  

Page 14: PFQ@ 10th Italian Networking Workshop (Bormio)

pcap  library  

Page 15: PFQ@ 10th Italian Networking Workshop (Bormio)

Pcap  library,  1  thread  counter  

Page 16: PFQ@ 10th Italian Networking Workshop (Bormio)

Pcap,  1  thread  counter,  BPF=udp  

Page 17: PFQ@ 10th Italian Networking Workshop (Bormio)

Pcap,  1  thread  counter,  BPF=hXp  ||  udp    

Page 18: PFQ@ 10th Italian Networking Workshop (Bormio)

pcap-­‐perf  

Page 19: PFQ@ 10th Italian Networking Workshop (Bormio)

pcap-­‐perf  

Page 20: PFQ@ 10th Italian Networking Workshop (Bormio)

pcap-­‐perf  with  BPF  =  udp  

Page 21: PFQ@ 10th Italian Networking Workshop (Bormio)

pcap-­‐perf  (2  threads)  

Page 22: PFQ@ 10th Italian Networking Workshop (Bormio)

tcpdump  

Page 23: PFQ@ 10th Italian Networking Workshop (Bormio)

tcpdump  –s  64  –i  dev  –w  /ramdisk/dump.pcap  ([email protected])  

Page 24: PFQ@ 10th Italian Networking Workshop (Bormio)

tcpdump  –s  138  –i  dev  –w  /ramdisk/dump.pcap  (100M@~8Mpps)  

Page 25: PFQ@ 10th Italian Networking Workshop (Bormio)

tcpdump  –i  dev  –w  /ramdisk/dump.pcap  vlan    (5  Gbps)  

Page 26: PFQ@ 10th Italian Networking Workshop (Bormio)

tcpdump  –i  dev  –w  /ramdisk/dump.pcap  ip  host  192.168.0.10  (voip  call)  

Page 27: PFQ@ 10th Italian Networking Workshop (Bormio)

Thanks  for  the  aXen0on!  

[email protected]