Upload
ashton-calhoun
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
1 Raffaello Secchi SPECTS 2005 – July 27, 2005
BRUTE: A High Performance and Extensible Traffic Generator
Nicola Bonelli, Stefano Giordano, Gregorio Procissi, Raffaello Secchi
Department of Information EngineeringUniversity of Pisa
2 Raffaello Secchi SPECTS 2005 – July 27, 2005
Outline
● Motivations● BRUTE Features● Architecture design and internals
● Implementing issues● Extensibility and modularity
● Script language● Application Programming Interface (API)
● Programming library● Macros
● Traffic modules● Performance evaluation
● Fast Ethernet Scenario● Gigabit Ethernet Scenario
● Conclusions
3 Raffaello Secchi SPECTS 2005 – July 27, 2005
Motivations & Requirements
● The current open-source software tools are not suitable to deal with high-speed networks:
● poor performance in terms of generated frames per second● scarce timing/rate accuracy in traffic generation
● Requirements:● high performance and precision● extensibility● configurability● RFC2544 compliance
● We developed a tool that …● generate high speed flows over Fast- and Gigabit-Ethernet● extensible through a modular architecture● configurable through an ad-hoc script language● IP version independent: IPv4, IPv6
4 Raffaello Secchi SPECTS 2005 – July 27, 2005
BRUTE Features
● What is BRUTE?● BRUTE is a Linux user-space real-time traffic engine operating at layer-II
and layer-III
● High performance ● Saturate Fast-Ethernet link with short frame length (64 bytes)● Saturate Gigabit-Ethernet link with 128 bytes frame length
● Configuration● Flexible script language, which allows the user to define customized traffic
patterns
● Extensible design● Traffic modules (C-language)
● API (library functions and macros)● Frame building, memory allocation, sockets handling● IP checksum● Reliable statistical distributions● Timing resources
5 Raffaello Secchi SPECTS 2005 – July 27, 2005
Implementing Choices (1/3)
Timing issues: temporal accuracy – A traffic generator deals with packets and
inter-departure times…• busy-wait polling versus system call sleep
mechanism
– The gettimeofday features:• low resolution (1 μsec)• high latency due to the time evaluation (500 CPU
cycles)• system-call interrupt mechanism
– Reading the CPU time-stamp-counter…• higher resolution (1 nano-sec with 1Ghz CPU clock)• lower latency around 32 CPU cycles (Intel Pentium)• no interrupt (no system call)
Timers comparison
0
100
200
300
400
500
600
rdtsc gettimeofday
6 Raffaello Secchi SPECTS 2005 – July 27, 2005
Internals of Linux Kernel 2.4
7 Raffaello Secchi SPECTS 2005 – July 27, 2005
Implementing Choice (2/3)
Socket family– The sendto computational load differs according
to socket family
– PF_PACKET family avoids routing and headers building
– PF_PACKET bypass the Linux NetFilter Framework
– PF_PACKET allows to customize the Ethernet frame
• RFC2544 suggests some tests using random MAC address
Socket Latency comparison
0
500
1000
1500
2000
2500
PF_PACKET PF_INET
8 Raffaello Secchi SPECTS 2005 – July 27, 2005
Implementing Choices (3/3)
Scheduling policy● Real-time requirements
● Traffic generator is a typical real-time application
● Linux soft real-time SCHED_FIFO policy● control over the order of execution of processes● static priority assigned to process● preemption of any normal process● no time slicing
● Memory blocking avoid paging delays● mlockall used to disable paging
9 Raffaello Secchi SPECTS 2005 – July 27, 2005
Overall Architecture
● The modular design involves a distributed parser algorithm● The core parser handles grammar and part of lexical tasks● Micro-parsers distributed in traffic modules complete the lexical
parsing
● The traffic engine executes the micro-engines codes in order to generate the traffic pattern
10 Raffaello Secchi SPECTS 2005 – July 27, 2005
Extensibility: T-module
• A module implements a traffic class:T-module• only few lines of C-language code define a fully
customizable pattern of traffic
• A T-module consists of:• The structure module_descriptor
• to allow the link between BRUTE core and the module
• The structure mod_line• to define the parameters of a specific traffic class
• The micro-parser handler• to implement the ad-hoc lexical parser
• The micro-engine handler• in charge of generating traffic
11 Raffaello Secchi SPECTS 2005 – July 27, 2005
Brute script language
<label:> command tok_1 <+->=val; tok_2 <+->=val; …
● A statement consists of:● label● command identifier ● sequence of semicolon terminated atoms
● Where an atom consists of:● Tokens identifier (l-value)● Numbers, functions and variables (r-values)
cbr msec=1000; saddr=192.168.0.1; daddr=192.168.0.2;\ rate=1000; len=udp_data(18); sport=1024; dport=1024;
lab: cbr msec=1000; rate +=1000;
loop times=10; label=lab;
12 Raffaello Secchi SPECTS 2005 – July 27, 2005
API (1/2)
● Memory management● Allocate and free the memory space required to hold the frame● Setup the frame headers according to the parameters specified in the
configuration file or using random destination (MAC or IP) when specified in the command line.
● The UDP data is filled as specified in the RFC2544
● Timing management● Read the TSC register of the CPU using architecture dependent assembly
instructions (get_cycles)● Busy-wait routine in charge of introducing inter-departure times between
packets
● Frame management● Update the frame with the changes required to obtain the subsequent. It
modifies the IP id and checksum fields and destination IP or MAC according the command line options.
● Forward the frame to network device driver
13 Raffaello Secchi SPECTS 2005 – July 27, 2005
API (2/2)
● Random Number Generation ● Implemented the Mersenne Twister algorithm
● Quasi infinite period (219937-1)● ~100 CPU cycles (fast to be executed at run-time)● Good statistical properties
● Statistical Distributions● Implemented functions to generate some statistical distribution (uniform,
exponential, Pareto …)
Algorithm CPU cycles Period Lifetime Entropy Chi2 Correlation
Linux rand 109 16(231-1) 9.5 hours 7.95421 0.01% -0.04935
/dev/urandom 20100 - - 7.99996 90.00% -0.00016
TT800 94 2800-1 ∞ 7.356743 0.01% 0.139006
Mersenne T. 100 219937-1 ∞ 7.99995 50.00% 0.00028
14 Raffaello Secchi SPECTS 2005 – July 27, 2005
Implemented Traffic Modules (2/4)
• Poisson process• constant packet length • exponential inter-departure time
parameters: msec, saddr, daddr, sport, dport, len, tos, ttl, lambda
15 Raffaello Secchi SPECTS 2005 – July 27, 2005
Poisson Arrival of Burts
• Poisson Arrival of Burst (PAB) process:
R(t) = R N(t)(R is a constant [bitrate])
• N(t) underlying state process
• N(t): superposition of bursts, occurring with exponential inter-arrival time and arbitrary burst length distribution• N(t) is equivalent to the number of busy servers in a M/G/∞ queue, with service time B
• For fixed t, R(t) ~ Poiss (R*E[B])
• If B’s are Pareto distributed (1<<2), R(t) is Long Range Dependent with Hurst parameter H = (3 –
X X X
T1 T2 T3
B1
B2
N(t)
tX
16 Raffaello Secchi SPECTS 2005 – July 27, 2005
Implemented Traffic Modules (3/4)
• PAB process• constant packet length• Poisson inter-arrival of burts, pareto burst’s length
• parameters: msec, saddr, daddr, sport, dport, len, tos, ttl, alpha, theta, lambda
0
0.05
0.1
0.15
0.2
0.25
0 2000 4000 6000 8000 10000 12000
P(r
ate
)
rate
Empirical distribution
Poisson distribution envelope
Tn
en
T !
0
2000
4000
6000
8000
10000
12000
0 50 100 150 200
fra
me
/se
c
sec
PAB istantaneous rate
17 Raffaello Secchi SPECTS 2005 – July 27, 2005
Implemented Traffic Modules (4/4)
• End to end delay estimation requirements:
• Measurement Methodology• Two hosts synchronized clock via GPS• One host closed in loopback
• Packet format• Rude implements a proprietary packet format• Using a standard RTCP (SR) we don’t need a specific
receiver applications (tcpdump, ethereal, AX4000…)
• Transmission delay compensation from application layer to device driver
18 Raffaello Secchi SPECTS 2005 – July 27, 2005
Performance Measurement
● Internal measurement (non invasive)● Allocated a vector into the device driver to store packets’ timestamps
(using get_cycles).● Developed a kernel module to dump off-line timestamps through a /proc
entry.
● Wire-line measurement● Over Fast- and Gigabit-Ethernet (on copper line and optical fiber).
● Hardware employed● Genuine Intel Pentium-4 2.40 Ghz, 512 Mbyte RAM, motherboard ASUS
P4PE, Fast Ethernet 3com 3c905c-TX Tornado● Dual Genuine Intel Xeon 2.66 Ghz, 1Gbyte RAM, motherboard SuperMicro
X5DPE-G2, Intel PRO/1000LX Gigabit Ethernet Controller (fiber)● Spirent AX4000 Traffic Analyzer
19 Raffaello Secchi SPECTS 2005 – July 27, 2005
• Fast Ethernet Scenario• Adapter: 3com 3c905c-TX Tornado fast Ethernet• Frame length: 64 bytes• BRUTE saturates the link capacity
0
20000
40000
60000
80000
100000
120000
140000
160000
0 20 40 60 80 100
fps
sec
brute
rude
udpgen
mgen
Maximal Rate Test Comparisons
20 Raffaello Secchi SPECTS 2005 – July 27, 2005
Throughput vs. frame length
• Fast Ethernet Scenario• Adapter: 3com 3c905c-TX Tornado fast Ethernet• BRUTE matches the ideal rate curve at each frame length
0
20000
40000
60000
80000
100000
120000
140000
160000
0 200 400 600 800 1000 1200 1400 1600
fra
me
/se
c
byte/frame
brute
rude
udpgen
mgen
21 Raffaello Secchi SPECTS 2005 – July 27, 2005
Rate Bias Comparison
-30
-25
-20
-15
-10
-5
0
5
10
0 20000 40000 60000 80000 100000 120000 140000 160000
err
%
frame/sec
brute
rude
udpgen
mgen
• Fast Ethernet Scenario• Adapter: 3com 3c905c-TX Tornado fast Ethernet• Error rate averaged over 106 frames• The through of BRUTE is unbiased at each frame rate
22 Raffaello Secchi SPECTS 2005 – July 27, 2005
0
5000
10000
15000
20000
25000
30000
35000
0 20000 40000 60000 80000 100000 120000 140000 160000
sta
nd
ard
de
v.
frame/sec
brute
rude
udpgen
mgen
Standard Deviation of Rate Comparison
• Fast Ethernet Scenario• Adapter: 3com 3c905c-TX Tornado fast Ethernet• averaged performed over a window size of 100 frames• Std. dev of the rate of BRUTE grows linearly
23 Raffaello Secchi SPECTS 2005 – July 27, 2005
Maximal Rate Test Comparison
• Gigabit-Ethernet Scenario • Adapter: Intel PRO/1000LX Gigabit Ethernet Controller
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
0 20 40 60 80 100 120
fra
me
/se
c
sec
brute
rude
udpgen
mgen
24 Raffaello Secchi SPECTS 2005 – July 27, 2005
Bias Error Comparison
-70
-60
-50
-40
-30
-20
-10
0
10
20
30
0 100000 200000 300000 400000 500000 600000 700000 800000
err
%
frame/sec
brute
rude
udpgen
mgen
• Gigabit-Ethernet Scenario • Adapter: Intel PRO/1000LX Gigabit Ethernet Controller• average over 106 frames
25 Raffaello Secchi SPECTS 2005 – July 27, 2005
Standard Deviation Comparison
1
10
100
1000
10000
100000
1e+06
0 100000 200000 300000 400000 500000 600000 700000 800000
sta
nd
ard
de
v.
frame/sec
brute
rude
udpgen
mgen
• Gigabit-Ethernet Scenario • Adapter: Intel PRO/1000LX Gigabit Ethernet Controller• averaged performed over a window size of 103 frames
26 Raffaello Secchi SPECTS 2005 – July 27, 2005
Conclusions
● BRUTE is real-time extensible traffic generator:● Flexible architecture and extensible design.● Along with several traffic modules that generate different pattern of
Ethernet traffic.
● High performance and high level of precision suitable
for network benchmarking● Use of timing paradigms to better satisfy realtime requirements ● Capability to generate workloads at wirespeed in order to stress
network device