View
213
Download
0
Tags:
Embed Size (px)
Citation preview
1
An Information-theoretic Approach to Network Measurement and
Monitoring
Yong Liu, Don Towsley, Tao Ye, Jean Bolot
2
Outline
motivation background flow-based network model full packet trace compression
marginal/joint coarser granularity
netflow and SNMP future work
3
Motivation network monitoring: sensing a network
traffic engineering, anomaly detection, … single point v.s. distributed
different granularities full traffic trace: packet headers flow level record: timing, volume summary statistics:
byte/packet counts challenges
growing scales: high speed link, large topology
constrained resources: processing, storage, transmission
30G headers/hour at UMass gateway solutions
sampling: temporal/spatial compression: marginal/distributed
4
Questions
how much can we compress monitoring traces? how much information is captured by different m
onitoring granularity? packet trace/NetFlow/SNMP
how much joint information is there in multiple monitors? joint compression trace aggregation monitor placement
5
Our Contribution
flow-based network models explore temporal/spatial correlation in network
traces projection to different granularity
information theoretic framework entropy: bound/guideline on trace compression quantitative approach for more general problems
validation against measurement from operational network
6
Entropy & Compression Shannon entropy of discrete r.v.
compression of i.i.d. symbols (length M) by coding coding: expected code length:
info. theoretic bound on compression ratio:
Shannon/Huffman coding
assign short codeword to frequent outcome achieve the H(X) bound
7
Entropy & Correlation joint entropy
entropy rate of stochastic process
exploit temporal correlation
Lempel-Ziv Coding: (LZ77, gzip, winzip) asymptotically achieve the bound for stationary process
joint entropy rate of correlated processes exploit spatial correlation
Slepian-Wolf Coding: (distributed compression) encode each process individually, achieve joint entropy rate in limit
8
Network Trace Compression naïve way: treat as byte stream, compress by generic tools
gzip compress UMass traces by a factor of 2 network traces are highly structured data
multiple fields per packet• diversity in information richness • correlation among fields
multiple packets per flow• packets within a flow share information• temporal correlation
multiple monitors traversed by a flow• most fields unchanged within the network• spatial correlation
network models explore correlation structure quantify information content of network traces serves as lower bounds/guidelines for compression algorithms
9
Packet Header Trace
source IP address
destination IP address
data sequence number
acknowledgment number
time stamp (sec.)
time stamp (sub-sec.)
total lengthToSvers. HLenIPID flags
TTL protocol header checksum
destination portsource port
window sizeHlen
fragment offset
TCP flags
urgent pointerchecksum
Timing
IP Header
TCP Header
0 16 31
10
Header Field Entropy
source IP address
destination IP address
data sequence number
acknowledgment number
time stamp (sec.)
time stamp (sub-sec.)
total lengthToSvers. HLenIPID flags
TTL protocol header checksum
destination portsource port
window sizeHlen
fragment offset
TCP flags
urgent pointerchecksum
Timing
IP Header
TCP Header
0 16 31
flow id
time
11
Single Point Packet Trace
T0 F0 T1 F1 T3 F0 Tn FnTm F0
temporal correlation introduced by flows packets from same flow closely spaced in time they share header information
packet inter-arrival: # bits per packet:
T0 F0 T3 F0 Tm F0 flow based trace:
flow record: F0 K T0
flowID
flowsize
arrivaltime packet inter-arrival
12
Network Models
flow-based model flow arrivals follow Poisson with rate flows are classified to independent flow classes acc
ording to routing (the set of routers traversed) flow i is described by:
• flow inter-arrival time: • flow ID:• flow length: • packet inter-arrival time within the flow:
packet arrival stochastic process:
13
Entropy in Flow Record
# bits per flow:
# bits per second: marginal compression ratio
determined by flow length (pkts.) and variability in pkt. inter-arrival.
14
Single Point Compression: Results
Trace H (total) Model
Ratio
Compression Algorithm
C1-in 706.3772 0.2002 0.6425
BB1-out 736.1722 0.2139 0.6574
BB2-out 689.9066 0.2186 0.6657
Compression ratio lower bound calculated by entropy much lower than real compression algorithm Real compression algorithm difference
Records IPID, packet size, TCP/UDP fields Fixed packet buffer for each flow => many flow records for long
flows
BB2-outBB1-out
router
C1-in
C2-in
15
Distributed Network Monitoring single flow recorded by multiple
monitors spatial correlation:
traces collected at distributed monitors are correlated
marginal node view:#bits/sec to represent flows seen by one node, bound on single point compression
network system view:#bits/sec to represent flows cross the network, bound on joint compression
joint compression ratio: quantify gain of joint compression
16
“perfect” network fixed routes/constant link delay/no packet loss
flow classes based on routes flows arrive with rate: # of monitors traversed: #bits per flow record:
info. rate at node v:
network view info. rate:
joint compression ratio:
Baseline Joint Entropy Model
dependence on # of monitors travered
17
Joint Compression: Results
Set of Traces Joint Compression Ratio
{C1-in, BB1-out, C2-in, BB2-out} 0.5
{C1-in, BB1-out} 0.8649
{C1-in, BB2-out} 0.8702
{C2-in, BB1-out} 0.7125
{C2-in, BB2-out} 0.6679
BB2-outBB1-out
router
C1-in
C2-in
18
Coarser Granularity Models
NetFlow model similar to flow model: joint compression result similar to full trace
SNMP model any link SNMP rate process is sum of rate processes of
all flow classes passing through that link traffic rates of flow classes are independent Gaussian entropy can be calculated by covariance of these
processes information loss due to summation
small joint information between monitors difficult to recover rates of flow classes from SNMP data
19
Joint Compression Ratio of Different Granularity
Set of Traces SNMP NetFlow Packet Trace
{C1-in, BB1-out} 1.0021 0.8597 0.8649
{C1-in, BB2-out} 0.9997 0.8782 0.8702
BB2-outBB1-out
router
C1-in
C2-in
20
Conclusion
information theoretic bound on marginal compression ratio -- ~ 20% (time+flow id, even lower if include other low entropy fields)
marginal compression ratio high (not very compressible) in SNMP, lower in NetFlow, and the lowest in full trace
joint coding is much more useful/nessassary in full trace case than in SNMP
“More entropy for your buck”
21
Future Work
network impairments how many more bits for delay/loss/route change
model netflow with sampling distributed compression algorithms lossless v.s. lossy compression entropy based monitor placement
maximize information under constraints