Evaluation of Header Field Entropy for Evaluation of Header Field Entropy for Hash-Based Packet SelectionHash-Based Packet Selection
Christian Henke, Carsten Schmoll, Tanja Zseby
Fraunhofer Institute FOKUS, Berlin, Germany
Evaluation of Header Field Entropy for Hash-Based Packet Selection
PAM 2008, Cleveland
Outline Outline
2
1. Introduction Multipoint Sampling
2. Problem Statement
3. Approach
4. Measurement Setup
5. Measurement Results
6. Conclusion
Evaluation of Header Field Entropy for Hash-Based Packet Selection
PAM 2008, Cleveland
IntroductionIntroduction Multipoint Sampling Multipoint Sampling
3
Passive Multipoint Measurements– at observation points a packet ID and timestamp exported for each
packet
– trace observable based on occurrence of packet ID – delay = timestamp A – timestamp B of packets with equal ID
Multipoint Collector
Point A Point B
Point C
Evaluation of Header Field Entropy for Hash-Based Packet Selection
PAM 2008, Cleveland
IntroductionIntroduction Multipoint Sampling Multipoint Sampling
4
CChallenge in Passive Multipoint Measurements immense amounts of measurement data High infrastructure costs: processing, storing, exporting
Random Packet Selection and Estimation
Random Sampling (n-out-of-N, probabilistic) unsuitable -> inconsistent sample at observation points
Duffield and Grossglauser in “Trajectory Sampling for Direct Traffic Observation” propose hash-based packet selection.
Evaluation of Header Field Entropy for Hash-Based Packet Selection
PAM 2008, Cleveland
IntroductionIntroduction Multipoint Sampling Multipoint Sampling
5
IP Header Transport Header Payload
hash input
hash function
packet selected packet not selected
RSSxh ;)( Sxh )()(xh
consistent selected subset if x, h and S are equal at all observation points
RDh :
Dx
Hash-Based Paket Selection
Evaluation of Header Field Entropy for Hash-Based Packet Selection
PAM 2008, Cleveland
Problem StatementProblem Statement
Which packet content to use as hash input?
Requirements for header fields 1. static between network nodes ( IP TTL and checksum)
2. variable among packets
Challenge: HBS is deterministic; but goal is to emulate random selection choice of hash input can introduce bias to the selection
6
Evaluation of Header Field Entropy for Hash-Based Packet Selection
PAM 2008, Cleveland
Problem StatementProblem Statement
7
How bias is introduced
- packets in a hash input collision have same hash input - selection decision is not independent- the more packets in collision the more grievous the bias- unsuitable to use whole packet because hash value calculation time
increases with hash input length
Evaluation of Header Field Entropy for Hash-Based Packet Selection
PAM 2008, Cleveland
ApproachApproach
Approach– packets differ more often in high variable bytes– entropy per byte used to measure variability
Entropy
InformationEfficiency
pi probability that hash value i occurs
H(B) entropy dependent on discrete Variant of Byte Values
8
Evaluation of Header Field Entropy for Hash-Based Packet Selection
PAM 2008, Cleveland
Evaluation dependent on analyzed traces - 6 IPv4 trace groups – 1 IPv6 - geographical locations (NZ, AUT, FR, NED – 2 LEO)- network location (university, peering point, large ISP)- application mix
Measurement SetupMeasurement Setup
9
Trace Name Location Duration packets in millions
IP Address anonymized
IP Version TCP UDP ICMP others main applications %
NZIX New Zealand 30 hours ~200 Yes 4 68 20 9 3 http(50) quake(5)FH Salzburg Austria 3 days ~110 Yes 4 99 1 0 0 http(90)LEO 1 3 hours ~130 No 4 1 90 10 0 edonkey(25)LEO 2 6 min ~12 No 4 33 60 0 7 tunnel(60) edonkey(10)Twente Netherlands 10 days ~380 Yes 4 89 7 1 3 diversifiedCiril France 6 hours ~100 No 4Mawi WIDE-6Bone 40 days ~80 Yes 6
Evaluation of Header Field Entropy for Hash-Based Packet Selection
PAM 2008, Cleveland
Measurement ResultsMeasurement Results
Entropy IPv4
10
Evaluation of Header Field Entropy for Hash-Based Packet Selection
PAM 2008, Cleveland
Measurement ResultsMeasurement Results
High Entropy Header Fields IPv4: Identification, Length LSB, Src/Dst Address 2 LSB TCP: Chksum, SeqNo, AckNo, Src/Dst Port 2 LSB UDP: Chksum, Length LSB, Src/Dst Port 2 LSB ICMP: Chksum, Bytes 12,13,18,19 IPv6: Length LSB
– more IPv6 traces required for further evaluation– Addresses anonymized and no transport header - only 8 bytes could be evaluated
Recommended 8 byte Configuration IP ID field + 6 Transport Header Bytes:
TCP (Checksum, 2 LSB of Seq and AckNo) UDP (Checksum, Source Port, LSB Destination Port, LSB Length) ICMP (Checksum, Bytes 12,13,18,19)
11
Evaluation of Header Field Entropy for Hash-Based Packet Selection
PAM 2008, Cleveland
Measurement ResultsMeasurement Results
12
Empirical Hash Input Collisions Evaluation 4 configurations used
1. whole IP and transport header (minimum reachable collisions)
2. only IP header (bad configuration)
3. 8 high entropy bytes
4. Molina‘s 16 bytes sum of packets on 20 largest collisions of each trace
– Large collision: all or none decision of all packets that have same attributes– Small collisions: packets equal in one collision but different between
Evaluation of Header Field Entropy for Hash-Based Packet Selection
PAM 2008, Cleveland
Measurement ResultsMeasurement Results
Hash Input Collision Comparison
recommended 8 bytes better than Molina’s 16 bytes LEO2 traces include a large VPN traffic flow with UDP Checksum==0 – more high
entropy bytes should be used
13
Trace Group Trace Files Packets/file in millions
Identical IP + Transport
header Identical IP
Header Recomm.
8 BytesMolina’s 16
Bytes FH Salzburg 18 6 3,547 238,174 3,547 3,547 NZIX 19 10 484,034 1,564,246 484,405 1,562,066 Twente 36 10 13,120 475,570 16,004 49,477 LEO 1 12 10 61,072 450,273 73,730 86,809 LEO 2 1 10 949 8,116 7,919 1,121
Evaluation of Header Field Entropy for Hash-Based Packet Selection
PAM 2008, Cleveland
ConclusionConclusion
Outcome give a recommendation of 8 bytes for use as hash input for HBS 8 recommended bytes sufficient to gain unique hash inputs
Henke, Schmoll, Zseby “Empirical Evaluation of Hash Functions for Multipoint Measurements”
hash calculation time linear increase with input length hash functions are able to select representative subset based on 8 bytes
14
Evaluation of Header Field Entropy for Hash-Based Packet Selection
PAM 2008, Cleveland
Future Work
Correlation between Bytes Correlation between address bytes entropy of combined bytes expected to be average of entropy
IPv6
entropy evaluation of IPv6 addresses transport headers