Algorithms for Network Security
George Varghese, UCSD
Network Security Background
Current Approach: When a new attack appears, analysts work for hours (learning) to obtain a signature. Following this, IDS devices screen traffic for signature (detection)
Problem 1: Slow learning by humans does not scale as attacks get faster.Example: Slammer reached critical mass in 10 minutes.
Problem 2: Detection of signatures at high speeds (10 Gbps or higher) is hard.
This talk: Will describe two proposals to rethink the learning and detection problems that use interesting algorithms.
Dealing with Slow Learning by Humans by Automating Signature Extraction
(OSDI 2004, joint with S. Singh, C. Estan, and S. Savage)
Extracting Worm Signatures by Content Sifting
Unsupervised learning: monitor network and look for strings common to traffic with worm-like behavior
Signatures can then be used for detection.
SRC: 11.12.13.14.3920 DST: 132.239.13.24.5000 PROT: TCP
00F0 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 ................0100 90 90 90 90 90 90 90 90 90 90 90 90 4D 3F E3 77 ............M?.w0110 90 90 90 90 FF 63 64 90 90 90 90 90 90 90 90 90 .....cd.........0120 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 ................0130 90 90 90 90 90 90 90 90 EB 10 5A 4A 33 C9 66 B9 ..........ZJ3.f.0140 66 01 80 34 0A 99 E2 FA EB 05 E8 EB FF FF FF 70 f..4...........p. . .
PACKET HEADER
PACKET PAYLOAD (CONTENT)
Kibvu.B signature captured by EarlyBird on May 14th, 2004
Assumed Characteristics of Worm Behavior we used for Learning
Content PrevalencePayload of worm is seen frequently
Address DispersionPayload of worm is seen traversing between many distinct hosts
Address Dispersion Table Sources Destinations Prevalence Table
The Basic Algorithm
Detector at Vantage Point
A B
cnn.com
C
DE
1 (B)1 (A)
Address Dispersion Table Sources Destinations
1
Prevalence Table
The Basic Algorithm
Detector at Vantage Point
A B
cnn.com
C
DE
1 (A)1 (C)
1 (B)1 (A)
Address Dispersion Table Sources Destinations
1
1
Prevalence Table
The Basic Algorithm
Detector at Vantage Point
A B
cnn.com
C
DE
1 (A)1 (C)
2 (B,D)2 (A,B)
Address Dispersion Table Sources Destinations
1
2
Prevalence Table
The Basic Algorithm
Detector at Vantage Point
A B
cnn.com
C
DE
1 (A)1 (C)
3 (B,D,E)3 (A,B,D)
Address Dispersion Table Sources Destinations
1
3
Prevalence Table
The Basic Algorithm
Detector at Vantage Point
A B
cnn.com
C
DE
What are the challenges?
Computation– We have a total of 12 microseconds processing
time for a packet at 1Gbps line rate– Not just talking about processing packet headers,
need to do deep packet inspection not for known strings but to learn frequent strings.
State– On a fully-loaded 1Gbps link the basic algorithm
could generate a 1GByte table in less than 10 seconds
Idea 1: Index fixed length substrings
Approach 1: Index all substrings– Problem: too many substrings too much
computation too much state
Approach 2: Index packet as a single string– Problem: easily evadable (e.g., Witty, Email viruses)
Approach 3: Index all contiguous substrings of a fixed length ‘S’– Will track everything that is of length ‘S’ and larger
A B C D E F G H I J K
Idea 2:Incremental Hash Functions
Use hashing to reduce state.– 40 byte strings 8 byte hash
Use an Incremental hash function to reduce computation.– Rabin Fingerprint: efficient incremental hash
Approach 1: sub-sample packets– If we chose 1 in N, it will take us N times to detect the
worm
Approach 2: deterministic or random selection of offsets – Susceptible to simple evasion attacks– No guarantee that we will sample same sub-string in
every packet
Approach 3: sample based on the hash of the substring (Manber et al in Agrep)– Value Sampling: sample fingerprint if last ‘N’ bits of the
fingerprint are equal to the value ‘V’The number of bits ‘N’ can be dynamically setThe value ‘V’ can be randomized for resiliency
Insight 3 :Don’t need to track every substring
Implementing Insight 3:Value Sampling
Value Sampling Implementation: – For selecting 1/64 fingerprints Last 6 bits equal to 0
Ptrack Probability of selecting at least one substring of length S in a L byte invariant– For last 6 bits equal to 0 F=1/64– For 40 byte substrings (S = 40)
Ptrack = 99.64% for a 400 byte invariant
A B C D E F G H I J KFingerprint = 11000000
SAMPLE
Fingerprint = 10000000
SAMPLE
Fingerprint = 11000001
IGNORE
Fingerprint = 11000010
IGNORE
0.984
0.986
0.988
0.99
0.992
0.994
0.996
0.998
1
1 10 100 1000 10000 100000
Insight 4:Repeated substrings are uncommon
Can greatly reduce memory by focusing only on the high frequency content
Only 1% of the 40 byte substrings repeat more
than 1 time
Number of repeats
Cu
mu
lati
ve f
ract
ion
of
sig
na
ture
s
Implementing Insight 4:Use an approximate high-pass filter
Multi Stage Filters use randomized techniques to implement a high pass filter using low memory and few false positives [EstanVarghese02]. Similar to approach by Motwani et al.– Use the content hash as a flow identifier
Three orders of magnitude improvement over the naïve approach (1 entry/string)
Multistage Filters
Packet Window
Comparator
Comparator
Comparator
CountersHash 1
Hash 2
Hash 3
Stage 1
Stage 2
Stage 3
INSERT in Dispersion
Table If all counters
above threshold
Increment
Insight 5:Prevalent substrings with high dispersion are rare
1
10
100
1000
10000
0 10 20 30 40 50 60
Num
ber
of d
istin
ct s
igna
ture
s
Time (minutes)
S>1 AND D>1 S>30 AND D>30
Insight 5 : Prevalent substrings with high dispersion are rare
Naïve approach would maintain a list of sources (or destinations)
We only care if dispersion is high– Approximate counting suffices
Scalable Bitmap Counters– Sample larger virtual bitmap; scale and
adjust for error– Order of magnitude less memory than
naïve approach and acceptable error (<30%)
Implementing Insight 5:Scalable Bitmap Counters
Hash : based on Source (or Destination) Sample : keep only a sample of the bitmap Estimate : scale up sampled count Adapt : periodically increase scaling factor
With 3, 32-bit bitmaps, error factor = 28.5%
1 1
Hash(Source)
Error Factor = 2/(2numBitmaps-1)
High Speed Implementation:Practical Content Sifting
Memory “State” scaling– Hash of fixed sized substrings– Multi Stage Filters
Allow us to focus on the prevalent substrings Total size is 2MB
– Scalable Bitmap countersScalable counting of sources and destinations
CPU “Computation” scaling– Incremental hash functions– Value Sampling
1/64 sampling detects all known worms
Implementing Content Sifting
KEY Repeats Sources
Destinations
FoundADTEntry?
Key = RabinHash(“IAMA”) (0.349, 0.037)
IAMAWORM
ADTEntry=Find(Key) (0.021)
Address Dispersion Table
Prevalence Table
YES
isprevalence >
thold
YES
valuesample
key
NO
Update Multistage Filter
(0.146)
Update Entry (0.027)
Create & Insert Entry (0.37)
Multi-stage Filter(Dynamic Per Port Thresholds)
Scaling bitmap counters (5 bytes)
0.042us per byte (in software implementation), with 1/64 value sampling
Deployment Experience
1: Large fraction of the UCSD campus traffic, – Traffic mix: approximately 5000 end-hosts, dedicated
servers for campus wide services (DNS, Email, NFS etc.)– Line-rate of traffic varies between 100 & 500Mbps.
2: Fraction of local ISP Traffic, (DEMO)– Traffic mix: dialup customers, leased-line customers – Line-rate of traffic is roughly 100Mbps.
3: Fraction of second local ISP Traffic, – Traffic mix: inbound / outbound traffic into a large
hosting center. – Line-rate is roughly 300Mbps.
False Positives we encountered
Common protocol headers– Mainly HTTP and SMTP
headers– Distributed (P2P) system
protocol headers– Procedural whitelist
Small number of popular protocols
Non-worm epidemic Activity– SPAM
GNUTELLA.CONNECT /0.6..X-Max-TTL: .3..X-Dynamic-Qu erying:.0.1..X-V ersion:.4.0.4..X -Query-Routing:. 0.1..User-Agent: .LimeWire/4.0.6. .Vendor-Message: .0.1..X-Ultrapee r-Query-Routing:
Other Experience:
Lesson 1: From experience, static whitelisting is still not sufficient for HTTP and P2P. We needed other more dynamic white listing techniques
Lesson 2: Signature selection is key. From worms like Blaster, we get several options. A major delay today in signature release is “vetting” signatures.
Lesson 3: Works better for vulnerability based mass attacks; does not work for directed attacks or attacks based on social engineering where rep rate is low,
Lesson 4: Major IDS vendors have moved to vulnerability signatures. Automated approaches to this (CMU) are very useful but automated exploit signature detection may also be useful as an addition piece of defense in depth for truly Zero day stuff.
Related Work and issues
3 roughly concurrent pieces of work: Autograph (CMU), Honeycomb (Cambridge) and EarlyBird (us). EarlyBird is only
Further work at CMU extending Autograph to polymorphic worms (can do with Earlybird in real-time as well). Automating vulnerability sigs
Issues: encryption, P2P false positives like Bit Torrent, etc.
Part 2: Detection of Signatures with Minimal Reassembly
(to appear in SIGCOMM 06, joint with F. Bonomi and A.. Fingerhut of Cisco Systems)
Field Extraction
Equal to 1 ?
Equal to 1
Equal to 1 ?
BitMapHash 1
Hash 2
Hash 3
Stage 1
Stage 2
Stage 3
ALERT ! If
all bitsare set
Membership Check via Bloom Filter
Set
Example 1: String Matching (Step 1: Sifting using Anchor Strings)
A0
A1
An
String Database to Block
A2
ST0
ST1
ST2
STn
Anchor Strings
Bloom Filter
Hash Function
Sushil Singh, G. Varghese, J. Huber, Sumeet Singh, Patent Application
String Matching Step 2: Standard hashing
A0
A1
An
A2
ST0
ST1
ST2
STn
Hash Function
Hash Bucket-0
Hash Bucket-1
Hash Bucket-m
Matching Step 3: Bit Trees instead of chaining
A2
A8
A11
ST2
ST8
ST11
A17ST17
1
0
0
1
LOC L1
A8
A11
ST8
ST11
A2ST2
A17ST17
L1
L2
L3
ST8
ST11
ST2
ST17
0
1
1
1
0
0
1
0
1
0
LOC L2
LOC L3
Strings in a single hash bucket
Problem is harder than it may appear
Network IDS devices are beginning to be folded into network devices like switches. Cisco, Force10
Instead of having appliances that work at 1 or 2 Gbps, we need IDS line cards (or better still chips) that scale to 10-20 Gbps.
Because attacks can be fragmented into pieces which can be sent out of order and even with inconsistent data, the standard approach has been to reassemble the TCP stream and to normalize the stream to remove inconsistency.
Theoretically, normalization requires storing 1 round trip delay worth per connection, which at 20 Gbps is huge, not to mention the computation to index this state.
Worse, have to do Reg-Ex not just exact match (Cristi)
Headache: dealing with Evasions
SEQ = 13, DATA = “ACK” SEQ = 10, DATA = “ATT”
SEQ =10, TTL = 10, “ATT” SEQ = 13, TTL = 1, “JNK” . . SEQ = 13, “ACK”
SEQ = 10, “ATTJNK”
SEQ = 13, ACK
THE CASE OF THE INTERSPERSED CHAFF
THE CASE OF THE MISORDERED FRAGMENT
THE CASE OF THE OVERLAPPING SEGMENTS
Conclusions
Surprising what one can do with network algorithms. At first glance, learning seems much harder than lookups or QoS.
Underlying principle in both algorithms is “sifting”: reducing traffic to be examined to a manageable amount and then doing more cumbersome checks.
Lots of caveats in practice: moving target