Upload
thomasina-burns
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Polygraph: Automatically Generating Signaturesfor Polymorphic Worms
Presented by: Devendra Salvi
Paper by: James Newsome, Brad Karp, Dawn Song
Introduction
Why automated signature generation technique ?
Learning from previous worm detection implementations
Polymorphic worm ?
Polymorphic Worm design
Characteristic of a Polymorphic worm Invariant bytes Wildcard bytes Code bytes
Creating a Polymorphic worm Assumptions
Perfectly obfuscated code
Code obfuscation
Polymorphic Worm design
The two chief sources of invariant content Exploit framing (reserved key words) Exploit payload (alter control flow)
Invariant content in polymorphic worm Apache multiple-host-header vulnerability
Apache-Knacker exploit
Unshaded area=wildcard bytes
Lightly shaded =code bytes
Heavily shaded=invariant content byte
Invariant content in polymorphic worm (contd.) BIND TSIG vulnerability
Exploited by the Lion worm.
Unshaded area=wildcard bytes
Lightly shaded =code bytes
Heavily shaded=invariant content byte
Invariant content in polymorphic worm (contd.) CodeRed AdmWorm Slapper Clet polymorphic engine
Boxed bytes are found in at least 20% of Clet’s outputs; shaded bytes are found in all of Clet’s outputs.
Polymorphic Signatures
Substring Signatures Insufficient ? A single invariant substring exists across payload instances for the same
worm; that is, the substring is sensitive, in that it will match all worm instances.
The invariant substring is sufficiently long to be specific; that is, the substring does not occur in any nonworm payloads destined for the same
IP protocol and port. Signature Classes for Polymorphic Worms
Conjunction signatures Token-subsequence signatures Bayes signatures
Polygraph
Polygraph monitor incorporates the Polygraph signature generator.
Polygraph (contd.)
Polygraph Signature Generator Signature quality Efficient signature generation Efficient signature matching Generation of small signature sets. Robustness against noise and multiple worms. Robustness against evasion and subversion.
Algorithm for signature generation Preprocessing: Token Extraction
All of the distinct substrings of a minimum length
are extracted.
e.g.. If there are ‘K’ occurrences of “http”, “ttp” will not be considered distinct unless if it appears in another ‘K’ occurrences and not as a substring of “http”
This is the first step of the algorithm which filters out irrelevant tokens of a suspicious flow.
Algorithm for signature generation (contd.) Generating single signatures
Generating Conjunction Signatures Unordered token list
Generating Token-Subsequence Signatures Ordered token list (regular expression)
E.g.. “.*one.*two.*”. “.*o.*n.*e.*z.*” Generating Bayes Signatures
Pr[L(x) = worm|x] and Pr[L(x) = worm|x]. (Pr[L(x) = worm|x] / Pr[L(x) = worm|x]) =
Pr[L(x) = worm] Õ1in Pr[xi = 1|L(x) = worm] /Pr[L(x) = worm] Õ1in Pr[xi = 1|L(x) = worm]
Practical signatures generation Generating multiple signatures
the suspicious flow pool could contain more than one type of worm, and could contain innocuous flows
Bayes algorithm implementation Conjunction algorithms require clustering
Each cluster contains similar flow Hierarchical clustering
Practical signatures generation Hierarchical Clustering
Cluster are merged iteratively. Two clusters are merged based on what the merged signature would be for each of the O(s2) pairs of clusters.
The two clusters that result in a signature with the lowest false positive rate are merged.
S1 S2 S3 S4 S5 S6
S1 S2-S3 S4 S5-S6
Performance of each Polygraph signature generation algorithm Experimental Setup:
Token-extraction threshold k = 3 , the minimum token length a = 2, and the minimum cluster size to be 3.
All experiments were run on desktop machines with 1.4 GHz Intel R Pentium R III processors, running Linux kernel 2.4.20.
Signatures for polymorphic versions of three real-world exploits are generated. the Apache-Knacker exploit the ATPhttpd exploit the BIND-TSIG exploit
Network traces. several network traces as input for and to evaluate Polygraph signature
generation, HTTP and DNS.
Results
Single polymorphic worm ApacheKnacker signatures.
For each algorithm, the correct signature is generated 100% of the time for all experiments where the suspicious pool size is greater than 2,and 0% of the time where the suspicious pool size is only 2.
Results (contd.)
Single polymorphic worm BINDTSIG signatures.
These signatures were successfully generated for innocuous pools containing at least 3 worm samples.
Results (contd.)
Single Polymorphic Worm Plus Noise
False Negatives: Clusters produce 0% false negatives while Bayes algorithm, beyond 80%, at which point the signatures cause
100%false negatives. Figures (a) and (b) show the additional false positives that result
from the addition of noise.
Results (contd.)
Multiple Polymorphic Worms Plus Noise False Negatives is similar to single polymorphic worms
plus noise
False Positives is very similar to single polymorphic worms plus noise when there is only one type of worm in the suspicious pool.
Potential attacks on Polygraph
Overtraining Attacks The conjunction and token subsequence algorithms are designed to extract the
most specific signature possible from a worm. An attacker may attempt to exploit this property to prevent the generated signature from being sufficiently general.
Innocuous Pool Poisoning An attacker could determine what signatures Polygraph would generate
for it. He could then create otherwise innocuous flows that match these signatures, and try to get them into Polygraph’s innocuous flow pool.
Long-tail Attack: An exploit could have already occurred by the time we see a full signature match.
Strengths
The paper introduces preventive measure, should there be a polymorphic worm.
Signature generation technique is automated Since the algorithms work efficiently for
polymorphic worm as well as in situation where there maybe more than one worm present in the data flow, it is practical too.
Weaknesses
Any of the signature generation algorithm when applied individually can be evaded.
In the time it comes up with a signature, the vulnerable host might be already infected.
Improvisation
All of the three mentioned algorithms can be implemented simultaneously and use the signature which has the fewest false positives and false negatives