46
Polygraph: Automatically Generating Signatures for Polymorphic Worms James Newsome * , Brad Karp *† , and Dawn Song * Intel Research Pittsburg Carnegie Mellon University

Polygraph: Automatically Generating Signatures for Polymorphic Worms

  • Upload
    jennis

  • View
    46

  • Download
    0

Embed Size (px)

DESCRIPTION

Polygraph: Automatically Generating Signatures for Polymorphic Worms. James Newsome * , Brad Karp *† , and Dawn Song *. * Carnegie Mellon University. † Intel Research Pittsburgh. Internet Worms. Definition: Malicious code that propagates by exploiting software No human interaction needed - PowerPoint PPT Presentation

Citation preview

Page 1: Polygraph: Automatically Generating Signatures for Polymorphic Worms

Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome*,Brad Karp*†, and Dawn Song*

†Intel Research Pittsburgh*Carnegie Mellon University

Page 2: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 20052

Internet Worms• Definition: Malicious code that

propagates by exploiting software

• No human interaction needed• Able to spread very quickly

• Slammer scanned 90% of Internet in 10 minutes

Page 3: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 20053

Proposed Defense Strategy

!

WormDetected!

•Honeycomb [Kreibich2003] •Autograph [Kim2004] •Earlybird [Singh2004]

Page 4: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 20054

Challenge: Polymorphic Worms

• Polymorphic worms minimize invariant content• Encrypted payload• Obfuscated decryption

routine• Polymorphic tools are

already available• Clet,ADMmutate

Do good signatures for polymorphic worms exist?

Can we generate them automatically?

Page 5: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 20055

Good News: Still some invariant content

GET Host: PayloadPart 2HTTP/1.1URL Host: Payload

Part 1RandomHeaders

RandomHeaders

RandomHeaders

DecryptionRoutine

DecryptionKey

EncryptedPayload \xff\xbfNOP

slide

•Protocol framing•Needed to make server go down vulnerable code path

•Overwritten Return Address•Needed to redirect execution to worm code

•Decryption routine•Needed to decrypt main payload•BUT, code obfuscation can eliminate patterns here

Page 6: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 20056

Bad News: Previous Approaches Insufficient

• Previous approaches use a common substring• Longest substring

• “HTTP/1.1”• 93% false positive rate

• Most specific substring• “\xff\xbf”• .008% false positive rate (10 / 125,301)

DecryptionRoutine

DecryptionKey

EncryptedPayload \xff\xbfNOP

slide

GET Host: PayloadPart 2HTTP/1.1URL Host: Payload

Part 1RandomHeaders

RandomHeaders

RandomHeaders

Page 7: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 20057

What to do?• No one substring is specific enough• BUT, there are multiple substrings

• Protocol framing• Value used to overwrite return address• (Parts of poorly obfuscated code)

• Our approach: combine the substrings

Page 8: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 20058

Outline• Substring-based signatures insufficient• Generating signatures• Perfect (noiseless) classifier case

• Signature classes & algorithms• Evaluation

• Imperfect classifier case• Clustering extensions• Evaluation

• Attacking the system• Conclusion

Page 9: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 20059

Goals• Identify classes of signatures that can:

• Accurately describe polymorphic worms• Be used to filter a high speed network line• Be generated automatically and efficiently

• Design and implement a system to automatically generate signatures of these classes

Page 10: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 200510

Polygraph Architecture

NetworkTap

FlowClassifier

SignatureGenerator

Suspicious FlowPool

Innocuous FlowPool

WormSignatures

Page 11: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 200511

Outline• Substring-based signatures insufficient• Generating signatures• Perfect (noiseless) classifier case

• Signature classes & algorithms• Evaluation

• Imperfect classifier case• Clustering extensions• Evaluation

• Attacking the system• Conclusion

Page 12: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 200512

Signature Class (I): Conjunction• Signature is a set of strings (tokens)• Flow matches signature iff it contains all

tokens in the signature• O(n) time to match (n is flow length)• Generated signature:

• “GET” and “HTTP/1.1” and “\r\nHost:” and “\r\nHost:” and “\xff\xbf”

• .0024% false positive rate (3 / 125,301)

DecryptionRoutine

DecryptionKey

EncryptedPayload \xff\xbfNOP

slide

GET Host: PayloadPart 2HTTP/1.1URL Host: Payload

Part 1RandomHeaders

RandomHeaders

RandomHeaders

Page 13: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 200513

Generating Conjunction Signatures• Use suffix tree to find set of tokens that:

• Occur in every sample of suspicious pool• Are at least 2 bytes long

• Generation time is linear in total byte size of suspicious pool

• Based on a well-known string processing algorithm [Hui1992]

Page 14: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 200514

Signature Class (II): Token Subsequence• Signature is an ordered set of tokens• Flow matches iff it contains all the tokens in

signature, in the given order• O(n) time to match (n is flow length)• Generated signature:

• GET.*HTTP/1.1.*\r\nHost:.*\r\nHost:.*\xff\xbf• .0008% false positive rate (1 / 125,301)

DecryptionRoutine

DecryptionKey

EncryptedPayload \xff\xbfNOP

slide

GET Host: PayloadPart 2HTTP/1.1URL Host: Payload

Part 1RandomHeaders

RandomHeaders

RandomHeaders

Page 15: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 200515

Generating Token Subsequence Signatures

• Use dynamic programming to find longest common token subsequence (lcseq) between 2 samples in O(n2) time• [SmithWaterman1981]

• Find lcseq of first two samples• Iteratively find lcseq of intermediate

result and next sample

Page 16: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 200516

Experiment: Signature Generation• How many worm samples do we need?• Too few samples

signature is too specific false negatives

• Experimental setup• Using a 25 day port 80 trace from lab perimeter• Innocuous pool: First 5 days (45,111 streams)• Suspicious Pool:

• Using Apache exploit described earlier• Non-invariant portions filled with random bytes

• Signature evaluation:• False positives: Last 10 days (125,301 streams)• False negatives: 1000 generated worm samples

Page 17: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 200517

Signature Generation Results

# Worm Samples

Conjunction Subseq

2 100% FN 100% FN

3 to 100 0% FN .0024% FP

0% FN.0008% FP

GET .* HTTP/1.1\r\n.*\r\nHost: .*\xee\xb7.*\xb2\x1e.*\r\nHost: .*\xef\xa3.*\x8b\xf4.*\x89\x8b.*E\xeb.*\xff\xbf

GET .* HTTP/1.1\r\n.*\r\nHost: .*\r\nHost:.*\xff\xbf

Page 18: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 200518

Also Works for Binary Protocols• Created polymorphic version of

BIND TSIG exploit used by Li0n Worm• Single substring signatures:

• 2 bytes of Ret Address: .001% false positives• 3 byte TSIG marker: .067% false positives

• Conjunction: 0% false positives• Subsequence: 0% false positives

• Evaluated using a 1 million request trace from a DNS server that serves a major university and several CCTLDs

Page 19: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 200519

Outline• Substring-based signatures insufficient• Generating signatures• Perfect (noiseless) classifier case

• Signature classes & algorithms• Evaluation

• Imperfect classifier case• Clustering extensions• Evaluation

• Attacking the system• Conclusion

Page 20: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 200520

Noise in Suspicious Flow Pool• What if classifier has false positives?• 3 worm samples:

• GET .* HTTP/1.1\r\n.*\r\nHost: .*\r\nHost:.*\xff\xbf

• 3 worm samples + 1 legit GET request:• GET .* HTTP/1.1\r\n.*\r\nHost:

• 3 worm samples + a non-HTTP request:• .*

Page 21: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 200521

Our Approach: Hierarchical Clustering• Used for multiple sequence alignment in

Bioinformatics [Gusfield1997]• Initialization:

• Each sample is a cluster• Each cluster has a signature matching all samples in

that cluster• Greedily merge clusters

• Minimize false positive rate, using innocuous pool• Stop when any further merging results in

significant false positives• Output the signature of each final cluster of

sufficient size

Page 22: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 200522

Hierarchical Clustering

WormSample 1

InnocSample 1

WormSample 2

InnocSample 2

WormSample 3

MergeCandidate

Common substrings:HTTP/1.1, GET, …High false positive rate!

Page 23: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 200523

Hierarchical Clustering

WormSample 1

InnocSample 1

WormSample 2

InnocSample 2

WormSample 3

MergeCandidate

Common substrings:HTTP/1.1, GET, …High false positive rate!

Page 24: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 200524

Hierarchical Clustering

WormSample 1

InnocSample 1

WormSample 2

InnocSample 2

WormSample 3

Common substrings:HTTP/1.1, GET, \xff\xbf, \xde\xadLow false positive rate(but high false negative rate)

MergeCandidate

Page 25: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 200525

Hierarchical Clustering

WormSample 1

InnocSample 1

WormSample 2

InnocSample 2

WormSample 3

Cluster

Cluster HTTP/1.1, GET, \xff\xbf, \xde\xad

HTTP/1.1, GET, \xff\xbf

Page 26: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 200526

Clustering Evaluation (with noise)• Suspicious pool consists of:

• 5 polymorphic worm samples• Varying number of noise samples

• Noise samples chosen uniformly at random from evaluation trace

• Clustering uses innocuous pool to estimate false positive rate

Page 27: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 200527

Clustering Results

Noise ConjunctionFpos Fneg

SubseqFpos Fneg

0% .0024% 0% .0008% 0%38% .0024% 0% .0008% 0%50% .0024% 0% .0008% 0%80% .0024% 0%

.7470% 100%.0008% 0%1.109% 100%

90% .0024% 0%.3384% 100%.4150% 100%

.0008% 0%

.6903% 100%1.716% 100%

Page 28: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 200528

Outline• Substring-based signatures insufficient• Generating signatures• Perfect (noiseless) classifier case

• Signature classes & algorithms• Evaluation

• Imperfect classifier case• Clustering extensions• Evaluation

• Attacking the system• Conclusion

Page 29: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 200529

Overtraining Attacks• Conjunction and Subsequence can be tricked

into overtraining• Red herring attack

• Include extra fixed tokens• Remove them over time• Result: Have to keep generating new signatures

• Coincidental pattern attack• Create ‘coincidental’ patterns given a small set of

worm samples• Result: more samples needed to generate a

low-false-negative signature (50+)

Page 30: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 200530

Solution: Threshold matching• Signature classifies as worm if enough tokens are

present• Implementation: Bayes Signatures

• Assign each token a score based on Bayes Law• Choose highest-acceptable false positive rate• Choose threshold that gets at most that rate in innocuous

training pool• Properties:

Signatures generated and matched in linear time Not susceptible to overtraining attacks Don’t need clustering You get the false positive rate you specify Currently does not use ordering

Page 31: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 200531

Outline• Substring-based signatures insufficient• Generating signatures• Perfect (noiseless) classifier case

• Signature classes & algorithms• Evaluation

• Imperfect classifier case• Clustering extensions• Evaluation

• Attacking the system• Conclusion

Page 32: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 200532

Remaining False Positives• Conjunction signature has 3 false positives• 1 of these also matched by subsequence

signature• What is causing these?• Would it be so bad if 3 legitimate requests were

filtered out every 10 days?

Page 33: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 200533

The Offending RequestGET /Download/GetPaper.php?paperId=XXX HTTP/1.1…Host: nsdi05.cs.washington.edu\r\n…POST /Author/UploadPaper.php HTTP/1.1\r\n…Host: nsdi05.cs.washington.edu\r\n…<binary data containing \xff\xbf>

Page 34: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 200534

Possible Fixes• Use protocol knowledge

• Match on request level instead of TCP flow level• Require \xff\xbf be part of Host header• Disadvantage: need protocol knowledge

• Use distance between tokens• Makes signatures more specific• Disadvantage: risks more overtraining attacks

Page 35: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 200535

Future Work• Defending against overtraining• Further reducing false positives

• Could be reduced by learning more features (such as offsets)

• But this increases risk of overtraining• Promising solution: semantic analysis

• Automatically analyze how worm exploit works• Only use features that must be present• First steps in Newsome05 (NDSS)• Currently extending this work (Brumley-Newsome-Song)

Page 36: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 200536

Conclusions• Key observation: Content variability is

limited by nature of the software vulnerability

• Have shown that:• Accurate signatures can be automatically

generated for polymorphic worms• Demonstrated low false positives with real

exploits, on real traffic traces

Page 37: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 200537

Thanks!• Questions?• Contact: [email protected]

Page 38: Polygraph: Automatically Generating Signatures for Polymorphic Worms
Page 39: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 200539

• Conjunction & Subsequence may overtrain• Coincidental pattern attack:

• For non-invariant bytes, choose ‘a’ or ‘b’• Result:

• Suspicious pool has many substrings in common of form: ‘aabba’, ‘babba’…

• Unseen worm samples will have many of these substrings, but not every one

Coincidental Pattern Attack

Page 40: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 200540

Results with “Coincidental Pattern Attack”

•False negatives:

Suspicious Pool Size

Page 41: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 200541

Results: Multiple Worms + NoiseNoise Conjunction Subseq Bayes

0% .0024% 0% .0008% 0% .008% 0%38% .0024% 0% .0008% 0% .008% 0%50% .0024% 0% .0008% 0% .008% 0%80% .0024% 0%

.7470% 100%.0008% 0%1.109% 100%

.008% 0%

90% .0024% 0%.3384% 100%.4150% 100%

.0008% 0%

.6903% 100%1.716% 100%

10% 100%

Page 42: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 200542

The Innocuous Pool• Used to determine:

• How often tokens appear in legit traffic• Estimated signature false positive rates

• Goals:• Representative of current traffic• Does not contain worm flows

• Can be generated by:• Taking a relatively old trace• Filtering out known worms and exploits

Page 43: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 200543

Key Algorithm: Token Extraction• Need to identify useful tokens

• Substrings that occur in worm samples• Problem: Find all substrings that:

• Occur in at least k out of n samples• Are at least x bytes long

• Can be solved in time linear in total length of samples using a suffix tree

Page 44: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 200544

Signature Class (III): Bayes• Use a Bayes classifier• Presence of a token is a feature

• Hence, each token has a score:

•Generated signature:•(‘GET’: .0035, ‘Host:’: .0022, ‘HTTP/1.1’: .11, ‘\xff\xbf’: 3.15) Threshold=1.99 •.008% false positive rate (10 / 125,301)

Page 45: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 200545

Generating Bayes Signatures• Use suffix tree to find tokens that occur in a

significant number of samples• Determine probabilities:

• Pr(worm) = Pr(~worm) = .5• Pr(substring|worm): use suspicious pool• Pr(substring|~worm): use innocuous pool

• Set a “certainty threshold” c• Signature matches a flow if the Bayes formula identifies

it as more than c% likely to be a worm• Choose c that results in few (< 5) false positives in

innocuous pool

Page 46: Polygraph: Automatically Generating Signatures for Polymorphic Worms

James Newsome May, 200546

Innocuous Pool Poisoning• Before releasing worm:

• Determine what signature of worm is• Flood Internet with innocuous requests that match

• Eventually included in innocuous training pool• Release worm• Polygraph will:

• Generate signature for worm• See that it causes many false positives in innocuous pool• Reject signature

• Solution:• Use a relatively old trace for innocuous pool• Drawback: Hierarchical clustering generates more spurious

signatures