40
Logic-based, data-driven enterprise network security analysis Xinming (Simon) Ou Assistant Professor CIS Department Kansas State University COS 598D: Formal Methods in Networking Princeton University March 08, 2010 1

Logic-based, data-driven enterprise network security analysis Xinming (Simon) Ou Assistant Professor CIS Department Kansas State University COS 598D: Formal

Embed Size (px)

Citation preview

Logic-based, data-driven enterprise network security analysis

Xinming (Simon) OuAssistant Professor

CIS Department

Kansas State University

COS 598D: Formal Methods in Networking

Princeton University

March 08, 2010

1

Self Introduction

• Brief Bio– PhD, Princeton University, 2005

– Post-doc, Purdue CERIAS, Idaho National Laboratory, 2006

– Assistant Professor, Kansas State University, 2006-now

• Research Interests– Computer and network security, especially on formal and quantitative

analysis

– Programming languages, formal methods

• Research Group– Argus: http://people.cis.ksu.edu/~xou/argus/

2

Overview of the two lectures

• Lecture One– Datalog model for network attacks– SLG resolution for Datalog evaluation– Exhaustive proof generation for Datalog

• Lecture Two– Formulating security hardening problem as a SAT

solving problem– Applying MinCostSAT to achieve optimal security

configuration– Open research problems

3

Cyber Defender’s Life

Security advisories

Apache1.3.4bug!

Vulnerability reports

Network configuration

IDS alertsUsers and data assets

Reasoning System

Automated Situation Awareness

4

Multi-step Attacks

Internet

Demilitarized zone (DMZ)

Corporation

webServer

workStationwebPages

fileServer

Firewall 2

buffer

overrun

Trojan horsesharedBinaryNFS shell

Firewall 1

5

Two Questions

• Are there potential attack paths in the system?– How can they happen?– How can they be addressed in an optimal way?

• Are there attacks that are going on/have succeeded in the system?– How do you know?– How to counter the attack?

What we are going to focus on

6

MulVAL

Datalog Rules from Security Experts

Vulnerability Scanner

Analyzer

Could root be compromised on any of

the machines?Ou, Govindavajhala, and Appel. Usenix Security 2005

Answers

Network Analyzer

Vulnerability Information (e.g.

NIST NVD)

Network reachability information

Vulnerability definition (e.g. OVAL, Nessus

Scripting Language)

User information

Vulnerability Scanner

7

Network config(firewall analyzer)

Host access-control lists

reachable(internet, webServer, tcp, 80)reachable(webServer, fileserver, nfs, -)

.

.

.

8

Host config scanner

File permissions

fileOwner(webServer, /bin/apache, root)

fileAttr(webServer, /bin/apache, r,w,x,r,0,0,r,0,0)

9

Host-based vulnerability scanner

Installed software

vulExists(webserver, ‘CVE-2006-3747’, httpd)

vulExists(dbServer, 'CVE-2009-2446', mySQL).

… …

10

US-CERTNVD

Apache1.3.4bug!

Security advisories

vulProperty('CVE-2006-3747', remote, privEscalation).

vulProperty('CVE-2009-2446', remote, privEscalation).

… …

11

Security expert

Datalog Rules

execCode(Host, PrivilegeLevel) :- vulExists(Host, Program, remote, privilegeEscalation), serviceRunning(Host, Program, Protocol, Port, PrivilegeLevel), networkAccess(Host, Protocol, Port).

Linux security behavior;Windows security behavior;Common attack techniques

The rules are completely independent of any site-specific

settings. 12

Rule for NFS

dmz

corp

webServer

webPagesfileServer

sharedBinaryNFS shell

accessFile(Server, Access, Path) :-

nfsExport(Server, Path, Access, Client),

reachable(Client, Server, nfs, -),

execCode(Client, _Perm).

13

Rule for Trojan Horse

corp

workStation

webPagesfileServer

Trojan horseprojectPlan

sharedBinary

execCode(H, User) :- accessFile(H, write, Path), fileOwner(H, Path, User).

14

Deducing new facts

execCode(Host, PrivilegeLevel) :- vulExists(Host, Program, remote, privilegeEscalation), serviceRunning(Host, Program, Protocol, Port, PrivilegeLevel), networkAccess(Host, Protocol, Port).

internet

dmzwebServer

Firewall 1

vulExists(webServer, httpd, remote, privilegeEscalation).

serviceRunning(webServer, httpd, tcp, 80, apache).

networkAccess(webServer, tcp, 80).

execCode(attacker, webServer, apache).Oops!

From Vulnerability Scanner & NVD

From Vulnerability Scanner

Derived

15

Advantages of using Prolog

• Prolog’s goal-oriented evaluation is potentially more efficient.

• Prolog provides more programming flexibility.

Can we evaluate Datalog programs in Prolog?

16

However…

• Prolog as a programming language cannot be directly used to evaluate Datalog

ancestor(X,Y) :- parent(X,Y).

ancestor(X,Y) :- parent(X,Z), ancestor(Z,Y).

parent(bill,mary).

parent(mary,john).

?- ancestor(X,Y).

17

However…

• Prolog as a programming language cannot be directly used to evaluate Datalog

ancestor(X,Y) :- parent(X,Y).

ancestor(X,Y) :- ancestor(Z,Y), parent(X,Z).

parent(bill,mary).

parent(mary,john).

?- ancestor(X,Y).

18

However…

• Prolog as a programming language cannot be directly used to evaluate Datalog

ancestor(X,Y) :- ancestor(Z,Y), parent(X,Z).

ancestor(X,Y) :- parent(X,Y).

parent(bill,mary).

parent(mary,john).

?- ancestor(X,Y).

19

Z2=john

X=mary

Y=john

Y=john

X=bill

Y=mary

Problem of SLD resolutionancestor(X,Y) :- parent(X,Y).

ancestor(X,Y) :- parent(X,Z), ancestor(Z,Y).

parent(bill,mary).

parent(mary,john).

parent(X,Y).

Success

Success

parent(X,Z), ancestor(Z,Y).

ancestor(X, Y).

X=bill

Z=mary

ancestor(mary,Y).

parent(mary,Y).

Success

parent(mary,Z2), ancestor(Z2,Y).

…Failure

…Failure

ancestor(john,Y).

X=mary

Z=john

ancestor(john,Y).

20

Problem of SLD resolution

ancestor(X, Y).

ancestor(X,Y) :- ancestor(Z,Y), parent(X,Z).

ancestor(X,Y) :- parent(X,Y).

parent(bill,mary).

parent(mary,john).

ancestor(Z, Y), parent(X, Z).

ancestor(Z1, Y), parent(Z, Z1), parent(X, Z).

ancestor(Z2, Y), parent(Z1, Z2), parent(Z, Z1), parent(X, Z).

21

Problem of SLD resolution

• Termination of cyclic Datalog programs not only depends on logical semantics, but also the order of the clauses and subgoals.– This creates problems since in network security

analysis, such cyclic rules are common place.• e.g. after compromising one machine, the attacker can use it as a

stepping stone to compromise another.

– Datalog is a declarative language; thus order should not matter.

– A pure Datalog program shall always terminate due to the bound on the number of tuples.

22

Bottom-up Evaluation

Semi-naïve Evaluation:

Step(1) (base case)ancestor(bill,mary),ancestor(mary,john)

Step(2)Iteration 1ancestor(bill, john)

Iteration 2No new tuples (“fixpoint”)

ancestor(X,Y) :- ancestor(Z,Y), parent(X,Z).

ancestor(X,Y) :- parent(X,Y).

parent(bill,mary).

parent(mary,john).

23

SLG Resolution

• Goal-oriented evaluation• Predicates can be “tabled”

– A table stores the evaluation results of a goal.– The results can be re-used later, i.e. dynamic

programming.– Entering an active table indicates a cycle.– Fixpoint operation is taken at such tables.

• The XSB system implements SLG resolution– Developed by Stony Brook (http://xsb.sourceforge.net/ ).– Provides full ISO Prolog compatibility.

24

Z=bill

Y=mary

SLG resolution example

ancestor(X, Y).

ancestor(X,Y) :- ancestor(Z,Y), parent(X,Z).

ancestor(X,Y) :- parent(X,Y).

parent(bill,mary).

parent(mary,john).

ancestor(Z, Y), parent(X, Z).

25

generator nodenew table created for ancestor(X,Y)

active noderesolve ancestor(Z,Y) against the results in the table for ancestor(X,Y)

parent(X, bill).

parent(X,Y). X=mary

Y=john

X=bill

Y=mary

Success

Success

Failure

Z=mary

Y=john

parent(X, mary).

X=bill Success

Z=bill

Y=john

parent(X, bill). Failure

SLG in MulVAL

netAccess(H2, Protocol, Port) :-

execCode(H1, User),

reachable(H1, H2, Protocol, Port).

netAccess(…)

Possible instantiations

table for goal

execCode(…)

Possible instantiations

table for first subgoal

from input tuples

26

SLG complexity for Datalog

• Total time dominated by the rule that has the maximum number of instantiations– Time for computing one table = Computation of the subgoals + retrieving information from input tuples + matching results in the rules bodies– Time for computing all tables = retrieving information from input tuples + matching results in the rules’ bodies

• See “On the Complexity of Tabled Datalog Programs” http://www.cs.sunysb.edu/~warren/xsbbook/node21.html

27

MulVAL complexity in SLG

execCode(Attacker, Host, User) :- vulExists(Host, _, Program, remote, privilegeEscalation), networkService(Host, Program, Protocol, Port, User), netAccess(Attacker, Host, Protocol, Port).

Scale with network size

O(N) different instantiations

28

netAccess(Attacker, H2, Protocol, Port) :-

execCode(Attacker, H1, _),

reachable(H1, H2, Protocol, Port).

MulVAL complexity in SLG

Scale with network size

O(N2) different instantiations

Complexity of MulVAL

29

Datalog proof generation

• In security analysis, not only do we want to know what attacks could happen, but also we want to know how attacks can happen– Thus, we need more than an yes/no answer for

queries.– We need the proofs for the true queries, which in the

case of security analysis will be attack paths.– We also want to know all possible attack paths; thus

we need exhaustive proof generation.

30

An obvious approach

31

execCode(Host, PrivilegeLevel) :- vulExists(Host, Program, remote, privilegeEscalation), serviceRunning(Host, Program, Protocol, Port, PrivilegeLevel), networkAccess(Host, Protocol, Port).

execCode(Host, PrivilegeLevel, Pf) :- vulExists(Host, Program, remote, privilegeEscalation, Pf1), serviceRunning(Host, Program, Protocol, Port, PrivilegeLevel, Pf2), networkAccess(Host, Protocol, Port, Pf3), Pf=(execCode(Host, PrivilegeLevel), [Pf1, Pf2, Pf3]).

This will break the bounded-term property and result in non-termination

for cyclic Datalog programs

MulVAL Attack-Graph Toolkit

Datalog representation

Machine configuration

Network configuration

Security advisories

XSB reasoning

engine

Datalog P

roof Steps

Grap

h

Bu

ilder Datlog proof

graph

Datalog rules

Ou, Boyer, and McQueen. ACM CCS 2006

Joint work with Idaho National Laboratory

32

Translated rules

netAccess(H2, Protocol, Port, ProofStep) :-

execCode(H1, User),

reachable(H1, H2, Protocol, Port),

ProofStep= because( ‘multi-hop network access', netAccess(H2, Protocol, Port), [execCode(H1, User), reachable(H1, H2, Protocol, Port)] ).

Stage 1: Record Proof Steps

Proof step

33

netAccess(fileServer, rpc, 100003)

Stage 2: Build the Exhaustive Proof

because(‘multi-hop network access', netAccess(fileServer, rpc, 100003), [execCode(webServer, apache), reachable(webServer, fileServer, rpc, 100003)])

1multi-hop network access

0

execCode(webServer, apache)

reachable(webServer, fileServer, rpc, 100003)

2

3

34

Complexity of Proof Building

• O(N2) to complete Datalog evaluation– With proof steps generated

• O(N2) to build a proof graph from proof steps– Need to build O(N2) graph components– Building of one component

• Find the predecessor: table lookup• Find the successors: table lookup

Total time: O(N2), if table lookup is constant time

35

Logical Attack Graphs

10

2

3

4

5

6

: OR

: AND

: ground fact

execCode(attacker,workStation,root)

Trojan horse installation

accessFile(attacker,workStation, write,/usr/local/share)

NFS semantics

networkService (webServer,httpd,tcp,80,apache)

vulExists(webServer, CAN-2002-0392, httpd, remoteExploit, privEscalation)

netAccess(attacker,webServer, tcp,80)

Remote exploitexecCode(attacker, webServer,apache)

accessFile(attacker,fileServer, write,/export)

NFS shell

36

Performance and Scalability

0.01

0.1

1

10

100

1000

10000

1 10 100 1000

Number of hosts

CPU time (sec)

Fully connected

Partitioned

Ring

Star

37

Related Work

• Sheyner’s attack graph tool (CMU)– Based on model-checking

• Cauldron attack graph tool (GMU)– Based on graph-search algorithms

• NetSPA attack graph tool (MIT LL)– Graph-search based on a simple attack model

38

Advantages of the Logic-programming Approach

• Publishing and incorporation of knowledge/information through well-understood logical semantics

• Efficient and sound analysis by leveraging the reasoning power of well-developed logic-deduction systems

39

Next Lecture

• How to make use of the proof graph– Optimizing mitigation measures through SAT solving

• Open problems– Uncertainty in reasoning

40