35
.lu software verification & validation V V S Julian Thomé, Lwin Khin Shar, Domenico Bianculli and Lionel Briand Search-driven String Constraint Solving for Vulnerability Detection

Search-driven String Constraint Solving for Vulnerability Detection

Embed Size (px)

Citation preview

.lusoftware verification & validationVVS

Julian Thomé, Lwin Khin Shar, Domenico Bianculli and Lionel Briand

Search-driven String Constraint Solving for Vulnerability Detection

Injection vulnerabilities and XSS are serious threats

2

protected void authenticate() {String user = req.getParameter("user"); // SOURCE String pin = req.getParameter("pin"); // SOURCE String token = req.getParameter("token"); // SOURCE

Document doc = db.parse(new File("users.xml"));

if(user.isEmpty() || pin.isEmpty() || !token.matches("[0-9]{8}")) { // …} else {

String q = "/users/user[@id='" + ESAPI.encoder().encodeForXPath(user) + "' and @pin=" + ESAPI.encoder().encodeForXPath(pin) + "]";// …NodeList nl=(NodeList)xpath.evaluate(q); // SINK// …

}}

3

A vulnerable example program

protected void authenticate() {String user = req.getParameter("user"); String pin = req.getParameter("pin"); String token = req.getParameter("token");

Document doc = db.parse(new File("users.xml"));

if(user.isEmpty() || pin.isEmpty() || !token.matches("[0-9]{8}")) { // …} else {

String q = "/users/user[@id='" + ESAPI.encoder().encodeForXPath(user) + "' and @pin=" + ESAPI.encoder().encodeForXPath(pin) + "]";// …NodeList nl=(NodeList)xpath.evaluate(q);// …

}}

4

A vulnerable example program

"0 or 1"

"eve"

The program is vulnerable to XPath Injection attacks

"/users/user[@id='evil' and @pin=0 or 1]"

Vulnerability Analysis: State-of-the-Art

5

Vulnerability Analysis: State-of-the-Art

ProgramPath

ConditionsThreat Model

+

Attack Conditions

Symbolic Execution

SAT = vulnerable UNSAT = not vulnerable

Constraint Solving

6

Limitation of Constraint Solvers

Only limited support for (complex) string operations provided by the state-of-the-art constraint solvers:

7

- String replacement and/or sanitisation operations

- String libraries of programming languages provide hundreds of operations (e.g. Java String library, Apache Commons)

Workaround 1: Extending Solver

Constraint Solvers could be extended in order to support new operations

8

Problems: - Not trivial and requires expert knowledge - Not scalable to the size of a complete string

library of a modern programming language

Workaround 2: Re-expressing Constraints

Constraints could be re-expressed in terms of constraints which are natively supported by the constraint solver

9

Problem: Increased complexity of generated constraint, potentially leading to scalability issues

However, in practice …

10

11

Constraint Solvers fail or return an errorCVC4

Z3-str2

Remind audience about the limitation of state-of-the-art

Our Approach: Search-driven

String Constraint Solving

12

Search-driven String Constraint Solving

External Solver (CVC4, Z3-str2, …)

Attack Condition

constraint with unsupported operations

solutions of constraint with

supported operations

SAT/ UNSAT/

TIMEOUT

Hybrid Constraint

Solving

13

14

Hybrid Constraint Solving (ACO-Solver)

1. Automata-based solver solves all constraints it supports and returns a solution for every variable in terms of an FSM

2. Search-based solver searches for paths in the solution automata that make the constraint satisfiable

Automata-based solver reduces the search space

Search-driven String Constraint Solving

External Solver (CVC4, Z3-str2, …)

Attack Condition

constraint with unsupported operations

solutions of constraint with

supported operations

SAT/ UNSAT/

TIMEOUT

Automata-based Solver

Search-based Solver

ACO-Solver

15

len(v0user ) > 0�len(v0pin ) > 0�v1user = encodeForXPath(v0user )�v0q = concat("/users/user[@id='", v1user )�v1q = concat(v0q ,"' and @pin=")�v1pin = encodeForXPath(v0pin )�v2q = concat(v1q , v1pin )�v3q = concat(v2q ,"]")�matches(v1pin ,"[0-9]+ [Oo][Rr] 1=1")

matches(v0token ,"[0-9]{8}")

Attack Condition Decomposition

16

len(v0user ) > 0�len(v0pin ) > 0�matches(v0token ,"[0-9]{8}")�v1user = encodeForXPath(v0user )�v0q = concat("/users/user[@id='", v1user )�v1q = concat(v0q ,"' and @pin=")�v1pin = encodeForXPath(v0pin )�v2q = concat(v1q , v1pin )�v3q = concat(v2q ,"]")�matches(v1pin ,"[0-9]+ [Oo][Rr] 1=1")

Attack Condition Decomposition

17

len(v0user ) > 0�len(v0pin ) > 0�v1user = encodeForXPath(v0user )�v0q = concat("/users/user[@id='", v1user )�v1q = concat(v0q ,"' and @pin=")�v1pin = encodeForXPath(v0pin )�v2q = concat(v1q , v1pin )�v3q = concat(v2q ,"]")�matches(v1pin ,"[0-9]+ [Oo][Rr] 1=1")

len(v0user ) > 0�len(v0pin ) > 0�matches(v0token ,"[0-9]{8}")�v1user = encodeForXPath(v0user )�v0q = concat("/users/user[@id='", v1user )�v1q = concat(v0q ,"' and @pin=")�v1pin = encodeForXPath(v0pin )�v2q = concat(v1q , v1pin )�v3q = concat(v2q ,"]")�matches(v1pin ,"[0-9]+ [Oo][Rr] 1=1")

matches(v0token ,"[0-9]{8}")

Provide every attack condition partition as input to the external solver

Search-driven String Constraint Solving

18

External Solver (CVC4, Z3-str2, …)

Attack Condition

constraint with unsupported operations

solutions of constraint with

supported operations

SAT/ UNSAT/

TIMEOUT

Automata-based Solver

Search-based Solver

ACO-Solver

19

len(v0user ) > 0�len(v0pin ) > 0�v1user = encodeForXPath(v0user )�v0q = concat("/users/user[@id='", v1user )�v1q = concat(v0q ,"' and @pin=")�v1pin = encodeForXPath(v0pin )�v2q = concat(v1q , v1pin )�v3q = concat(v2q ,"]")�matches(v1pin ,"[0-9]+ [Oo][Rr] 1=1")

matches(v0token ,"[0-9]{8}") SAT/UNSAT/Crash

Attack Condition Partition

SAT/UNSAT/Crash

Invoke External Solver

Result

Invoke External Solver

20

matches(v0token ,"[0-9]{8}") SAT

ResultAttack Condition Partition

Crash

len(v0user ) > 0�len(v0pin ) > 0�v1user = encodeForXPath(v0user )�v0q = concat("/users/user[@id='", v1user )�v1q = concat(v0q ,"' and @pin=")�v1pin = encodeForXPath(v0pin )�v2q = concat(v1q , v1pin )�v3q = concat(v2q ,"]")�matches(v1pin ,"[0-9]+ [Oo][Rr] 1=1")

All the Attack Condition partitions with unsupported operations are solved by ACO-Solver

Search-driven String Constraint Solving

21

External Solver (CVC4, Z3-str2, …)

Attack Condition

constraint with unsupported operations

solutions of constraint with

supported operations

SAT/ UNSAT/

TIMEOUT

Automata-based Solver

Search-based Solver

ACO-Solver

- An unsupported operation (foo) has to be invokable and its output out has to be observable

- Search a set of inputs that generate an output (out) which satisfies all the constraint which are imposed on it

Search-based Solving

22

out=foo(i0 … in)

Ant Colony Optimisation (ACO)

- Suited for graph searching problems

- Stochastic approach in nature, which allows for escaping from local optima

- Inherent parallelism

- Inspired by the behaviour of ants (leaving pheromone traces on paths leading to food)

23

Fitness Function

24

- Assess the quality of a potential solution

- A lower fitness implies a higher quality of the solution

- Different fitness functions for 1. Numeric constraints (Korel) 2. String constraints (Levenshtein) 3. Regular expressions (Myers and Miller)

ACO Algorithm

1 Construction of solution1,1 Build set of solution components1,2 Determine fitness of solution components1,3 Selecting the best solution components

2 Application of local search3 Update of pheromone values

25

Evaluation

26

Benchmark and Evaluation Settings

27

- 43 web programs from 9 Java Web applications/services (1 KLOC - 52 KLOC)

- Attack conditions for 64 vulnerable and 40 non-vulnerable paths with various vulnerability types (SQLi, XMLi, Xpathi, LDAPi, XSS)

- The timeout for solving each attack condition was set to 30s

RQ1: Benefit

How does the proposed approach improve the effectiveness of state-of-the-art solvers for solving constraints

related to vulnerability detection?

28

Z3-str2 Z3-str2 + ACO-Solver✔ vuln. detected ✔ ∆ vuln. detected

19 3 4,7 % 65 46 46 71,9 %

ACO-Solver significantly improves the recall (# detected vulnerabilities) of Z3-str2/CVC4

RQ1: Benefit

CVC4 CVC4 + ACO-Solver✔ vuln. detected ✔ ∆ vuln. detected

72 55 85,9 % 83 11 64 100 %

29explain what the limitations of Z3-str2 are

- Z3-str has some limitations when it comes to symbolic regular expressions

RQ2: Cost

Is the cost of using our technique affordable in practice?

30

31

The cost of using our technique is affordable, because - we can detect significantly more vulnerabilities - vulnerability detection is an offline activity

Z3-str2 Z3-str2 + ACO-Solver CVC4 CVC4 + ACO-

Solvertime (s) 100,28 1.518,33 4,96 728,57

RQ2: Cost

RQ3: Role of the Automata-based solver

Does the automata-based solver contribute to the effectiveness of the

search-based procedure?

32

33

The automata-based solver plays a fundamental role in achieving a higher effectiveness

RQ3: Role of the Automata-based solver

Z3-str2 Z3-str2 + modACO-Solver✔ vuln. detected t(s) ✔ ∆ vuln. detected t(s)

19 3 4,7 % 100,28 19 0 3 4,7 % 2.651,66

CVC4 CVC4 + modACO-Solver✔ vuln. detected t(s) ✔ ∆ vuln. detected t(s)

72 55 85,9 % 4,69 73 1 56 87,5 % 927,75

34

Conclusion

Additional Information: https://github.com/julianthome/acosolver

Making constraint solving for vulnerability detection practical

35

Additional Information: https://github.com/julianthome/acosolver