61
Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California, Santa Barbara [email protected] http://www.cs.ucsb.edu/~vlab

Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Eliminating Web Software Vulnerabilities with Automated Verification

Tevfik BultanVerification Lab

Department of Computer ScienceUniversity of California, Santa Barbara

[email protected] http://www.cs.ucsb.edu/~vlab

Page 2: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

University of California at Santa Barbara

Page 3: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Verification Lab (VLab) at UCSB

• Application of automated verification techniques to software – Automated verification of web applications

• Checking input validation, sanitization, string analysis (PHP)• Checking navigation correctness (MVC farmeworks)• Checking data models (Ruby on Rails)

– Automated verification of web services• Modular testing and verification of web services (WSDL)• Formal modeling and analysis of choreography and

orchestration (WS-CDL, WS-BPEL)– Automated verification of access control policies

• Policy composition (XACML)– Automated verification of concurrency

• Analyzing concurrency, deadlock detection (Java)

Page 4: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

VLab Publications on String Analysis

• Relational String Verification Using Multi-Track Automata [Yu, Bultan, Ibarra CIAA’10]

• Stranger: An Automata-based String Analysis Tool for PHP [Yu, Alkhalaf, Bultan TACAS’10]

• Generating Vulnerability Signatures for String Manipulating Programs Using Automata-based Forward and Backward Symbolic Analyses [Yu, Alkhalaf, Bultan ASE’09]

• Symbolic String Verification: Combining String Analysis and Size Analysis [Yu, Bultan, Ibarra TACAS’09]

• Symbolic String Verification: An Automata-based Approach [Yu, Bultan, Cova, Ibarra SPIN’08]

Page 5: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Web software

• Web software is becoming increasingly dominant• Web applications are used extensively in many areas:

– Commerce: online banking, online shopping, …– Entertainment: online music & videos, …– Interaction: social networks

• We will rely on web applications more in the future:– Health records

• Google Health, Microsoft HealthVault– Controlling and monitoring of national infrastructures:

• Google Powermeter• Web software is also rapidly replacing desktop applications

– Could computing + software-as-service• Google Docs, Google …

Page 6: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

One Major Road Block

• Web applications are not trustworthy!

• Web applications are notorious for security vulnerabilities – Their global accessibility makes them a target for many malicious

users

• As web applications are becoming increasingly dominant – and as their use in safety critical areas is increasing

• their trustworthiness is becoming a critical issue

Page 7: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Web applications are not secure

• There are many well-known security vulnerabilities that exist in many web applications. Here are some examples:– Malicious file execution: where a malicious user causes the

server to execute malicious code– SQL injection: where a malicious user executes SQL commands

on the back-end database by providing specially formatted input– Cross site scripting (XSS): causes the attacker to execute a

malicious script at a user’s browser

• These vulnerabilities are typically due to – errors in user input validation or – lack of user input validation

Page 8: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Web Application Vulnerabilities

Page 9: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Web Application Vulnerabilities

• The top two vulnerabilities of the Open Web Application Security Project (OWASP)’s top ten list in 2007– Cross Site Scripting (XSS)– Injection Flaws (such as SQL Injection)

• The top two vulnerabilities of the OWASPs top ten list in 2010– Injection Flaws (such as SQL Injection)– Cross Site Scripting (XSS)

Page 10: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Why are web applications error prone?

• Extensive string manipulation:– Web applications use extensive string manipulation

• To construct html pages, to construct database queries in SQL, etc.

– The user input comes in string form and must be validated and sanitized before it can be used

• This requires the use of complex string manipulation functions such as string-replace

– String manipulation is error prone

Page 11: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

String Related Vulnerabilities String related web application vulnerabilities occur when:

a sensitive function is passed a malicious string input from the user

This input contains an attack It is not properly sanitized before it reaches the sensitive function

String analysis: Discover these vulnerabilities automatically

Page 12: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

XSS Vulnerability A PHP Example:

1:<?php2: $www = $_GET[”www”];3: $l_otherinfo = ”URL”;4: echo ”<td>” . $l_otherinfo . ”: ” . $www . ”</td>”;5:?>

The echo statement in line 4 is a sensitive function It contains a Cross Site Scripting (XSS) vulnerability

<script ...

Page 13: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Is it Vulnerable? A simple taint analysis, e.g., [Huang et al. WWW04], can report this

segment vulnerable using taint propagation

1:<?php2: $www = $_GET[”www”];3: $l_otherinfo = ”URL”;4: echo ”<td>” . $l_otherinfo . ”: ” .$www. ”</td>”;5:?>

echo is tainted → script is vulnerable

tainted

Page 14: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

How to Fix it? To fix the vulnerability we added a sanitization routine at line s Taint analysis will assume that $www is untainted and report that the

segment is NOT vulnerable

1:<?php2: $www = $_GET[”www”];3: $l_otherinfo = ”URL”;s: $www = preg_replace(”[^A-Za-z0-9 .-@://]”,””,$www);4: echo ”<td>” . $l_otherinfo . ”: ” .$www. ”</td>”;5:?>

tainted

untainted

Page 15: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Is It Really Sanitized?

1:<?php2: $www = $_GET[”www”];3: $l_otherinfo = ”URL”;s: $www = preg_replace(”[^A-Za-z0-9 .-@://]”,””,$www);4: echo ”<td>” . $l_otherinfo . ”: ” .$www. ”</td>”;5:?>

<script ...

Page 16: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Sanitization Routines can be Erroneous The sanitization statement is not correct!

preg_replace(”[^A-Za-z0-9 .-@://]”,””,$www); – Removes all characters that are not in {A-Za-z0-9 .-@:/}– “.-@” denotes all characters between “.” and “@” (including

“<” and “>”)– “.-@” should have been “.\-@”

This example is from a buggy sanitization routine used in MyEasyMarket-4.1 (line 218 in file trans.php)

Page 17: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

String Analysis String analysis determines all possible values that a string expression

can take during any program execution

Using string analysis we can identify all possible input values of the sensitive functions

Then we can check if inputs of sensitive functions can contain attack strings

How can we characterize attack strings? Use regular expressions to specify the attack patterns

Attack pattern for XSS: Σ <scriptΣ∗ ∗

Page 18: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

String Analysis If string analysis determines that the intersection of the attack pattern

and possible inputs of the sensitive function is empty then we can conclude that the program is secure

If the intersection is not empty, then we can again use string analysis to generate a vulnerability signature

characterizes all malicious inputs

Given Σ <scriptΣ∗ ∗ as an attack pattern: The vulnerability signature for $_GET[”www”] is

Σ <α sα cα rα iα pα tΣ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗where α∉ {A-Za-z0-9 .-@:/}

Page 19: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Vulnerabilities can be tricky

• Input <!sc+rip!t ...> does not match the attack pattern – but it matches the vulnerability signature – and it can cause an attack

• 1:<?php• 2: $www = <!sc+rip!t ...>;• 3: $l otherinfo = ”URL”;• s: $www = preg replace(”[^A-Za-z0-9 .-@://]”,””, <!sc+rip!t...>);• 4: echo ”<td>” . $l otherinfo . ”: ” . <script ...> .”</td>”;• 5:?>

Page 20: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Automata-based String Analysis

• Finite State Automata can be used to characterize sets of string values

• We use automata based string analysis– Associate each string expression in the program with an automaton– The automaton accepts an over approximation of all possible

values that the string expression can take during program execution

• Using this automata representation we symbolically execute the program, only paying attention to string manipulation operations

Page 21: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Symbolic Automata Representation

• We use the MONA DFA Package for automata manipulation – [Klarlund and Møller, 2001]

• Compact Representation:– The transition relation of the DFA is represented as a multi-terminal

BDD (MBDD)

• Exploits the MBDD structure in the implementation of DFA operations– Union, Intersection, and Emptiness Checking– Projection and Minimization

• Cannot Handle Nondeterminism:– We extended the alphabet with dummy bits to encode

nondeterminism

Page 22: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Symbolic Automata RepresentationExplicit DFArepresentation

Symbolic DFArepresentation

Page 23: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

String Analysis Stages Construct dependency graphs for the PHP programs Combine symbolic forward and backward symbolic reachability

analyses Forward analysis

Assume that the user input can be any string Propagate this information on the dependency graph When a sensitive function is reached, intersect with attack pattern

Backward analysis If the intersection is not empty, propagate the result backwards to

identify which inputs can cause an attack

Front Front EndEnd

ForwardForwardAnalysisAnalysis

BackwardBackwardAnalysisAnalysis

Attackpatterns

PHPProgram

VulnerabilitySignatures

Page 24: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Dependency Graphs

Given a PHP program,

first construct the:

Dependency graph

1:<?php2: $www = $ GET[”www”];3: $l_otherinfo = ”URL”;4: $www = preg_replace(

”[^A-Za-z0-9 .-@://]”,””,$www);

5: echo $l_otherinfo . ”: ” .$www;

6:?>echo, 5

str_concat, 5

$www, 4

“”, 4

preg_replace, 4

[^A-Za-z0-9 .-@://], 4 $www, 2

$_GET[www], 2

“: “, 5$l_otherinfo, 3

“URL”, 3

str_concat, 5

Dependency Graph

Page 25: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Forward Analysis

• Using the dependency graph we conduct vulnerability analysis • Automata-based forward symbolic analysis that identifies the possible

values of each node– Each node in the dependency graph is associated with a DFA

• DFA accepts an over-approximation of the strings values that the string expression represented by that node can take at runtime

• The DFAs for the input nodes accept Σ∗– Intersecting the DFA for the sink nodes with the DFA for the attack

pattern identifies the vulnerabilities• Uses post-image computations of string operations:

– postConcat(M1, M2)

returns M, where M=M1.M2– postReplace(M1, M2, M3) (language-based replacement)

returns M, where M=replace(M1, M2, M3)

Page 26: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Forward Analysis

echo, 5

str_concat, 5

$www, 4

“”, 4

preg_replace, 4

[^A-Za-z0-9 .-@://], 4 $www, 2

$_GET[www], 2

“: “, 5$l_otherinfo, 3

“URL”, 3

str_concat, 5

Forward = URL: [A-Za-z0-9 .-@/]*

Forward = URL: [A-Za-z0-9 .-@/]*

Forward = [A-Za-z0-9 .-@/]*

Forward = [A-Za-z0-9 .-@/]*

Forward = Σ*Forward = εForward = [^A-Za-z0-9 .-@/]

Forward = Σ*

Forward = :

Forward = URL

Forward = URL

Forward = URL:

L(URL: [A-Za-z0-9 .-;=-@/]*<[A-Za-z0-9 .-@/]*)

Attack Pattern = Σ*<Σ*

≠ Ø

L(URL: [A-Za-z0-9 .-@/]*) =L(Σ*<Σ*)

Page 27: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Intersection Result Automaton

U

R

L

:

Space

<

[A-Za-z0-9 .-;=-@/]

URL: [A-Za-z0-9 .-;=-@/]*<[A-Za-z0-9 .-@/]*

[A-Za-z0-9 .-@/]

Page 28: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Widening

• String verification problem is undecidable in general– Two string variables, equality checks and concatenation is enough

to get to undecidability

• The forward fixpoint computation is not guaranteed to converge in the presence of loops and recursion

• We compute a sound approximation– During fixpoint we compute an over approximation of the least

fixpoint that corresponds to the reachable states

• We use an automata based widening operation to over-approximate the fixpoint– Widening operation over-approximates the union operation and

accelerates the convergence of the fixpoint computation

Page 29: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Backward Analysis

• A vulnerability signature is a characterization of all malicious inputs that can be used to generate attack strings

• We identify vulnerability signatures using an automata-based backward symbolic analysis starting from the sink node

• Uses pre-image computations on string operations:– preConcatPrefix(M, M2)

returns M1 and where M = M1.M2– preConcatSuffix(M, M1)

returns M2, where M = M1.M2– preReplace(M, M1, M2)

returns M3, where M=replace(M1, M2, M3)

Page 30: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Backward Analysis

echo, 5

str_concat, 5

$www, 4

“”, 4

preg_replace, 4

[^A-Za-z0-9 .-@://], 4 $www, 2

$_GET[www], 2

“: “, 5

$l_otherinfo, 3

“URL”, 3

str_concat, 5

Forward = URL: [A-Za-z0-9 .-@/]*

Backward = URL: [A-Za-z0-9 .-;=-@/]*<[A-Za-z0-9 .-@/]*

Forward = URL: [A-Za-z0-9 .-@/]*

Backward = URL: [A-Za-z0-9 .-;=-@/]*<[A-Za-z0-9 .-@/]*

Forward = [A-Za-z0-9 .-@/]*

Backward =[A-Za-z0-9 .-;=-@/]*<[A-Za-z0-9 .-@/]*

Forward = [A-Za-z0-9 .-@/]*

Backward = [A-Za-z0-9 .-;=-@/]*<[A-Za-z0-9 .-@/]*

Forward = Σ*

Backward = [^<]*<Σ*

Forward = ε

Backward = Do not care

Forward = [^A-Za-z0-9 .-@/]

Backward = Do not care

Forward = Σ*

Backward = [^<]*<Σ*

Forward = :

Backward = Do not care

Forward = URL

Backward = Do not care

Forward = URL

Backward = Do not care

Forward = URL:

Backward = Do not care

node 3 node 6

node 10

node 11

node 12

Vulnerability Signature = [^<]*<Σ*Vulnerability Signature = [^<]*<Σ*

Page 31: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Vulnerability Signature Automaton

[^<]*<Σ*

<[^<]

Σ

NotASCII

Page 32: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Vulnerability Signatures

• The vulnerability signature is the result of the input node, which includes all possible malicious inputs

• An input that does not match this signature cannot exploit the vulnerability

• After generating the vulnerability signature– Can we generate a patch based on the vulnerability signature?

The vulnerability signature automaton for the running example

[^<]

<

Σ

Page 33: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Patches from Vulnerability Signatures

• Main idea: – Given a vulnerability signature automaton, find a cut that separates

initial and accepting states– Remove the characters in the cut from the user input to sanitize

• This means, that if we just delete “<“ from the user input, then the vulnerability can be removed

[^<]

<

Σ

min-cut is {<}

Page 34: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Patches from Vulnerability Signatures

• Ideally, we want to modify the input (as little as possible) so that it does not match the vulnerability signature

• Given a DFA, an alphabet cut is – a set of characters that after ”removing” the edges that are

associated with the characters in the set, the modified DFA does not accept any non-empty string

• Finding a minimal alphabet cut of a DFA is an NP-hard problem (one can reduce the vertex cover problem to this problem)– We use a min-cut algorithm instead – The set of characters that are associated with the edges of the min

cut is an alphabet cut • but not necessarily the minimum alphabet cut

Page 35: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Automatically Generated Patch

Automatically generated patch will make sure that no string that matches the attack pattern reaches the sensitive function

<?php

if (preg match(’/[^ <]*<.*/’,$ GET[”www”]))

$ GET[”www”] = preg replace(<,””,$ GET[”www”]); $www = $_GET[”www”]; $l_otherinfo = ”URL”; $www = preg_replace(”[^A-Za-z0-9 .-@://]”,””,$www); echo ”<td>” . $l_otherinfo . ”: ” .$www. ”</td>”;?>

Page 36: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Experiments

• We evaluated our approach on five vulnerabilities from three open source web applications:

(1) MyEasyMarket-4.1: A shopping cart program

(2) BloggIT-1.0: A blog engine

(3) proManager-0.72: A project management system

• We used the following XSS attack pattern:

Σ∗<scriptΣ∗

Page 37: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Forward Analysis Results

• The dependency graphs of these benchmarks are reduced based on the sinks– Unrelated parts are removed

Input Results

#nodes #edges #sinks #inputs Time(s) Mem (kb) #states/#bdds

21 20 1 1 0.08 2599 23/219

29 29 1 1 0.53 13633 48/495

25 25 1 2 0.12 1955 125/1200

23 22 1 1 0.12 4022 133/1222

25 25 1 1 0.12 3387 125/1200

Page 38: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Backward Analysis Results

• We use the backward analysis to generate the vulnerability signatures– Backward analysis starts from the vulnerable sinks identified during

forward analysis

Input Results

#nodes #edges #sinks #inputs Time(s) Mem (kb) #states/#bdds

21 20 1 1 0.46 2963 9/199

29 29 1 1 41.03 1859767 811/8389

25 25 1 2 2.35 5673 20/302,20/302

23 22 1 1 2.33 32035 91/1127

25 25 1 1 5.02 14958 20/302

Page 39: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Alphabet Cuts

• We generate alphabet cuts from the vulnerability signatures using a min-cut algorithm

• Problem: When there are two user inputs the patch will block everything and delete everything– Cannot interpret the relations among input variables (e.g. only block

when the concatenation of two inputs contains <script)

Input Results

#nodes #edges #sinks #inputs AlphabetCut

21 20 1 1 {<}

29 29 1 1 {s,’,”}

25 25 1 2 Σ , Σ

23 22 1 1 {<,’,”}

25 25 1 1 {<,’,”}

Vulnerabilitysignaturedepends ontwo inputs

Page 40: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Relational String Analysis

• Instead of using multiple single-track DFAs use one multi-track DFA– Each track represents the values of one string variable

• Using multi-track DFAs:– Identifies the relations among string variables– Generates relational vulnerability signatures for multiple user inputs

of a vulnerable application– Improves the precision of string analysis– Proves properties that depend on relations among string variables,

e.g., $file = $usr.txt

Page 41: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Multi-track Automata

• Let X (the first track), Y (the second track), be two string variables• λ is a padding symbol• A multi-track automaton that encodes X = Y.txt

(a,a), (b,b) …

(λ,t) (λ,x) (λ,t)

Page 42: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Relational Vulnerability Signature

• Performs forward analysis using multi-track automata to generate relational vulnerability signatures

• Each track represents one user input– An auxiliary track represents the values of the current node– Intersects the auxiliary track with the attack pattern upon

termination

Page 43: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Relational Vulnerability Signature

• Consider a simple example having multiple user inputs

<?php

1: $www = $_GET[”www”];

2: $url = $_GET[”url”];

3: echo $url. $www;

?>

• Let the attack pattern be Σ < Σ ∗ ∗

Page 44: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Relational Vulnerability Signature

• A multi-track automaton: ($url, $www, aux)• Identifies the fact that the concatenation of two inputs contains <

(a,λ,a),(b,λ,b),… (<,λ,<)

(λ,a,a),(λ,b,b),…

(λ,a,a),(λ,b,b),…

(λ,<,<)

(λ,<,<)

(λ,a,a),(λ,b,b),…

(λ,a,a),(λ,b,b),…

(a,λ,a),(b,λ,b),…

Page 45: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Relational Vulnerability Signature

• Project away the auxiliary variable• Find the min-cut• This min-cut identifies the alphabet cuts {<} for the first track ($url) and

{<} for the second track ($www)

(a,λ),(b,λ),… (<,λ)

(λ,a),(λ,b),…

(λ,a),(λ,b),…

(λ,<)

(λ,<)

(λ,a),(λ,b),…

(λ,a),(λ,b),…

(a,λ),(b,λ),…

min-cut is {<},{<}

Page 46: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Patch for Multiple Inputs

• Patch: If the inputs match the signature, delete its alphabet cut

<?php

if (preg match(’/[^ <]*<.*/’, $ GET[”url”].$ GET[”www”]))

{

$ GET[”url”] = preg replace(<,””,$ GET[”url”]);

$ GET[”www”] = preg replace(<,””,$ GET[”www”]);

}

1: $www = $ GET[”www”];

2: $url = $ GET[”url”];

3: echo $url. $www;

?>

Page 47: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Technical Issues

• To conduct relational string analysis, we need to compute ”intersection” of multi-track automata– Intersection is closed under aligned multi-track automata

• λs are right justified in all tracks, e.g., abλλ instead of aλbλ– However, there exist unaligned multi-track automata that are not

describable by aligned ones– We proposed an alignment algorithm that constructs aligned

automata which over or under approximate unaligned ones

Page 48: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Other Technical Issues

• Modeling Word Equations:– Intractability of X = cZ

• The number of states of the corresponding aligned multi-track DFA is exponential to the length of c.

– Irregularity of X = YZ • X = YZ is not describable by an aligned multi-track automata

• We proposed a conservative analysis– Constructs multi-track automata that over or under-approximate the

word equations

Page 49: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Composite Analysis

• What I have talked about so far focuses only on string contents– It does not handle constraints on string lengths– It cannot handle comparisons among integer variables and string

lengths

• We extended our string analysis techniques to analyze systems that have unbounded string and integer variables

• We proposed a composite static analysis approach that combines string analysis and size analysis

Page 50: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Size Analysis

• Size Analysis: The goal of size analysis is to provide properties about string lengths– It can be used to discover buffer overflow vulnerabilities

• Integer Analysis: At each program point, statically compute the possible states of the values of all integer variables.– These infinite states are symbolically over-approximated as linear

arithmetic constraints that can be represented as an arithmetic automaton

• Integer analysis can be used to perform size analysis by representing lengths of string variables as integer variables.

Page 51: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

An Example

• Consider the following segment:

1: <?php

2: $www = $ GET[”www”];

3: $l otherinfo = ”URL”;

4: $www = preg replace(”[^A-Za-z0-9 ./-@://]”,””,$www);

5: if(strlen($www) < $limit)

6: echo ”<td>” . $l otherinfo . ”: ” . $www . ”</td>”;

7:?>

• If we perform size analysis solely, after line 4, we do not know the length of $www

• If we perform string analysis solely, at line 5, we cannot check/enforce the branch condition.

Page 52: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Composite Analysis

• We need a composite analysis that combines string analysis with size analysis.– Challenge: How to transfer information between string automata

and arithmetic automata?

• A string automaton is a single-track DFA that accepts a regular language, whose length forms a semi-linear set– For example: {4, 6} {2 + 3k | k ≥ 0}∪

• The unary encoding of a semi-linear set is uniquely identified by a unary automaton

• The unary automaton can be constructed by replacing the alphabet of a string automaton with a unary alphabet

Page 53: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Arithmetic Automata

• An arithmetic automaton is a multi-track DFA, where each track represents the value of one variable over a binary alphabet

• If the language of an arithmetic automaton satisfies a Presburger formula, the value of each variable forms a semi-linear set

• The semi-linear set is accepted by the binary automaton that projects away all other tracks from the arithmetic automaton

Page 54: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Connecting the Dots

• We developed novel algorithms to convert unary automata to binary automata and vice versa

• Using these conversion algorithms we can conduct a composite analysis that subsumes size analysis and string analysis

StringAutomata

Unary LengthAutomata

Binary LengthAutomata

ArithmeticAutomata

Page 55: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Stranger: A String Analysis Tool

– Uses Pixy [Jovanovic et al., 2006] as a PHP front end– Uses MONA [Klarlund and Møller, 2001] automata package for

automata manipulation

ParserParser

DependencyDependencyAnalyzerAnalyzer

StringString AnalyzerAnalyzer

MONA Automata MONA Automata PackagePackage

Automata BasedAutomata BasedString ManipulationString Manipulation

LibraryLibrary

CFG

DependencyGraphs

Symbolic String AnalysisSymbolic String Analysis

DFAs

Pixy Front EndPixy Front EndString/Automata

Operations

Stranger Automata

String Analysis Report

(Vulnerability Signatures)

PHP program

Attackpatterns

Stranger is available at:www.cs.ucsb.edu/~vlab/stranger

Page 56: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Case Study Schoolmate 1.5.4

Number of PHP files: 63 Lines of code: 8181

Forward Analysis results

After manual inspection we found the following:

Actual Vulnerabilities False Positives

105 48

Time Memory Number of XSS sensitive sinks

Number of XSS Vulnerabilities

22 minutes 281 MB 898 153

Page 57: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Case Study – False Positives Why false positives?

– Path insensitivity: 39 Path to vulnerable program point is not feasible

– Un-modeled built in PHP functions : 6– Unfound user written functions: 3

– PHP programs have more than one execution entry point

– We can remove all these false positives by extending our analysis to a path sensitive analysis and modeling more PHP functions

Page 58: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Case Study - Sanitization We patched all actual vulnerabilities by adding sanitization routines

We ran stranger the second time– Stranger proved that our patches are correct with respect to the

attack pattern we are using

Page 59: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Related Work: String Analysis

• String analysis based on context free grammars: [Christensen et al., SAS’03] [Minamide, WWW’05]

• Application of string analysis to web applications: [Wassermann and Su, PLDI’07, ICSE’08] [Halfond and Orso, ASE’05, ICSE’06]

• Automata based string analysis: [Xiang et al., COMPSAC’07] [Shannon et al., MUTATION’07]

• String analysis based on symbolic/concolic execution: [Bjorner et al., TACAS’09]

• Bounded string analysis : [Kiezun et al., ISSTA’09]

Page 60: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

Related Work

• Size Analysis– Size analysis: [Hughes et al., POPL’96] [Chin et al., ICSE’05] [Yu et

al., FSE’07] [Yang et al., CAV’08]– Composite analysis: [Bultan et al., TOSEM’00] [Xu et al., ISSTA’08]

[Gulwani et al., POPL’08] [Halbwachs et al., PLDI’08]

• Vulnerability Signature Generation– Test input/Attack generation: [Wassermann et al., ISSTA’08]

[Kiezun et al., ICSE’09]– Vulnerability signature generation: [Brumley et al., S&P’06]

[Brumley et al., CSF’07] [Costa et al., SOSP’07]

Page 61: Eliminating Web Software Vulnerabilities with Automated Verification Tevfik Bultan Verification Lab Department of Computer Science University of California,

THE END