OWASP Benchmark Comparison - JuliaSoft. Code analysis ...The Open Web Application Security Project (OWASP) is an independent organization focused on improving the security of software

OWASP BENCHMARK COMPARISON - TECHNICAL WHITE PAPER

1

www.juliasoft.com

OWASP Benchmark Comparison

Why Does Semantic Static Analysis Permit

a Higher Vulnerability Detection Rate?

https://juliasoft.com

2


BackgroundThe relevance and impact of software security attacks justifies the interest of researchers and companies in finding ways of assessing that software does not contain such threats. In general, such attacks are due to unexpected flows of data (injections), from input sources to specific sinks, i.e., program points that might execute dangerous or restricted operations.

For instance, SQL-injection is the consequence of an unrestricted information flow from a user input or web service entry point into a routine that performs a database query. Tracking all possible information flows is however quite challenging: flows can be direct (e.g., through local variable assignments), but also by side-effect (e.g., by calling a method that assigns the field of an object in a data structure). Data might also flow through arrays and object fields, rather than just local variables.

This complexity hinders both soundness or coverage (finding all threats of a given category) and precision or accuracy (limited number of false alarms) of an analysis tool. Consequently, existing tools are typically unsound (missing threats) and/or imprecise (issuing too many false alarms). For instance, penetration testers try to generate probable attack patterns, with no guarantee of completeness but full guarantee of precision.

Static analysis tools are typically based on pattern matching or other unsound techniques. This means that they inevitably miss part of the vulnerabilities, as it is not possible to have every variation of every type of vulnerability mapped in the tool. This imprecision has contributed to some prejudice towards static analysis tools’ usefulness.

In this White Paper we want to illustrate what are the practical advantages of using a static analyzer based on a sound technique and how this enables a high vulnerability detection rate.

The OWASP Benchmark to Compare Tools EfficiencyEstablishing a metric to compare different tools is problematic since there is no general agreement about what factors (e.g., soundness or precision) one should take into account in the evaluation, and why. In addition, it is not always clear about what constitutes a threat, and how some attacks should be formally defined.

The Open Web Application Security Project (OWASP) is an independent organization focused on improving the security of software. Among the many projects carried on by different groups, they provide a Benchmark test suite designed to measure the quality of code analyzers thus making it possible to compare the tools to each other.

As far as we know, the OWASP Benchmark Project represents the most relevant attempt to establish a universal security benchmark, i.e., a suite of thousands of small Java programs containing security threads. According to their web page1 :

The OWASP Benchmark for Security Automation is a free and open test suite designed to evaluate he speed, coverage, and accuracy of automated software vulnerability detection tools and services. Without the ability to measure these tools, it is difficult to understand their strengths and weaknesses, and compare them to each other.

The benchmark has benefited from the critical contribution of many organizations, so that it has also served as a way of clarifying the actual nature of the threats. During

1 https://www.owasp.org/index.php/Benchmark

https://www.owasp.org/index.php/Benchmark

https://www.owasp.org/index.php/Benchmark


3

the years, it has emerged as the reference for the comparison of security analysis tools, and it is nowadays a must-do for any tool that asserts to find software security vulnerabilities in Java code.

Most tests of the OWASP benchmark are servlets that might allow unconstrained information flow from their inputs to dangerous routines. A few tests are not related to injections, but rather to unsafe cookie exchange or to the use of inadequate cryptographic algorithms, hash functions or random number generators. Injection attacks are however the most complex to spot and have larger scientific interest.

The benchmark sets traps for tools, i.e., it contains also harmless servlets that seem to feature security threats, at least at a superficial analysis. In this way, the benchmark measures the number of true positives (that is, real vulnerabilities reported by the tool), and false positives (that is, vulnerabilities reported by the tool that are not real issues).

They represent a deep and wide stress test: a perfect analyzer should not be caught in a trap while still reporting all the real vulnerabilities. In an ideal world, a tool would get 100% true positive and 0% false positives. However, to achieve such a result one should be able to explore all possible executions of a program; therefore, existing tools make a compromise between soundness, precision, and efficiency of the analysis.

An important feature of this benchmark is the automatic generation of reports to compare different tools: there are several scripts to plot coverage and accuracy of the tools, inside comparative scorecards. This gives an immediate graphical picture of the relative positioning of the tools. Free tools are plotted in the scorecards. Commercial tools are anonymized into their overall average.

Taint Analysis of the OWASP Benchmark with JuliaJulia currently features 45 checkers, including the Injection checker based on a sound taint analysis, and Cookie, Random and Cryptography that cover the other kinds of threats considered in the OWASP benchmark. In particular, the Injection checker performs a wide range of checks covering various kinds of injections (e.g., SQL, command, log forging, etc.) Julia CWE coverage claim2 reports the complete list of software weaknesses identified by our analyses w.r.t. the Common Weaknesses Enumeration standard3 .

The main idea of Julia’s taint analysis is to model explicit information flows through Boolean flags. Such flags are set true if a value at a given program point might be potentially injected with a user input, and they are a sound overapproximation of all taint behaviors for the variables in scope at a given program point. Instructions that might have side-effects (field updates, array writes and method calls) need some approximation of the heap, to model the possible effects of the updates. Similarly, the analysis relies on an overapproximation the call graph representing how different methods might call each other. This analysis relies on a very formal mathematical foundation that proves the soundness of the approach. For full formal details of the analysis please refer to (Ernst, Lovato et al, Boolean Formulas for the Static Identification of Injection Attacks in Java, 2015).

Overall ResultsThe rigorous mathematical foundation puts Julia’s analyses in position to largely outperform the other commercial static analyzers. In particular Figure 1 shows,

2 https://juliasoft.com/resources/cwe-coverage/3 https://cwe.mitre.org/

https://juliasoft.com/resources/cwe-coverage/

https://cwe.mitre.org/

https://juliasoft.com/resources/cwe-coverage/

https://cwe.mitre.org/

4


the scorecard generated by the OWASP benchmark, that compares Julia to other free (explicitly) and commercial (anonymously) static analyzers. Scorecards report soundness on the left and precision horizontally. Hence, a perfect (i.e., sound and precise) tool should stay on the top left corner of the scorecard.

Figure 1 shows that Julia is very close to that corner, much more than all free analyzers and the average of the anonymous commercial analyzers. Figure 2, at the bottom, reports the results of Julia for the eleven categories of threats considered by the OWASP benchmark. Julia is always close to the top left corner of the scorecard and always finds all threats, since it is the only sound analyzer in this comparison. Hence its results lie on the 100% line for soundness (true positive rate).

Figure 2 reports also the number of false negatives (FN), true positives (TP), and false positives (FP) obtained by Julia on the OWASP benchmark. For all categories Julia obtained zero false negatives: this proves the (practical) soundness of the analysis, meaning that Julia is always able to spot security vulnerabilities if the programs contain them. In addition, the number of false positives is always a small percentage (below 20%) of the number of warnings produced: this proves the (practical) precision of the analysis, meaning that a developer using Julia to identify vulnerabilities will need to discard (at least on the OWASP benchmark) only one warning out of six.

Some Representative Test CasesWe now discuss three representative examples where Julia correctly catches an Injection vulnerability, identifies a safe code, and produces a false alarm.

Command Injection

Cross-Site Scripting

Insecure Cookie

LDAP Injection

Path Traversal

SQL Injection

Trust Boundary Violation

Weak Encryption Algorithm

Weak Hash Algorithm

Weak Random Number

XPath Injection

126

246

36

27

133

272

83

130

129

218

15

100%

100%

100%

100%

100%

100%

100%

100%

100%

100%

100%

20

19

0

4

22

36

12

0

0

0

3

16,00%

9,09%

0,00%

12,50%

16,30%

15,52%

27,91%

0,00%

0,00%

0,00%

15,00%

84,00%

90,91%

100,00%

87,50%

83,70%

84,48%

72,09%

100,00%

100,00%

100,00%

85,00%

Overall Results 100% 10,21% 89,79%


5

True positive: A (simplified) example of the OWASP benchmark test is shown in Figure 3 Julia warns about a possible XSS attack at the last line, since the bar parameter to format() (at line 14) is tainted. In fact, this parameter is built from the content of local variable param (line 9), that received (line 5) an input that the user can control (the header of the connection). Note that Julia correctly spots the information flow through the constructor of StringBuilder and the call to replace(). Such result has been possible since Julia’s Injection checker follows the flow of the user input from line 5 to line 14 going through the assignments at line 6, 7, 9 and 10. A syntactic analyzer could produce a false negative since it has no local evidence that the value of bar might contain some unsanitized user input. However, usually a syntactic analyzer is not in position to identify that there is a user input coming from line 5.

True negative: Now consider the test in Figure 4 Julia does not issue a warning here. Actually, no possible XSS attack is possible this time, since variable param (tainted at line 6) is sanitized into bar by Spring method htmlEscape() (line 8). Julia uses a dictionary of sanitizing methods and others can be specified by the user. Here a syntactic analyzer might produce a warning at line 10 since it could locally check that bar is the result of some computation, but not that such value has been sanitized at line 8 since it is not in position to follow the complete flow of user input.

6


False positive: Finally consider the test in Figure 5. This time Julia falls in the trap and issues a spurious warning about a potential XSS attack at the call to format(), since it thinks that bar (and hence obj) is tainted. But this is not actually the case, since this test manipulates a valueList in such a way that the value finally stored into bar is untainted. This list manipulation is too complex for the taint analysis of Julia, that cannot distinguish each single element of the list and conservatively assumes all elements of the list to be tainted.

Such false positive is the result of a sound flow analysis (that, as shown in the previous examples, allows Julia to track how user input might flow inside the program) that needs to scale up to industrial software (thus cannot rely on a very precise model of the heap). As we discussed in Figure 1 this leads to about 10% of false alarms on the OWASP Benchmark, that is a reasonable compromise to achieve the 100% true positive rate.

ConclusionsThe continuous growth of cyber attacks makes reducing application security risk a top priority for most organizations. Injections and similar vulnerabilities are one of the main causes behind data breach incidents and their efficient removal is fundamental in securing a software system.

Static analysis is an industry best practice to identify security vulnerabilities in software code, but the precision of static analysis tools may reduce the willingness to use them. Using an inefficient static analyzer has two consequences. First, the reduction of risk is not sufficient because there is little guarantee that the vulnerabilities have been found. Second, their use is resource-consuming and results in a low ROI.

For this reason, the efficiency of a static analyzer is an important factor and efforts are made to find ways to confront the different tools in order to make informed


7

decisions. OWASP Benchmark is the most well know independent comparison of static analysis technologies’ capabilities. The Benchmark suite includes more than 2,000 test cases and the results are verified by OWASP itself.

The results of the benchmark evidence a great gap in the precision of different tools where Julia, a static analyzer based on a formal method, achieves a 100% true positive rate (finds all vulnerabilities) and an efficiency of 90%.

To reduce vulnerability risk to a minimum and achieve a high Return on Investment, make sure your decision in adopting a static analysis solution takes into account its efficiency.

Contact us to find out how JuliaSoft solutions can support your application security effort, or take a free trial at portal.juliasoft.com

Bibliography

Michael D. Ernst, Alberto Lovato, Damiano Macedonio, Ciprian Spiridon, Fausto Spoto: “Boolean Formulas for the Static Identification of Injection Attacks in Java”. Proceedings of the 20th Conference on “Logic for Programming, Artificial Intelligence, and Reasoning”, Suva, Fiji, November 24-28, 2015.

https://www.springer.com/it/book/9783662488980

https://www.springer.com/it/book/9783662488980

8


Appendix

Running the AnAlysis of the oWAsP BenchmARk

If you would like to examine the warnings of Julia to better understand the results of sound static analysis, we offer the possibility to re-run the OWASP Benchmark using our online analysis service. Proceed as follows:

Running Julia on the OWASP Benchmark We assume that you have an Eclipse project for the OWASP Benchmark 1.2 , that compiles with no errors, and that you have run mvn eclipse:eclipse inside that project (you need maven 3 for that). Then follow these steps:

1. Register to Julia's online service of analysis at https://portal.juliasoft.com; 2. Install Julia's Eclipse plugin and configure it with your credential information (instructions and

credentials available at login time to our service and under your user profile); 3. Set the maximum number of warnings shown to 5000; 4. Make sure you have enough credit to run the analysis (150,000 credits); otherwise please contact

us at [email protected]; 5. Select the OWASP benchmark project in Eclipse and analyze it with Julia's Eclipse plugin: in the first

screen, flag both options Only main and Include .properties files. The latter is needed since your benchmark assumes that the analyzer has access to property files, which is normally turned off for privacy;

6. In the next screen of Julia's Eclipse plugin, select only the Basic checkers Cryptography, Cookie and Random and the Advanced checker Injection;

7. Click Finish and wait until the analysis terminates. This should take a few minutes, unless there are other analyses in the queue. You can check the progress of the analysis in the console view of Eclipse;

8. Once the analysis has terminated, you can see the warnings in the Eclipse view of Julia and export the XML file of the warnings with the icon of that view that looks like a downwards arrow.

After these steps, you will be able to navigate the results of all the test cases of the OWASP Benchmark.

https://juliasoft.com/resources/technical/ contains the user manuals as well as short tutorials of the Eclipse plugin and the web interface.

JuliaSoft advanced code analysis solutions help companies to ensure the security, quality and respect of privacy of their software applications. For more information, please visit our website at www.juliasoft.com or contact us atJuliaSoft Srl - Lungadige Galtarossa, 21 - 37133 Verona, Italy Tel +39 045 2081901 - [email protected]

https://juliasoft.com

mailto: [email protected]

Documents

OWASP Benchmark Comparison - JuliaSoft. Code analysis ...The Open Web Application Security Project (OWASP) is an independent organization focused on improving the security of software