12
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 11, NO. 4, APRIL 2016 799 Seeing the Unseen: Revealing Mobile Malware Hidden Communications via Energy Consumption and Artificial Intelligence Luca Caviglione, Mauro Gaggero, Jean-François Lalande, Wojciech Mazurczyk, Senior Member, IEEE, and Marcin Urba´ nski Abstract—Modern malware uses advanced techniques to hide from static and dynamic analysis tools. To achieve stealthiness when attacking a mobile device, an effective approach is the use of a covert channel built by two colluding applications to exchange data locally. Since this process is tightly coupled with the used hiding method, its detection is a challenging task, also worsened by the very low transmission rates. As a consequence, it is important to investigate how to reveal the presence of malicious software using general indicators, such as the energy consumed by the device. In this perspective, this paper aims to spot malware covertly exchanging data using two detection methods based on artificial intelligence tools, such as neural networks and decision trees. To verify their effectiveness, seven covert channels have been implemented and tested over a measurement framework using Android devices. Experimental results show the feasibility and effectiveness of the proposed approach to detect the hidden data exchange between colluding applications. Index Terms— Energy-based malware detection, covert channels, colluding applications, neural networks, decision trees. I. I NTRODUCTION M ODERN MALWARE uses advanced techniques to defeat static analysis tools or live detection systems. Even if designing a malware is nowadays considered quite common [1], the most advanced programmers try to hide malicious behaviors by using different techniques, such as the repackaging of legitimate applications or the obfusca- tion/ciphering of code. Besides, by automating such mech- anisms, a single attacker can add malicious code to several applications that may be sent to alternative markets. As a Manuscript received October 8, 2015; revised December 15, 2015; accepted December 15, 2015. Date of publication December 22, 2015; date of current version February 1, 2016. This work was supported by the Polish National Science Center under Grant 2015/18/E/ST7/00227. The associate editor coor- dinating the review of this manuscript and approving it for publication was Prof. Husrev T. Sencar. L. Caviglione and M. Gaggero are with the Institute of Intelligent Systems for Automation, National Research Council of Italy, Genoa 16149, Italy (e-mail: [email protected]; [email protected]). J.-F. Lalande is with the Institut National des Sciences Appliquées Centre Val de Loire, Bourges 18020, France, and also with the CIDRE Team, CentraleSupélec/Institut National de Recherche en Informatique et en Automa- tique, Rennes 35065, France (e-mail: [email protected]). W. Mazurczyk and M. Urba´ nski are with the Institute of Telecommuni- cations, Warsaw University of Technology, Warsaw 00-665, Poland (e-mail: [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIFS.2015.2510825 consequence, classical signature-based methods have limited results [2]. One of the most advanced mechanisms used by malware to exfiltrate information or to bypass the security frameworks of mobile devices relies upon information-hiding techniques to exchange data between different processes. Especially, as in the case of smartphones, a local covert channel can be used to setup a communication path between two colluding applications to extract personal information [3], [4]. As it has been observed in [5], mobile devices are particularly prone to hidden-communication attacks due to their variety of hard- ware resources, as they incorporate cameras, GPS, WLAN, Bluetooth, cellular networks, and many sensors. Moreover, malware developers turned a significant portion of their atten- tion to mobile devices, leading to an increase of 1800% in mobile malware over the past two years [6]. Thus, there is an urge for research efforts to design original countermeasures and enable early prevention. Unfortunately, this is very difficult since the detection strictly depends on the type of covert channel. For instance, exploiting electromagnetic signals to covertly transmit data is very different from manipulating the statistics of the available RAM to embed secrets [5]. Addi- tionally, covert channels typically achieve limited bandwidths, thus increasing the complexity of finding out whether a hidden exchange is ongoing. In this perspective, a promising approach aims at exploit- ing general information to detect covert channels. A recent debate has emerged about the possibility of using the power consumption as an indicator to identify malicious activities. Despite [7] claims that malware cannot be detected by high- level applications measuring energy consumption of processes, other works demonstrate that proper power measurements can reveal some threats [8]–[10]. In this paper, we show the feasibility of using measure- ments of the energy consumed by a device to detect malware exploiting a covert channel. To this aim, we have implemented five popular covert channels available in the literature targeting the Android platform [4], [11], together with two new ones. Further, we have developed an experimental setup to quantify the energy consumption of the software components running on a mobile device. In more details, we have used measure- ments provided by the high-level model of PowerTutor [8] together with values available in the /sys portion of the file system [10], [12]. 1556-6013 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

IEEE TRANSACTIONS ON INFORMATION FORENSICS …cygnus.tele.pw.edu.pl/~wmazurczyk/art/TIFS-energy.pdf · IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 11, NO. 4, APRIL

  • Upload
    hatuong

  • View
    219

  • Download
    1

Embed Size (px)

Citation preview

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 11, NO. 4, APRIL 2016 799

Seeing the Unseen: Revealing Mobile MalwareHidden Communications via Energy Consumption

and Artificial IntelligenceLuca Caviglione, Mauro Gaggero, Jean-François Lalande, Wojciech Mazurczyk, Senior Member, IEEE,

and Marcin Urbanski

Abstract— Modern malware uses advanced techniques to hidefrom static and dynamic analysis tools. To achieve stealthinesswhen attacking a mobile device, an effective approach is theuse of a covert channel built by two colluding applications toexchange data locally. Since this process is tightly coupled withthe used hiding method, its detection is a challenging task, alsoworsened by the very low transmission rates. As a consequence, itis important to investigate how to reveal the presence of malicioussoftware using general indicators, such as the energy consumedby the device. In this perspective, this paper aims to spot malwarecovertly exchanging data using two detection methods based onartificial intelligence tools, such as neural networks and decisiontrees. To verify their effectiveness, seven covert channels havebeen implemented and tested over a measurement frameworkusing Android devices. Experimental results show the feasibilityand effectiveness of the proposed approach to detect the hiddendata exchange between colluding applications.

Index Terms— Energy-based malware detection, covertchannels, colluding applications, neural networks, decisiontrees.

I. INTRODUCTION

MODERN MALWARE uses advanced techniques todefeat static analysis tools or live detection systems.

Even if designing a malware is nowadays considered quitecommon [1], the most advanced programmers try to hidemalicious behaviors by using different techniques, such asthe repackaging of legitimate applications or the obfusca-tion/ciphering of code. Besides, by automating such mech-anisms, a single attacker can add malicious code to severalapplications that may be sent to alternative markets. As a

Manuscript received October 8, 2015; revised December 15, 2015; acceptedDecember 15, 2015. Date of publication December 22, 2015; date of currentversion February 1, 2016. This work was supported by the Polish NationalScience Center under Grant 2015/18/E/ST7/00227. The associate editor coor-dinating the review of this manuscript and approving it for publication wasProf. Husrev T. Sencar.

L. Caviglione and M. Gaggero are with the Institute of Intelligent Systemsfor Automation, National Research Council of Italy, Genoa 16149, Italy(e-mail: [email protected]; [email protected]).

J.-F. Lalande is with the Institut National des Sciences Appliquées CentreVal de Loire, Bourges 18020, France, and also with the CIDRE Team,CentraleSupélec/Institut National de Recherche en Informatique et en Automa-tique, Rennes 35065, France (e-mail: [email protected]).

W. Mazurczyk and M. Urbanski are with the Institute of Telecommuni-cations, Warsaw University of Technology, Warsaw 00-665, Poland (e-mail:[email protected]; [email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIFS.2015.2510825

consequence, classical signature-based methods have limitedresults [2].

One of the most advanced mechanisms used by malwareto exfiltrate information or to bypass the security frameworksof mobile devices relies upon information-hiding techniquesto exchange data between different processes. Especially,as in the case of smartphones, a local covert channel can beused to setup a communication path between two colludingapplications to extract personal information [3], [4]. As it hasbeen observed in [5], mobile devices are particularly proneto hidden-communication attacks due to their variety of hard-ware resources, as they incorporate cameras, GPS, WLAN,Bluetooth, cellular networks, and many sensors. Moreover,malware developers turned a significant portion of their atten-tion to mobile devices, leading to an increase of 1800% inmobile malware over the past two years [6]. Thus, there is anurge for research efforts to design original countermeasuresand enable early prevention. Unfortunately, this is very difficultsince the detection strictly depends on the type of covertchannel. For instance, exploiting electromagnetic signals tocovertly transmit data is very different from manipulating thestatistics of the available RAM to embed secrets [5]. Addi-tionally, covert channels typically achieve limited bandwidths,thus increasing the complexity of finding out whether a hiddenexchange is ongoing.

In this perspective, a promising approach aims at exploit-ing general information to detect covert channels. A recentdebate has emerged about the possibility of using the powerconsumption as an indicator to identify malicious activities.Despite [7] claims that malware cannot be detected by high-level applications measuring energy consumption of processes,other works demonstrate that proper power measurements canreveal some threats [8]–[10].

In this paper, we show the feasibility of using measure-ments of the energy consumed by a device to detect malwareexploiting a covert channel. To this aim, we have implementedfive popular covert channels available in the literature targetingthe Android platform [4], [11], together with two new ones.Further, we have developed an experimental setup to quantifythe energy consumption of the software components runningon a mobile device. In more details, we have used measure-ments provided by the high-level model of PowerTutor [8]together with values available in the /sys portion of the filesystem [10], [12].

1556-6013 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

800 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 11, NO. 4, APRIL 2016

To perform the detection, we have developed an approachbased on two well-known artificial intelligence tools, i.e.,neural networks [13] and decision trees [14]. They are ableto learn from a set of past collected energy measurementswhether hidden communication is present to reveal threatsat runtime. Specifically, two detection methods have beendeveloped, each one using both neural networks and decisiontrees. The first approach requires the solution of a regressionproblem to predict the future behavior of energy consumption.A hidden communication is spotted if the difference betweenthe actual and predicted consumption exceeds a certain thresh-old. The second method is based on a classification problem,and provides information on hidden communications by usinga set of features characterizing the energy behavior of thedevice. The considered artificial intelligence tools have beenalready used to reveal malicious code [15], [16] or to preventthe execution of hazardous software on Android devices [17].However, no previous works using energy consumption to spotinformation-hiding-capable malware exist in the literature.

To summarize, the main contributions of this paper are:(i) showcase the feasibility of using energy footprints forthe detection of malware implementing information-hidingtechniques, (ii) the creation of an ad-hoc testbed and a hybridmeasurement method to characterize seven covert channels,including two new implementations, and (iii) the developmentof two “intelligent” frameworks to perform the detection withlow computational requirements.

The rest of this paper is structured as follows. Section IIreviews the literature dealing with the detection of malware byusing energy measurements. Section III details the referencescenario and the considered covert channels, while Section IVdescribes the methodology used for the measurements ofpower consumption. Section V introduces the two detectionmethods based on neural networks and decision trees, andSection VI discusses experimental results. Finally, Section VIIconcludes the paper.

II. RELATED WORKS

Anomaly detection using energy footprints has been par-tially investigated in the literature, primarily for malware andnetwork attacks. However, at best of the authors’ knowledge,it has never been applied to covert channels.

In general, energy-based anomaly detection methods aregrouped according to how measurements are performed.In more details, we have the following approaches.

1) System-Based [7], [18]–[20]: an energy footprint iscreated by considering the whole consumption of thedevice or some specific sets of applications and/orhardware parts. The obtained data represents the “cleansystem” that serves as a baseline for malware discovery.

2) Application-Based [21]: similarly to the previous case,an energy footprint is created for a well-defined poolof applications (e.g., games), and each one is measuredseparately. The collected traces are then compared atruntime against the data obtained with a single-processgranularity.

3) User-Based [22]–[24]: an energy footprint is created byanalyzing the typical behavior of users and the related

power consumption. This also includes, for instance,the applications and device features that are active, aswell as their timing statistics.

4) Attack-Based [9], [10], [25]–[27]: measurements aredone while real attacks or malicious malware are target-ing a controlled environment. The acquired traces forma database of energy signatures used for the detection.

It is worth noting that methods belonging to the first threegroups can potentially deal with unknown threats, while thelast one only allows to recognize attacks for which signaturesare available.

Concerning system-based methods, Jacoby and Davis [19]demonstrated how to reveal network attacks by using abattery-based Intrusion Detection System (IDS) analyzingthe power consumption and the utilization levels of somecritical hardware/software components like the CPU. Later,Nash et al. [20] proposed to estimate the energy footprint of adesktop computer by using a multiple linear regression modelconsidering the CPU load, read/write accesses to the storageunit, and network transmissions. To evaluate the presence ofmalicious activities, the measured parameters are combinedwith performance data counters available in the operatingsystem, and the results are compared with different thresholds.Such approach was proven to be effective, but its maindrawback is the complexity to compute proper numericalvalues for the needed thresholds. Liu et al. [18] proposedthe VirusMeter tool for Symbian smartphones that computesthe energy profile for the whole “clean” system. The useris alerted by using a heuristic to compare the actual energyconsumption with a reference value. Lastly, Hoffman et al. [7]performed comprehensive experiments with both artificial andreal-world malware using observation windows of differentlengths. They created energy footprints in a controlled settingfor the IEEE 802.11 and 3G hardware of a smartphone, andthe resulting power consumption was treated as a baseline fordetection. Even if such an approach is effective, the maincontribution of the work is about the noise level of toolsused for measurements: the additional power consumed by amalware is often too small to be detectable with the resolutionof many measurement software.

Among the techniques using application-based methods, themost notable work has been proposed by Kim et al. [21],where the power consumption is monitored to detect malwarein Windows Mobile devices. Their proof-of-concept solutionis based on the energy footprints of the applications runningon the smartphone, which are compared against “clean”consumption templates.

Concerning methods explicitly considering the behavior ofthe user, Dixon et al. [22], [24] and Dixon and Mishra [23]showed that there is a strong correlation between the batterydrain of a mobile device and the user’s location. This canbe exploited to determine the average power consumption fordifferent locations and make the detection of abnormalitiesmore efficient. In fact, instead of considering the requiredpower only as a function of time, location-specific energyprofiles are used as additional indicators.

Regarding attack-based approaches, Buennemeyer et al. [26]investigated the energy signatures of some network threats

CAVIGLIONE et al.: REVEALING MOBILE MALWARE HIDDEN COMMUNICATIONS 801

against mobile devices. In more details, the power consump-tion of a device is correlated with IEEE 802.11 activities.If an irregularity is discovered, it is compared with existingsignatures to perform the detection of the attack. Then, eachmobile device exchanges alerts with peers, thus implement-ing a distributed network IDS. This work has been furtherextended by modifying the rates at which the battery statusis polled, and by considering the activity of the Bluetoothair interface to increase the performance in terms of correctdetections [27]. Moreover, Caviglione and Merlo [9] focusedon how antivirus and network attacks such as port scan andping floods impact over the battery depletion of differentsmartphones. They state the need for “green security” mech-anisms to effectively develop consumption-based malwaredetection systems [28]. Curti et al. [25] studied the energyfootprints for benign applications like Skype or YouTube andalso for network attacks like Denial of Service. They also pro-vided a power consumption model for the hardware involvedin IEEE 802.11 communications allowing to distinguish anormal traffic pattern from a network threat. This work hasbeen further extended by Merlo et al. [10] by analyzingthe feasibility of porting the two aforementioned approacheson Android devices with the aim of developing a malwaredetection framework. Unfortunately, the proposed solutionsturned out to be unsuitable, mainly due to implementationissues, which can be overcome by introducing the directobservation of power consumption from the battery hardwarewithout the need to dwell deeply into the drivers.

Literature also indicates that future research directionsshould consider hybrid approaches, i.e., the power consump-tion should be enriched with additional information such asthe memory usage. Even if hybrid approaches are possible,they are typically implemented in a standalone fashion [19]or as a part of a larger network-based IDS (see, e.g., [26]).Furthermore, the technology evolves very dynamically, thusstudies performed even few years ago could quickly becomeobsolete as functionality and capabilities of modern devicessignificantly outrun the old ones.

III. COVERT CHANNELS

In this section, we describe how a prototypical malwareexploiting a local covert channel to secretly leak sensitivedata has been developed, and how we have studied its maincharacteristics.

A. Reference Scenario

We consider the typical scenario depicted in Figure 1, wherea malware composed of two colluding applications exchangesdata through a local covert channel built within the device inorder to exfiltrate sensitive information [3], [4], [11]. In moredetails, the process CCSender has access to the data but hasnot the permission to use the network. Instead, the colludingapplication CCReceiver has access to the network, hence itis able to exfiltrate the received data to an external serveror a Command & Control (C&C) facility. Obviously, thecommunications of the CCReceiver towards the C&C couldbe detected, for instance, by inspecting the traffic produced

Fig. 1. Typical communication scenario of two colluding parts of a malware:CCSender and CCReceiver exchange data through a local covert channel.

by the device. Thus, it is common to use other information-hiding methods to build a network covert channel withinthe produced flow. For instance, some malware uses a TORclient to reach the server anonymously, making classic trafficanalysis ineffective [29]. Consequently, analyzing an encryptedflow of information produced by the CCReceiver does nothelp to reveal the presence of a malware, and this is whythis paper focuses on the detection of the local covert channelitself.

Moreover, we assume that the malware monitors the oper-ations performed by the user in order to transmit data whenhe/she is not active [30]. Indeed, waking up the covert channelduring user’s activity could degrade the performances of thedevice and reveal the presence of the threat. To this aim, manymodern malware delay their activation in order to be invisibleor not rise attention. Thus, even if the triggering of the malwaremay occur at any time, it is more likely to happen whenthe user is not active. For example, the families of malwareDroidKungFu 1 and 2 include a time bomb mechanism thattriggers the malicious behavior after a predefined period oftime [31]. With such a protection, a malware would be runningstatistically when the smartphone is idle. Therefore, we focuson such low attention-raising hazards operating when thedevice is idle since they are the preferred choice to performattacks and to avoid an easy detection [3].

B. Implemented Covert Channels

For experimental purposes we implemented on the Androidplatform seven local covert channels between the processesCCSender and CCReceiver. Five of them have beenalready proposed in the literature and they are listed as follows.

1) Type of Intent [11]: the secret receiver registers256 types of intent listeners and the secret senderencodes data by choosing and sending intents of theproper type.

2) File Lock [32]: the secret sender communicates bylocking a file. The secret receiver also tries to lock thesame file and, if it succeeds, a 0 is inferred. Otherwise,an exception is raised, meaning that the secret sender haslocked the file before the secret receiver. In this case, a1 is received.

3) System Load [11]: the secret sender sends a 1 byburdening the CPU of the device. The secret receiverchecks how many clock ticks the sender has got since theprevious iteration. If the value is greater than a certainthreshold, a 1 is inferred, a 0 otherwise.

4) Volume Settings [32]: the secret data is encoded into theringtone volume level of the device. If the sender can

802 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 11, NO. 4, APRIL 2016

TABLE I

MEASURED BANDWIDTHS (IN BITS PER SECOND) FOR THE CONSIDEREDCOVERT CHANNELS COMPARED TO THE THEORETICAL VALUES

use eight levels of volume, it can send up to three bitsper iteration.

5) Unix Socket Discovery [11]: the secret data is sent byencoding the information within the state of a socket.Specifically, a closed socket is equal to 1, while anopened one is equal to 0.

Additionally, we propose the following two new covertchannels that were only tested in a theoretical fashion.

6) File Size: the secret sender sets the size of a shared fileand the secret receiver interprets it as a byte.

7) Memory Load: the secret receiver acquires the initialmemory load of the secret sender. Then, the secretsender inflates the allocated data to send a 1 or releasesmemory to send a 0.

Before investigating the related energy footprints,we conducted a performance evaluation to assess thecorrectness of the implementation, also in the perspectiveof removing possible power-hungry bugs that can voidmeasurements. The literature provides theoretical limitsand/or performance evaluations in a very mixed set oftestbeds and with early implementations of the Androidoperating system and outdated smartphones. Thus, weinvestigated the data throughput to understand whether theavailable reference values are still valid. For this round oftests, we used a Samsung Galaxy SIII smartphone. Table Ireports the mean bitrate achieved by each covert channelaveraged over 100 repeated trials together with the estimatedcapacities as provided in the related references. In moredetails, the measured values highly differ from those providedin the literature. This is mainly due to changes in the Androidplatform and in the device drivers, hence confirming thehigh variability of performances as the result of a tightcoupling between covert channels and the hardware/softwarearchitecture. Besides, the best obtained throughput is equal to74.62 bit/s for the case of the Type of Intent covert channel,while the lowest is 6.43 bit/s for the System Load method.At least four methods have similar maximum bandwidths,which suggests that we have reached some limits within theAndroid platform.

IV. MEASUREMENT METHODOLOGY

In this section we present the methodology used to measurethe power consumption of the malware based on colludingapplications exchanging data through covert channels. To havereliable data, we first investigated the literature to find the

Fig. 2. Box diagram of the energy measurement architecture.

most suitable ways to gather information on the power usedby applications running on Android devices. As pointed outin [10], the energy can be measured at high level using theAndroid APIs or at low level by probing the battery driver.Performing measures at low level is a difficult task, as itrequires to patch the battery driver to get access to fine-grained data. Unfortunately, using high-level APIs may leadto values with a poor degree of reliability. For example,Merlo et al. [10], [12] report that high-level data is not accurateenough to detect some form of network attacks, and alsoproposes an alternative mechanism defined as “middle-level”exploiting information stored in the /sys portion of the filesystem such as the instantaneous voltage of the battery.

Based on these results, we decided not to collect low-level energy values for two main reasons: this techniquerequires deep changes in the operating system, thus lackingof portability, and has a non-negligible consumption that canmask the one of the covert channel used by the colludingapplications. Instead, to quantify the power depletion we reliedupon: (i) high-level energy consumption measures of eachprocess running on the system, for which we used a modifiedversion of PowerTutor [8], i.e., we developed a patch to enablethe tool to send the consumption of processes to a properdata collector via an Android intent; (ii) middle-level energyconsumption information acquired by gathering current andvoltage values stored in /sys. We point out that since ourobjective is the detection of a threat without recurring to datarelated to the power consumed by the network subsystem, oureffort of using a mixed high- and middle-level methodologyis novel [33].

We developed a measurement framework both for collectingdata and automatize the experiments. Figure 2 depicts thebox diagram and the major interactions among the differentsubsystems. In particular, the EnergyCollector is the controllerof the experiments and is in charge of running repeated trialsand collecting data measured by PowerTutor from APIs and/sys. The latter receives gathered data each second throughan intent. Obviously, the communication flow between theEnergyCollector and the CCsender/CCreceiver wouldnot exist on a real device, and is only used to repeat theexperiments. Nevertheless, after measuring the overhead, wecan assume the impact of intents in terms of additional poweras negligible.

For each covert channel, we conducted repeated trialsaccording to the following flow:

1) the EnergyCollector waits a random time;2) the EnergyCollector sends an intent to the CCsender

to start a covert transmission;

CAVIGLIONE et al.: REVEALING MOBILE MALWARE HIDDEN COMMUNICATIONS 803

3) CCsender and CCreceiver exchange a message;4) CCreceiver sends an intent to notify the EnergyCol-

lector that the transmission is finished;5) the experiment is repeated.The random time inserted in the step 1) ensures that the

beginning of the covert channel transmission is not knowna priori and that two different exchanges are not triggeredwithin a given timeframe. This enables to have an unpre-dictable behavior, which is more coherent with real-worlduse cases (a discussion on malware randomly activating isprovided in [34] and [35]). The duration of the transmissiondepends on the type of the covert channel since each one hasa different bandwidth, as discussed in Section III.

V. ENERGY CONSUMPTION AND BLACK-BOX MODELING

The basic idea of our approach is to detect covert commu-nications among colluding applications by using the energyrequirements of the processes running on a device as amarker. The main benefit is the decoupling of the detectionof the covert channel from its implementation. In fact, hiddencommunications are tightly coupled with the adopted carrierand are characterized by a very low bitrate, thus making theirspotting very difficult and poorly generalizable.

To detect covert channels, two general problems have tobe addressed: (i) developing an approximate model for thepower consumption of a process, and (ii) using the obtainedinformation to recognize whether a covert channel is present.Achieving such goals is challenging, especially due to thedifferent sources of power drains, e.g., transmissions over airinterfaces, memory operations, CPU- or I/O-bound behaviors,and user to kernel space switches [36]. Thus, we proposetwo general methods based on the black-box modelingapproach.

The first technique, denoted as regression-based detec-tion (RBD), uses a regression fed with past values of theconsumption in a “clean” system to predict its future behavior.Then, if the expected energy footprint deviates too muchfrom the real one, hidden communication is assumed present.The second approach, denoted as classification-based detec-tion (CBD), exploits a reduced set of features describing theenergetic behavior of the device. More specifically, a classi-fication problem is solved to detect covert communicationsbetween colluding applications.

To this aim, well-known artificial intelligence tools, suchas one-hidden-layer feedforward neural networks and binarydecision trees, are used to model the power consumption. TheRBD and CBD approaches are composed by two steps, asshown in Figure 3. First, a model of the power consumptionis constructed based on a set of past collected measurements.This requires the solution of an optimization problem tofind the best values of the parameters the model dependsupon. Such a step is called “training”, and the collected mea-sures constitute the training set. This may be computationallydemanding, but it is usually performed offline and not on themobile device. Second, the optimized model is used to detectcovert communications based on the new, actual measurementsof power consumption. This step can be performed online, i.e.,at runtime on the device.

Fig. 3. Two-step procedure for the detection of covert channels.

A. Neural Networks and Decision Trees

In this section, we introduce the tools used to constructthe approximate models of power consumption. Both areemployed in the literature to approximate unknown relation-ships between input and output variables without any a-prioriknowledge on the underlying dynamics. They can be used tosolve either regression or classification problems [37]. Regres-sion models map the input space into a real-valued domain.Instead, classifiers map the input space into predefined classes.The relationships between inputs and outputs are learned fromthe collected data and then used to associate an output tonew, unseen inputs. Neural networks belong to the familyof “parametric” approximators, as their output significantlydepends on the parameters that define the structure of themodel, whereas decision trees are usually referred to as “non-parametric” approximators, i.e., they strongly depend from theavailable data and, less significantly, from a reduced set ofparameters.

As said, the goal is to approximate an input/output mappingxi �→ y

i , where xi ∈ Rn is the input and y

i ∈ Rm is the

corresponding output. The unknown functional relationshipbetween inputs and outputs is approximated by a function γbelonging to a family � of parametrized functions, i.e.,

yi = γ (xi , α) (1)

where α ∈ Rp is a vector of parameters and yi ∈ R

m is theestimated value of y

i .In recent years, the literature of learning from data has

mainly concentrated on the use of nonlinear models for theapproximation of complex systems. In fact, it has been provedfrom both theoretical and numerical viewpoints, that classicallinear models may be computationally intractable for theapproximation of complex functions, especially in the case ofhigh dimensionality of the input variables (see, e.g., [38]–[40]and the references therein). On the contrary, it has beenshown that nonlinear structures have a greater flexibility andguarantee good approximations with a smaller number ofparameters.

One-hidden-layer feedforward neural networks are a verypopular nonlinear architecture already used to model a varietyof systems (see, e.g., [41], [42] and references therein). In thispaper we focus on neural networks with sigmoidal activationfunctions. They enjoy the universal approximation property,

804 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 11, NO. 4, APRIL 2016

i.e., the ability to approximate any well-behaved function witharbitrary precision [43], [44]. The function γ is given by alinear combination of parametrized basis functions, with acertain number of free parameters to be optimized inside. Forscalar outputs, we have

γ (x, α) =ν∑

k=1

ckσ

( q∑

j=1

akj x j + bk

)+ c0

where σ(·) is the activation function, ν is the number of basisfunctions (i.e., the neurons) and α � col

(akj , bk, ck, c0

) ∈ Rp

is the vector of free parameters.Concerning binary decision trees, they are widely-used

models for learning (see, e.g., [14], [37], [45], [46] and thereferences therein). The output of the model is computed via abinary partition of the input space into smaller subsets called“leaves”, with sides parallel to the coordinate axes, in whichthe values of the output are constant. Each split pursues certainperformance goals and is repeated until a stopping criterion ismet. The partition is called a “tree” since the leaves are addedrecursively with binary splits to form a tree structure. Twodifferent types of trees exist depending on whether they areused to solve regression or classifications problems. In thecase of regression trees, a constant output value is assignedto each leaf, corresponding to the average of the observedoutput values therein. By contrast, a certain label is assignedto the terminal leaves in classification trees. In both cases,to avoid overfitting, the obtained tree is usually “pruned”according to some regularization costs [37]. In this paper,we focus on binary trees, in which each step of a predictioninvolves checking the value of only one variable at a time. Thefunctional relation γ in (1) is then approximated by means ofa piecewise-constant function over the various partitions.

After choosing the family of functions �, we need to findan element γ ∗ ∈ � capable of reproducing at best the systembehavior. This corresponds to a training procedure to optimizethe parameters on the basis of the available data. To thispurpose, a suitable index is used to measure the differencebetween the estimated model output yi for a certain value ofthe parameters and the real measured output values y

i . Themost used metric is a mean squared error (MSE) criterion:the optimal parameter vector α∗ is the one that minimizes thequadratic difference between the real output variables and theestimated ones, i.e.,

α∗ � minα∈R

p

{1

N

N∑

i=1

(yi − γ (xi , α))2

}

where N is the number of samples used for the training.In general, the training phase may be computationally

demanding, especially for large values of N . However, manyad-hoc-developed procedures are available in the literature(see, e.g., [47], [48]). Usually, they are implemented in effi-cient software packages, allowing to solve the training problemin a reduced amount of time.

B. Regression-Based Detection

The RBD method is composed of two steps: the firstconsists in modeling the power consumed in a “clean” system,

i.e., when no colluding applications performing hidden dataexchanges are present in the device. To this end, a collectionof past values of the consumption are used to predict the futureones. The second step is based on a comparison between theforecast consumption and the actual one. Hidden communi-cation is spotted if the expected energy footprint deviates toomuch from the real one.

In more details, we define the following quantities at eachsampling time t = 0, 1, . . .. Let pt ∈ R+ be the powerconsumed by a process at time t . Such a quantity is measuredand can be considered as an input. Let wt+1 ∈ R+ be theprediction of the power needed by the process itself at thenext time instant t + 1, i.e., it is an output.

At least in principle, the consumption of a process at timet + 1 may depend on the energy requirements at the previoustime instants, from 0 to t . However, to avoid dealing withvectors of increasing dimension as the time grows, we considera “fading memory” assumption, which consists in assumingthat the output of the system (i.e., the power consumption ofa process at time t + 1) depends only on a finite number q ofpast inputs [49]. In this perspective, we define a regressionvector (also called regressor) as the collection of the pastinput variables from time t − q + 1 up to time t , i.e., p

t �col

(pt−q+1, pt−q+2, . . . , pt

) ∈ Rq , where q is a positive

constant value.Thus, the training set for the creation of the model has the

form of a set of input/output pairs

�N �{

pt , wt+1

}Nt=1

where N is the total number of available measures. The goalis to find a model that is able to capture, at each time t , thefunctional relationship between past and future consumption,i.e., the mapping p

t �→ wt+1. Thus, equation (1) becomes

wt+1 = γ (pt , α)

where wt+1 is the estimated output at time t + 1 andγ is a function belonging to a family � of one-hidden-layer feedforward neural networks or binary decision trees.Notice that, when the regressor is made up by long series ofpast observations, the dimensionality of the problem rapidlyincreases, hence the need of efficient approximators arises.

Once the best model within the class � has been foundby the offline training procedure, the second step of the RBDmethod requires the definition of a detection rule. Specifically,the detection is performed online by feeding at each timestep t the trained model with the past q measurements ofthe consumption collected into the regressor p

t−1. Then, theestimated consumption wt is compared with the real measuredvalue wt in a time window moving over time. The sameprocedure is repeated at the next time steps, and a predictionerror et is defined as follows:

et � 1

τ

t∑

k=t−τ+1

|wk − wk | (2)

where τ is a given time horizon. We assume that a covert com-munication among two colluding applications is present if theprediction error is greater than a certain positive threshold ξ .

CAVIGLIONE et al.: REVEALING MOBILE MALWARE HIDDEN COMMUNICATIONS 805

Otherwise, it is considered absent. The rationale is that theapproximate model, if properly trained, is able to predict thefuture behavior of the energy consumption of the “clean”system with a good level of accuracy. Hence, severe deviationsreveals the presence of two processes covertly exchangingdata.

Clearly, the performance of the RBD depends on the qualityof the approximating model and on the parameters q , τ , and ξ ,which have to be properly tuned. We will observe their impacton the quality of detection in a real scenario in Section VI.Notice that the same model γ can be used to detect covertcommunications with all the covert channels described inSection III since it has been obtained using measurementsobtained in a “clean” system.

C. Classification-Based Detection

The CBD method consists in solving a classification prob-lem starting from a set of measurements both in the presenceand in the absence of colluding applications. It requiresthe definition of a set of “features” representing the powerconsumption of the device in a concise, effective manner.

More specifically, we focus on three different featurescharacterizing the energetic behavior of a process at eachtime t , collected into the vector f

t∈ R

3: (i) the average

power consumption from time t −λ+1 to time t , where λ is apositive constant defining a window of past measurements, (ii)the total variation of the power consumption from time t−λ+1to time t , and (iii) the instantaneous consumption at time t .Thus, we focus on the following vector of features at eachtime t :

ft� col

(t∑

l=t−λ+1

pl,

t∑

l=t−λ+1

|pl |, pt

)∈ R

3 (3)

Each vector ft

refers to a single measurement and isassociated to a certain class k among two possible ones,corresponding to the cases in which covert channels are usedto exchanging data between colluding applications (k = 1)and no covert channels are established (k = 0).

The training set for the creation of the model takes on theform of a set of N input/output pairs

�N �{

ft, gt

}N

t=1

where the scalar output gt is equal to k if the input vector ft

belongs to the class k.The goal is to find a model able to recognize the class

containing a given input vector that is not among the N usedfor the training. As in the case of the RBD method, we relyupon models belonging to a certain family � of one-hidden-layer feedforward neural networks and binary decision trees,i.e., equation (1) becomes:

g j = γ ( fj, α)

where g j is the class assigned to the input vector fj

by themodel. The output g j must be one of the two possible classes.

Clearly, the goal of the classification is to ensure that theassignment of the model is correct, i.e., the difference between

TABLE II

PARAMETERS OF THE RBD AND CBD DETECTION METHODS

g j and g j is as small as possible. To this end, as in the caseof the RBD, a suitable training phase is performed offline tofind the optimal values of the parameter vector α. Once thebest model within the family � has been found, at each time tthe detection whether a covert channel is present is performedonline by feeding the trained model with the vector f

tof the

current features and analyzing the value of the output gt .Differently from the RBD method, where the same approx-

imate model is used to detect covert communications withall the seven techniques introduced in Section III, the CBDapproach requires the training of a different model for eachcovert channel.

We point out that the accuracy of the CBD depends onthe quality of the approximating model and on the parame-ter λ used to construct the vector of features. Section VIwill discuss its impact on the quality of detection in a realscenario. Notice that the CBD requires the tuning of onlythis parameter, whereas the RBD relies upon three parameters.Table II summarizes the main parameters of the two considereddetection methods and their meaning.

VI. NUMERICAL RESULTS

To evaluate the effectiveness of the proposed approach, weconducted experimental trials using two different smartphones,i.e., a Samsung Galaxy SIII and a LG Optimus 4X HD P880.We tested the RBD and CBD methods using Matlab with theNeural Networks and Statistics Toolboxes. All the experimentswere done on a computer with a 2.5 GHz Intel i7 CPU and4 GB of RAM.

A dataset containing the consumption of all the softwarecomponents running on the smartphones was generated usingthe methodology presented in Section IV. In more details,the consumption was measured with a per-process granu-larity, both in a “clean” set-up and when a local covertchannel among two processes is created. Preliminary inves-tigations showcased that the overall power consumption ofthe smartphone is well represented by the energy requiredby processes with OS-wide dynamics. More specifically, theenergy consumption of the process System gave us enoughcondensed information to capture the general trend. Indeed,the System process is the core of all mechanisms requiredto access the low-level drivers of Android. It is used to handlenotifications, display, audio, alarms, and telephony, just fornaming a few. Therefore, for testing our methods, we reliedupon the information provided solely by the process System.

As said, we focused on a scenario in which information-hiding-capable malware increases its stealthiness by actingwhen the device is idle, i.e., there is no user activity involved.As regards the exchanged messages, we set a fixed sizeof 1000 bytes, which is large enough to represent the exfil-

806 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 11, NO. 4, APRIL 2016

Fig. 4. Battery drain for each type of covert channel during 3 hours ofrepeated experiments.

tration of sensitive information such as a bank account or acollection of contacts stored in the address book. Each covertchannel was tested by performing 100 data transmissionsstarting at random time instants. To have a proper statisticalrelevance, each trial was repeated 10 times.

Figure 4 shows the trend of the battery drain for a period of3 hours in the presence of colluding applications transmittingdata with the seven implemented covert channels. Resultsindicate that the most consuming methods are the VolumeSettings, the Memory Load, and the System Load, whereasthe lowest one is the Type of Intent. Even if it is one of themost consuming, the System Load covert channel is also theone with the smallest bitrate.

The performances of the proposed detection methods werequantified by means of the percentage of correct detection ofhidden communications, defined as follows:

d = 1

T

T −1∑

t=0

1 − |yt − yt | (4)

where T is the length of each trial (dependent on the typeof used covert channel) and yt and yt denote the actual andspotted hidden communications at time t , respectively. In moredetails, yt = 1 indicates that two colluding applications arecovertly exchanging data, whereas yt = 0 represents theabsence of hidden communication.

A. RBD MethodA training set composed of 5000 energy consumption

measurements was used to optimize the parameters of neuralnetworks and decision trees for the RBD method. Such sam-ples were obtained in a clean system without hidden com-munication between colluding applications. The training wasperformed using the Levenberg-Marquardt algorithm [47] forneural networks and by minimizing the MSE of the predictionscompared to the training data for decision trees, using theGini’s diversity index as split criterion [50]. As pointed outin Section V-B, the same approximator was used to detect allthe seven implemented covert channels.

In order to determine the best values of the parameters q ,ξ , τ , and ν, a thorough simulation campaign was performed.Concerning neural networks, Figure 5(a) reports the percent-age of correct detection d of each covert channel defined asin equation (4), averaged over 10 different trials, when ν isvaried from 5 to 50 and the other parameters q , ξ , and τ arefixed to 20, 30, and 20, respectively. Figures 5(b) and 5(c)

Fig. 5. Average percentage of correct detection for each covert channel usingthe RBD when varying the parameters of neural networks (a, b, d, and f) anddecision trees (c, e, and g).

show the behavior of the average d obtained with neuralnetworks and decision trees, respectively, when varying thelength q of the regressor from 5 to 30 and with the otherparameters fixed. Figures 5(d) and 5(e) depict the average dusing neural networks and decision trees, respectively, as afunction of the threshold ξ used for the detection rule, whereasthe other parameters are fixed. Lastly, Figures 5(f) and 5(g)showcase the average d obtained with neural networks anddecision trees, respectively, as a function of the length τ ofthe time window used for the detection rule, with the otherparameters fixed.

From the obtained results, it turns out that the percentageof correct detection for neural networks increases with ν up toν = 40, which is then the best number of activation functions.For larger values, the phenomenon of overfitting is experi-

CAVIGLIONE et al.: REVEALING MOBILE MALWARE HIDDEN COMMUNICATIONS 807

Fig. 6. Boxplots of the percentages of correct detection for each covertchannel using the RBD with neural networks (a) and decision trees (b).

Fig. 7. Comparison of real and estimated power consumption of the Systemprocess for the Volume Settings by using the RBD with neural networks (a)and decision trees (b).

enced, i.e., the number of basis functions is too large for theavailable data, and minor fluctuations in the energy measuresmay be overemphasized, thus resulting into bad detection rates.Concerning the length q of the regressor, the best value turnsout to be q = 20. The behavior of neural networks is moreaffected by the chosen value if compared to decision trees,for which all the values of q guarantee almost the sameresults. Instead, neural networks and decision trees exhibit thesame behavior when varying the threshold ξ for the predictionerror (2), for which the best value appears to be ξ = 30. Lastly,the percentage of correct detection grows if the time horizon τused in (2) increases up to τ = 20, and then remains almostconstant. Thus, in the perspective of saving computationaltime, τ = 20 is the best choice.

Figure 6 shows the boxplots of the percentages of correctdetection for each information-hiding technique computedover 10 different trials by using neural networks and decisiontrees with the best values of their parameters, i.e., ν = 40,q = 20, ξ = 30, and τ = 20. We conclude that the perfor-mance of neural networks and decision trees are comparable,i.e., on the average the accuracy of the detection is similar inboth cases. The most easily detectable method appears to bethe System Load covert channel, whereas the method that isthe least detectable is the File Size.

Figure 7 depicts the measured trend of the consumption ofthe System process compared with its estimation providedby neural networks and decision trees when using the VolumeSettings covert channel. The presence or absence of hiddencommunication is denoted by high or low values of the binarysignal at the bottom of each figure. As it can be seen, theprediction of the energy consumption is more accurate when

Fig. 8. Average percentage of correct detection for each covert channelusing the CBD when varying the parameters of neural networks (a and b)and decision trees (c).

no covert communication is active, whereas the predictionis not accurate in the presence of colluding applications.The “bad” prediction when covert channels are present isfundamental to spot hidden communications. More specifi-cally, neural networks underestimate the power consumption,whereas decision trees saturate to a certain value. This is notsurprising since the models have been built using a “clean”system without colluding applications.

B. CBD Method

To test the effectiveness of the CBD method, we used againa training set made up of 5000 energy samples. Differentlyfrom the RBD, the training was performed both when the col-luding applications are active and inactive. Moreover, differentapproximators were trained for each of the seven implementedcovert channels.

Since we had to solve a classification problem, the real-valued output of the neural networks was rounded eitherto 1 or 0 depending on whether hidden communication isspotted or not. Concerning decision trees, we adopted theso-called classification trees, whose output is directly oneof the classes defined during the training. The training ofneural networks was performed again by using theLevenberg-Marquardt algorithm, whereas classification treeswere trained by minimizing the MSE of the predictionscompared with the trained data and using the Gini’s diversityindex as the split criterion.

For the case of neural networks, we varied both the numberof neurons ν and the number of time instants λ for thecomputation of the features as in (3), in order to investigatetheir influence on the accuracy of the detection. Figure 8(a)depicts the percentage of correct detection d of each covertchannel, averaged over 10 different trials, when ν is variedfrom 5 to 30 and λ is fixed to 10. Figure 8(b) presentsthe average d for each information-hiding method when λranges from 5 to 30 and ν equals to 10. It turns out that

808 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 11, NO. 4, APRIL 2016

Fig. 9. Boxplots of the percentages of correct detection for each covertchannel using the CBD with neural networks (a) and decision trees (b).

Fig. 10. Comparison of the true covert channel activity over time against theestimated one for the Volume Settings by using the CBD with neural networks(a) and decision trees (b).

the number of neurons affects the accuracy of the detectiononly marginally. Therefore, to save memory and computationaltime, one might choose ν = 5 or ν = 10. As regards theeffect of λ, the best choices are λ = 5 or λ = 10 since asmall decay of performance is experienced for large values.Concerning decision trees, we investigated the effect of theparameter λ on the accuracy of detection. The results arereported in Figure 8(c) for the average d . Also in this case, λvaries from 5 to 30, and the percentage of detection decreasesif λ increases. Hence, optimal values are again λ = 5 orλ = 10.

Figure 9 depicts the boxplots of the percentages of correctdetection for each covert channel computed over 10 differenttrials using the approximate models with the best values oftheir parameters, i.e., ν = 10 and λ = 10 for neural networksand λ = 10 for decision trees. In general, neural networksguarantee better performances compared with decision trees,i.e., on the average the accuracy of the detection is higher andwith a lower variance. In all cases, the most easily detectablecovert channels are the File Lock and the Volume Settings.Instead, the Memory Load method is the most difficult todetect despite its high consumption. Instead, the System Loadis characterized by the largest dispersion.

Figure 10 portraits the estimated covert channel activitycompared to the real one for the Volume Settings method.As shown, neural networks and decision trees are able tocorrectly spot the channel activity most of the time, thusshowcasing their effectiveness for runtime or static analysispurposes within the security framework of the device.

C. Comparison Between RBD and CBD

To sum up, Table III reports the percentage of correctdetection averaged over 10 trials for all the implemented covert

TABLE III

AVERAGE PERCENTAGES OF CORRECT DETECTION FOR THE DIFFERENTDETECTION METHODS AND COVERT CHANNELS

channels for the RBD and CBD methods. In both cases, theSystem Load and the Volume Settings are the most easilydetectable covert channels. This may be ascribed to the factthat such methods are also the most power-consuming, i.e.,their energy footprint is more evident. In this case, the hiddencommunication is correctly spotted 9 times over 10 on theaverage. The most difficult methods to be detected are the FileSize and the Memory Load. However, even if lower than theone of the best-performing methods, their average percentageof correct detection is about 65% when using the RBD and85% for the CBD, which is quite a satisfactory result. TheMemory Load covert channel seems the most difficult to bespotted. This behavior is due to the absence of code in theSystem process that allocates memory: this task is done athigh level by the Dalvik virtual machine and at low level bythe Linux kernel.

According to the obtained results, in general the CBD out-performs the RBD in terms of percentage of correct detection.Moreover, it is worth noting that the RBD has three parametersto be tuned, i.e., q , τ , and ξ , instead of only one for the CBD,i.e., λ. As a consequence, the implementation of the CBD ina production-quality tool should be preferred, both in terms ofcomplexity and performance.

Concerning the computational effort, the average time forthe training was equal to 82.5 seconds for the RBD withν = 40 and q = 20 and to 15.6 seconds for the CBD withν = 10 and λ = 10 when using neural networks. The trainingtimes of the two detection methods when using decision treeswith q = 20 and λ = 10 were equal to 1.92 and 0.27 seconds,respectively. The higher times of the RBD are mainly dueto the greater dimension of the input vector compared to theCBD. In fact, in the first case the dimension of the input isequal to the length q = 20 of the regressor, whereas in thesecond one it is equal to the number of features, i.e., 3. Thisalso requires a larger number of neurons to obtain satisfactoryapproximations. In general, neural networks appears to bemore computationally demanding compared to decision trees.Nevertheless, the RBD requires the training of only one model,whereas an approximate model for each covert channel isrequired by the CBD, thus resorting to seven different trainingprocedures.

The average time to spot the presence of a covert channelfor the RBD method with the best values of the parameters isequal to 0.01 and 0.001 seconds, depending on whether neuralnetworks or decision trees are used, respectively. As regardsthe CBD approach, such times when using the best values

CAVIGLIONE et al.: REVEALING MOBILE MALWARE HIDDEN COMMUNICATIONS 809

of the parameters are again equal to 0.01 and 0.001 secondsfor neural networks and decision trees, respectively. In allcases, the online computational effort is very small. Thus,the proposed methods appear to be well-suited to beingimplemented in an online detection framework directlyrunning on a mobile device.

VII. CONCLUSIONS AND FUTURE WORKS

In this paper we have presented a framework based onartificial intelligence tools, such as neural networks anddecision trees, to detect the presence of malware usinginformation-hiding techniques exploiting power measure-ments. Specifically, we have focused on the colluding applica-tion scenario, which is characterized by two processes trying tocommunicate outside their sandboxes for malicious purposes,for instance, for sensitive data exfiltration. Two detection meth-ods have been developed, requiring the solution of regressionand classification problems. To verify their effectiveness, wehave implemented seven local covert channels on the Androidplatform, and we have performed an experimental measure-ment and detection campaign. The obtained results indicatethat both methods are characterized by a good detectionperformance and can be used as an accurate IDS softwareon a modern smartphone to reveal the presence of hazardsexploiting information hiding.

Future works aim at making our detection framework moreeffective, for instance by developing proper metrics to recog-nize at runtime the pair of colluding applications. Moreover,part of our ongoing research is devoted to understand ifusing additional information (e.g., activity correlation) couldincrease the accuracy of the approach. At the same time, wealso work to extend the energy-based detection approach toother threats exploiting information hiding. On the overall,such improvements should lead to the development of anapplication directly running on a mobile device to spot thepresence of covert communications in real time.

ACKNOWLEDGMENT

The source code of the covert channels and of the measure-ment framework described in this paper is available online athttp://steganocc.gforge.inria.fr.

REFERENCES

[1] K. Allix, Q. Jerome, T. F. Bissyandé, J. Klein, R. State, and Y. Le Traon,“A forensic analysis of Android malware—How is malware written andhow it could be detected?” in Proc. 38th Annu. Comput. Softw. Appl.Conf., 2014, pp. 384–393.

[2] V. Rastogi, Y. Chen, and X. Jiang, “Catch me if you can: EvaluatingAndroid anti-malware against transformation attacks,” IEEE Trans. Inf.Forensics Security, vol. 9, no. 1, pp. 99–108, Jan. 2014.

[3] J.-F. Lalande and S. Wendzel, “Hiding privacy leaks in Android appli-cations using low-attention raising covert channels,” in Proc. 8th Int.Conf. Availab., Rel. Secur., 2013, pp. 701–710.

[4] W. Mazurczyk and L. Caviglione, “Steganography in modern smart-phones and mitigation techniques,” IEEE Commun. Surveys Tuts.,vol. 17, no. 1, pp. 334–357, First Quarter 2015.

[5] W. Mazurczyk and L. Caviglione, “Information hiding as a challenge formalware detection,” IEEE Security Privacy, vol. 13, no. 2, pp. 89–93,Mar./Apr. 2015.

[6] McAfeeLabs. McAfee Labs Threat Report August 2014. [Online].Available: http://www.mcafee.com/uk/resources/reports/rp-quarterly-threat-q2-2014.pdf, accessed Dec. 18, 2015.

[7] J. Hoffmann, S. Neumann, and T. Holz, “Mobile malware detec-tion based on energy fingerprints—A dead end?” in Research inAttacks, Intrusions, and Defenses. Berlin, Germany: Springer, 2013,pp. 348–368.

[8] L. Zhang et al., “Accurate online power estimation and automaticbattery behavior based power model generation for smartphones,” inProc. IEEE/ACM/IFIP Int. Conf. Hardw./Softw. Codesign Syst. Synthesis,Oct. 2010, pp. 105–114.

[9] L. Caviglione and A. Merlo, “The energy impact of security mechanismsin modern mobile devices,” Netw. Secur., vol. 2012, no. 2, pp. 11–14,2012.

[10] A. Merlo, M. Migliardi, and P. Fontanelli, “On energy-based profilingof malware in Android,” in Proc. Int. Conf. High Performa Comput.Simulation, 2014, pp. 535–542.

[11] C. Marforio, H. Ritzdorf, A. Francillon, and S. Capkun, “Analy-sis of the communication between colluding applications on modernsmartphones,” in Proc. 28th Annu. Comput. Secur. Appl. Conf., 2012,pp. 51–60.

[12] A. Merlo, M. Migliardi, and P. Fontanelli, “Measuring and estimatingpower consumption in Android to support energy-based intrusion detec-tion,” J. Comput. Secur., vol. 23, no. 5, pp. 611–637, 2015.

[13] S. Haykin, Neural Networks: A Comprehensive Foundation.Englewood Cliffs, NJ, USA: Prentice Hall, 1999.

[14] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classificationand Regression Trees. London, U.K.: Chapman & Hall, 1984.

[15] A. Bose, X. Hu, K. G. Shin, and T. Park, “Behavioral detection ofmalware on mobile handsets,” in Proc. 6th Int. Conf. Mobile Syst., Appl.,Services, 2008, pp. 225–238.

[16] A. Shabtai, R. Moskovitch, Y. Elovici, and C. Glezer, “Detectionof malicious code by applying machine learning classifiers on staticfeatures: A state-of-the-art survey,” Inf. Secur. Tech. Rep., vol. 14, no. 1,pp. 16–29, 2009.

[17] A. Shabtai, U. Kanonov, Y. Elovici, C. Glezer, and Y. Weiss, “‘Andro-maly’: A behavioral malware detection framework for Android devices,”J. Intell. Inf. Syst., vol. 38, no. 1, pp. 161–190, 2012.

[18] L. Liu, G. Yan, X. Zhang, and S. Chen, “VirusMeter: Preventing yourcellphone from spies,” in Recent Advances in Intrusion Detection. Berlin,Germany: Springer, 2009, pp. 244–264.

[19] G. A. Jacoby and N. J. Davis, “Battery-based intrusion detection,”in Proc. IEEE Global Telecommun. Conf., vol. 4. Nov./Dec. 2004,pp. 2250–2255.

[20] D. C. Nash, T. L. Martin, D. S. Ha, and M. S. Hsiao, “Towards anintrusion detection system for battery exhaustion attacks on mobilecomputing devices,” in Proc. 3rd IEEE Int. Conf. Pervasive Comput.Commun. Workshops, Mar. 2005, pp. 141–145.

[21] H. Kim, J. Smith, and K. G. Shin, “Detecting energy-greedy anomaliesand mobile malware variants,” in Proc. 6th Int. Conf. Mobile Syst., Appl.,Services, 2008, pp. 239–252.

[22] B. Dixon, Y. Jiang, A. Jaiantilal, and S. Mishra, “Location based poweranalysis to detect malicious code in smartphones,” in Proc. 1st WorkshopSecur. Privacy Smartphones Mobile Devices, 2011, pp. 27–32.

[23] B. Dixon and S. Mishra, “Power based malicious code detectiontechniques for smartphones,” in Proc. 12th IEEE Int. Conf. Trust, Secur.,Privacy Comput. Commun., Jul. 2013, pp. 142–149.

[24] B. Dixon, S. Mishra, and J. Pepin, “Time and location power basedmalicious code detection techniques for smartphones,” in Proc. IEEE13th Int. Symp. Netw. Comput. Appl., Aug. 2014, pp. 261–268.

[25] M. Curti, A. Merlo, M. Migliardi, and S. Schiappacasse, “Towardsenergy-aware intrusion detection systems on mobile devices,” in Proc.Int. Conf. High Perform. Comput. Simulation, 2013, pp. 289–296.

[26] T. K. Buennemeyer, G. A. Jacoby, W. G. Chiang, R. C. Marchany, andJ. G. Tront, “Battery-sensing intrusion protection system,” in Proc. IEEEInf. Assurance Workshop, Jun. 2006, pp. 176–183.

[27] T. K. Buennemeyer, T. M. Nelson, M. A. Gora, R. C. Marchany, andJ. G. Tront, “Battery polling and trace determination for bluetoothattack detection in mobile devices,” in Proc. IEEE Inf. Assurance Secur.Workshop, Jun. 2007, pp. 135–142.

[28] L. Caviglione, A. Merlo, and M. Migliardi, “What is greensecurity?” in Proc. 7th Int. Conf. Inf. Assurance Secur., 2011,pp. 366–371.

[29] F.-E. Kioupakis and E. Serrelis, “Preparing for malware that uses covertcommunication channels: The case of Tor-based Android malware,” inProc. Int. Conf. Inf. Secur. Digital Forensics, 2014, pp. 85–96.

[30] P. Faruki, V. Ganmoor, V. Laxmi, M. S. Gaur, and A. Bharmal,“AndroSimilar: Robust statistical feature signature for Android malwaredetection,” in Proc. 6th Int. Conf. Secur. Inf. Netw., 2013, pp. 152–159.

810 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 11, NO. 4, APRIL 2016

[31] R. Andriatsimandefitra and V. V. T. Tong, “Detection and identificationof Android malware based on information flow monitoring,” in Proc.2nd Int. Conf. Cyber Secur. Cloud Comput., 2015, pp. 1–4.

[32] R. Schlegel, K. Zhang, X. Zhou, M. Intwala, A. Kapadia, andX. Wang, “Soundcomber: A stealthy and context-aware sound trojanfor smartphones,” in Proc. NDSS, vol. 11. 2011, pp. 17–33.

[33] A. Merlo, M. Migliardi, and L. Caviglione, “A survey on energy-awaresecurity mechanisms,” Pervasive Mobile Comput., vol. 24, pp. 77–90,Dec. 2015.

[34] A. Armando, A. Merlo, M. Migliardi, and L. Verderame, “Breakingand fixing the Android launching flow,” Comput. Secur., vol. 39,pp. 104–115, Nov. 2013.

[35] A. Armando, A. Merlo, and L. Verderame, “An empirical evaluation ofthe Android security framework,” in Security and Privacy Protectionin Information Processing Systems (IFIP Advances in Information andCommunication Technology), vol. 405, L. J. Janczewski, H. B. Wolfe,and S. Shenoi, Eds. Berlin, Germany: Springer, 2013, pp. 176–189.

[36] S. Lee, W. Jung, Y. Chon, and H. Cha, “EnTrack: A system facility foranalyzing energy consumption of Android system services,” in Proc. Int.Joint Conf. Pervasive Ubiquitous Comput., 2015, pp. 191–202.

[37] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of StatisticalLearning, 2nd ed. New York, NY, USA: Springer, 2009.

[38] R. Zoppoli, M. Sanguineti, and T. Parisini, “Approximating networksand extended Ritz method for the solution of functional optimizationproblems,” J. Optim. Theory Appl., vol. 112, no. 2, pp. 403–440, 2002.

[39] M. Gaggero, G. Gnecco, and M. Sanguineti, “Dynamic programmingand value-function approximation in sequential decision problems: Erroranalysis and numerical results,” J. Optim. Theory Appl., vol. 156, no. 2,pp. 380–416, 2013.

[40] M. Gaggero, G. Gnecco, and M. Sanguineti, “Approximate dynamicprogramming for stochastic N -stage optimization with application tooptimal consumption under uncertainty,” Comput. Optim. Appl., vol. 58,no. 1, pp. 31–85, 2014.

[41] L. Caviglione, “Enabling cooperation of consumer devices throughpeer-to-peer overlays,” IEEE Trans. Consum. Electron., vol. 55, no. 2,pp. 414–421, May 2009.

[42] A. Alessandri, C. Cervellera, and M. Gaggero, “Predictive control ofcontainer flows in maritime intermodal terminals,” IEEE Trans. ControlSyst. Technol., vol. 21, no. 4, pp. 1423–1431, Jul. 2013.

[43] K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforwardnetworks are universal approximators,” Neural Netw., vol. 2, no. 5,pp. 359–366, 1989.

[44] A. R. Barron, “Universal approximation bounds for superpositionsof a sigmoidal function,” IEEE Trans. Inf. Theory, vol. 39, no. 3,pp. 930–945, May 1993.

[45] R. A. Berk, Statistical Learning From a Regression Perspective.New York, NY, USA: Springer-Verlag, 2008.

[46] C. Cervellera, M. Gaggero, and D. Macciò, “An analysis based onF-discrepancy for sampling in regression tree learning,” in Proc. Int.Joint Conf. Neural Netw., 2014, pp. 1115–1121.

[47] D. W. Marquardt, “An algorithm for least-squares estimation of nonlinearparameters,” J. Soc. Ind. Appl. Math., vol. 11, no. 2, pp. 431–441, 1963.

[48] P. J. Werbos, The Roots of Backpropagation: From Ordered Derivativesto Neural Networks and Political Forecasting. New York, NY, USA:Wiley, 1994.

[49] A. Sen and M. Srivastava, Regression Analysis—Theory, Methods, andApplications. New York, NY, USA: Springer-Verlag, 2011.

[50] T. M. Cover and J. A. Thomas, Elements of Information Theory.New York, NY, USA: Wiley, 1991.

Luca Caviglione received the Ph.D. degree inelectronics and computer engineering from theUniversity of Genoa, Genoa, Italy. He has beeninvolved in research projects funded by ESA, EU,and MIUR. He is currently a Research Scientistwith the Institute of Intelligent Systems for Automa-tion, National Research Council of Italy, Genoa.He is a Work Group Leader of the Italian IPv6Task Force, a Contract Professor, and a Profes-sional Engineer. He has authored or coauthored over90 academic publications, and several patents. His

current research interests include P2P systems, wireless communications,cloud architectures, and network security. He is involved in the TechnicalProgram Committee of many international conferences and regularly servesas a Reviewer for the major international journals. Since 2011, he has beenan Associate Editor of the Transactions on Emerging TelecommunicationsTechnologies (Wiley).

Mauro Gaggero received the B.Sc. andM.Sc. degrees in electronics engineering andthe Ph.D. degree in mathematical engineeringfrom the University of Genoa, Genoa, Italy,in 2003, 2005, and 2010, respectively. He was aPostdoctoral Fellow with the Faculty of Engineering,University of Genoa, in 2010. Since 2011,he has been a Research Scientist with the Instituteof Intelligent Systems for Automation, NationalResearch Council of Italy, Genoa. His currentresearch interests include control and optimization

of nonlinear systems, distributed parameter systems, neural networks, andlearning from data. He is an Associate Editor of the European ControlAssociation Conference Editorial Board and the IEEE Control SystemsSociety Conference Editorial Board.

Jean-François Lalande received the Ph.D. degreein computer science from INRIA, Sophia-Antipolis,France, in 2004, within the Mascotte Project(CNRS/INRIA/UNSA). He is currently an Asso-ciate Professor with the INSA Centre Val de Loire,in the Laboratoire d’Informatique de l’Universitéd’Orléans. He is also temporarily associated withINRIA in the CIDRE team. During his Ph.D. studies,his research interests focused on the combinatorialoptimization for optical and satellite networks. Since2005, he has been been involved in the security of

operating systems, C-embedded software (including smart cards), and Androidapplications. His current research interests include mobile software security.He actively participates in the release of open-source software in order tomake security experiments reproducible.

Wojciech Mazurczyk (SM’13) received theB.Sc., M.Sc., Ph.D. (Hons.), and D.Sc. (Habilitation)degrees in telecommunications from the WarsawUniversity of Technology (WUT), Warsaw, Poland,in 2003, 2004, 2009, and 2014, respectively.He is currently an Associate Professor with theInstitute of Telecommunications, WUT, where heis the Head of the Bio-Inspired Security ResearchGroup (bsrg.tele.pw.edu.pl). His research interestsinclude bioinspired cybersecurity and networking,information hiding, and network security. He is

involved in the technical program committee of many internationalconferences, including the IEEE INFOCOM, the IEEE GLOBECOM, theIEEE ICC, and ACSAC. He also serves as a reviewer for major internationalmagazines and journals. Since 2013, he has been an Associate TechnicalEditor of the IEEE Communications Magazine (IEEE Comsoc).

Marcin Urbanski received the B.Eng. degree incomputer science from the Warsaw University ofTechnology, Warsaw, Poland, in 2015. He is cur-rently with the Norwegian University of Scienceand Technology, Trondheim, Norway, where he isimplementing applications for interactive lectures.His research interests include steganography andmobile software security.