IoTPredict: Collaborative QoS Prediction in IoTwhiteg5/pub/IoTPredict...IoTPredict: Collaborative QoS Prediction in IoT Gary White, Andrei Palade, Christian Cabrera, Siobh´an Clarke

IoTPredict: Collaborative QoS Prediction in IoTGary White, Andrei Palade, Christian Cabrera, Siobhan Clarke

School of Computer Science and Statistics,Trinity College Dublin, Dublin, Ireland

whiteg5, paladea, cabrerac, [email protected]

Abstract—Internet of Things (IoT) applications can be builtfrom a number of heterogeneous services provided by a rangeof devices, which are potentially resource constrained and/ormobile. As these services and applications continue to be morewidespread, a key research question is how to predict user-side quality of service (QoS), to ensure the optimal selection,composition and adaptation of IoT services. The exponentialgrowth in the number of these services means that it is notpractical to invoke all candidate services to test their QoS,especially during runtime service adaptation. QoS can vary bytime and location, which makes it difficult for service providers togive accurate estimates of how the service will perform for userslocated in changing network topologies. We propose IoTPredict,a novel neighbourhood-based prediction approach for the IoT,which uses an alternative similarity computation mechanism.Our collaborative approach requires no additional invocation ofservices, which is a key requirement for resource constraineddevices in the IoT. We evaluate our algorithm on a QoS datasetand show that it achieves higher QoS prediction accuracy thanother state of the art approaches.

I. INTRODUCTION

The Internet of Things (IoT) has developed from recentadvancements in Internet technologies, Wireless Sensor Net-works (WSN) and a new generation of cheaper and smallerwireless devices. The number of devices connected to theIoT is predicted to grow at an exponential rate, with latestforecasts predicting that there will be between 26 and 50billion connected devices by 2020 [1], [2]. The management ofthese devices is recognised as one of the most important areasof future technology and has gained wide attention from re-search institutes and industry in a number of different domainsincluding transportation, healthcare, industrial automation andemergency response [3], [4].

The devices that provide software services in these domainsare usually resource-constrained and/or mobile, which canlead to the QoS degrading or the service not responding[5]. A middleware can be used to handle these challengesand choose the optimal service at design time and conduct adynamic service adaptation if the QoS begins to degrade [6].Applications can have a range of QoS requirements based onthe domain and sensitivity of the information. With additionalservices available from a variety of devices many providersoffer services that provide equivalent functionalities. Suchredundant services can be utilised for service adaptation byreplacing the current working services with a candidate servicein response to unexpected QoS changes (e.g. unacceptable re-sponse time). To achieve this, knowledge about QoS values ofthe services is required to make timely and accurate adaptation

decisions, such as when to trigger an adaptation action, whichworking services to be replaced and which candidate service tochoose. For applications in critical domains such as healthcareand emergency response, it is especially important for amiddleware to react to degrading service quality and handleruntime service adaptation by selecting suitable replacementservices.

Selecting the best IoT service for a user is a difficult task asthe QoS of a service takes into account a number of factors,such as, response time, location and cost, which may be influ-enced by user preference [7]. There are also a number of QoSfactors, which are difficult to quantify, such as, the security andfunctional stability of a service [8]. Quantifiable QoS factors,such as response time and throughput, are not stable and varyby time and location of invocation [9]. Traditional approacheshave focused on time-series analysis to make predictions offuture QoS values [10], [11], [12]. These approaches rely onthe user executing the service to generate the values, which canbe used for time series prediction, but they make no estimatesfor uninvoked services. This is a problem in the IoT as therewill be a large number of candidate services and it would betoo time consuming to invoke even a subset of the functionallyequivalent services and as the services may be charged it canalso increase the cost to the service user. The invocation ofservices just to obtain their QoS values is a waste of networkresources, especially in a resource-constrained environment.

An alternative approach to predicting QoS is inspired fromrecommender systems, where the QoS information of similarusers is used to make predictions about the QoS from possibleservices, using collaborative filtering [13], [14]. These collab-orative filtering approaches use matrix factorisation (MF) toallow the user to receive QoS predictions of services that theyhave not invoked, based on QoS values from similar users.Using the QoS from similar users gives more informationabout candidate services, which can be chosen either at designtime or during runtime service execution. These approacheshave traditionally been used only in cloud based services,but recent results have shown they can be used in an IoTenvironment but require further changes to improve accuracy[15], [16]. They have an additional benefit that they cangather QoS information about possible candidate services inthe service composition without harming the performance ofthe network by using the predicted values based on other usersrather than invoking the services directly. They are also ableto scale well through the use of distributed gradient descent,which makes them suitable for handling the large scale users

User Request

WS3: Flood PredictionService

WS3: Air PolutionService

Service Providers

Gateways

User Request

ASP3: Best RouteService

WSN1: Water LevelService

WS2: Flood PredictionService

ASP4: Air PollutionService

WS1: Best Route Service

GW2:192.168.2.2

GW1:192.168.2.1

GW3:192.168.2.3

GW5:192.168.2.5

GW4:192.168.2.4GW6:192.168.2.6

ASP2: Possible RoutesService

Guidepost

ASP1: Noise Service

Fig. 1: Small-Scale Demonstration

and services that we would expect in an IoT environment [17].In this paper we outline a collaborative framework for the

IoT to support collaborative QoS prediction and show how itcan be used in a small-scale example. We also propose a novelalgorithm, which uses an alternative similarity comparison toreduce the error in the predicted QoS values compared to stateof the art approaches. The remainder of the paper is organisedas follows: Section II describes the collaborative framework inIoT and a middleware, which can implement this framework.Section III shows how this framework can be used in theIoT to predict QoS and introduces the alternative similaritycomputation method. Section IV describes the experimentsthat were conducted and Section V presents the results of theseexperiments. Section VI outlines the related work and SectionVII concludes and presents future work.

II. COLLABORATIVE FRAMEWORK IN IOT

Figure 1 shows a small-scale IoT scenario, with servicesprovided from different service types including web services(WS), residing on resource rich devices, wireless sensornetworks (WSN), which may be resource-constrained andcontrolled by a software defined network, and autonomousservice providers (ASP), who are independent mobile users inthe environment with intermittent availability. Even in a small-scale scenario such as this, traditional time-series approachescan cause a number of additional service requests for eachuser, which can flood the network [18]. A collaborative frame-work allows for users to share local QoS usage experiences andcombine this information to get a global view of the QoS of theservices in the environment. Each user keeps a record of their

User Request Request Handler Service Registration Engine

Service DiscoveryEngine Web Service

Provider

QoS MonitorMonitoring

EnginePrediction

Engine

Service Composition& Execution Engine

Service Registry

AutonomousService Provider

AsynchronousRequest

Publish/SubscriptMQTT

Service Interface

WSN

Fig. 2: Middleware Architecture

QoS experiences in the service registry. If a service consumerwould like to get personalised QoS predictions for alternativeservices, they can allow their QoS records to be used for otherusers predictions, which encourages collaboration. We canthen use our algorithm to make predictions of QoS values fornew users based on local information from similar users andforward the recommended services to the service compositionand execution engine.

The services provided by devices in the environment canbe used in a number of applications in a variety of differentdomains. One way to categorise these applications is by theirQoS requirements, which is based on the sensitivity andcriticality of the application. IoT applications can typicallybe categorised as best effort (no QoS), differentiated services(soft QoS) and guaranteed services (hard QoS) [8]. In the hardQoS case, there are strict hard real-time QoS guarantees. Thiswould be for safety critical applications such as monitoringpatients in a hospital or collision avoidance in a self-drivingcar system. Soft QoS, doesn’t require hard real time guaranteesbut needs to be able to reconfigure and replace servicesthat fail. This could be a routing application, which uses airquality, flooding and noise information to provide the bestroute through the city. If one of the services is about tofail, the middleware should re-compose the application usingsuitable replacement services. The final case is best effort,where there are no guarantees when a service fails. Thecollaborative framework that we define can provide soft QoSguarantees by using collaborative filtering process to selectsuitable replacement services based on the QoS experiencesfrom other users in the environment.

Figure 2 shows our middleware architecture, including theQoS monitor used to collect QoS information from a numberof different user requests. The middleware is implemented inthe gateways to register, compose, monitor and execute theservices in the environment. The services have a range ofdifferent QoS properties including response-time, location andenergy consumption. The communication links for invoking

these services are diverse, which will affect the personal QoSexperience of the users, so they are monitored using themonitoring engine. These devices are resource constrainedand mobile, which adds additional complexity compared totraditional web services that are static and more reliable. Themonitored results are used by the prediction engine to providepersonalised QoS prediction to service consumers, through ourprediction algorithm.

The main components of the middleware are the RequestHandler (RH), the Service Registration Engine (SRE), theService Discovery Engine (SDE), the QoS Monitor and theService Composition & Execution Engine (SCEE). The RHestablishes a request/response communication channel withthe user and forwards the request to the other middlewarecomponents. The SRE registers the available services in theenvironment. The SDE uses the backward-planning algorithmto identify the concrete services, which can be used to satisfythe request and sends this list of services to the SCEE. TheQoS monitor is used to monitor these services and can predictpossible candidate services to switch to if one of the servicesbegins to degrade, using the prediction engine. The predictionengine is the main focus of this paper and the design isdiscussed in detail in Section III. The SCEE uses these servicesand the predicted QoS values to create a response for therequest.

The SCEE is responsible for the composition and executionof services discovered by the SDE. Figure 3 illustrates a listof available services in the environment identified to satisfya user request, which was received from the RH. The SCEEcreates a list of service flows based on the concrete serviceproviders received from the SDE. The flows are then mergedbased on the service description. If two or more services inthe flow have the same input, the SCEE creates a guidepostto enable the invocation of one of these services based onQoS requirements. As some of the QoS values can be missingfrom the registries the goal of the collaborative framework isto make predictions for the missing values.

An execution guidepost G = Rid, D is a split-choicecontrol element of the composition process and maintains aset of execution directions D for a composition request Rid.These execution directions will be referred to as branches.Each element in the set D is defined dj = id, w, q wherej ≤ |D|. The set w represents the services in the branchand id represents the identifier of the branch. The valueq reflects the branch’s aggregated QoS values [19], whichcan be calculated according to predefined formulas [20]. Thebranch that maximises/minimises an objective function willbe selected by the guidepost during execution. This objectivefunction can contain a number of different non-functional QoSfactors that can be specified by the user in the service request.

As an example we consider the response time for eachbranch. The formula in Equation 1 calculates the response timeby aggregating the response time value of each componentservice in a sequential flow [20]. In this formula, rti is theresponse time of service i. However, it is possible that thisvalue could not be calculated because of a missing QoS value

WS1Branch 1

Branch 2

WS1WS2

WSN1

ASP2Req Guidepost 0.34

Res

0.34

0.34

ASP3

Branch 3

ASP1

0.23

Branch QoSWS2 + WS1 = 0.68s1

23

WSN1 + WS1 = ?ASP1 + ASP3 = ?

(a) Service flow without QoS Prediction

WS1Branch 1

Branch 2

WS1WS2

WSN1

ASP2Req Guidepost 0.34

0.22 Res

0.34

0.34

ASP3

Branch 3

ASP1

0.23 1.2

Branch QoSWS2 + WS1 = 0.68s1

23

WSN1 + WS1 = 0.56sASP1 + ASP3 = 1.43s

(b) Service flow with QoS Prediction

Fig. 3: Service Composition Flows

from an individual service candidate in the flow, or the valuebeing out-of-date.

Response T ime (RT ) =

n∑i=1

rti (1)

To address this problem, the QoS monitor uses QoS pre-diction to predict QoS values across each branch stored in theguidepost. Figure 3 shows the flows created by the SCEE forUser 4 (U4) in Figure 4. The response time values recordedduring service discovery phase were 0.34s for service providerWS2, 0.34s for WS1 and 0.23s for ASP1. The response timevalues for WSN1 and ASP3 were not recorded. In Figure3a, when the execution reaches Guidepost G, we can onlyaggregate the response time values for Branch 1, which is notoptimal. If the composition selects the branch with the lowestreported response time, it will select Branch 3, which is alsonot optimal. Only when we use the predicted values for themissing service QoS does the composition choose the optimalBranch 2, which can be seen in Figure 3b.

This example shows that we have been able to make a num-ber of predictions for services without directly invoking them,which reduces the load on the network. The effectivenessof the example relies on accurate QoS predictions to ensurethat the correct branch is selected. Recent studies have shownsome accuracy issues with current state of the art collaborativefiltering algorithms for reliable service composition, with someinstances where the composition was worse using predictedvalues for the flows [16]. This motivated the need to improvethe accuracy of current state of the art approaches.

III. COLLABORATIVE QOS PREDICTION

A. Small-Scale Demonstration

To provide QoS values on m IoT services for n users,you need to invoke at least n × m services. This is almostimpossible in an IoT environment where we expect a largenumber of services and users. Without this QoS information,the service composition and execution engine cannot selectthe optimal components based on their QoS and must make achoice based on whatever QoS information is available. Thisleads to choosing potentially non-optimal services, which cancause service degradation at runtime and execution errors.

The QoS value of IoT service s observed by user u can bepredicted by exploring the QoS experiences from a user similarto u. A user is similar to u if they share similar characteristics,which can be extracted from their QoS experiences withdifferent services. By sharing local QoS experience amongusers, these approaches can predict the QoS value of a rangeof IoT services including ASPs, web services and WSNs evenif the user u has never invoked the service s before. Figure1 can be modelled as a bipartite graph G = (U ∪ S,E),such that each edge in E connects a vertex in U to S. LetU = u1, u2, ..., u4 be the set of component users andS = ASP1, ASP2, ...,WSN2 denote the set of IoT servicesand E (solid lines) represent the set of invocations between Uand S. Given a pair (i, j), ui ∈ U and cj ∈ S, edge eijcorresponds to the QoS value of that invocation. Given theset E the task is to predict the weight of potential invocations(broken lines).

We visualise the process of matrix factorisation for the flowsgenerated in Figure 3, in Figure 4b, where each table entryshows an observed weight in Figure 4a. The task can beframed as a matrix completion problem, where we want tofill in the remaining values to have a fully completed matrixas shown in Figure 4c. There are a number of steps to makethese predictions, which are discussed in detail in the followingsections. In the remainder of this section we present the overallframework used to generate the predicted values at a higherlevel based on a specific example.

One of the problems that emerges due to the large numberof services in the IoT is that there are a number of func-tionally similar services for users to choose, which reducesthe possibility of having commonly invoked services. In thesmall-scale example in Figure 4b, with half the values reportedthere are few services common to users. For example, U1

has only one service in common with U2 and U3 and twoservices in common with U4. The values come from the publicdataset released by Zheng et al. [9], which consists of a matrixof the response time and throughput of 339 users for 5,825web services. As this dataset is for web services we use theHetHetNets traffic model to add heterogeneous IoT traffic datato the existing dataset [21], which is described in more detail inSection IV-A. The original dataset only has the user servicematrix as features, which makes it difficult to calculate thesimilarity between users with few commonly invoked services.The first step in the algorithm, discussed in more detail in

Section III-B, is to factorise the sparse user-service matrixand then use V TH to approximate the original matrix, wherethe low-dimensional matrix V denotes the user latent featurespace and the low-dimensional matrix H represents the servicelatent feature space, using the latent factor model [22]. Theadditional features in the latent feature space represents theunderlying structure in the data, computed from the originaldataset using matrix factorisation. As these matrices are densethey allow for similarity computation between all the usersand services in the matrix, which solves the original problemof having sparse matrices.

Once we have access to the low dimensional dense matriceswe can then compute the similarity between different users andservices using their latent features (Section III-C). Traditionalapproaches for QoS prediction have used Persons CorrelationCoefficient (PCC), to calculate the similarity between usersand services. However after conducted some experiment on theIoT data we found that there were a number of assumptions,which PCC makes that are not satisfied, such as having nooutliers and the variables being approximately normally dis-tributed. We propose an alternative non-parametric similaritycomputation mechanism that does not make these assumptionscalled Kendall’s Tau.

Once we have computed the similarity between users andservices we can then make predictions for the missing valuesby using the Top-K largest tau values for the users and services(Section III-D). The predictions for the users and services areweighted based on how similar they are and using an equalweighting of user and service based prediction.

B. Latent Features Learning

To learn the latent features of the users and services matrixwe employ matrix factorisation, which fits a model to the user-service matrix from the original dataset. The original QoSmatrix is factorised into two low rank matrices V and H . TheQoS usage experience of a user is typically determined by asmall number of factors such as the network load, location ofinvocation and provider resources. The latent feature modelensures an accurate and low-dimensional representation of theoriginal matrix.

Let Ω be the set of all pairs i, j and Λ be the set ofall known pairs (i, j) in Ω. Consider the matrix W ∈ Rm×n

consisting of m users and n services. Let V ∈ Rl×m andH ∈ Rl×n be the latent user and service feature matrices.Each column in V represents the l-dimensional user-specifiedlatent feature vector of a user and each column in H representsthe l-dimensional service-specific latent feature of a service.We employ an approximating matrix W = V TH to learn theuser-service relationship W [23]:

wij ≈ wij =

l∑k=1

vkihki (2)

To learn matrices V and H from the obtained QoS valuesin the original matrix W , we need to construct a cost functionto evaluate the accuracy of the approximation. We use the

U1 U2 U3 U4

WSN1 WSN2WS2WS1ASP2ASP1 ASP3

(a) Invocation Graph

0.9U1

ASP1

0.43U2 0.231.22

WS1

0.33

WSN1

0.340.23U4

0.23ASP2

0.30U3 0.37

0.12WS2

0.34 0.24

WSN2ASP3

1.39

(b) User-service Matrix

0.9U1

ASP1

0.43 0.23U2 0.23 0.25

WS1

0.33 0.36

WSN1

0.340.23 0.23U4

0.23 0.25ASP2

0.300.80U3 0.370.360.38 0.15

0.12 0.22 0.30WS2

0.34 0.22 0.24

WSN2

1.22

1.20

0.91ASP3

1.39

(c) Predicted Matrix

Fig. 4: Demonstration of QoS Prediction in IoT

standard Euclidean distance between the two matrices as thecost function.

F (W, W ) = ‖W − W ‖2F =∑ij

(wij − wij)2 (3)

where ‖ · ‖2F denotes the Frobenius norm.The optimisation problem can then be solved by using the

optimisation objective function in [23]:

minV,H

f(V,H) =∑

(i,j)∈Λ

[wij − wij logwij ],

s.t. wij =

l∑k=1

vkihki,

V ≥ 0,

H ≥ 0.

(4)

The objective function in Eq. 4 can then be minimised usingincremental gradient descent to find a local minimum, whereone gradient descent step intends to decrease the square of theprediction error of only one rating, that is wij−wij logwij . Weupdate the V and H in the opposite direction of the gradientdescent in each iteration [13].

C. Similarity Computation

The similarities of the latent features for different usersand services in matrix V and H can be calculated usinga correlation coefficient to measure the similarity. PearsonCorrelation Coefficient (PCC) is widely used for correlationcomputation in collaborative filtering [13], [24], [25]. Thecorrelation between users ui and uj is defined by performingPCC computation on their l-dimensional latent feature vectorsVi and Vj with the following equation [24]:

S(ui, uj) =

∑lk=1(vik − vi)(vjk − vj)√∑l

k=1(vik − vi)2

√∑lk=1(vjk − vj)2

(5)

where vi = (vi1, vi2, . . . , vil) is the latent feature vector ofuser ui and vik is the weight on the kth feature. vi is theaverage weight on the l-dimensional latent features for user ui.The similarity between two users S(i, j) falls into the interval[-1, 1], where a larger value indicates higher similarity.

0 1 2 3 4

Latent Feature Value

0

1

2

3

4

5

6

7

Cou

nt

(a) User Latent Feature Histogram

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Latent Feature Value

0

1

2

3

4

5

6

Cou

nt

(b) Service Latent Feature His-togram

Fig. 5: Latent Feature Histogram

PCC is also employed to compute the similarity betweenservice si and service sj as follows:

S(si, sj) =

∑lk=1(hik − hi)(hjk − hj)√∑l

k=1(hik − hi)2

√∑lk=1(hjk − hj)2

(6)

where hi = (hi1, hi2, . . . , hil) is the latent feature vector ofservice si and hik is the weight on the kth feature. hi is theaverage weight on l-dimensional latent features for service si.

PCC makes a number of assumptions that must be taken intoconsideration [26]. The variables, which are being correlated,must be approximately normally distributed. However, as canbe seen in Figure 5, which shows the histogram of latentfeature values for the users and services after conductingmatrix factorisation, it is clear that the distribution is notnormal. PCC also assumes that outliers are kept to a minimumor removed entirely. Figure 6 is a scatter plot that shows thecorrelation between two users and services. The figure showsthat in the latent features there can be a number of outliers,which effects PCC as it assumes a linear relationship.

As it is difficult to justify removing the latent outliers,we can use an alternative approach, in particular, a non-parametric correlation coefficient such as Kendall’s Tau, whichis much less sensitive to outliers [27]. Kendall’s tau measuresthe degree of monotonic relationship between variables, andcalculates the dependence between ranked variables, whichmakes it feasible for non-normally distributed data. The sim-ilarity between two users ui and uj is defined by performingKendall’s tau on their l-dimensional latent feature vectorsVi and Vj . Let (x1, y1), (x2, y2), . . . , (xn, yn) be a set of

0.0 0.5 1.0 1.5 2.0

User 5

0.0

0.5

1.0

1.5

2.0

Use

r1

(a) User Latent Feature

0.00 0.05 0.10 0.15 0.20 0.25

Service 5

0.0

0.1

0.2

0.3

0.4

0.5

Ser

vice

1

(b) Service Latent Feature

Fig. 6: Latent Feature Correlation

observations from Vi and Vj . Any pair of observations from Viand Vj are said to be concordant if the ranks for both elementsagree i.e. if both xi < xj and yi < yj , or if both xi > xjand yi > yj . They are said to be discordant, if xi > xj andyi < yj , or xi > xj and yi < yj . If xi = xj , or yi = yj , thepair is a tie [27].

τ =c− dc+ d

=S(n2

) =2S

n(n− 1)(7)

where,c = the number of concordant pairs,d = the number of discordant pairs.

If ties are present among the two ranked variables, thefollowing equation shall be used instead:

τ =S√

n(n− 1)/2− T√n(n− 1)/2− U

(8)

T =∑t

t(t− 1)/2 (9)

U =∑u

u(u− 1)/2 (10)

where,t = number of observations of variable x that are tied,u = number of observations of variable y that are tied.

The same formula can also be used to compute the similaritybetween the services through the use of concordant anddiscordant pairs.

D. Missing QoS Value Prediction

The similarity values can then be used to identify simi-lar neighbours to the current user by ordering the values.Kendall’s tau falls into the interval [-1, 1], where a positivevalue denotes similarity and a negative value denotes dissimi-larity. QoS usage experience from dissimilar users negativelyeffects the prediction accuracy, so users with a negative tauvalue are excluded from the similarity set. We employ onlythe QoS usage experience of users with the Top-K largest tauvalues for predicting the QoS of the user. The Top-K similarusers for user ui are defined as Ψi:

Ψi = uk|S(ui, uk) > 0, ranki(k) ≤ K, k 6= i (11)

where ranki(k) is the ranking position of user uk based onuser ui and K denotes the size of set Ψi. We can similarlydenote the Top-K IoT services for service sj as Φj by:

Φj = sk|S(sj , sk) > 0, ranki(k) ≤ K, k 6= j (12)

where rankj(k) is the ranking position of service sk basedon service si and K denotes the size of set Ψj . To predictmissing values for wij in the user service matrix, user-basedapproaches use the values from the Top-K similar users asfollows:

wij = wi +∑k∈Ψi

S(ui, uk)∑a∈Ψi

S(ui, ua)(wkj − wk) (13)

where wi and wk are the average observed QoS values ofdifferent services by users ui and uk respectively. For servicebased approaches, entry values of Top-K similar services areemployed for predicting the missing entry wij in a similarway:

wij = wj +∑k∈Φj

S(ij , ik)∑a∈Φj

S(ij , ia)(wik − wk) (14)

where wj and wk are the average observed QoS values ofdifferent services si and sk by different users respectively.The predicted values using Eq. 13 and Eq. 14 are combinedfor a more precise prediction in the following equation [28]:

w∗ij = λ× wuij + (1− λ)× ws

ij (15)

where wuij denotes the predicted value by user based approach

and wsij denotes the predicted value by the service based

approach. The parameter λ controls how much the hybridprediction relies on user based or service based approach. Wesummarize the proposed algorithm in Algorithm 1.

IV. EXPERIMENTAL SETUP

A. Dataset Description

To test our algorithm, we use an established dataset to con-trol for any differences in QoS that can be caused by invokingservices at different times. We use the dataset released byZheng et al. [9], which consists of a matrix of the responsetime and throughput of 339 users for 5,825 web services.Response time stands for the time duration between a usersending a request and receiving a response, while throughputdenotes the data transmission rate (e.g. kbps) of a userinvoking a service. The dataset has 1,974,675 observations forboth the response time and throughput dataset, which allowsfor a comprehensive evaluation of the algorithms. The reasonfor the use of a dataset instead of a testbed is due to the timevarying nature of QoS, which can change due to congestionon the network. This would mean that the algorithms would beevaluated using different values. The dataset also allows moreusers and services to be evaluated as there are no availabletestbeds that have access to 339 users and 5825 service, that

Algorithm 1 IoTPredict Algorithm

Input: W, l, λOutput: W ∗

1: Learn V and H by applying latent feature learning on W2: for all(ui, uj) ∈ U × U do3: calculate the similarity S(ui, uj) by Eq. 74: end for5: for all(si, sj) ∈ S × S do6: calculate the similarity S(si, sj) by Eq. 77: end for8: for all(i, j) ∈ Λ do9: construct similar set Ψi by Eq. 11

10: construct similar set φi by Eq. 1211: end for12: for all(i, j) ∈ Ω− Λ do13: calculate wu

ij by Eq. 1314: calculate ws

ij by Eq. 1415: w∗ij = λ× wu

ij + (1− λ)× wsij

16: end for

we know of. An established dataset not only allows for acomprehensive evaluation, but also allows future algorithmsto easily be compared to the existing state of the art.

As this dataset is for web services, which are usuallydeployed in the cloud, they have better response time thanmight be expected from low power devices. We use theHetHetNets traffic model to add heterogeneous traffic data tothe existing dataset [21]. The model provides a realistic andmanageable traffic model that can be applied in many contextssuch as Wi-Fi, ad-hoc and sensor networks. We add the trafficmodel to the original dataset and set the parameters to λ = 2,which is standard practice for modelling network traffic insensor networks and the IoT. Table I shows the descriptivestatistics of the dataset, which shows the difference in scaleof the datasets with the response time ranging from 0.002sto 27.285s and the throughput ranging from 0.004kbps to1000kbps. The two datasets are skewed, which can be seenfrom the skewness and kurtosis results.

As we have access to the full dataset we can remove apercentage of the values to use as the training set. We thenevaluate the algorithms by making predictions for the missingvalues in the dataset. This allows us to evaluate how thealgorithms perform for different matrix densities from 5-35%in our experiments using standard error metrics, which areintroduced in the following section.

B. Metrics

To evaluate the prediction accuracy of the proposed algo-rithms, we use standard error metrics: the Mean AbsoluteError (MAE), the Mean Relative Error (MRE), the Root MeanSquare Error (RMSE) and the Ninety-percentile Relative Error(NPRE). The NPRE is the 90th percentile of all the pairwiserelative errors.

MAE is defined as:

TABLE I: Descriptive Statistics of Dataset

Statistics Response Time ThroughputNumber of Observations 1,974,675 1,974,675Number of Invocations 1,873,838 1,831,253Number of Users 339 339Number of Services 5,825 5,825Min 0.002s 0.004kbpsMax 27.285s 1000kbpsMean 2.9s 47.56kbpsVariance 5.886s 12276kbpsSkewness 2.61s 4.96kbpsKurtosis 11.03s 28.58kbps

MAE =1

N

∑i,j

|wi,j − w∗i,j | (16)

MRE is defined as:

MRE = medianIij(t)=0

|wi,j(t)− w∗i,j(t)|wij(t)

(17)

and RMSE is defined as:

RMSE =

√1

N

∑i,j

(wi,j − w∗i,j)2 (18)

where wi,j is the QoS value of service cj observed by userui, w∗i,j denotes the predicted QoS value of the service cj byuser ui. N is the number of predicted QoS values. MAE givesequal weight to individual differences, RMSE on the otherhand gives large weight to extreme errors due to the squaringterm.

C. Performance Comparison

We compare the performance of our algorithm against twostate of the art approaches: non-negative matrix factorisation(NMF) [23] and personalised neighbourhood matrix factorisa-tion (PMF) [25]. These methods serve as a good comparison toour algorithm based on the results that were reported in theirpapers. These algorithms use PCC for similarity computation,which allows us to test the effect of Kendall’s Tau similaritymeasure.

V. RESULTS

A. Impact of Matrix Density

Matrix density has an obvious effect on accuracy, with theerror decreasing as the density of the matrix increases. Figure7a and 7b show the results of the Response Time dataset forall algorithms. Figure 7a shows that our algorithm performsslightly worse for low matrix densities (<= 10%), but shows alarge improvement for densities greater than that in the MRE.Figure 7b shows the RMSE for the algorithms, which assignsmore weight to outlying errors. In this figure, we can see thatour algorithm has marginally reduced error for all of the matrixdensities compared to the other approaches.

Figure 7c and 7d show the results for the throughput dataset,where we can see a greater difference in the results of the algo-rithms. Figure 7c shows a clear improvement in the IoTPredictalgorithm compared to the two other approaches for all of the

5% 10% 15% 20% 25% 30% 35%

Matrix Density

0.410

0.415

0.420

0.425

0.430

0.435

0.440

MR

EResponse Time

IoTPredict

NMF

PMF

(a)

5% 10% 15% 20% 25% 30% 35%

Matrix Density

1.85

1.90

1.95

2.00

2.05

2.10

RM

SE

Response Time

IoTPredict

NMF

PMF

(b)

5% 10% 15% 20% 25% 30% 35%

Matrix Density

0.35

0.40

0.45

0.50

0.55

MR

E

Throughput

IoTPredict

NMF

PMF

(c)

5% 10% 15% 20% 25% 30% 35%

Matrix Density

5.0

5.5

6.0

6.5

NP

RE

Throughput

IoTPredict

NMF

PMF

(d)

Fig. 7: Impact of Matrix Density on Response Time and Throughput

matrix densities in the MRE. Figure 7d shows the NPRE forthe algorithms and we can see that the IoTPredict algorithmhas worse performance for the lower matrix densities. As thematrix density increases the error in the IoTPredict algorithmreduces dramatically and has better performance than thecompeting approaches.

B. Impact of Dimensionality

Dimensionality is the number of latent features used bythe algorithm. In this section we investigate the impact thisparameter has on the prediction error. Figure 8a and 8bshow the impact of the latent features for 10% density onthe response time dataset. The error is minimised using 10latent features and as more latent features are added the errorincreases, indicating an overfitting problem.

Figure 8c and 8d show the impact of the latent features for90% density. In this case, the MAE and RMSE are minimisedwith around 20-30 latent features. The results in Section V-Aare conducted with 20 latent features, which reduces error formore dense matrices as shown by these results. The resultsalso show that the number of latent features can be tuned toimprove performance if the matrix density is already known, toavoid the under/overfitting problem. In this section we focuson detailed description of our own model parameters ratherthan a comparison against existing approaches.

C. Impact of Top-K

The Top-K is the number of similar users selected to makethe predictions for the individual user, which can impact theprediction accuracy of the approaches. Figure 9 shows how theerror changes for the 10% and 90% matrix densities as the topK users is increased from 2-50, with the dimensionality set to20 latent features. Figure 9a and 9b show the impact of thenumber of Top-K users chosen on the response time dataset.In this case, we see that the the error drops considerably asK increases from 2 to 10 in both the MAE and RMSE. AsK continues to increase, the error either remains constant orincreases slightly. This can be seen when the matrix densityis both at 10% and 90%.

In the throughput dataset, the response is slightly different.The 10% matrix density remains almost constant even as thenumber of Top-K users changes. The results for the 90%matrix density are interesting and show that the error increaseswith the number of Top-K users. The increased error is caused

by using values from an increased number of users that arenot similar, which decreases the prediction accuracy. Figure 9also shows the problem, when trying to optimise the algorithmfor one QoS factor, throughput, as it can make the algorithmless accurate for other QoS factors such as response time.

D. Overhead Introduced

The end-to-end response time from the user needs to takeinto account the overhead of the algorithm as well as theprediction accuracy. In this section we discuss the overheadintroduced by the algorithms. For each of the matrix densitieswe measure the time to train the model and to generatethe predictions for the missing values in the matrix. Theexperiment is repeated 20 times and the average value is taken.We use the same parameters for the overhead as the impactof matrix density results in Figure 7.

The results for the RT dataset are presented in Figure 10aand the results for the TP dataset are presented in Figure10b. Both of the datasets follow similar patterns with NMFintroducing the most overhead in both the RT and TP dataset.It can be seen that the overhead for NMF and PMF increasewith density however IoTPredict remains constant. There is atrade-off between PMF and IoTPredict, with PMF introducingless overhead for matrices with densities between 5-25% andIoTPredict introducing less overhead for matrices between 25-35% density.

VI. RELATED WORK

A key research question in the IoT is how to increase thereliability of applications that use the services provided in theenvironment [5]. A number of approaches have been proposedfor dynamic service composition [19], service selection [29],run-time QoS evaluation [30] and service ranking prediction[31]. These approaches assume that the QoS values are al-ready known, however, in reality, user side QoS may varysignificantly, given unpredictable communication links, mobileservice providers and resource constrained devices.

Approaches such as collaborative filtering have been used inother domains such as commercial recommender systems [24],[32], [33]. There are two main approaches for predicting valuesusing collaborative filtering, which can typically be classifiedas either memory or model based. Memory based approachesstore the training data in memory and in the prediction phasesimilar users are sorted based on the current user. There

10 20 30 40 50

Dimensionality

1.418

1.420

1.422

1.424

1.426

MA

EResponse Time

10% Density

(a)

10 20 30 40 50

Dimensionality

1.964

1.966

1.968

1.970

1.972

1.974

RM

SE

Response Time

10% Density

(b)

10 20 30 40 50

Dimensionality

1.308

1.310

1.312

1.314

1.316

MA

E

Response Time

90% Density

(c)

10 20 30 40 50

Dimensionality

1.775

1.780

1.785

1.790

1.795

RM

SE

Response Time

90% Density

(d)

Fig. 8: Impact of Dimensionality on Response Time at 10% and 90% density

0 10 20 30 40 50

Top-K

1.00

1.05

1.10

1.15

MA

E

Response Time

Matrix Density = 10%


(a)

0 10 20 30 40 50

Top-K

1.45

1.50

1.55

1.60

1.65

1.70

RM

SE

Response Time



(b)

0 10 20 30 40 50

Top-K

14

16

18

20

22

MA

E

Throughput



(c)

0 10 20 30 40 50

Top-K

37.5

40.0

42.5

45.0

47.5

50.0

52.5

RM

SE

Throughput



(d)

Fig. 9: Impact of Top-K on Response Time and Throughput

5% 10% 15% 20% 25% 30% 35%

Matrix Density

100

200

300

400

500

Tim

e(s

)

Response Time

IoTPredict

NMF

PMF

(a) Overhead by RT

5% 10% 15% 20% 25% 30% 35%

Matrix Density

100

200

300

Tim

e(s

)

Throughput

IoTPredict

NMF

PMF

(b) Overhead by TP

Fig. 10: Overhead Introduced

are a number of approaches that use neighbourhood basedcollaborative filtering, including user-based approaches [34],item-based approaches [35] and their combination [32]. VSS[34] and PCC [24] are often used in similarity computationmethods, in this paper, we have shown justification for theuse of Kendall’s Tau, with improved results against theseapproaches.

Model-based approaches, which employ a machine learningtechnique to train a predefined model from the training datasetshave become increasingly popular. Several approaches havebeen studied, including clustering models [36], latent factormodels [22] and aspect models [37]. Latent factor models,create a low-dimensional factor model, on the premise thatthere are only a small number of factors influencing the QoS[24]. The latent factor model has been extended to includeneighbourhood integrated matrix factorisation, which fusesneighbourhood-based and model-based approaches to achievehigher accuracy [38].

Both approaches can be combined to predict missing QoSvalues by using global information from MF and local in-

formation from similar users and items [13], [14]. Previousapproaches have focused on web services deployed in thecloud [39], [25], [13], [38], [40]. However these make up onlyone part of the IoT and we need to collect QoS informationand make predictions for all service types to create a reliableapplications that use all the services provided in the IoT.

VII. CONCLUSION AND FUTURE WORK

In the IoT, QoS predictions for individual users can be madeby exploring past usage experience from the individual userand similar users in the environment. The collaborative filter-ing approach is useful in an IoT environment as it requires noadditional invocations to generate the extra QoS information,which reduces the load on the network - a key challengein the IoT as the devices are usually resource constrained.We have proposed a framework and middleware to allowusers to contribute QoS information to enable collaborativefiltering. The values reported by users can then be used by ourIoTPredict algorithm to make QoS predictions for candidateservices. The prediction accuracy is higher compared to otherstate of the art approaches, while maintaining a low overhead.The algorithm can be used by a number of alternative servicecomposition frameworks to compose recommended servicesor the goal oriented approach as shown in Section II.

The IoT is highly dynamic as it includes a number of mobiledevices, which may be resource and power constrained. Incurrent experiments the users and services are static, which isone of the threats to validity. In future work, we will addmobility to both the users and services for a more robustevaluation.

ACKNOWLEDGMENTS

This work was funded by Science Foundation Ireland (SFI)under grant 13/IA/1885.

REFERENCES

[1] H. Bauer, M. Patel, and J. Veira, “The internet of things: Sizing up theopportunity,” McKinsey, 2014.

[2] D. Evans, “The internet of things: How the next evolution of the internetis changing everything,” CISCO white paper, vol. 1, p. 14, 2011.

[3] I. Lee and K. Lee, “The internet of things (iot): Applications, invest-ments, and challenges for enterprises,” Business Horizons, vol. 58, no. 4,pp. 431 – 440, 2015.

[4] A. Al-Fuqaha et al., “Internet of things: A survey on enabling tech-nologies, protocols, and applications,” IEEE Comm. Surveys Tutorials,vol. 17, no. 4, pp. 2347–2376, 2015.

[5] D. Miorandi et al., “Internet of things: Vision, applications and researchchallenges,” Ad Hoc Networks, vol. 10, no. 7, pp. 1497 – 1516, 2012.

[6] L. Zeng et al., “Qos-aware middleware for web services composition,”IEEE Transactions on Software Engineering, vol. 30, no. 5, pp. 311–327,May 2004.

[7] D. A. Menasc et al., “On optimal service selection in service orientedarchitectures,” Performance Evaluation, vol. 67, no. 8, pp. 659 – 675,2010.

[8] G. White, V. Nallur, and S. Clarke, “Quality of service approaches iniot: A systematic mapping,” Journal of Systems and Software, vol. 132,pp. 186 – 203, 2017. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S016412121730105X

[9] Z. Zheng, Y. Zhang, and M. R. Lyu, “Investigating qos of real-worldweb services,” IEEE Transactions on Services Computing, vol. 7, no. 1,pp. 32–39, 2014.

[10] Z. Ye et al., “Long-term qos-aware cloud service composition usingmultivariate time series analysis,” IEEE Trans. Serv. Comp., vol. 9, no. 3,pp. 382–393, 2016.

[11] A. Amin et al., “An automated approach to forecasting qos attributesbased on linear and non-linear time series modeling,” in 2012 Proceed-ings of the 27th IEEE/ACM ASE, 2012, pp. 130–139.

[12] A. Amin and A.Colman, “An approach to forecasting qos attributes ofweb services based on arima and garch models,” in IEEE 19th ICWS,2012, pp. 74–81.

[13] Y. Zhang, Z. Zheng, and M. R. Lyu, “Exploring latent features formemory-based qos prediction in cloud computing,” in 2011 IEEE 30thInternational Symposium on Reliable Distributed Systems, 2011, pp. 1–10.

[14] W. Lo et al., “An extended matrix factorization approach for qosprediction in service selection,” in 2012 IEEE Ninth InternationalConference on Services Computing, 2012, pp. 162–169.

[15] G. White, A. Palade, C. Cabrera, and S. Clarke, “Quantitative evaluationof qos prediction in iot,” in 2017 47th Annual IEEE/IFIP InternationalConference on Dependable Systems and Networks Workshops (DSN-W),June 2017, pp. 61–66.

[16] G. White, A. Palade, and S. Clarke, “Qos prediction for reliable servicecomposition in iot,” in Service-Oriented Computing: 15th InternationalConference, ICSOC 2017, Workshop (ISyCC), Malaga, Spain, November13-16, 2017, Proceedings. Springer International Publishing, 2017.

[17] R. Gemulla, E. Nijkamp, P. J. Haas, and Y. Sismanis, “Large-scalematrix factorization with distributed stochastic gradient descent,” inProceedings of the 17th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining, ser. KDD ’11. NewYork, NY, USA: ACM, 2011, pp. 69–77. [Online]. Available:http://doi.acm.org/10.1145/2020408.2020426

[18] J. Zhu, Z. Zheng, and M. R. Lyu, “Context-aware reliability predictionof black-box services,” CoRR, vol. abs/1503.00102, 2015. [Online].Available: http://arxiv.org/abs/1503.00102

[19] N. Chen, N. Cardozo, and S. Clarke, “Goal-driven service compositionin mobile and pervasive computing,” IEEE Transactions on ServicesComputing, vol. PP, no. 99, pp. 1–1, 2016.

[20] N. B. Mabrouk, S. Beauche, E. Kuznetsova, N. Georgantas, and V. Is-sarny, “Qos-aware service composition in dynamic service orientedenvironments,” in ACM/IFIP/USENIX International Conference on Dis-tributed Systems Platforms and Open Distributed Processing. Springer,2009, pp. 123–142.

[21] M. Mirahsan, R. Schoenen, and H. Yanikomeroglu, “Hethetnets: Hetero-geneous traffic distribution in heterogeneous wireless cellular networks,”IEEE Journal on Selected Areas in Communications, vol. 33, no. 10, pp.2252–2265, 2015.

[22] R. Salakhutdinov and A. Mnih, “Probabilistic matrix factorization.” inNips, vol. 1, no. 1, 2007, pp. 2–1.

[23] D. D. Lee and H. S. Seung, “Learning the parts of objects by non-negative matrix factorization,” Nature, vol. 401, no. 6755, pp. 788–791,1999.

[24] P. Resnick et al., “Grouplens: An open architecture for collaborativefiltering of netnews,” in Proceedings of the 1994 ACM Conference onComputer Supported Cooperative Work. ACM, 1994, pp. 175–186.

[25] Z. Zheng and M. R. Lyu, “Personalized reliability prediction of webservices,” ACM Trans. Softw. Eng. Methodol., vol. 22, no. 2, pp. 12:1–12:25, 2013.

[26] O. Hryniewicz and J. KArpinsKi, “Prediction of reliability–the pitfallsof using pearsons correlation,” Eksploatacja i Niezawodnosc, vol. 16,2014.

[27] M. G. Kendall, “Rank correlation methods.” 1948.[28] Z. Zheng, Y. Zhang, and M. R. lyu, “Cloudrank: A qos-driven com-

ponent ranking framework for cloud computing,” in Proceedings of the2010 29th IEEE Symposium on Reliable Distributed Systems, ser. SRDS’10. Washington, DC, USA: IEEE Computer Society, 2010, pp. 184–193.

[29] A. Yachir et al., “Event-aware framework for dynamic services discoveryand selection in the context of ambient intelligence and internet ofthings,” IEEE Transactions on Automation Science and Engineering,vol. 13, no. 1, pp. 85–102, 2016.

[30] G. Su, D. S. Rosenblum, and G. Tamburrelli, “Reliability of run-timequality-of-service evaluation using parametric model checking,” in Pro-ceedings of the 38th International Conference on Software Engineering.ACM, 2016, pp. 73–84.

[31] Y. Huang et al., “Time-aware service ranking prediction in the internetof things environment,” Sensors, vol. 17, no. 5, p. 974, 2017.

[32] H. Ma, I. King, and M. R. Lyu, “Effective missing data prediction forcollaborative filtering,” in Proceedings of the 30th Annual InternationalACM SIGIR Conference on Research and Development in InformationRetrieval. ACM, 2007, pp. 39–46.

[33] R. Burke, “Hybrid recommender systems: Survey and experiments,”User Modeling and User-Adapted Interaction, vol. 12, no. 4, pp. 331–370, 2002.

[34] J. S. Breese, D. Heckerman, and C. Kadie, “Empirical analysis ofpredictive algorithms for collaborative filtering,” in Proceedings of theFourteenth Conference on Uncertainty in Artificial Intelligence, ser.UAI’98, 1998, pp. 43–52.

[35] M. Deshpande and G. Karypis, “Item-based top-n recommendationalgorithms,” ACM Trans. Inf. Syst., vol. 22, no. 1, pp. 143–177, Jan.2004.

[36] G.-R. Xue et al., “Scalable collaborative filtering using cluster-basedsmoothing,” in Proceedings of the 28th Annual International ACM SIGIRConference on Research and Development in Information Retrieval.ACM, 2005, pp. 114–121.

[37] P. Singla and M. Richardson, “Yes, there is a correlation: - from socialnetworks to personal behavior on the web,” in Proceedings of the 17thInternational Conference on World Wide Web. ACM, 2008, pp. 655–664.

[38] Z. Zheng et al., “Collaborative web service qos prediction via neigh-borhood integrated matrix factorization,” IEEE Transactions on ServicesComputing, vol. 6, no. 3, pp. 289–299, July 2013.

[39] D. Geebelen et al., “Qos prediction for web service compositions usingkernel-based quantile estimation with online adaptation of the constantoffset,” Information Sciences, vol. 268, pp. 397 – 424, 2014.

[40] J. Xu, Z. Zheng, and M. R. Lyu, “Web service personalized qualityof service prediction via reputation-based matrix factorization,” IEEETransactions on Reliability, vol. 65, no. 1, pp. 28–37, March 2016.

Documents

IoTPredict: Collaborative QoS Prediction in IoTwhiteg5/pub/IoTPredict...IoTPredict: Collaborative QoS Prediction in IoT Gary White, Andrei Palade, Christian Cabrera, Siobh´an Clarke