7
978-1-4673-5828-6/13/$31.00 c 2013 IEEE Traffic Adaptive Channel Switching With Time Slice Based Predictors eza Szab´ o * , Gergely Pongr´ acz * , Mathias Sintorn * TrafficLab, Ericsson Research, Budapest, Hungary Ericsson, Stockholm, Sweden E-mail: [{geza.szabo, gergely.pongracz, mathias.sintorn}@ericsson.com] Abstract—Channel switching in HSPA networks is used to reduce the channel occupancy when there is no data transfer for the given user, this way reducing battery consumption. This paper is the first dealing with another important aspect that is the CPU load on the radio network controller (RNC) caused by channel switching. This is an optimization task, in which both the channel switching and staying on the high-bandwidth channel have costs. In this paper we propose a system to minimize the costs by applying a predictor based method which uses time slice based features in order to reduce the high variance in the feature values. The proposed system is evaluated and compared to other state-of-the-art methods. Keywords: deep packet inspection, channel switching, machine learning, optimization I. I NTRODUCTION In current High Speed Packet data Access (HSPA) net- works the usage of the highest bandwidth high speed (HS) channel (DCH) [1] is power and radio resource consuming, so when a user does not have an active burst (IP packet exchange) it is switched down to a less power consuming channel (FACH or URA PCH [1]) (see Figure 1). The switch between channels is done using inactivity timers, e.g., if there was no traffic for a given user equipment (UE) in the last 500 ms, that UE is switched down from DCH to FACH. When traffic increases, the UE is switched back to DCH. In state-of-the-art papers [2], [3] the UE is in focus and authors propose methods to optimize battery consumption. Our paper is the first dealing with another important aspect that is the CPU load on the radio network controller (RNC) caused by channel switching. There is a certain cost for switching channels on the RNC, so a switch to a lower channel which is followed by another burst is a waste of resources. Also in the other direction it would be useful to foresee the time of the next burst to be able to switch accordingly. Current timer based solutions use a static limit to control channel switching. This has a tradeoff: as switching consumes resources in the radio network, it should not be a too frequent event, for that reason a long downswitch timer would be needed. On the other hand being on the HS channel also consumes radio resources and causes RNC load. This means that after a burst ends, the best would be to switch down immediately. These contradictory requirements could only be met if the end of the burst could be precisely found. Our goal is to create a system capable of improving the current RNC capabilities – e.g., increased number of concurrently served users – by extending the radio network !"# %&’& (&’)* !"# +,- %&’& (&’)* %#& .&’’)(/ *&0!1" !"##$!%$&’ ’(# .&’’)(/ *&0!1" &()!"##$!%$&’ )*+, 2,*’3 -./0 /123. /04.5/604 -./0 71880/9 :00202 2,*’3 (;& </.60443:= <.>0/ >?0: 4>386?3:= Fig. 1. Channel switching model controller (RNC) functionality with a smart channel switching method. The following requirements have to be fulfilled by our system: The proposed method has to be at least as efficient in terms of CPU capacity saving as the current timer based channel switching solution. The proposed method can only introduce a limited calcu- lation complexity to the system. In this paper we propose to use prediction algorithms to have robust estimations on the traffic intensity in the next traffic bursts. The proposed method can decide whether to switch down or stay at the high-bandwidth channel to optimize RNC load. We also consider the calculation complexity of the prediction algorithms to make sure that the CPU load gain is much higher than the introduced calculation load of the prediction algorithm on the system. The main contribution of the paper is as follows. Proposal of a framework for dynamic DCH to URA PCH downswitch timer estimation. The detailed evaluation of the proposed framework and its components by comparing the proposed methods to state- of-the-art solutions emulated on the real-world traffic of an operational mobile broadband operator. The paper is structured as follows. In Section II a brief overview is given about state-of-the-art channel releasing tech- niques for energy saving. Section III-A discusses the basic approach which is refined in Section III-B. In Section III-B6 we evaluate the proposed system from several important per- spectives. Finally, Section IV concludes the paper.

Traffic Adaptive Channel Switching With Time Slice Based ...szabog/index_files/trafficadaptive.pdfswitched down from DCH to FACH. When traffic increases, the UE is switched back

  • Upload
    lamnga

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

978-1-4673-5828-6/13/$31.00 c©2013 IEEE

Traffic Adaptive Channel Switching With Time Slice

Based Predictors

Geza Szabo∗, Gergely Pongracz∗, Mathias Sintorn†

∗TrafficLab, Ericsson Research, Budapest, Hungary†Ericsson, Stockholm, Sweden

E-mail: [{geza.szabo, gergely.pongracz, mathias.sintorn}@ericsson.com]

Abstract—Channel switching in HSPA networks is used to

reduce the channel occupancy when there is no data transfer

for the given user, this way reducing battery consumption. This

paper is the first dealing with another important aspect that is

the CPU load on the radio network controller (RNC) caused by

channel switching. This is an optimization task, in which both

the channel switching and staying on the high-bandwidth channel

have costs. In this paper we propose a system to minimize the

costs by applying a predictor based method which uses time slice

based features in order to reduce the high variance in the feature

values. The proposed system is evaluated and compared to other

state-of-the-art methods.

Keywords: deep packet inspection, channel switching, machine learning, optimization

I. INTRODUCTION

In current High Speed Packet data Access (HSPA) net-works the usage of the highest bandwidth high speed (HS)channel (DCH) [1] is power and radio resource consuming, sowhen a user does not have an active burst (IP packet exchange)it is switched down to a less power consuming channel (FACHor URA PCH [1]) (see Figure 1). The switch between channelsis done using inactivity timers, e.g., if there was no traffic fora given user equipment (UE) in the last 500 ms, that UE isswitched down from DCH to FACH. When traffic increases,the UE is switched back to DCH.

In state-of-the-art papers [2], [3] the UE is in focus andauthors propose methods to optimize battery consumption. Ourpaper is the first dealing with another important aspect that isthe CPU load on the radio network controller (RNC) causedby channel switching. There is a certain cost for switchingchannels on the RNC, so a switch to a lower channel which isfollowed by another burst is a waste of resources. Also in theother direction it would be useful to foresee the time of thenext burst to be able to switch accordingly. Current timer basedsolutions use a static limit to control channel switching. Thishas a tradeoff: as switching consumes resources in the radionetwork, it should not be a too frequent event, for that reasona long downswitch timer would be needed. On the other handbeing on the HS channel also consumes radio resources andcauses RNC load. This means that after a burst ends, the bestwould be to switch down immediately. These contradictoryrequirements could only be met if the end of the burst couldbe precisely found.

Our goal is to create a system capable of improvingthe current RNC capabilities – e.g., increased number ofconcurrently served users – by extending the radio network

!"#$%&'&$(&')*$ !"#$

+,-$%&'&$(&')*$%#& $

.&'')(/$*&0!1"$ !"##$!%$&'

'(#$

.&'')(/$*&0!1"$ &()!"##$!%$&'

)*+,$

2,*'3$-./0$/123.$/04.5/604$-./0$71880/9$:00202$

2,*'3$(;&$</.60443:=$<.>0/$>?0:$4>386?3:=$

Fig. 1. Channel switching model

controller (RNC) functionality with a smart channel switchingmethod.

The following requirements have to be fulfilled by oursystem:

• The proposed method has to be at least as efficient interms of CPU capacity saving as the current timer basedchannel switching solution.

• The proposed method can only introduce a limited calcu-lation complexity to the system.

In this paper we propose to use prediction algorithms tohave robust estimations on the traffic intensity in the nexttraffic bursts. The proposed method can decide whether toswitch down or stay at the high-bandwidth channel to optimizeRNC load. We also consider the calculation complexity of theprediction algorithms to make sure that the CPU load gainis much higher than the introduced calculation load of theprediction algorithm on the system.

The main contribution of the paper is as follows.

• Proposal of a framework for dynamic DCH to URA PCHdownswitch timer estimation.

• The detailed evaluation of the proposed framework and itscomponents by comparing the proposed methods to state-of-the-art solutions emulated on the real-world traffic ofan operational mobile broadband operator.

The paper is structured as follows. In Section II a briefoverview is given about state-of-the-art channel releasing tech-niques for energy saving. Section III-A discusses the basicapproach which is refined in Section III-B. In Section III-B6we evaluate the proposed system from several important per-spectives. Finally, Section IV concludes the paper.

2

II. RELATED WORK

[4] investigated the UMTS discontinuous reception (DRX)mechanism for mobile station power saving. The DRX mech-anism is controlled by two parameters: the inactivity timerthreshold tI and the DRX cycle tD. The author proposesan M/G/1 queuing model for UMTS DRX, and an analyticanalysis and simulation model were proposed to study theoptimal tI and tD selections that maximize the power savingunder the given mean packet waiting time constraint. Theproposed method in the paper works with adjusting the timersin tiny steps and examining the effects on the power saving.The results in the paper depend on the applied traffic model.The author presumes stable traffic mixes for approx. 20 min-utes. We experienced that the traffic mix in our measurementschanged in the order of seconds thus such a method would benot agile enough to react on the traffic mix changes resultingin negligible power gain.

[5] applies an adaptive method to adjust the channelrelease timeouts for each user in a given cell to minimalizethe blocked calls in an UMTS system. The call admission isa simpler problem from the traffic characteristic point of viewthan modeling elastic internet traffic as apart from the callrequests intensity and sojourn time, the traffic is a data streamwith fixed rate. Further, their method could manipulate thenumber of users in the system by blocking calls or droppingan existing one. This is a different use case than us.

It is a matter of philosophy which node in the network isresponsible for radio resource management. Our point of viewis that the network side is the main coordinator, but there area couple of interesting papers which deal with terminal sideapproaches. Several papers propose to efficiently use the DCH-idle empty period by scheduling delay-tolerant traffic into it[6], [7] or make the traffic more bursty to make space forenergy saving [8]. [9] proposes an approach to estimate theinactivity timers based on feeding decision trees with featurevectors constructed from the function calls of the terminalsduring a specific application execution. Our aim was to makea terminal independent system and prove that deep packetinspection (DPI) deployed in the RNC can efficiently substitutesuch detailed information which available only if a modifiedterminal is applied.

A practical approach is described in [2], [3] in whichpower meters were used to measure the energy consumption ofsmartphones during using several applications and logging thechannel state as well. In [2] several parameter settings for the3G state machine were evaluated on analyzing real networktraces and a simple and practical improvement was proposedto improve energy saving during the reception of streamingtraffic.

The common in all of these papers is that the aim is tooptimize UE side battery consumption. Our goal is considerthe network side aspect, the CPU load on the RNC caused bychannel switching, thus the above methods cannot be applieddirectly.

III. THE CONCEPT OF THE SYSTEM

Our overall goal is to construct such a system on the RNCwhich is capable of answering a binary decision whether a user

curr/next state idle URA PCH FACH DCH

idle 0 11.5 11.5

URA PCH 0.000108 4.7

FACH 1 0.0303 3.4

DCH 3.4 4.7 0.7667

TABLE I. USED RNC COST PARAMETERS [CPU LOAD % / USER / SEC]

should leave the DCH channel due to lack of transmitted datain the near future or it is advisable to stay on. An unnecessaryswitch down and switch back up soon cause waste of CPUcycles instead of gain.

A straightforward approach was to experiment with es-timating the interarrival time between packets with variousmethods. This approach very soon turned out to be perfor-mance overkill due to the frequent arrival of the packets so westarted to come up with ideas with less calculation complexity.

A. Inter-burst time predictor based solutions

To reduce the number of data segments to process weconstructed bursts from the packets. A burst is a set ofconsecutive packets of a user within an interarrival less than500 msec.

1) Fixed down switch timer (ds10): The cost of the channelswitches and the maintaining of a specific channel, we used theRNC load parameters of the simulator used in [10] applied fora HSPA network. The RNC load parameters in Table I showan average CPU load ratio per user per second interval for oneCPU of an RNC in an average HS network. The radio trans-mission and the en- and decoding of the data to be transmittedrequires CPU capacity thus maintaining a certain channel (thediagonal in Table I) or switching between channels have alsocosts in terms of CPU capacity consumption in the RNC.The threshold of switchdown decision can be calculated fromTable I. The measured power consumption values show that thecost of a DCH to URA PCH downswitch and URA PCH toDCH upswitch is equal with the cost of staying for approx. 10seconds on the DCH channel.

2) Ideal case: In Section III-A1 we learned that a 10 sec-ond timeout minimizes the power consumption of the timeoutbased method. A more optimistic theoretical assumption iswhen a method foresees the events for the next 10 secondsproviding information if any data is transmitted. The evalua-tions in the paper use a 3 second timeout as the baseline. Theoperator used the 3 second timeout setup in the measurementwe recorded. This baseline is also supported by [11] in whichauthors measured the timeout values at 4 different operatorsranged from 2-3 secs. Our measurement used for evaluationis recorded in the 3G network of a mobile operator. Thebusy hour is recorded and we sampled the user base to resultin approx. 10k users which is statistically representative butstill manageable in terms of the calculation complexity of theemulation. For each user a packet log is recorded containingthe timestamp and the type of traffic it transmitted recognizedwith DPI.

Figure 3 shows the achievable power saving ratio in differ-ent use cases. The ’Cost ratio comp to baseline (DS3)’ columnin Figure 3 shows the gain of a specific method compared to

3

the 3 sec timeout case (DS3) which is calculated by

Σallusersuseri

(powercostds3)− Σallusersuseri

(powercostmethodeval)

Σallusersuseri

(powercostds3)

The ideal case can reduce the power consumption of theDS3 case to its 76%. This is a 24% on the DS3 case (’Gain’column) and considered the achievable maximum gain (’Gainratio comp to ideal’ column). ’DS10’ row shows that a 10 sectimeout results in a 13% gain comparing to the DS3 case andabout half of the theoretical maximum.

Figure 6 shows that there is a 10% of the users – therightmost segment of the green line showing the possible gain– have no chance for power saving as they are practicallyalways active. Note that the above average gains involve theseusers as well. As interesting information we examined theseusers, checked their activity and found the following:

• 2% of users: Constant update (in every 1 sec) of economicdata, diagrams of shares; Application refreshes diagramwith new TCP flow in every few seconds

• 10% of users: System messages originated from a mis-configured operation system; Either misconfiguration ormalware activity but OS is in host scanning state

• 1% of users: Net radio; Very low bandwidth activity butalways on

• 10% of users: Social hubs; Instant messenger applicationsand social networking sites e.g., Facebook and the useris actually active for a long time

• 1% of users: software-update in the background; Userswith plenty of apps

• 35% of users: file-sharing; High bandwidth demand con-stantly

• 5% of users: video-playback; Sticks on youtube and seethe videos from one after the other

• 2% of users: web-browsing; Very active web browsing,very few idle periods, which can be the result of AJAXsites triggering on demand object request like eBay

• Rest is not so exact or not known from DPI

3) Interarrival time between bursts estimation: To establisha method to come near to the theoretical maximum CPU ca-pacity saving we involved methods from the field of machine-learning (ML). One of our approaches was to estimate the exactvalue of the interarrival time between bursts (ITB) then basedupon a simply threshold it is decided whether downswitch isnecessary or not. In our channel switching model we switcheddown directly to URA PCH, the predictor does not use FACHstate as a possibility for a limited load gain in certain situations.Only network controlled methods are used in the model, thusterminal side battery saving methods e.g., fast dormancy1 arenot considered in the model. In our experiment we used thelinear regression module of weka [12]. Note that in all theML-based experiments we run the evaluation with severalparameter setups and the best performing is presented inthe papers. The threshold was the calculated 10 second (seeSection III-A1). For each burst we collected the followinginformation elements: transferred bytes, up/dn byte ratio, ITB,first packet size, last packet size, burst duration, up/dn packet

1Especially fast dormancy is likely to cause high signaling load in thenetwork thus it would have negative impact on reducing network node load.

time

Burst Burst Burst Burst

Predict time to next burst

Inter-arrival time prediction

timeTime

slices

Time-sliced based prediction

Predict activity until threshold time

transferred

data

transferred

data

Fig. 2. Comparison of the basic concepts of the interarrival time betweenburst and time sliced based methods

ratio, parallel flow number, DPI flags. The above informationelements define training vectors for ML-based approaches tobe used for the next ITB estimation. Figure 2 shows the mainconcept of the ITB based method. If the estimation from thelinear regression is bigger than 10 secs downswitch is decided.

’ITB estimation’ row in Figure 3 shows the best achievedgain we got from the linear regression module with variousparameters. The 32% loss compared to the simple DS3 casereflected us that the ITB is a highly variable parameter anddifficult to estimate.

4) Binary decision based on ITB estimation: Anotherapproach to relax the issue of the high variance of theITB is to reduce the expected estimation output to abinary decision instead of deriving information froma continuous output as in the linear regression case.To perform this supervised classification methods canbe also applied. We applied [13] with the followingparameters: $1 --learnertype TreeLearner

--baselearnertype SingleStumpLearner 7,which means that we used a tree based strong learner, astump based weak learner with a default depth setup (7) andthe ’Model size’ column in Figure 3 holds the applied modelsize. The model size is the number of boosting iterationsapplied on the training data. In our dataset 10 represents alimited model size capable of generalization from the trainingdata but still avoiding overfitting. (The ideal model sizecan be selected by tracking the error rate evolution of thetraining. If the decrease of error starts to shrink it means thatwe reached the ideal model size. Increasing the number ofboosting iteration further can result in the perfect learningof the training data set resulting in absolute overfitting.) Theadvantage of ensuring the model generalization capability isthat we no longer need to follow the usual 1:1 training:testingratio feeding to the booster. Our dataset is a timeseriesrepresenting changing user behavior in time.

We need coherent data to be able to calculate power cost.This excludes the random sampling of the data. It is also clearthat e.g., the first 5 minutes of user is not representative to thewhole lifetime of the user as various applications can be usedin various network conditions. We trained the whole dataset ofeach user to one booster per user and tested on the same data.In this way we demonstrated the learnability of the dataset and

4

Test scenario

ITB / bandwidth, active,

inactive period length

estimator

Final decision

method

Model

size

Training data

ratio of total

data

Cost ratio

comp to

baseline

(DS3)

Gain

(1-x)

Gain ratio

comp to

ideal

ideal 0.76 0.24 100.00

DS 10 0.89 0.11 45.27

ITB estimation weka.LinearRegression simple limit 1.32 -0.32 -133.33

binary decision based on

ITB estimationweka.LinearRegression multiboost 10 100% 1.09 -0.09 -37.45

rnn estimation matlab.ESN_toolbox simple limit 0.90 0.10 41.15

binary decision based on

basic time slice informationmultiboost 10 100% 1.07 -0.07 -28.81

training with channel state

simulationmultiboost 10 100% 0.87 0.14 55.56

binary decision based on

rnn estimationmatlab.ESN_toolbox multiboost 10 100% 0.85 0.15 62.55

without DPI matlab.ESN_toolbox multiboost 10 100% 0.85 0.15 62.55

exp smoothing exp.sm. simple limit 0.92 0.08 31.69

binary decision based on

exp smoothing estimationexp.sm. multiboost 10 100% 0.85 0.16 63.79

big model matlab.ESN_toolbox multiboost 50 100% 0.84 0.16 66.26

big model, 1:1 train:test matlab.ESN_toolbox multiboost 50 50% 0.85 0.15 63.37

adaptive tree matlab.ESN_toolbox moa.OzaBagAdwin 10 100% 0.94 0.06 24.69

Time slice based methods

ITB based methods

Fig. 3. Summary of test scenario results

the achievable best estimation. ’Binary decision based on ITBestimation’ row in Figure 3 shows the achieved gain which isstill a 9% loss compared to the DS3 baseline.

B. Time sliced based prediction

The experiments with the ITB based method showed thatthe data is difficult to learn. Further, the goal of the learner isto minimize the error by estimating the proper class whetherto downswitch or staying on the DCH channel but the effectof the decision is only later weighted by the cost functions andthe channel state-machine thus the fine-tuning of the methodwas necessary. We decided to smooth the input data to thelearner by defining time slices and collecting the statistics inthe time slices.

1) Rnn estimation: Our first approach with the time slicedbased prediction was to get similar information from thesystem as it was done during the ITB based approach, thusestimate an ITB for the next burst and make a simple thresholdbased decision upon. We applied a recurrent neural network(RNN) approach [14]. RNN provides online training mode andthe internal states make the method statefull which is usefulfor timeseries estimation. The input vectors consisted of thefollowing features:

• Counter-based features referring a specific time slice:transferred bytes, transferred packets, number of parallelflows

• DPI information fed as flags: a flag is set to 1 if currenttime slice contains that kind of traffic

• TCP activity flags: if active TCP transmission is ongoingthen the flag signs this with value 1

The expected outcome of the system was to estimate if thereis any traffic in the next 10 seconds in every time slice. Ifno data is estimated then a downswitch is done. There is noupswitch till the arrival of the next packet.

’Rnn estimation’ row in Figure 3 shows the best achievedgain with various settings (different reservoir sizes, reservoirconnectivity, etc.). We also extended the estimation with a

10 sec timeout, meaning that after 20 pieces of 500 msectimeslice, a switchdown is done. A 10% gain is achieved withthis method over the DS3 case. This is an achievement howeverthe DS10 case still performs better without any change on thecurrent system.

Note that the periodic checking of the system is a sig-nificant advantage over the ITB based method. Comparingto the ITB based solution, the time slice based approach– besides making prediction easier – decreases the impactof false decision. It can happen that there is not enoughinformation in the beginning of the ITB to decide well. ITBqueries the first slice only once at the beginning of the ITB.If it is false, then the prediction for the whole ITB will befalse. The time slice based method can become sure whetherchannel switch is necessary after 1-2-3 slices.

2) Binary decision based on basic time slice information:We also had experiments with boosting used on the sametraining vectors as in the RNN case. The boosting was setupwith the same parameters as in Section III-A. ’Binary decisionbased on basic time slice information’ row in Figure 3 showsthat a 7% loss occurred comparing to the DS3 case. Examiningthe cause of the loss we found that the learner can be trainedwith different strategies. The selection of the input vectorsaffects the constructed model significantly.

a) Feed DT with all the timeslices as it is: In thiscase the input represents the distribution of active/inactiveperiods. Either the active or inactive time slices dominated thetraining data while the number of those events which triggereda necessary decision was low. This resulted in that the learnermade clear distinction between the active and inactive periodsof the user simply by the existence of transmitted data. Thelearner neglected the transient periods due to the small negativeeffect on the overall error ratio. This made it clear that we needto emulate the channel state machine during the training as welland we should only feed the learner with data on non-triviastates.

b) Feed DT with only those timeslices where decisionis necessary: This approach is similar to ITB method. The

5

prev_

rnn

prev_

in /

activ

e!pre

dictio

n

prev_

block

!counte

r

prev_

tcpac

tive

prev_

byte

prev_

packet

prev_

paralle

l

prev_

last

_byt

e

prev_

last

_pack

et

prev_

adve

rtise

ment

prev_

email

prev_

file

downlo

ad

prev_

inst

ant m

essagi

ng

prev_

media

pla

yback

prev_

soci

al netw

orkin

g

prev_

soft

war

e update

prev_

syst

em

prev_

video

playb

ack

prev_

web

brow

sing

prev_

tcp

prev_

udp

prev_

other

rnn

in /

activ

e!pre

dictio

n

block

!counte

r

tcp_act

ive

bytepack

et

paralle

l

advert

isem

ent

email

file

downlo

ad

inst

ant m

essag

ing

media

pla

ybac

k

socia

l netw

orkin

g

softw

are u

pdate

syst

em

video

playb

ack

web

brow

sing

tcp

udpoth

er

clas

s

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8.01 1 1 0 6.57 2.08 4 0 0 0 0 0 0 0 1 0 0 0 1 1 d

8.01 1 1 0 6.57 2.08 4 5.8 1.1 0 0 0 0 0 0 0 1 0 0 0 1 1 8.23 0 2 0 5.54 1.61 2 0 0 0 0 0 0 0 1 0 0 0 1 1 d

8.23 0 2 0 5.54 1.61 2 6.57 2.08 0 0 0 0 0 0 0 1 0 0 0 1 1 6.55 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d

6.55 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5.65 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d

5.65 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6.31 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d

6.31 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8.71 3 1 0 5.8 1.1 1 0 0 0 0 0 0 0 1 0 0 0 1 0 d

8.71 3 1 0 5.8 1.1 1 5.54 1.61 0 0 0 0 0 0 0 1 0 0 0 1 0 8.1 2 2 0 5.09 1.1 1 0 0 0 0 0 0 0 1 0 0 0 1 0 d

8.1 2 2 0 5.09 1.1 1 5.8 1.1 0 0 0 0 0 0 0 1 0 0 0 1 0 9.28 1 3 0 5.09 1.61 2 0 0 0 0 0 0 0 0 0 0 1 1 0 d

9.28 1 3 0 5.09 1.61 2 5.09 1.1 0 0 0 0 0 0 0 0 0 0 1 1 0 7.57 0 4 0 5.35 1.79 1 0 0 0 0 0 0 0 0 0 0 1 0 0 d

7.57 0 4 0 5.35 1.79 1 5.09 1.61 0 0 0 0 0 0 0 0 0 0 1 0 0 8.15 0 5 0 5.55 1.95 2 0 0 0 0 0 0 0 0 0 0 1 1 0 d

8.15 0 5 0 5.55 1.95 2 5.35 1.79 0 0 0 0 0 0 0 0 0 0 1 1 0 7.13 0 6 0 3.83 1.1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 d

7.13 0 6 0 3.83 1.1 1 5.55 1.95 0 0 0 0 0 0 0 0 0 0 0 1 0 6.11 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d

6.11 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9.03 6 1 0 5.6 1.61 2 0 0 0 0 0 0 0 1 0 0 0 1 0 d

Active

Active

Inactive

Inactive

Fig. 4. An example of the input vector of the proposed system

estimation is done in the first slice of inactive periods. Thedrawback of the strategy is that the input will not includeregular data traffic. This strategy results in less willing to keepthe channel up.

3) c) Training with channel state simulation: Our proposalis to simulate channel state machine and feed DT accordingly.This approach simulates the effect of regular downswitchtimeout. It avoids to feed the training with lot of inactiveslices when the user gets detached for e.g., half a day. Theother important feature is the training data should not containsuch input which can be observed only in the ideal case: theexpected output of the prediction is the state in the next 10 secsof the system. In the ideal case there is no hint that therewill be data transmission but the expected output turns intoindicating active period. In practice there is no such systemwhich provides valuable output in this case thus if the systemis in idle state the system is made to ignore the training input.This approach is the most effective in training the final decisiontree.

’Training with channel state simulation’ row in Figure 3shows that with this latest strategy 14% is achieved over theDS3 case which is higher than the gain in the DS10 case.

4) Binary decision based on RNN estimation: We intendedto involve the statefullness of the RNN based method ofSection III-B1 to further improve the power saving gain. Wedefined the following 4 parameters as input to the learner.

• Bandwidth estimation by a RNN based on the basicfeatures and DPI, TCP activity, block counter. The realvalue is fed back to the system immediately in the nexttime slice.

• Active period estimation: Beginning of each active perioda RNN is queried to estimate the length of the currentactive period based on the previous observations. Theinformation is updated from time slice to time slice(practically decreasing the estimation).

• Inactive period estimation: Beginning of each inactiveperiod a RNN is queried to estimate the length of the cur-rent inactive period based on the previous observations.

The information is updated from time slice to time slice(practically decreasing the estimation).

• Block counter: shows the elapsed time since the start ofcurrent consecutive active or inactive blocks

As boosting is completely stateless by default we fedinput vectors containing the features for the current timeslice and the one before. This puts the input vector into amore statefull context becoming more aware whether therewas switch from/to active/inactive states. The summary of thefeatures can be seen in Figure 4.

5) Summary of the proposed system: Figure 5 shows themodular view of our proposed multilayer method. At eachpacket there are measurements running to calculate basic datafor traffic features (like measuring packet arriving time for theaverage inter-arrival time feature). At the end of each timeslice(currently 500 ms) the timer hits and the feature calculatorstarts calculating the features from the data collected by themeasurement module. Note that this is a parallel activity: themeasurement module does not stop for the calculation period.Also note that the timer runs per user this way we avoid burstsin feature generation. When the feature generator is done, thebasic features are send to the layer 1 predictor (which is in ourcase a recurrent neural network, but other implementations arealso possible) and also directly to the layer 2 predictor. Thelayer 1 predictor calculates three constructed features only forthe next timeslice: burst length, silence length and bandwidth.Out of the basic features coming from the feature calculatorand the constructed features coming from the layer 1 predictorthe final decision is made in the layer 2 predictor (which is inour case a dynamic decision tree). Here the final decision ismade on the traffic adaptive channel switching window level,meaning that we aim to predict whether there is any traffic inthe next window. Window is defined as the time which mustbe spent in lower power level state in order to make a channelswitch worthy.

6) Discussion on the performance of the proposed system:’Binary decision based on RNN estimation’ row in Figure 3shows that 16% gain can be achieved comparing to the DS3

6

packet

Measurement

module

(runs at each

arriving packet)

Feature

calculator

module

(runs at the end

of each timeslice)

Layer 1 predictor

Burst length

Silence length

Bandwidth

(predicts for next slice)

Layer 2

predictor

Final decision

(predicts for

next slice)

Fig. 5. The proposed system

scenario and approx. 63% gain of the ideal case is achieved.

Figure 6 shows a per user gain ratio with the pro-posed system and the ideal case. X axis is the user ID,while Y axis represents the gain compared to the DS3case. The green line is the optimal case ((powercostideal −powercostds3)/powercostds3), the blue dots are the per usergains ((powercostpred − powercostds3)/powercostds3) withthe proposed timeslice based traffic adaptive channel switchingmethod. Note that the users are in an ordered list based on theiroptimal gain.

The total CPU load gain of the RNC is the difference of theload gain caused by our proposed system and the calculationload of the algorithm. We estimated the CPU and memorycontroller (MC) load of the algorithms based on [15]. The finalproposed system contains RNN and decision tree as well, thustheir load needs to be added. Considering an average 3G setupthe CPU load of the RNC caused by our method is 1.4%.This can vary based on the RNC setup and load. The loadgain of our proposed system is only user behavior dependent,thus can be considered fix in this calculation. In summary, the14.6% load gain can be achieved on an average RNC with ourproposed system.

Another aspect that is important from the point of viewof the operator is the parallel number of users on the DCHchannel. Usually the network nodes are sold with licenses for acertain capacity e.g., the number of parallel users in the system.The downswitch timer has significant impact on these numbers.Figure 7 shows the parallel number of users on DCH channelin our measurement, in which a user is considered active if ithas PDP context and occupies DCH channel due to its recentdata transmission activity. It is clearly visible that the DS3ensures the lowest number of parallel users which operatorsprefer in their current setups. DS10 case which would result inthe lowest RNC load provides the highest parallel number ofusers. The optimal solution both minimizes the RNC load andensures low number of parallel users. Our proposed method(’RNN’ line in Figure 7) ensures a certain trade-off with alower number of parallel number of users than the DS10 caseand also lower RNC load than the DS3 case.

C. Further test scenarios

We experienced with further test scenarios to further in-crease the power gain with either more accurate estimation orapplying less calculation complex estimators.

First we removed the DPI information from the wholesystem. We were convinced that DPI provides important infor-mation to the ML-algorithms as one useful feature of boostingis that it practically provides a feature vector selection duringthe learning as the features in every learning iteration areselected based on their information content. DPI flags werealways present among the most important features. On the

Fig. 6. Per user gain ratio with the proposed system /Green line shows thegain with the ideal solution, Blue dots shows the predictor based gain

other hand calculating the cost gain with system the DPIflags removed the overall gain on the DS3 case was also 15%(’without DPI’ row in Figure 3). The learned lesson is that DPIis useful for the ML-algorithms but there are other features inthe system that can effectively substitute them. Without DPIthe RNC can save CPU usage2.

We made experiments to substitute the complex RNN algo-rithm with a simpler one e.g., an exponential smoothing. First,bandwidth estimation was done with exponential smoothingwith simple threshold (like in Section III-B1). The overall gainwas 8% (’exp smoothing’ row in Figure 3) comparing to theDS3 case, which is only slightly worse than the RNN case(’rnn estimation’ 10%). Second the RNN in the final proposedsystem were substituted by exponential smoothing algorithmsand tested showed that the gain comparing to the DS3 casedropped only by 1% from 16% to 15%. Figure 7 ’shaper’ lineshows that load saving is paired with decreased parallel numberof users on DCH channel which can be considered a furtheradvantage of this simple algorithm beside the low calculationcomplexity. The calculation load of the shaper based algorithmrequires practically the execution of the decision tree with anestimated 0.1% CPU load. In total 14.9% load gain can berealized. This is also important from an industrial point ofview that the calculation and implementation complexity ofexponential smoothing comparing to an RNN is significantlylower.

We experimented with increasing the model size of thelearner. It is important to note that such an increase of modelsize would result in overfitted models. The ’big model’ rowin Figure 3) comparing to the DS3 case shows an 1% gainincrease compared to the ’binary decision based on RNNestimation’ case resulting in 16% gain in total. It was alsochecked what happens if the training and testing data is usedin 1:1 ratio. The ’big model, 1:1 train:test’ row shows an 1%drop comparing to the ’big model case’ which shows clearlythat both the overfitting of the model and the user behaviorchange have negative effect on the overall performance.

2Note that though we considered the calculation cost of DPI as zero inthe paper as DPI can be either supported with hardware elements or the DPIinformation can be also gathered from other nodes of the network, DPI stillhas some calculation cost.

7

0

1000

2000

3000

4000

5000

6000

7000

1

185

369

553

737

921

1105

1289

1473

1657

1841

2025

2209

2393

2577

2761

2945

3129

3313

3497

Time (sec)

# o

f users

on D

CH

ds3

ds10

shaper

rnn

optMethod Avg. user# Gain (x/DS3 ratio)

DS 3 4477.97 1

DS 10 5633.83 0.2581

shaper 5030.96 0.1234

rnn 5429.11 0.2124

optimal 4720.74 0.0542

Fig. 7. Parallel number of users on DCH channel

Finally, we tested online decision tree builder algorithmsto see how a system without training phase could perform.We used the moa package [16] with the following parameters’-javaagent:sizeofag.jar moa.DoTask

"EvaluatePrequential -l OzaBagAdwin -s

(ArffFileStream -f input) -f 100 -q 100’.’Adaptive tree’ row in Figure 3 shows 6% gain over theDS3 case which is a big drop in accuracy with the samearchitecture that could achieve the 15%.

IV. CONCLUSION

In this paper we proposed a system to decrease theCPU load of an RNC by applying a predictor based methodto estimate an adaptive downswitch timer for the DCH toURA PCH switches. This is an optimization task, whereboth the channel switch and staying on the high-bandwidthchannel have costs. Our proposed system works in a multilayerpredictor architecture. After the traffic measurement, the basicfeatures are forwarded to the layer 1 predictor, which is inour case a RNN and also directly to the layer 2 predictor.The layer 1 predictor calculates three constructed features forthe next timeslice: burst length, silence length and bandwidth.Layer 2 predictor is responsible to determine a final decisionbased on the basic features coming from the feature calculatorand the constructed features coming from the layer 1 predictor.The granularity of the final decision is the size of trafficadaptive channel switching window. Window is defined as thetime must be spent in lower power level state in order to makea channel switch worthy. The proposed system predicts if thereis any traffic in the forthcoming window. The proposed systemincludes a training setup to maximize the performance of theML-based methods by simulating the DCH state machine andfeeding the training accordingly.

The proposed method evaluated with a operational mobilenetwork measurement and results showed that the achievableCPU load gain in the RNC is approx. 15% comparing to CPUload of the default 3 sec downswitch case which is 63% ofthe possible gain comparing to the ideal case. The proposedmethod also ensures a low number of parallel users on theDCH channel which is important for the operators.

REFERENCES

[1] “3GPP,” http://www.3gpp.org/.

[2] F. Qian, Z. Wang, A. Gerber, Z. M. Mao, S. Sen, and O. Spatscheck,“Characterizing Radio Resource Allocation For 3G Networks,” inProceedings of the 10th ACM SIGCOMM conference on Internet

measurement, ser. IMC ’10. New York, NY, USA: ACM, 2010,pp. 137–150. [Online]. Available: http://doi.acm.org/10.1145/1879141.1879159

[3] L. Zhang, B. Tiwana, R. Dick, Z. Qian, Z. Mao, Z. Wang, andL. Yang, “Accurate Online Power Estimation And Automatic BatteryBehavior Based Power Model Generation Ffor Smartphones, year=Oct.,pages=105-114,,” in Hardware/Software Codesign and System Synthesis

(CODES+ISSS), 2010 IEEE/ACM/IFIP International Conference on.

[4] S.-R. Yang, “Dynamic Power Saving Mechanism for 3G UMTSSystem,” Mob. Netw. Appl., vol. 12, no. 1, pp. 5–14, Jan. 2007.[Online]. Available: http://dx.doi.org/10.1007/s11036-006-0002-0

[5] F. Liers and A. Mitschele-Thiel, “Umts data capacity improvementsemploying dynamic rrc timeouts,” in Personal, Indoor and Mobile

Radio Communications, 2005. PIMRC 2005. IEEE 16th International

Symposium on, vol. 4, sept. 2005, pp. 2186 –2190 Vol. 4.

[6] H. A. Lagar-Cavilla, K. Joshi, A. Varshavsky, J. Bickford, and D. Parra,“Traffic backfilling: Subsidizing lunch for delay-tolerant applicationsin umts networks,” in Proceedings of the 3rd ACM SOSP Workshop

on Networking, Systems, and Applications on Mobile Handhelds, ser.MobiHeld ’11. New York, NY, USA: ACM, 2011, pp. 11:1–11:5.[Online]. Available: http://doi.acm.org/10.1145/2043106.2043117

[7] A. Schulman, V. Navda, R. Ramjee, N. Spring, P. Deshpande,C. Grunewald, K. Jain, and V. N. Padmanabhan, “Bartendr: Apractical approach to energy-aware cellular data scheduling,” inProceedings of the sixteenth annual international conference on

Mobile computing and networking, ser. MobiCom ’10. NewYork, NY, USA: ACM, 2010, pp. 85–96. [Online]. Available:http://doi.acm.org/10.1145/1859995.1860006

[8] V. Looga, X. Yu, O. Zhonghong, and A. Yla-Jaaski, “Exploiting trafficscheduling mechanisms to reduce transmission cost on mobile devices,”in Wireless Communications and Networking Conference (WCNC),

2012 IEEE, april 2012, pp. 1766 –1770.

[9] “Z. Zhao, J.Zhao: Cut the Tail: Mobile Energy Saving Using RadioTail Prediction, for EECS589, Project Report,” retrieved: Jan, 2013.[Online]. Available: http://www-personal.umich.edu/∼zhezhao/papers/EECS589.pdf

[10] N. Reider, A. Racz, and G. Fodor, “On scheduling and power con-trol in multi-cell coordinated clusters,” in Global Telecommunications

Conference, 2009. GLOBECOM 2009. IEEE, 30 2009-Dec. 4, pp. 1–7.

[11] P. H. J. Perala, A. Barbuzzi, G. Boggia, and K. Pentikousis, “Theory andpractice of rrc state transitions in umts networks,” in Proc. of 5th IEEE

Broadband Wireless Access Workshop co-located with IEEE Globecom,

BW-WAWS, Honolulu, HI, USA, Nov. 2009.

[12] “Weka 3: Data Mining Software in Java,” retrieved: Oct, 2011.[Online]. Available: http://www.cs.waikato.ac.nz/ml/weka/

[13] “MultiBoost: Data Mining Software in Java,” retrieved: Jun, 2012.[Online]. Available: http://mloss.org/software/view/246/

[14] “Echo State Network Toolbox,” retrieved: Jun, 2012. [Online].Available: http://sourceforge.net/projects/esnbox/

[15] “Lists of instruction latencies, throughputs and micro-operationbreakdowns for Intel, AMD and VIA CPUs,” retrieved: March, 2013.[Online]. Available: http://www.agner.org/optimize/instruction tables.pdf

[16] “MOA Massive Online Analysis: Real Time Analytics for DataStreams,” retrieved: Jun, 2012. [Online]. Available: http://moa.cs.waikato.ac.nz/