Technical Report - University of Minnesota · Understanding the complexity of 3G UMTS network performance - Modeling, Prediction, and Diagnosis Technical Report Department of Computer

Understanding the complexity of 3G UMTS network performance - Modeling, Prediction, and Diagnosis

Technical Report

Department of Computer Science

and Engineering

University of Minnesota

4-192 Keller Hall

200 Union Street SE

Minneapolis, MN 55455-0159 USA

TR 12-019

Understanding the complexity of 3G UMTS network performance -

Modeling, Prediction, and Diagnosis

Yingying Chen, Nick Duffield, Patrick Haffner, Wen-ling Hsu, Guy

Jacobson, Yu Jin, Subhabrata Sen, Shobha Venkataraman, and Zhi-li

Zhang

July 30, 2012

Understanding the Complexity of 3G UMTS NetworkPerformance - Modeling, Prediction, and Diagnosis

Yingying Chen2, Nicholas Duffield1, Patrick Haffner1, Wen-Ling Hsu1, Guy Jacobson1, Yu Jin1, Subhabrata Sen1,

Shobha Venkataraman1, and Zhi-Li Zhang2

1AT&T Labs Research2University of Minnesota-Twin Cities

ABSTRACTWith rapid growth in smart phones and mobile data, effec-tively managingcellular datanetworksis important in meet-ing user performance expectations. However, thescale, com-plexity and dynamics of a large 3G cellular network makeit a challenging task to understand the diverse factors thataffect its performance. In this paper we study the RNC(Radio Network Controller)-level performancein oneof thelargest cellular network carriers in US. Using large amountof datasetscollected fromvarious sourcesacrossthenetworkand over time, we investigate the key factors that influencethe network performance in terms of the round-trip timesand lossrates (averaged over an hourly timescale). We startby performing the “first-order” property analysis to analyzethe correlationandimpact of each factor onthenetwork per-formance. We then apply RuleFit – a powerful supervisedmachine learning tool that combines linear regression anddecision trees – to develop models and analyze the relativeimportance of various factors in estimating and predictingthe network performance. Our analysis culminates with thedetection and diagnosis of both “ transient” and “persistent”performance anomalies, with discussion onthe complex in-teractions and differing effects of the various factors thatmay influencethe3G UMTSnetwork performance.

1. INTRODUCTIONThe wide adoption of smart phones and other mo-

bile devices such as smart tablets and e-readers hasspurred rapid growth in mobile data. In order to meetuser performance expectations and enhance user expe-riences, effectively managing cellular data networks isimperative: for example, quickly trouble-shooting per-formance issues as they arise, or adding additional ca-pacity where the network elements are overloaded. Dueto the sheer scale, complexity and dynamics of a typi-cal large scale cellular data network, there is a myriadof diverse factors that may affect the performance of acellular data network, from the types, geographical lo-cations and coverage of various network elements (e.g.,cell towers/base stations, radio network controllers, IP

gateways or routers) within the network infrastructure,to the number of users served by each network element,the amount of traffic generated by the users, user us-age patterns and behaviors (e.g., mixtures of dominantapplications) across different times of days and weeks,to user handset side as well as the (application) serverside issues.In this paper, utilizing the massive amount of per-

formance and other data collected in one of the largestcellular network carriers in US, we set out to analyzeand understand the various key factors that may influ-ence the network performance in a large UMTS cellulardata network. For instance, we are interested in un-derstanding how the geographical coverages of nodeBs(i.e.,, base stations; please refer to Section 2 for a quickoverview of the UMTS architecture and its basic ter-minologies) and their distances to corresponding RNCsand/or SGSNs in the radio access network subsystemsin general affect the (average) round trip delays experi-enced by users, how the overall system utilization (e.g.,in terms of overall packet, byte or flow counts), thenumber of subscribers and their usage patterns (e.g.,the application mix) may have an impact on the (aver-age) loss rates across the network, how the time-of-dayor day-of-week effect correlate with the overall networkperformance over time, and so forth. Furthermore, weare interested in dissecting how the confluence of vari-ous factors may interact or counteract with each otherin different parts of the network, thus having a differingeffect on the overall network performance across the net-work. For these purposes, we aim to build macroscopicmodels that can help identify and access the various ma-jor factors that may (or may not) influence the networkperformance, and predict their effects across the net-work and over time. (This is as opposed to microscopicmodels that target specific network elements or users,e.g., to trouble-shoot (transient or persistent) perfor-mance issues experienced by certain network elements(e.g., cell towers) or users. Building effective micro-scopic models requires far more detailed, fine-grained

1

(and perhaps also lower-level) measurement data.) Ourgoal is to provide network operators with a “big pic-ture” understanding of the major factors that influencethe overall network performance across different partsof the network and over longer time periods.To understand the diverse factors that may influence

the network performance, we gather and collect infor-mation and data from a variety of sources in the large3G UMTS cellular network. These include “static” in-formation regarding the UMTS infrastructure, such asthe GPS location of each NodeB, the number of sectorsper NodeB, the corresponding RNCs that NodeBs areassociated with, the SGSNs that RNCs are connectedto, the equipment type and vendor of each RNC, andso forth. We also collect IP flow-level data at eachSGSN/GGSN, where each TCP/UDP flow can be at-tributed to one of the many RNCs connecting to anSGSN. From the flow-level data, we can obtain impor-tant usage-related factors such as the packet counts,byte counts, and flow counts as well as the average num-ber of bytes per flow, e.g., for each RNC in every hour.In addition, we classify flows based on the known ap-plications such as web, email, VoIP, streaming video,MMS (multimedia messaging service), Jabber (anothermessaging service), Appstore downloads, etc.; the appli-cation mix therefore reflects the user behavior at vari-ous locales. Clearly, the factors obtained using the flow-level data are time-varying, and thus dynamic in nature.Both to reduce the amount of data and to provide a“bigpicture”view of the large cellular network, we aggregatethe flow-level statistics at the RNC-level and computethe averages on an hourly basis. Last but not the least,as two important indicators of the overall network per-formance, we consider round-trip times (RTTs) and lossrates, averaged on an hourly basis and aggregated at theRNC-level. The RNC-level, hourly averaged RTTs andloss rates are calcuated based on the average RTT andloss rate of each TCP flow belonging to an RNC [24].Clearly, one may expect that certain factors (e.g., dis-

tance of nodeBs to an RNC or SGSN) are more im-portant in terms of their contribution to the overalluser delay performance, while others (e.g., heavy us-age of stream videos) can be more critical in terms oftheir impact on the loss rate performance. Unfortu-nately, due to its vast diversity and complexity (for in-stance, the numbers of NodeBs and sectors as well asthe geographical coverage can vary significantly fromone RNC to another, and the usage patterns and ap-plication mixes differ also markedly across the networkand over time), teasing out each factor individually isnot an easy task. Not only are there a vast array ofdiverse factors, but many of these factors are also inter-twined, effecting disparate influences in different partsof the network. Furthermore, there may be latent fac-tors that are not explicitly accounted for, or captured

at all, in the information and data we have collected.To circumvent these challenges, our approach is to

first take several major classes of factors and performnetwork-wide correlation and other “first-order” analy-sis to understand how significantly each of these majorfactors individually influence the overall RNC-level net-work performance (in terms of hourly RTTs and lossrates). This provides us with a baseline understand-ing of the individudal effect (or lack thereof) that eachof the major classes of factors has on the overall net-work performance at the RNC-level. Next, to examinethe collective effect of various factors on the networkperformance and sort out their relative contributions,we apply RuleFit [12] – a powerful predictive learningmethod via rule-emsembles, combining both linear re-gression and decision trees. RuleFit provides both pre-dictive and interpretative capabilities that are neededfor our analysis. By aggregating as well as dividing thedatasets along different dimensions (e.g., based on geo-graphical locations such as states or along time such asdays or weeks), we intelligently apply RuleFit to buildmacroscopic models to assess and dissect the variousmajor factors that affect part or the whole network, andpredict their impact on the network performance overtime. For example, by analyzing the network perfor-mance predictions over time and comparing the modelobtaining using the datasets from all RNCs with thoseobtained from the RNCs located within certain geo-graphical regions, we can identify the major factors thathave persistent network-wide effect as well as uncoverthose factors that have marked impact at certain localesor at certain times. Furthermore, by examining the levelof the overall contributions of all factors as well as therelative importance of each factor and how they differacross locales or over times, we can also infer whetherthere are potential latent factors in play, affecting thenetwork performance in a way that cannot be quantifiedusing the factors explicitly included in the models. Sig-nificant deviations from the model predictions may alsosignal anomalous events, and can be used for diagnosispurposes (see Sections 6, 7 for details).In summary, the models and insights obtained from

our study provide the network operators with a “bigpicture” understanding of the overall RNC-level RTTand loss rate performance in a large 3G cellular net-work and the various factors that may influence suchperformance, thereby the user experiences. Such a “bigpicture”understanding can help network operators bet-ter cope with the management challenges posed by thescale and complexity of a large cellular network. Forexample, our relative importance analysis reveals thatthe placement of the network facilities has a major effecton the RTT performance; whereas loss rate performancemay hinge on the collective effect of the type of equip-ment being used, the usage patterns such as the appli-

2

cation mix, and flow size, etc. The factors identifiedby models suggest that the network operators can re-duce latency by improving the network coverage in someparts of the network, or add additional network capacityin other parts of the network to meet the user demandsdue to the increasing usage of bandwidth-intensive ap-plications. While our study is focused primarily onmacroscopic models, as the results in Section 7 illus-trate, they may also help diagnose and trouble-shoot thenetwork performance issues associated with certain partof the network or specific network elements by detectinganomalies and pointing to potential latent factors thatare in play. Finally, we believe that the methodologydeveloped in this paper can be applied to other 3G net-work architectures, and possibly also the emerging 4G(LTE) cellular networks.

2. BACKGROUND AND DATASETSIn this section, we give a brief overview of the ar-

chitecture of the typical 3G UMTS network, and thedatasets we collected for our study.

2.1 UMTS Network Overview

Figure 1: UMTS network architecture

As illustrated in Figure 1, a typical UMTS networkconsists of two major entities, the UMTS Terrestrial Ra-dio Access Network (UTRAN) and the core network.UTRAN consists of base stations (NodeB), and Ra-dio Network Controllers (RNC) which are connectedto and control a number of NodeBs. Each NodeB istypically configured with multiple sectors, e.g., 3 sec-tors (the common configuration) in three different di-rections, each covering 120 degree range. If there aremore than three sectors associated with one NodeB,multiple sectors are overlapped on each direction, andare distinguished using different frequencies. Moreover,NodeB usually supports multi-carrier technology. Viasoftware configuration, each sector can support multi-ple carriers in order to further increase its capacity. Thecore network is comprised of the Serving GPRS SupportNodes (SGSN) and the Gateway GPRS Support Nodes(GGSN). To connect to, say, a web server located in theoutside Internet, a User Equipment (UE) will first con-tact the nearest NodeB. After receiving the user data

access requests on one of its sectors, NodeB will han-dover the user requests to its upstream RNC, whichfurther forwards the data service requests to an SGSN.In the core network, the SGSN establishes a tunnel witha GGSN, using the GPRS Tunneling Protocol (GTP).The data is carried as IP packets in the tunnel and fi-nally reaches the web server in the external Internet.

2.2 DatasetsFor this study, we combine datasets collected from

a variety of sources in order to gain a more compre-hensive view of the various factors that may influencethe network performance. As mentioned in the intro-duction, these data sources include “static” informa-tion regarding the UMTS infrastructure, such as theGPS location (and zipcode) of each NodeB, the num-ber of sectors (and carriers) per NodeB, the correspond-ing RNCs that NodeBs are associated with, the SGSNsthat RNCs are connected to, the equipment type andvendor of each RNC, and so forth. To gain a senseof the population density and demographics in eachbase station/RNC coverage area, we also utilized the2000 census data1, which contains the land area cover-age of each zip code. From these static data sources,we can estimate the geographic areas that are coveredby each NodeB, RNC and SGSN, and get a sense ofthe population density and other demographic informa-tion (e.g., rural vs. small city vs. large metro, etc.).Therefore, these static factors can help us dissect andunderstand the impact from the geographical coverage,regional variations, the placement of the networking fa-cilities, and other infrastructure-related issues.There are two major sources of dynamic data that are

used in our study. One is the IP traffic data collectedperiodically at each GGSN. Because of the large volumeof the traffic, all the measurements were computed atthe RNC level, i.e. aggregated for all the users served bythe same RNC, and at the granularity of an hour. (Thetime mentioned in this paper always refers to the localtime of the RNC location.) From the IP traffic data, weobtain the RTT and loss rate performance measurementdata and the usage related statistics such as the numberof bytes, flows, packets, average flow sizes, and so forth,as well as the application classifications and mixes (e.g.,email, VoIP, streaming video, MMS, Jabber, Appstoredownloads) – traffic that cannot be classified is labelledunknown. The other dynamic data source contains morelower level information, such as the total number of Ra-dio Resource Control (RRC) [1] attempts served by eachRNC at every hour. An RRC attempt indicates a con-nection establishment attempt between the UEs and the1While the census data is almost 12 years old, the landarea coverage per zip code does not change drastically overthe years. Moreover, we do not make any direct conclusionbased on the census data, but rather combine it with otherdata and techniques for our study.

3

UTRAN, and therefore the #RRC attempts can be agood approximation of the #requests served at eachRNC. The datasets collected span more than 6 months.However, as representative examples and for illustrativepurposes, we will focus on the datasets collected duringa two-week period in September 2011. To adhere to theconfidentiality under which we had access to the data,at places, we present normalized views of our resultswhile retaining the scientifically relevant bits.

3. PROBLEM SETTING & ILL USTRATION

Elapsed time (hr)

RN

C in

dex

50 100 150 200 250 300

Low

High

Figure 2: RTT time series across all RNCs.

Due to the sheer scale, complexity and dynamics ofthe large 3G UMTS cellular network, there are a myriadof complex factors that may potentially affect or influ-ence the overall network performance. Some of thesefactors may depend on other factors, and interact witheach other differently in different parts of the networks.To identify the major factors and tease out the effect ofeach factor – and understand how they affect the overallnetwork performance in different parts of the networkand over time – are extremely challenging tasks. In thispaper, we consider RTT and loss rate (aggregated at theRNC-level and averaged hourly) as two key performanceindicators. In Figures 2, we plot the RTTs of all RNCsin the network over a two-week time period. The x-axisis the elapsed time in hours since 12AM on Sep.2nd,2011. The y-axis is the index of all the RNCs, whichare grouped based on the state they (primarily) serve.The colorbar on the right is the indicator of the RTTperformance level. 2 As we observe from the figure, notonly that RNCs have large variability across times (thetemporal variability), e.g., a strong diurnal pattern, butalso across different RNCs (the spatial variability). Forloss rate, we observe similar trends. Moreover, the RTTand loss rate time series are positively correlated in aconsiderable manner. The correlation coefficient of theaverage RTT and loss rate (averaged across all RNCs)is 0.7 (see Table 1, which also includes the pairwise cor-relation coefficients of RTT, loss rate and other usage

2Given the sensitivity of the data, and the privacy of theusers, the abosolute values of the metrics throughout ourstudy are anonymized.

statistics: flow size – average flow size, #bytes – thebytecount, # flows – the flow count at each RNC perhour, averaged across all RNCs ). This suggests thattime dependent factors are possibly one of the maindrivers for the performance variation over time.On the other hand, the spatial variability of RTTs

and loss rates cannot be attributed to time dependentfactors. Clearly some geograpphic conditions and location-dependent factors (e..g., varying usage patterns or dif-ferent mixtures of applications served by different RNCs)may come into play here. In terms of their spatial vari-ability, the RTT and loss rate have very weak correla-tion (in most hours), with values generally smaller than0.4 and sometimes negatively correlated (see Figure 3).In other words, suppose RNC A has larger RTT thanRNC B at a particular hour, it does not necessarily im-ply that A is also likely to have a larger loss rate than Bin the same hour. This suggests that there probably ex-ists two separate sets of factors which contribute to thespatial diversity in the RTT vs. loss rate performance.

1 25 49 73 97 121 145 168

−0.2

0

0.2

0.4

0.6

Elapsed hours

Cor

rela

tion

coef

ficie

nt

Figure 3: Diurnal pattern in the correlations be-

tween RTT and loss rate spatial variability.

It is aslo interesting to observe that although theirspatial variability is weakly correlated, there exhibits adistinct diurnal pattern, as clearly shown in Figure 3.This also holds true in terms of the spatial variabil-ity of other usage metrics, such as the #bytes, #flows,flowsize. These observations indicate that the overallnetwork performance, whether RTT or loss rate, canbe affected by a combination of many factors. Theirrelative importance or contribution to the network per-formance may also differ across the network, and varyover time.

4. FIRST-ORDER PROPERTY ANA LYSISIn this section we start by looking into several major

classes of factors that are expected to have likely influ-ence on the RNC-level network performance. These in-clude various user usage-related factors, infrastructure-related factors (e.g., locations, placement and cover-age of network facilities) as well as device factors (type

4

Table 1: Correlation coefficient of all pairwise

temporal variability.

RTT loss flowsize #bytes #flows

RTT 1 0.7017 0.2528 0.9763 0.9746

loss 0.7 1 -0.3794 0.6582 0.7606

flowsize 0.2528 -0.3794 1 0.3269 0.1619

#bytes 0.9763 0.6582 0.3269 1 0.9839#flows 0.9746 0.7606 0.1619 0.9839 1

and vendor of network equipment). For each of them,we present metrics for characterization and estimation(where needed), and perform network-wide correlationand other “first-order” analysis to get a better sense ofhow they positively (or negatively) influence the perfor-mance. This provides us with a baseline understandingof the individudal effect (or lack thereof) that each ofthe major classes of factors has on the overall networkperformance at the RNC-level. Clearly, the complexityand diversity of the network making it almost impossi-ble to exhaust all possible factors and perform similaranalysis.

4.1 UsageFactors

0

0.2

0.4

0.6

0.8

1

Number of bytes

CD

F

Figure 4: Distribution of traffic load on all RNCs

at 11AM.

User behaviors can affect the performance in a num-ber of ways due to the varying traffic load and appli-cation mix over different time periods (e.g., the timeof the day or the day of the week) and across differ-ent parts of the network (as represented by the RNCs).Foremost, it directly shapes the diurnal patterns seenin the performance. As shown in Table 1, the num-ber of bytes or flows served by each RNC has strongcorrelation with the performance, in particular, RTT.Besides, the traffic load also has a large variation acrossdifferent RNCs, The cumulative distribution of the totalnumber of bytes are depicted in Figure 4. The largestbyte counts can be 10 times the smallest byte countsper hour. A similar distribution is also observed for thenumber of RRC establishments seen on these RNCs perhour, which can be a good approximation of the num-ber of active subscribers and their data accesses at thathour. This large variation of traffic load across RNCs

is an illustration of the huge diversity (in terms of usersand their data access activity) that exists in the largeUMTS network.

RNC index

Per

cent

age

of a

pplic

atio

ns

web

stream

phone

email

appstore

unknown

Figure 5: Fractions of major applications on all

RNCs at 11AM.

In addition to the traffic load, application mix also ex-hibits large variation and diversity over time and acrossthe RNCs. There are altogether 16 known categories ofapplications, and one unknown category. Most RNCssee a significant increase in streaming and appstore traf-fic during the nighttime, whereas during the daytime,web, email and other traffic tend to dominate. Thebreakdowns of the major applications such as web, stream-ing, smart phone, email, appstore, as well as the un-known traffic across different RNCs are shown in Fig-ure 5. The x-axis is the index for all the RNCs, wherethe RNCs within the same state are grouped closer toeach other. The composition of the traffic exihibits aclear geographical distinction. While most RNCs con-tain a large portion of streaming traffic, several clus-ters of RNCs contain much larger portions of unknowntraffic. A closer look at the dataset reveals that theyrepresent RNCs coming from 14 different states. Aswill be discussed later, these states are also among thosethat tend to persistently suffer the worst network per-formance in terms of loss rate.

4.2 Infrastructural Factors

Number of RABs

Num

ber

of s

ecto

rs

Number of RABs

Num

ber

of N

odeB

s

Figure 6: Correlation of user requests and #sec-

tors deployed.

To meet the diversified user demands at different RNCs,the number of NodeBs, sectors, and carriers associatedwith each RNC also vary accordingly, as manifested inFigure 6. For those regions with a large or an increas-ing number of UMTS subsribers, multiple NodeBs are

5

RNC index

Cov

erag

e pe

r N

odeB

RNC index

RT

T

Figure 7: The effect of NodeB coverage on RTT.

Flows per sector

RT

T

Flows per sector

Loss

rat

e

Figure 8: The impact of the number of flows per

sector on performance.

likely deployed in the same geographical location. Boththe number of devices being placed and the geographi-cal placement strategies used can play a crucial role ininfluencing the network performance experienced by theusers in each region.First, the more sectors or carriers are deployed, the

less load will incur on each carrier or sector. The impactof the load per sector or per carrier, as characterized bythe number of flows per sector, on the network perfor-mance is shown in Figure 8 using a one-day dataset.The basic trend indicates that larger flows per sectortends to incur larger RTT and loss rate. Note that itis not a perfect linear curve, as the number of flows persector is not the sole factor affecting the network perfor-mance. Other factors such as the varying behaviors ofusers at different hours also come into play. Other met-rics such as the number of bytes per sector, users persector, as well as the load on the carrier level exhibitsimilar impact on the network performance.Second, an increasing number of NodeBs within a ge-

ographical area of a fixed size can potentially improvethe RTT performance in that it leads to a decrease inthe coverage area per NodeB, and therefore smaller“lastmile” network latency between UEs and NodeBs. Toconfirm and verify this intuition, we estimate the cov-erage area of each nodeB (which is not directly accessi-ble from our data) using a combination of informationfrom the census data and inference using the VoronoiDiagram (based on the GPS locations of NodeBs). Thecensus data provides us with information such as theland area covered by each zip code. Along with the zipcode information for each NodeB, we can approximatelycompute the coverage area per NodeB. However, sincemultiple NodeBs can be deployed even within the same

zip code area, as such this approximation may over-estimate the real coverage area of a NodeB. To over-come this problem, we also apply the Voronoi Diagramto demarcate the serving area per NodeB, based onthe GPS locations of the NodeBs. The limitation withthis method is that not all US land areas (e.g., desertsand other uninhabitable areas) may be covered by theNodeBs. (Another technicality due to the use of theVoronoi Diagram is that the NodeBs on the US bound-ary are considered to have an“infinite”area – this effectcan be mitigated by explicitly including the contour ofthe US boundary into the estimation process.) There-fore, both methodologies may overestimate the coveragearea in some way. For this reason, we infer the coveragearea per NodeB by taking the minimum of the resultsobtained by both methods. In Figure 7, the impact ofour estimated NodeB coverage area on the RTT per-formance is plotted. As expected, RNCs with NodeBsthat have smaller coverage areas tend to have smalleraverage RTT performance.Similar to the location and placement of NodeBs,

the number of RNCs deployed in a geographical areaand the coverage areas of RNCs are likely driven bythe user/subscriber population and the load they gen-erate. The same also holds true for SGSNs, but to alesser degree. In general, due to their smaller numbers,RNCs and SGSNs are not as geographically spreadedas NodeBs. For instance, RNCs and SGSNs may beplaced in certain selected locations for ease of manage-ment. Overall, the distance between RNCs and SGSNsare usually stable, and there is far less variability in thedistance between RNCs and SGSNs than there is be-tween NodeBs and RNCs. The latter depends highlyon the geograghical coverage of each RNC, and variesfrom region to region. Nonetheless, both the distancebetween NodeBs and RNCs and the distance betweenRNCs and SGSNs can potentially have a significant im-pact on the network performance, especially the RTTperformance.

4.3 DeviceFactorsIn addition to the aforementioned factors, the choice

of vendor for the networking devices such as NodeBsand RNCs may also have an effect on the network per-formance, as illustrated below. This may be due tothe fact that different vendors can have fairly differenthardware specifications [6], resulting in different dataprocessing speeds and capacities.In the large cellular network studied in this paper, two

major types of devices are used, vendor-x and vendor-y.The type of NodeB is usually chosen to be the same asthe RNC it connects to, most likely due to the hard-ware level compatibility issue. The comparison of theRTT and loss rate performance distributions of thesetwo types of RNCs at two different hours is shown in

6

0

0.2

0.4

0.6

0.8

1

RTT

CD

FRTT (4AM)

0

0.2

0.4

0.6

0.8

1

RTT

CD

F

RTT (12PM)

0

0.2

0.4

0.6

0.8

1

Loss Rate

CD

F

Loss rate (4AM)

0

0.2

0.4

0.6

0.8

1

Loss Rate

CD

F

Loss rate (12PM)

vendor−yvendor−x




Figure 9: Comparing the performance for two major types of devices.

Figure 9, one for 12PM and the other for 4AM. Whilethe RTT performance distribution is quite similar forboth types for most of the 24 hours, all RNCs of vendor-x outperform those from vendor-y in terms of their lossrate. Moreover, such contrasting performance is per-sistently observed across all 24 hours. More detailedinvestigation reaveals that the maximum capacity forNodeBs of vendor-x is 6 sectors and 12 carriers, whereasNodeBs of vendor-y only have 3 sectors and 6 carriers.To increase the capacity, additional carriers are config-ured on vendor-y’s NodeBs through software configura-tion. However, as the overall data processing capacityof the device is still constrained by the hardware ca-pacity, NodeBs of vendor-y tend to incur more packetlosses, especially during high loads.

5. PERFORMANCE MODELINGTo understand the complex interaction of various fac-

tors and assess their relative importance and contribu-tion to the network performance, we apply RuleFit [4,12] – a powerful predictive learning method via rule-emsembles – to build macroscopic models to assess anddissect the various major factors that affect part or thewhole network, and predict their impact on the networkperformance over time. RuleFit provides both predic-tive and interpretative capabilities that are useful forour study in a number of ways. Rulefit is useful for ourmeasurement analysis in a number of ways. First, itprovides better modeling accuracy, especially for sucha large pool of factors by combining linear regressionand decision tree. Second, Rulefit ranks the factorsbased on their importance to the output performance,thereby enabling the relative important analysis. Third,by analyzing the fitting accuracy of the model acrossstates and hours, or comparing the model obtained us-ing the datasets from all RNCs with those obtainedfrom the RNCs located within certain geographical re-gions, we can identify the major factors that have per-sistent network-wide effect as well as uncover those fac-tors that have marked impact at certain locales or atcertain times. Furthermore, by examining the level ofthe overall contributions of all factors as well as therelative importance of each factor and how they differacross locales or over times, we can also infer whether

there are potential latent factors in play, affecting thenetwork performance in a way that cannot be quanti-fied using the factors explicitly included in the models.Significant deviations from the usual model predictionsmay also signal anomalous events, and can be used fordiagnosis purposes.In this section, we start with an overview of Rulefit.

Next, we describe the training and prediction datasetsand the list of features (factors) we use as input to Rule-fit as well as their mutual interactions. By focusingon a generalized model generated using one-day data,important factors associated with the performance arediscussed. Interpretation of the rules generated usingthe model is also presented. The fitting accuracy acrossstates and hours as well as using it as a mechanism todetect persistent performance anomalies is further dis-cussed.

5.1 Rulefit OverviewRulefit is a supervised rule-based ensemble learning

technique. Given a set of input variables x, the ensem-ble prediction is a linear combination of the predictionsof each ensemble members, which takes the form,

F (x) = a0 +

m∑

i=0

amfm(x) (1)

where ai is the linear combination parameters, and fm(x)is a set of base learners. They are different functions ofthe input variables possibly derived from different para-metric families. We consider base learners being a setof simple rules, along with a set of linear functions ofthe input variables. The rules are generated using de-cision tree. Each rule base learner is in the form of aconjunctive statement,

r(x) =∏

j

I(xj ∈ sj) (2)

where sj is a subset of all the possible values of theinput variables. I(.) is an indication function, and therule takes value 1 only if all the arguments hold true.The parameters ai of the linear regression model areestimated with a lasso penalty. The importance ofrule rk is measured using

Ik = |ak|√

sk(1− sk) (3)

7

where sk is the support of the rule on the training data.An input variable is considered important if it definesimportant rule (or linear) predictors. The importanceof an input variable xl is defined as

Jl = Il +∑

xl∈rk

Ik/mk (4)

where Il is the importance of the linear predictor, andmk is the total number of input variables that definethe rule rk.

5.2 Predictor Var iablesTo characterize the impact from all possible factors,

and apply them to Rulefit as the input predictor vari-ables to predict RNC performance, we summarize theminto following 35 such metrics,Usage factors: 1) nRRC : number of RRC establish-ment attempts, 2) nByte : number of bytes, 3) nF lows :number of flow, 4) Bpflow : number of bytes per flow,5) - 21) fractions of 17 application categories, includingmisc, web, ftp, instant messaging (IM), P2P, navigation(Nav), VPN, email, Voip, game, Appstore, video op-timization (opt), multimedia messaging service (mms),push notification (push), smartphone applications, stream-ing, and unknown category which is not able to be iden-tified as any of the 16 categories.Infrastructural factors: 22) B2RDist : distance be-tween NodeB and RNC, 23) R2SDist : distance be-tween RNC and SGSN, 24) nSector : number of sec-tors, 25) nCarrier : number of carriers, 26) RRCpsec :number of RRCs per sector, 27) RRCpcar : number ofRRCs per carrier, 28) nNodeB : number of NodeBs,29) CovpNB : land coverage per NodeB, 30) Bpsec :number of bytes per sector, 31) Bpcar : number of bytesper carrier, 32) Fpsec : number of flows per sector, 33)Fpcar : number of flows per carrier.Device factor: 34) device : device type.Beside, we also include geographical state as one of

the input variables to capture any state specific features.All features are considered as numerical variables, ex-cept device and state, which is in the form of an un-orderable categorical variable.Mutual information among features: To bettercharacterize the features that may affect the networkperformance and give as much information to the learn-ing algorithm as possible, we present a rich list of fea-tures as input predictor variables. Some may share sim-ilar information, while others are independent from eachother. Rulefit itself provides powerful learning algo-rithm using decision tree to perform the feature selction.Therefore, the existence of multicollinearity [3] amongthese features does not affect the accuracy of the model.However, when it comes to compute the relative im-portance of each factor, Rulefit may consider a featureas unimportant if another feature that is highly corre-lated with it is selected to build the model. Instead of

5 10 15 20 25 30 35

statenRRC

B2RDistR2SDistnSectornCarrier

RRCpsecRRCpcarnNodeBCovpNB

devicenByteBpsecBpcar

nFlowsFpsecFpcar

Bpflowmiscweb

ftpIM

unknownP2PNav

VPNemailVoip

gameAppstore

optmmspush

smartphonestreaming

2

4

6

8

10

12

Figure 10: Pairwise mutual information among

all features.

taking the risk of losing the accuracy of the predictivemodel [15] or losing the intuitive interpretation power ofthe model by applying a separate feature selection tech-nique on the dataset before applying Rulefit, we providemore insights of the interactions among these featuresusing the concept mutual information [8]. The mutualinformation of two variables x and y takes the form,

Ixy =∑

x

∑

y

p(x, y)logp(x, y)

p(x)p(y)(5)

The idea of mutual information is similar to Pearsoncorrelation except that the former one also takes cate-gorical variables. Ixy measures the amount of informa-tion that one variable contains about another, which isthe reduction in the uncertainty of one variable due tothe knowledge of the other. Ixy = 0 if and only if xand y are independent from each other, meaning know-ing one of them give no information about the other.If all the information contained in x is also containedin y, then Ixy is the same as the entropy of x. In thecase when x and y are identical, Ixy is the same as theentropy of x or y.The pairwise mutual information of all the features

are shown in Figure 10. The colorbar on the right indi-cates the mutual information level. The y-axis lists allthe features from top to bottom, and x-axis indicatesthe same set of features in the same order from left toright, i.e., the value in the ith row from the top andthe ith column from the left is the entropy value of theith variable. As expected, the entropy values on the di-agonal of the matrix suggests that the categorical vari-ables such as state and device have less uncertainty, asthey only have limited number of possible values associ-ated with them. Second to the categorical variables, theinfrastural related variables have less uncertainty thanall the application mix related variables. In particular,

8

R2SDist, the distance between RNCs and their SGSNsare more stable due to the fact that only few selectedlocations are used to place RNCs and SGSNs comparedto the span of NodeBs. Among all the applications, ftpand game are two of the applications that have smallerentropy values than the rest of applications, most likelydue to the fact that there are fewer users use ftp andgame via their mobile UEs.The values other than the diagonal entries also shown

interesting observations. All the values on ith row orith column indicates the mutual information betweenthe ith variable and all the other variables. The 2ndrow or column implies that the number of requests ascharacterized by nRRC share pretty large amount ofinformation with the traffic load per sector or carrieras well as the usage related variables such as nBytes,Bpsec, etc. Meanwhile, nRRC also contains much ofthe information as contained in such application mixrelated variables as misc, web, email, Voip, Appstore,mms, push notification, smartphone apps, and stream-ing, most likely due to the fact that they are widely usedamong people, thereby leading to a high correlation be-tween the nRRC and the fractions of these applications.

5.3 Training & Prediction SetsFor the varying purpose of modeling, prediction, and

diagnosing the network performance, we organize ourwhole datasets into subsets of data in three differentways.The most intuitive way to organize the data is based

on the date they were collected. Each subset of thedatasets is a one-day data, containing 24 hourly aggre-gated measurements for all RNCs. This will facilitateour modeling and prediction of the datasets on a dailybasis. Starting from Sep.2nd, 2011, each such datasetis labeled as TS-D-{mm/dd}. However, this organiza-tion of the datasets mixes the measurements performedat different hours as well as from RNCs within differ-ent states. To better understand the modeling accu-racy at specific hours or at particular RNCs, as well ashelp diagnose the problem specific to certains hours orRNCs, we further reorganize the datasets based on theirhour TS-H-{hour}, and state information. For simplic-ity, we perform this finer granularity analysis only onthe datasets collected at the first 10 days which observeless anomalies compared to the rest of the days. Bycomparing the model and important factors from differ-ent angles, we are potentially able to diagnose the per-formance and narrow down the issues to certain days,hours, or states.

5.4 Generalized ModelTo get better a understanding of the important fac-

tors associated with the performance, a generalized modelis created and discussed using the first day data TS-D-

09/02. The model is generalized in the sense that itis able to model the major performance behaviors forRNCs within different states, across different hours, aswell as predict the performance in the near future.Important factors: As one of the output of our model,input predictor variables are ranked based on their rel-ative importance to the performance. The top 10 mostimportant factors contributing to RTT and loss rate forthe generalized model are listed in Table 2, 3, respec-tively. The number below each factor is the relative im-portance level of that factor, where 100 is most impor-tant and 0 is least important. Both RTT and loss rateperformance prove to be highly state dependent, whichmight due to the varying geographical conditions, lo-calized user interest or application mix, as well as otherstate specific latent factors.Beyond the state factor, other important factors sug-

gest that RTT is mainly dependent on such architec-tural factors as the coverage of NodeB, the distancebetween NodeB and RNC, etc. To better improve theRTT performance, a better placement strategy of thenetworking devices is critical. For instance, replacingNodeBs covering huge areas with larger number of NodeBscovering smaller area and transmitting at a lower powermight be more cost-effective. Moreover, instead of us-ing a few selected locations for the placement of RNCs,spreading them out may help reduce the distance be-tween NodeB and RNC.On the other hand, loss rate is more dependent on

such factors as the device type, the application mix,and flow size (Bpflow). As we discussed earlier, devicesof vendor-x exhibit persistently better loss rate perfor-mance than those of vendor-y. Application mix and flowsize are highly dependent though. The streaming appli-cation is ranked as the most important factor is due tothe fact that its flow size is much larger than all otherapplications.Another observation is that the metrics associated

with NodeBs, such as, flows per sector or flows per car-rier play a more important role than those associatedwith RNCs, such as the load on RNCs, as characterizedby the total number of bytes or RRC establishmentsserved on them. That implies the network performanceis more constrained by the NodeB performance thanRNC. Some problems can possibly get fixed by simplydeploying more NodeBs or more sectors on NodeBs, oreven configuring more carriers on each sector, which isfar more cost-effective than adding more RNCs.Rules interpretation: The model generated by Rule-fit takes the form of a linear combination of a set ofeasy-to-interpret rules. The coefficients and the sup-port of a rule defines its importance as to how muchimprovement or degradation it brings to the overall per-formance. One of the important rules of the generalizedmodel suggests that reducing CovpNB to lower than

9

Table 2: Top 10 factors and their relative importance to RTT performance.PPPPPP

TSRank

1 2 3 4 5 6 7 8 9 10

state CovpNB mms B2RDist jabber Fpsec nFlow email Bpflow BpcarTS-D-09/02 100 39.72 37.04 15.98 13.20 5.96 4.65 4.41 4.24 4.07

state nByte CovpNB nFlow Fpcar B2RDist Bpcar mms Fpsec RRCpsecTS-D-09/12 100 77.24 53.19 33.28 22.64 19.06 17.85 13.81 12.58 9.34

State CovpNB B2RDist R2SDist Bpflow jabber RRCpsec nNodeB nCarrier mmsTS-H-10 100 43.30 17.02 10.73 8.29 6.33 5.53 4.00 3.76 3.74

Table 3: Top 10 factors and their relative importance to loss rate performance.PPPPPP

TSRank

1 2 3 4 5 6 7 8 9 10

state stream device Bpflow Appstore mms Voip CovpNB R2SDist webTS-D-09/02 100 61.74 46.30 36.43 28.57 19.13 15.75 9.84 9.09 7.91

Bpflow state unknown smartphone web nByte email nFlow RRCpcar FpcarTS-D-09/12 100 71.45 45.72 22.23 21.08 18.09 15.25 13.82 11.97 11.94

Bpflow state device stream email misc jabber Appstore mms unknownTS-H-10 100 93.09 43.57 20.68 15.79 14.57 12.20 11.77 8.18 7.01

12.35miles2 in 11 of the states , RTT can be improvedby 7.3ms. The support of this rule is 0.33, which is thefraction of data points falling into this category. An-other such important rule states that if more than 78sectors are deployed for RNCs in 16 of the states , theirRTT can be improved by 6.69ms.Similarly, for loss rate, the generated rules also sug-

gest a list of ways that can potentially improve the per-formance. For instance, if the number of bytes per flowgets larger than 6115 by combining smaller flows, theloss rate can potentially get improved by around 0.16percent within 13 of the states. On the other hand, forthree of the states , if the distance between RNC andSGSN is larger than 270.9 miles, the loss rate will bepenalized for 0.28 percent.Fitting accuracy and detecting persistent perfor-

mance anomalies: As discussed in Section 5 and 4,the performance, as well as the traffic load and appli-cation mix have large variations across different RNCsand hours. Our generalized model may not fit equallywell for all these scenarios. As depicted in Figure 11,the performance at early morning hours, from 2AM to4AM, behave fairly different from the rest of the hours.In fact, it improves during those hours, possibly due tothe decrease of the overall traffic load and the changeof application mix. While most traffic originates fromweb, email, etc. during daytime, entertainment trafficsuch as streaming, Appstore becomes more prevalantduring nighttime.In addition, as mentioned in previous section, both

RTT and loss rate are highly dependent on the statefactor itself. Not surprisingly, the fitting error acrossdifferent states using the same generalized model alsoobserves a large variation, as shown in Figure 12. Thex-axis is the state index. In particular, the performanceobserved in states labeled as 29, 30, 31 is much worse

than other states. These performance anomalies (im-proving or regressing) exist persistently, not just for oneor two days. By comparing the fitting accuracy acrossstates and hours, or even individual RNC, Rulefit isable to help us detect them and narrow down to spe-cific hours or states.Moreover, loss rate is observed to have persistent

worse fitting accuracy than RTT, which suggests thepossible missing latent factors in our model. Loss rateperformance is therefore more complicated than we ex-pected and could depend on lots of other factors.

6. PREDICTIONOne important application of the performance mod-

eling is to help predict future network performance fromthe past. A model that can provide powerful predictionaccuracy will help the operators gain more insights ofthe future needs, and therefore proactively take mea-sures to prevent any predictable performance issues.Any transient performance issue will also be easily de-tected as a result of that.

09/02 09/09 09/16 09/23 09/30 10/07 10/14 10/210

20

40

60

80

100

120

Date

Nor

mal

ized

RM

SE

(%

)

RTT

loss rate

Figure 13: Prediction of a two-month duration.

For this purpose, we conduct a prediction for a two-month duration using the observations in the first week(i.e., Sep.2nd to Sep.8th). Considering the day of week

10

0 5 10 15 200

5

10

15

20

25

30

Time (hour)

Nor

mal

ized

RM

SE

(%

)

RTTloss rate

Figure 11: Hourly fitting accuracy

on Sep.2nd.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 310

10

20

30

40

State index

Nor

mal

ized

RM

SE

(%

)

RTT

loss rate

Figure 12: Fitting accuracy across states on Sep.2nd.

pattern of the overall performance, we use the sameday within the first week to predict the same day per-formance of furture weeks. The daily prediction errorsas characterized by normalized root mean square er-ros(NRMSE) are shown in Figure 13. Note that theprediction error for the first week is the indication ofthe fitting accuracy of the model generated by Rulefitfor the real data.Our observations are two-fold. First, similar to the

fitting accuracy across different states and hours of thegeneralized model, the prediction error is much largerin loss rate than in RTT, partially implying the “com-plexity” of loss rate performance. On the other hand,most of the future RTT performance is quite predictablemainly due to the fact that it is more dependent oninfrastructure related factors. Second, there are morevariations in terms of the loss rate performance as it ismore susceptible to the change of user related factors.Though the long term trend (even beyond 2-month pe-riod) suggests that the loss rate performance is mostlypredictable, there are quite a few days showing quiteabnormal behaviors. One of the anomaly happens onSep.12th and 13th, as was displayed in Figure 2. An-other anomaly happens the week after, where we ob-serve an increase of loss rate (but not RTT) on quitea number of RNCs. Several other anomalies also hap-pened in October.In real system, in order to provide better prediction

accuracy and alleviate the impact from anomalies in thehistorical data, as well as be more responsive to morerecent trend variations, we can train the model usingmore (than just one week) historical data and give moreweights towards recent data.

7. PERFORMANCE DIAGNOSISBy adopting Rulefit, two types of preformance regres-

sions were unveiled, transient and persistent. However,Rulefit is more than just a tool to detect performance is-sue. Rather, it is also helpful in diagnosing these issues.In this section, we present our approach to tackling withthese two types of performance regressions and reveal-

ing the possible existence of latent factors using Rulefit.

7.1 Transient PerformanceRegressionFor diagnosing transient performance regression, we

take a step-by-step approach to solve this problem. First,we extract the key factors associated with the anamalyby applying the relative importance analysis. Next, wefurther narrow down the issue to the specific group ofstates, hours, or even RNCs by drilling down into thefitting accuracy across different states or RNCs as wellas over time. In addition, with the macroscopic infor-mation suggested by Rulefit, we need to conduct a moredetailed or even lower-level diagnosis on the originaldata to provide a better understanding of the under-lying behavioral change associated with the factors assuggested by Rulefit.Using the performance regression on Sep.12th as one

case study, we gain insights of the possible underlyingfactors attributing to the performance issue by compar-ing the important factors generated from Rulefit usingTS-D-09/02 and TS-D-09/12, respectively. As summa-rized in Table 2, 3, the number of bytes starts to play amore critical role in RTT performance, while the num-ber of bytes per flow also replaces the state factor andbecomes the most important, both suggesting the pos-sible issue related to the change of traffic load on thatday.To further investigate whether the performance degra-

dation on that day was only observed on particularstates or all states, we compare the fitting accuracyacross states of these two models. As indicated in Fig-ure 14, it was a problem associcated with a subset ofstates, as much larger fitting errors are observed in thosestates. The state index is the same as used in Figure 12.Similarly, the fitting accuracy across hours suggests thatthe performance issue occurs in all hours, not just forparticular hours.With the heuristics obtained from Rulefit, we specu-

late that the problem is most likely due to the drasticchange of the traffic load in those states. To verify ourhypothesis, we compare the number of bytes served on

11

3 13 4 14 17 21 15 6 16 8 10 19 22 5 23 2 28 31 30 27 18 24 20 9 29 7 1 12 11 26 250

50

100

150

State index

Nor

mal

ized

RM

SE

(%

)

RTT

loss rate

Figure 14: Prediction error across states on

Sep.12th.

3 13 4 14 17 21 15 6 16 8 10 19 22 5 23 2 28 31 30 27 18 24 20 9 29 7 1 12 11 26 250

1

2

3

4

5

State index

Tra

ffic

ratio

Figure 15: Traffic ratio of Sep.2nd to Sep.12th.

Sep.2nd and Sep.12th for each individual state. Theirrespective ratio of the number of bytes on Sep.2nd toSep.12th is shown in Figure 15. As we can see, thosestates with larger fitting errors also see a drastic dropof traffic on that particular day. Therefore, our as-sumption is indeed borne out by our investigation ofthe measurement data itself. A more detailed analysisreveals that the drop of traffic load not only happenedto a subset of RNCs within those states, but rather, allRNCs within those states. Moreover, these states areby chance all located on the east coast. Based on theseinduction, we believe that the degradation of the per-formance was not attributed to holidays or particularevents, which normally incur more traffic than usual.Rather, it could be an issue associated with the net-work carrier itself, such as the failure of SGSN or GGSNthat can have a large impact on all the RNCs on eastcoast. Similar strategies can also be applied to figureout the underlying reasons leading to the other transientanomalies observed in the weeks after that as shown inFigure 13.

7.2 Persistent PerformanceAnomaliesTo diagnose the performance regression persists within

a subset of problematic states as discussed earlier, weintend to adopt the same methodology in diagnosingthe transient performance issue. Similarly, we applyRulefit to the training sets collected in different states.We might have expected that the predictive model willachieve better accuracy if we apply Rulefit to each statesindividually. Surprisingly, the predictive loss rate per-formance model generated for those states with poorperformance is rather inaccurate, whose explained vari-ance is only around 50% or even lower. On the otherhand, the explained variance of the loss rate model im-proves for those states with good performance. Thissuggests that the factors presented in our study are notcomplete enough to fully capture the loss rate behavior.Rather, there may exist possible latent factors that playa critical role but was missing in our feature list. Thisphenomenon is also partly revealed by our previous ob-servation as shown in Figure 5, where proportions ofthe ”unknown” category of traffic in those problematicstates are much larger than the rest of the states. Themissing latent factors might be quite correlated with

the unknown type of traffic, and therefore suggesting afiner-granularity investigation of this type of traffic bythe operators.Besides the latent factors coming from the traffic it-

self, there may also exist other latent factors associatedwith the specific geographical and demographical condi-tions. Using one of the biggest and busiest states as oneexample, the numerous tall buildings can lead to verypoor performance that are not considered in our model.Besides, the large user density in that state leads tothe placement of many microcells, i.e. base stations ofsmaller sizes and transmitting at a lower power [13]. Asa result, users traverse more quickly through differentcells. The poor handoff process therefore may also leadto the interuption of service.The anomaly behavior in the early morning hours

on the other hand is not due to its performance regres-sion, but a large performance improvement compared toother daytime hours after 7AM. To get a better under-standing of why early morning hours have much betterperformance, or in other words, why other hours havemuch worse performance, we similarly apply Rulefit tothe training sets collected at different hours. The im-portant factors for one of the daytime hours TS-H-10are shown in Table 2, and 3. While the important fac-tors for RTT is similar to our generalized model, Bpflowbecomes the most important factor during the daytimehours. This is due to the fact that most applicationsprevalant during those hours have smaller flow sizes, re-sulting in a large number of smaller flows, and thereforedegraded loss rate performance. On the other hand, theflow size become much larger during the early morninghours due to the increase of streaming traffic. Simi-lar to the problem with the states, the model gener-ated for early morning hours also has very poor predic-tive power, having only around 40% explained variance,whereas the model improves for the rest of the daytimehours. This also suggests the missing latent factors inthose hours. The existence of the latent factors bothduring early morning hours and within the problematicstates illustrate the challenges coming from the“compli-cated”nature of the UMTS network performance. Withthe help of measurement datasets at a finer granular-ity, i.e. NodeB or even UE level, we believe the opera-tors can incorporate some of the missing latent factors,

12

and perform more accurate diagnosis using the similarmethodologies discussed here.

8. RELATED WORKThere have been quite a few efforts regarding the

evaluation or improvement of celluar network perfor-mance, such as [18] which studied the important fac-tors affecting the network performance as well as theweb browsering performance, and MobiPerf [17] whichdesigned a measurement tool widely installed on mo-bile end users in order to collect the client side mea-surement data and provide them with a set of real timenetwork related information, Online tools, e.g., [2, 5],are also available for measuring the performance. How-ever, their measurement studies are all based on clientside factors, whereas the datasets in our study were col-lected inside the carrier, containing more insights of thefactors pertaining to the carrier network itself. Besides,unlike previous studies, which either consider the per-formance indicator for a specific set of applications suchas Ping, HTTP, FTP, P2P, VoIP and streaming perfor-mance [16, 20, 22, 27], or consider the performance im-pact from only a few factors [10,14,25,26], we evaluatethe RTT and loss rate performance that are critical tomost of the applications and try to catch up as manyfactors as we can in modeling the performance. Otherperformance evaluation works include a study of theimpact of variable 3G wireless link latency and band-width on the TCP/IP performance [7], the design of abetter handoff decision algorithm using the mobile ter-minal location and area information [21], performanceevaluation of cellular networks using more realistic as-sumptions [11], and a remodeling of handoff interarrivaltime [9]. However, they are all simulation based work,and the studies were performed under certain restrictedassumptions. Similarly, in [23], authors observed a sig-nificant temporal and spatial variations in terms of thenetwork resource usage and subscriber behavior. Nosystematic approach was proposed to model the perfor-mance and understand the underlying important fac-tors. In terms of methodology, a comparative study ofvarious analysis methods of network performance wasperformed in [19]. However, none of them was appliedto real data and verified to be an effective approach indetecting and diagnosing the performance issue. To ourbest knowledge, our work is the first attempt to iden-tify a large set of factors from both user’s and carrier’sperspective via a large-scale collection of the datasetsinside the carrier. A systematic approach is presentedto build a macroscopic model of the cellular networkperformance, as well as perform prediction, and diag-nosis.

9. CONCLUSIONSIn this paper we have studied the RNC-level perfor-

mance in a large UMTS cellular network, with the goalto provide a ”big picture” understanding of the variousmajor factors that may influence the overall networkperformance across the network and over time. The ma-jor contributions and findings of our study are the fol-lowing. i) Large scale data collection: Our study utilizesmassive data periodically collected from diverse sourcesover more than six months within one of the largestUMTS cellular network carriers, whose coverage spansthe whole United States. The combination of differentdatasets provides us a quite comprehensive view of thenetwork performance in the network. The conclusionsdrawn in our study will therefore not be restricted toany particular group of users or devices. ii) Identifica-tion of a rich set of factors: As the network performanceis not a simple function of a few factors, we identifya rich set of features or factors along different dimen-sions, including usage patterns, user behavior or appli-cation mixes, network coverage, geographical regions,hardware and other factors associated with the networkinfrastucture. While the features gathered in our studyare clearly not exhaustive, new factors can be easilyaccommodated in our models. iii) Macroscopic mod-els and relative importance analysis: The macroscopicmodels developed in our study have both predictive andinterpretative capabilities. They provide network oper-ators with a “big picture” view of the overall networkperformance and a high-level understanding how vari-ous factors may have a differing effect on the networkperformance across the network or over time. iv) Un-veiling potential latent factors and detecting anomalies:Our models can also help unveil potential latent factorsthat are not directly captured in our datasets, or arehard to be captured using either numerical values ornon-orderable categorical values. The effect of poten-tial latent factors can be inferred, for example, by com-paring the prediction results produced by the modelsfeeding on different (sub)sets of data and by perform-ing the relative importance analysis. Anomalies mayalso be detected via significant deviations from usualperformance predictions.In a nutshell, the models and insights obtained from

our study provide the network operators with a “bigpicture” understanding of the overall RNC-level RTTand loss rate performance in a large 3G cellular net-work and the various factors that may influence suchperformance, thereby the user experiences. Such a “bigpicture”understanding can help network operators bet-ter cope with the management challenges posed by thescale and complexity of a large cellular network. Whilethe paper focuses only on a 3G UMTS network, we be-lieve that our methodology can also be applied to other3G (and perhaps also 4G) network architectures.

10. REFERENCES

13

[1] 3gpp specification: 25.331. http://www.3gpp.org/ftp/Specs/html-info/25331.htm.

[2] Consumer broadband test. http://www.broadband.gov/qualitytest/.

[3] Multicollinearity. http://en.wikipedia.org/wiki/Multicollinearity.

[4] Rulefit. http://www-stat.stanford.edu/~jhf/R-RuleFit.html.

[5] Speedtest.net. http://speedtest.net/.[6] Umts/3g nodeb/bts comparison. http://www.

umtsworld.com/technology/ran/ran.htm.[7] M. Chan and R. Ramjee. Tcp/ip performance

over 3g wireless links with rate and delayvariation. Wireless Networks, 11(1-2):81–97, 2005.

[8] T. Cover, J. Thomas, J. Wiley, et al. Elements ofinformation theory, volume 6. Wiley OnlineLibrary, 1991.

[9] S. Dharmaraja, K. Trivedi, and D. Logothetis.Performance analysis of cellular networks withgenerally distributed handoff interarrival times. InSymposium on Performance Evaluation ofComputer and Telecommunication Systems(SPECTS 2002), 2002.

[10] J. Fajardo, F. Liberal, and N. Bilbao. Impact ofthe video slice size on the visual quality for h. 264over 3g umts services. In BroadbandCommunications, Networks, and Systems, 2009.BROADNETS 2009. Sixth InternationalConference on, pages 1–8. IEEE, 2009.

[11] Y. Fang. Performance evaluation of wirelesscellular networks under more realisticassumptions. Wireless Communications andMobile Computing, 5(8):867–885, 2005.

[12] J. Friedman. Rulefit with r, 2005.[13] A. Goldsmith. Wireless communications.

Cambridge Univ Pr, 2005.[14] A. Haque, M. Kyum, M. Al Sadi, M. Kar, and

M. Hossain. Performance analysis of umts cellularnetwork using sectorization based on capacity andcoverage. International Journal, 2011.

[15] A. Haury, P. Gestraud, and J. Vert. The influenceof feature selection methods on accuracy. Stabilityand Interpretability of Molecular, 2011.

[16] T. Hoßfeld, A. Binzenhofer, M. Fiedler, andK. Tutschku. Measurement and analysis of skypevoip traffic in 3g umts systems. Proc. ofIPS-MoMe, pages 52–61, 2006.

[17] J. Huang, C. Chen, Y. Pei, Z. Wang, Z. Qian,F. Qian, B. Tiwana, Q. Xu, Z. Mao, M. Zhang,et al. Mobiperf: Mobile network measurementsystem. Technical report, Technical report).University of Michigan and Microsoft Research,2011.

[18] J. Huang, Q. Xu, B. Tiwana, Z. Mao, M. Zhang,

and P. Bahl. Anatomizing applicationperformance differences on smartphones. InProceedings of the 8th international conference onMobile systems, applications, and services, pages165–178. ACM, 2010.

[19] P. Lehtimaki. Data Analysis Methods for CellularNetwork Performance Optimization. PhD thesis,Helsinki University of Technology, 2008.

[20] S. Liu, W. Jiang, and J. Li. Architecture andperformance evaluation for p2p application in 3gmobile cellular systems. In WirelessCommunications, Networking and MobileComputing, 2007. WiCom 2007. InternationalConference on, pages 914–917. IEEE, 2007.

[21] A. Markopoulos, P. Pissaris, S. Kyriazakos, andE. Sykas. Cellular network performance analysis:handoff algorithms based on mobile location andarea information. Wireless PersonalCommunications, 30(2):97–117, 2004.

[22] B. Orstad and E. Reizer. End-to-end keyperformance indicators in cellular networks. PhDthesis, MasteraAZs thesis, Information andCommunication Technology, Agder UniversityCollege, Faculty of Engineering and Science,Grimstad, Norway, 2006.

[23] U. Paul, A. Subramanian, M. Buddhikot, andS. Das. Understanding traffic dynamics in cellulardata networks. In INFOCOM, 2011 ProceedingsIEEE, pages 882–890. IEEE, 2011.

[24] F. Qian, Z. Wang, A. Gerber, Z. Mao, S. Sen, andO. Spatscheck. Characterizing radio resourceallocation for 3g networks. In Proceedings of the10th annual conference on Internet measurement,pages 137–150. ACM, 2010.

[25] M. Rossi, L. Scaranari, and M. Zorzi. On the umtsrlc parameters setting and their impact on higherlayers performance. In Vehicular TechnologyConference, 2003. VTC 2003-Fall. 2003 IEEE58th, volume 3, pages 1827–1832. IEEE, 2003.

[26] P. Svoboda, F. Ricciato, W. Keim, and M. Rupp.Measured web performance in gprs, edge, umtsand hsdpa with and without caching. In World ofWireless, Mobile and Multimedia Networks, 2007.WoWMoM 2007. IEEE International Symposiumon a, pages 1–6. IEEE, 2007.

[27] Q. Zhang, W. Zhu, and Y. Zhang.Network-adaptive scalable video streaming over3g wireless network. In Image Processing, 2001.Proceedings. 2001 International Conference on,volume 3, pages 579–582. IEEE, 2001.

14

Documents

Technical Report - University of Minnesota · Understanding the complexity of 3G UMTS network performance - Modeling, Prediction, and Diagnosis Technical Report Department of Computer