A Reinforcement Learning Approach to Access Management in …downloads.hindawi.com/journals/wcmc/2017/6474768.pdf · ResearchArticle A Reinforcement Learning Approach to Access Management

Research ArticleA Reinforcement Learning Approach to Access Management inWireless Cellular Networks

Jihun Moon1 and Yujin Lim2

1Department of Computer Science University of Suwon San 2-2 Wau-ri Bongdam-eup HwaseongGyeonggi-do 445-743 Republic of Korea2Department of Information Technology Engineering Sookmyung Womenrsquos University100 Cheongpa-ro 47-gil Yongsan-gu Seoul 04310 Republic of Korea

Correspondence should be addressed to Yujin Lim yujin91sookmyungackr

Received 25 February 2017 Accepted 9 April 2017 Published 14 May 2017

Academic Editor Syed Hassan Ahmed

Copyright copy 2017 Jihun Moon and Yujin Lim This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

In smart city applications huge numbers of devices need to be connected in an autonomous manner 3rd Generation PartnershipProject (3GPP) specifies that Machine Type Communication (MTC) should be used to handle data transmission among a largenumber of devices However the data transmission rates are highly variable and this brings about a congestion problem To tacklethis problem the use of Access Class Barring (ACB) is recommended to restrict the number of access attempts allowed in datatransmission by utilizing strategic parameters In this paper we model the problem of determining the strategic parameters with areinforcement learning algorithm In our model the system evolves to minimize both the collision rate and the access delay Theexperimental results show that our scheme improves system performance in terms of the access success rate the failure rate thecollision rate and the access delay

1 Introduction

In smart city applications many smart and mobile devicesconnect with one another and operate in an adaptivemannerThe devices generate relevant city data in bursts and inunexpected manners on a massive scale Because Long TermEvolution-Advanced (LTE-A) cellular networks provide widecoverage and low latency LTE-A is considered to be one ofthemost promising communication infrastructures for smartcity applications However radio resources in this communi-cation infrastructure are too limited to serve a large amountof data 3rd Generation Partnership Project (3GPP) specifiesthat Machine Type Communication (MTC) should be usedto handle the congestion problem caused by small amountsof data being transmitted from a large number of deviceswithin a short period of time [1ndash3] MTC traffic includesperiodic-update traffic and event-driven traffic Event-driventriggering brings about bursts and unpredictable traffic flowsand these add to the congestion problem [4]

When a device has data to send it needs to synchronizewith a base station that is Evolved Node B (eNB) and to

reserve a Random Access Channel (RACH) The RACH isa sequence of physical radio resources (RA slots) To dothis a device follows a four-step random access procedure[5 6] First a device with data to send selects a randomaccess preamble randomly as a digital signature from apredefined set of preambles and sends the preamble to eNBThen eNB responds with a Random Access Response (RAR)message to synchronize subsequent uplink transmissionHowever if multiple devices send the same preamble inthe same RA slot at the first step a collision occurs andthey will receive the same RAR message After receiving theRAR message a device sends a connection request messagealong with a scheduling request Finally eNB acknowledgesthe connection request message If a device receives theacknowledgement message successfully it proceeds to datatransmission If a device encounters a preamble collisionit does not receive the message from eNB and will initiatea new RA procedure after a fixed backoff time When thenumber of unsuccessful attempts of a device reaches thepredefined maximum value (Maximum Number of PreambleTransmission [1]) the device finally fails the RA process

HindawiWireless Communications and Mobile ComputingVolume 2017 Article ID 6474768 7 pageshttpsdoiorg10115520176474768

2 Wireless Communications and Mobile Computing

As the number of devices attempting random accessin the same RA slot increases the numbers of preamblecollisions and access delays increase as well The access delayis the time between the generation of access request and thecompletion of the random access procedure As a result ofsuch a delay the congestion becomes heavy 3GPP specifiesthe use of the Access Class Barring (ACB) scheme to tacklethe congestion problem [7] ACB is a well-known schemethat restricts RA attempts ACB operates on two strategicparameters the barring factor and the barring durationBased on the current congestion status eNB regulates theRA attempts of MTC devices using these two parametersThus the control of the parameters is vital in protecting thesystem from the excessive connectivity of a large number ofdevices However 3GPP does not specify how to control theparameters dynamically

In this paper we model the problem of determining ACBparameter values by using a reinforcement learning algo-rithm This algorithm is able to follow unexpected changesand traffic bursts rapidly Through the learning algorithmwe propose a scheme for dynamically and autonomouslycontrolling the parameters The experimental results showthat the use of our scheme is sufficient to resolve thecongestion problem in terms of access success rate failurerate collision rate and access delay

The rest of the paper is organized as follows In Section 2we discuss related studies In Section 3 we propose an accessmanagement scheme that uses a reinforcement learningalgorithm In Sections 4 and 5 we evaluate the performanceof our scheme and conclude the paper with our plans forfuture research

2 Related Work

Several proposals for tackling the RA congestion problemare discussed In ACB [7] eNB broadcasts the barring factor(0 le 119901 le 1) and barring duration to its cell based on thecurrent congestion level A device with data to send generatesa random number (0 le 119902 le 1) If 119902 le 119901 the device getspermission to access RACH Otherwise the RA attempt isbarred for the barring duration However there is tradeoffwith respect to barring factor119901 If severe congestion occurs ina cell eNB sets 119901 to an extremely low value and most devicesare barred This results in an unacceptable access delay Onthe other hand if eNB sets 119901 to an extremely high value mostof the preambles encounter collisions This results in unac-ceptable data transmission failure Thus the barring factor isan important factor in determining system performance

3GPP specifies the use of Extended ACB (EAB) as wellas ACB In EAB [8] devices are grouped into a set of ACs(Access Classes) eNB broadcasts a barring bitmap for theACs periodically A device with data to send compares its ACwith the bitmap If the bit that corresponds to the AC of thedevice is set the device is barred from transmitting data untilthe bit changes In this case the scheduling policy among theACs is a factor determining system performance In currentcellular networks eNB alone determines the ACB barringfactor to stabilize each cell In [9] a cooperative mechanismis proposed to control congestion globally over multiple cells

The barring factor of each eNB is decided cooperativelyamong all eNBs This is done for global stabilization and foraccess load sharing

In order to maintain a high service quality for HTC(Human Type Communication) 3GPP specifies the use oftwo different schemes the MTC specific backoff scheme andthe separate RA resources scheme [10] In the MTC specificbackoff scheme a dedicated backoff parameter is set for theMTC devices The backoff scheme discourages the devices toattempt randomaccess for certain duration of timeThe back-off value forHTCdevices is shorter than it is forMTCdevicesIn the separate RA resources scheme RA slots are allocatedto HTC and MTC devices separately Both of the schemesfocus on reducing the impact of RACH congestion on HTCdevices Thus MTC devices may experience serious conges-tion because the amount of resources available is reduced

In addition to the solutions specified by 3GPP variousother congestion solutions are proposed In [11] a congestion-aware admission control scheme is proposed It rejects RArequests from MTC devices selectively according to thecongestion level which is directly induced from the incomingpacket processing delay at the application layer In [12]RACH resources are preallocated to different MTC classesusing class-dependent backoff procedures to prevent a largenumber of simultaneous RACH access attempts Dynamicaccess barring according to the traffic load level is pro-posed for collision avoidance Under this barring the accessattempts of devices transmitting for the first time are delayed

3 Proposed Scheme

To tackle the congestion problem we adopt Q-learning (QL)algorithm The algorithm utilizes a form of reinforcementlearning to solve Markovian decision problems withoutpossessing complete information [13 14] Because QL findssolutions through the experience of interacting with anenvironment we use it to model the ACB barring factor [15]In other words we control the ACB barring factor adaptivelywith QL

Let 119878 denote a finite set of possible environment statesand let119860 denote a finite set of admissible actions to be takenAt RA slot 119905 eNB perceives the current state 119904119905 = 119904 isin 119878 ofthe environment and takes an action 119886119905 = 119886 isin 119860 based onboth the perceived state and its past experience The action119886119905 changes the environmental state from 119904119905 to 119904119905+1 = 1199041015840 isin 119878When that happens the system receives the reward 119903119905

The goal of the QL algorithm is to find an optimal policyfor state 119904 that optimizes the rewards over the long run Thealgorithm estimates the 119876-value 119876(119904 119886) as the cumulativediscounted reward Using the 119876-values the algorithm findsthe optimal119876-value119876lowast(119904 119886) in a greedymannerThe119876-valueis updated as

119876 (119904 119886) = 119876 (119904 119886) + 120572 sdot Δ119876 (119904 119886) (1)where 120572 (0 le 120572 le 1) is the learning rate When 120572 is 0 the 119876-value is not updated When 120572 is a high value learning occursquickly as in

Δ119876 (119904 119886) = 119903 + 120574 sdot max1198861015840isin119860

119876(1199041015840 1198861015840) minus 119876 (119904 119886) (2)

Wireless Communications and Mobile Computing 3

Data 1 Data 2 Data 3 Data 4 Data 5Data set

00

01

02

03

04

05

06

07Su

cces

s rat

e

Num of actions = 3Num of actions = 5

(a) The success rate

00

02

04

06

08

10

Failu

re ra

te



(b) The failure rate

00

01

02

03

04

Col

lisio

n ra

te



(c) The collision rate

0

100

200

300

400

500

600

Acce

ss d

elay

(ms)



(d) The access delay

Figure 1 A performance comparison for various number of admissible actions

where 120574 (0 le 120574 le 1) is the discount factor that weighsimmediate rewards more heavily than future rewards

In this paper we model a QL algorithm to control thebarring factor in order to minimize both the number ofRACH collisions and the access delay A collision occurswhen two or more devices transmit the same preamble inthe same time slot We define a set of possible states a setof admissible actions and the rewards First we define theaccess success rate 119877succ

119905 (0 le 119877succ119905 le 1) as a set with

states 119878 The access success rate is defined as the number ofdevices that successfully access RACHdivided by the numberof devices contending in a given RA slot 119877succ

119905 is dividedevenly into |119878| states Each state 119904 has three possible actionsincreasing or decreasing the 119901 value by 120575119894 isin Δ or maintainingthe current 119901 value The Δ indicates a finite set of unit valuesfor119901 To balance the exploration and exploitation of learningan 120598-greedy method [16] is applied to our QL algorithm Inother words we can select a random action with probability 120598or we can select an action with probability 1 minus 120598 that gives anoptimal 119876(119904 119886) in the state 119904 We define the reward in order

to minimize the collision rate (119877col119905 0 lt 119877col

119905 lt 1) and theaccess delay (delay119905 0 lt delay119905 lt delaymax) 119877col

119905 representsthe number of colliding devices divided by the number ofcontending devices The reward given when action 119886 is takenat state 119904 in RA slot 119905 is

119903119905 (119904 119886) = 120596 sdot 1119877col119905

+ (1 minus 120596) delaymaxdelay119905

(3)

where delaymax is the maximum access delay that the systemallows and 120596 is a smoothing factor (0 le 120596 le 1)4 Performance Analysis

In this paper we extend the work of our previous study[15] and evaluate the performance of our access managementscheme in terms of access success rate collision rate failurerate and access delay

We adopted the traffic model for a smart meteringapplication as an experimental scenario in which a large


00

01

02

03

04

05

06

07Su

cces

s rat

e


120596 = 03

120596 = 05

120596 = 07


00

02

04

06

08

10

Failu

re ra

te


120596 = 03

120596 = 05

120596 = 07


00

01

02

03

04

Col

lisio

n ra

te


120596 = 03

120596 = 05

120596 = 07


0

200

400

600

800

1000

Acce

ss d

elay

(ms)


120596 = 03

120596 = 05

120596 = 07


Figure 2 A performance comparison for various values of 120596

number of devices access RACHs in a highly synchronizedmanner [1] Smart metering is one of smart city applicationsIn themodel the housing density of an urban area of Londonlocated within a single cell was used as the density of metersWe set the number of meters 119873 to 35670 Each meterrequested one data transmission for reading frequency 119879 Weset the reading frequency to 5min 3GPPdefines two differenttraffic models for smart metering applications We adopteda Beta distribution based model for our experiments Thenumber of meters that start the RA procedure in the 119905th RAslot is defined as

119899119905 = 119873int119905+1119905

119901 (119905) 119889119905 (4)

where 119901(119905) follows the Beta distribution The 119901(119905) is definedas

119901 (119905) = 119905120572 sdot (119879 minus 119905)(120573minus1)119879120572+120573minus1 sdot Beta (120572 120573) (5)

where Beta(120572 120573) is the Beta function with 120572 = 3 and 120573 = 4

In our QL model we divided 119877succ119905 evenly into 4 states 1199041

for 0 le 119877succ119905 lt 025 1199042 for 025 le 119877succ

119905 lt 05 1199043 for 05 le119877succ119905 lt 075 and 1199044 for 075 le 119877succ

119905 le 1 delaymax was setto the size of an RA slot multiplied by theMaximum Numberof Preamble Transmissions We set the learning rate 120572 in (1)to 09 For the 120598-greedy method we set 120598 to 001 The basicRACH capacity parameters for LTE FDD networks followed[1]

Our scheme was trained using five different datasetsEach dataset included four training sets and one test setThe training sets contained the RA requests generated bythe meters in a series of four reading frequencies After thetraining we measured the performance metrics of the testset In the figures the plotted values indicate the averages ofvalues measured from the five datasets

Figure 1 shows the schemersquos performance with respectto various numbers of admissible actions We defined theset elements Δ = 1205751 = 02 1205752 = 01 as operators tobe used in actions We used only 1205751 when there were threeadmissible actions increasing 119901 by 1205751 decreasing 119901 by 1205751and maintaining the current value of 119901 We used both 1205751


00

01

02

03

04

05

06Su

cces

s rat

e


120574 = 03

120574 = 05

120574 = 07


00

02

04

06

08

10

Failu

re ra

te


120574 = 03

120574 = 05

120574 = 07


00

01

02

03

04

Col

lisio

n ra

te


120574 = 03

120574 = 05

120574 = 07


0

200

400

600

800

Acce

ss d

elay

(ms)


120574 = 03

120574 = 05

120574 = 07



and 1205752 when there were five admissible actions increasing 119901by 1205751 or 1205752 decreasing 119901 by 1205751 or 1205752 and maintaining thecurrent value of 119901 For the reward in (3) we set 120596 to 05and we set the discount factor 120574 in (2) to 05 For the accesssuccess rate the failure rate and the access delay the schemewith three admissible actions showed about 78 25 and26 better performances respectively than the scheme withfive admissible actions did The failure rate was calculatedby taking the number of devices that ultimately failed RAattempts because the preamble transmission counter hadreached Maximum Number of Preamble Transmission anddividing it by the number of contending devices The schemewith three admissible actions showed a collision rate of about4 times that of the scheme with five admissible actions Theperformance with respect to various numbers of admissibleactions is mainly influenced by the granularity of 120575 Whenthe granularity is properly coarse (eg 120575 = 02) the barringfactor swiftly copes with the variance of the number ofmeterstrying to access RACH However when the granularityis too fine (eg 120575 = 01) the barring factor does not

promptly respond to the variance of RA requests The issueto determine the level of granularity is still open

Figure 2 shows the schemersquos performance with respect tovarious values of 120596 which influenced the rewards as shownin (3) We considered three admissible actions involving 1205751with 120574 = 05 For the access success and failure rates thescheme with 120596 = 05 showed about 30 and 16 betterperformances than those of the scheme with 120596 = 03Moreover the scheme with 120596 = 05 showed about 3 timesand 35 better performance than the scheme with 120596 = 07did For collision rate the scheme with 120596 = 07 showed about11 times and 24 times better performances than those with120596 = 03 and 120596 = 05 respectively This is because the rewardsadded more weight to the collision rate when 120596 = 07 Foraccess delay the scheme with 120596 = 05 showed about 25and 73 better performances than those of the schemes with120596 = 03 and 120596 = 07 respectively In these cases the rewardsadded weight to access delay and the scheme with 120596 =03 showed about 38 better performance than that of theschemewith120596 = 07 As shown in the figure the performance


00

01

02

03

04

05

06

07Su

cces

s rat

e

Number of data setData 1 Data 2 Data 3 Data 4 Data 5

Original ACBOur scheme


00

02

04

06

08

10

Failu

re ra

te




Number of data set

00

01

02

03

04

Col

lisio

n ra

te

Data 1 Data 2 Data 3 Data 4 Data 5



0

200

400

600

800

1000

Acce

ss d

elay

(ms)




Figure 4 A performance comparison with the original ACB

is significantly influenced by 120596 When the weight is moreadded to the collision rate the collision rate is improvedWhen the weight is more added to access delay the accessdelay is improvedThemechanism to dynamically control thesmoothing factor needs to improve the performance and weleave it for our future research

Figure 3 shows the schemersquos performance with respect tovarious values for the discount factor 120574 in (2) We used threeadmissible actions involving 1205751 with 120596 = 05 When 120574 waslow weight was added to quicker rewards For the successrate failure rate and access delay the scheme with 120574 = 05showed about 37 22 and 25 better performances thanthe scheme with 120574 = 03 did Compared to the scheme with120574 = 07 the scheme with 120574 = 05 showed about 26 17and 22 better performances respectively For collision ratethe scheme with 120574 = 03 showed about 88 and 68 betterperformances than those of the schemes with 120574 = 05 and120574 = 07 respectively

To evaluate the schemersquos performance we compared ourscheme to the original ACB [7] In the original ACB when

congestion is detected eNB regulates metersrsquo RA attempts bysetting the barring factor to 01 For our schemewe used threeadmissible actions involving 1205751 with 120596 = 05 and 120574 = 05 InFigure 4 our scheme shows a success rate of about 5 timesbetter than that of the original ACB In terms of the failurerate and access delay our scheme showed about 70 and50 better performances respectively For collision rate theoriginal ACB showed a very low value compared to that ofour scheme In the ACB because most meters have restrictedaccess to RACH and the competition for channel resources isreduced the success and collision rates decrease but both thefailure rate and the access delay increase

5 Conclusion

To tackle the RA congestion problem of MTC in LTE-Anetworks we modelled the ACB barring factor decisionproblem using a Q-learning algorithm The algorithm wasable to follow unexpected and burst traffic changes rapidlyThe goals of our model were minimizing both the RACH


collision rate and the access delay To meet these goals wedefined sets of possible states and of admissible actions byusing both the access success rate and the unit values tochange the barring factor To evaluate the performance ofour scheme we adopted the traffic model for smart meteringapplications The results show that our scheme improvessystem performance in terms of the access success rate thefailure rate the collision rate and the access delay

Conflicts of Interest

The authors declare no conflicts of interest

Acknowledgments

This research was supported by the Sookmyung WomenrsquosUniversity Research Grants (1-1703-2030) and by the BasicScience Research Program through the National ResearchFoundation of Korea (NRF) funded by the Ministry ofEducation (NRF-2015R1D1A1A09057141)

References

[1] 3rd Generation Partnership Project ldquoTechnical specificationgroup radio access network study on RAN improvements formachine-type communicationsrdquo 3GPP TR 37868 V1100 2011

[2] 3rd Generation Partnership Project ldquoEvolved universal terres-trial radio access (E-UTRA) radio resource control (RRC)protocol specificationrdquo 3GPP TS 36331 V1270 2015

[3] M S Ali E Hossain andD I Kim ldquoLTELTE-A random accessfor massive machine-type communications in smart citiesrdquoIEEE Communications Magazine vol 55 no 1 pp 76ndash83 2017

[4] P Jain P Hedman and H Zisimopoulos ldquoMachine typecommunications in 3GPP systemsrdquo IEEE CommunicationsMagazine vol 50 no 11 pp 28ndash35 2012

[5] M-Y Cheng G-Y Lin H-YWei and A C-C Hsu ldquoOverloadcontrol for machine-type-communications in LTE-advancedsystemrdquo IEEE Communications Magazine vol 50 no 6 pp 38ndash45 2012

[6] A Laya L Alonso and J Alonso-Zarate ldquoIs the random accesschannel of LTE and LTE-A suitable for M2M communicationsA survey of alternativesrdquo IEEE Communications Surveys ampTutorials vol 16 no 1 pp 4ndash16 2014

[7] M Hasan E Hossain and D Niyato ldquoRandom access formachine-to-machine communication in LTE-advanced net-works issues and approachesrdquo IEEE Communications Maga-zine vol 51 no 6 pp 86ndash93 2013

[8] Intel Corporation ldquoFurther performance evaluation of EABinformation update mechanismsrdquo 3GPP R2-120270 RANWG2Meeting 77 2012

[9] S-Y Lien T-H Liau C-Y Kao and K-C Chen ldquoCooperativeaccess class barring for machine-to-machine communicationsrdquoIEEETransactions onWireless Communications vol 11 no 1 pp27ndash32 2012

[10] 3rd Generation Partnership Project ldquoRACH overload solu-tionsrdquo 3GPP R2-103742 RANWG2 70bis 2010

[11] A Ksentini Y Hadjadj-Aoul and T Taleb ldquoCellular-basedmachine-to-machine overload controlrdquo IEEE Network vol 26no 6 pp 54ndash60 2012

[12] J-P Cheng C-H Lee and T-M Lin ldquoPrioritized RandomAccess with dynamic access barring for RAN overload in 3GPPLTE-A networksrdquo in Proceedings of the IEEE GLOBECOMWorkshops (GC Wkshps rsquo11) pp 368ndash372 Houston Tex USADecember 2011

[13] R S Sutton A G Barto and R J Williams ldquoReinforcementLearning is Direct Adaptive Optimal Controlrdquo IEEE ControlSystems vol 12 no 2 pp 19ndash22 1992

[14] A G Barto S J Bradtke and S P Singh ldquoLearning to act usingreal-time dynamic programmingrdquoArtificial Intelligence vol 72no 1-2 pp 81ndash138 1995

[15] J Moon and Y Lim ldquoAccess control of MTC devices usingreinforcement learning approachrdquo in Proceedings of the IEEEInternational Conference on Information Networking (ICOINrsquo17) pp 641ndash643 2017

[16] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press 1998

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014


Active and Passive Electronic Components

Control Scienceand Engineering

Journal of


International Journal of

RotatingMachinery


Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design



Shock and Vibration


Civil EngineeringAdvances in

Acoustics and VibrationAdvances in



Electrical and Computer Engineering

Journal of

Advances inOptoElectronics


Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of


Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014


Chemical EngineeringInternational Journal of Antennas and

Propagation




Navigation and Observation



DistributedSensor Networks



As the number of devices attempting random accessin the same RA slot increases the numbers of preamblecollisions and access delays increase as well The access delayis the time between the generation of access request and thecompletion of the random access procedure As a result ofsuch a delay the congestion becomes heavy 3GPP specifiesthe use of the Access Class Barring (ACB) scheme to tacklethe congestion problem [7] ACB is a well-known schemethat restricts RA attempts ACB operates on two strategicparameters the barring factor and the barring durationBased on the current congestion status eNB regulates theRA attempts of MTC devices using these two parametersThus the control of the parameters is vital in protecting thesystem from the excessive connectivity of a large number ofdevices However 3GPP does not specify how to control theparameters dynamically

In this paper we model the problem of determining ACBparameter values by using a reinforcement learning algo-rithm This algorithm is able to follow unexpected changesand traffic bursts rapidly Through the learning algorithmwe propose a scheme for dynamically and autonomouslycontrolling the parameters The experimental results showthat the use of our scheme is sufficient to resolve thecongestion problem in terms of access success rate failurerate collision rate and access delay

The rest of the paper is organized as follows In Section 2we discuss related studies In Section 3 we propose an accessmanagement scheme that uses a reinforcement learningalgorithm In Sections 4 and 5 we evaluate the performanceof our scheme and conclude the paper with our plans forfuture research

2 Related Work

Several proposals for tackling the RA congestion problemare discussed In ACB [7] eNB broadcasts the barring factor(0 le 119901 le 1) and barring duration to its cell based on thecurrent congestion level A device with data to send generatesa random number (0 le 119902 le 1) If 119902 le 119901 the device getspermission to access RACH Otherwise the RA attempt isbarred for the barring duration However there is tradeoffwith respect to barring factor119901 If severe congestion occurs ina cell eNB sets 119901 to an extremely low value and most devicesare barred This results in an unacceptable access delay Onthe other hand if eNB sets 119901 to an extremely high value mostof the preambles encounter collisions This results in unac-ceptable data transmission failure Thus the barring factor isan important factor in determining system performance

3GPP specifies the use of Extended ACB (EAB) as wellas ACB In EAB [8] devices are grouped into a set of ACs(Access Classes) eNB broadcasts a barring bitmap for theACs periodically A device with data to send compares its ACwith the bitmap If the bit that corresponds to the AC of thedevice is set the device is barred from transmitting data untilthe bit changes In this case the scheduling policy among theACs is a factor determining system performance In currentcellular networks eNB alone determines the ACB barringfactor to stabilize each cell In [9] a cooperative mechanismis proposed to control congestion globally over multiple cells

The barring factor of each eNB is decided cooperativelyamong all eNBs This is done for global stabilization and foraccess load sharing

In order to maintain a high service quality for HTC(Human Type Communication) 3GPP specifies the use oftwo different schemes the MTC specific backoff scheme andthe separate RA resources scheme [10] In the MTC specificbackoff scheme a dedicated backoff parameter is set for theMTC devices The backoff scheme discourages the devices toattempt randomaccess for certain duration of timeThe back-off value forHTCdevices is shorter than it is forMTCdevicesIn the separate RA resources scheme RA slots are allocatedto HTC and MTC devices separately Both of the schemesfocus on reducing the impact of RACH congestion on HTCdevices Thus MTC devices may experience serious conges-tion because the amount of resources available is reduced

In addition to the solutions specified by 3GPP variousother congestion solutions are proposed In [11] a congestion-aware admission control scheme is proposed It rejects RArequests from MTC devices selectively according to thecongestion level which is directly induced from the incomingpacket processing delay at the application layer In [12]RACH resources are preallocated to different MTC classesusing class-dependent backoff procedures to prevent a largenumber of simultaneous RACH access attempts Dynamicaccess barring according to the traffic load level is pro-posed for collision avoidance Under this barring the accessattempts of devices transmitting for the first time are delayed

3 Proposed Scheme

To tackle the congestion problem we adopt Q-learning (QL)algorithm The algorithm utilizes a form of reinforcementlearning to solve Markovian decision problems withoutpossessing complete information [13 14] Because QL findssolutions through the experience of interacting with anenvironment we use it to model the ACB barring factor [15]In other words we control the ACB barring factor adaptivelywith QL

Let 119878 denote a finite set of possible environment statesand let119860 denote a finite set of admissible actions to be takenAt RA slot 119905 eNB perceives the current state 119904119905 = 119904 isin 119878 ofthe environment and takes an action 119886119905 = 119886 isin 119860 based onboth the perceived state and its past experience The action119886119905 changes the environmental state from 119904119905 to 119904119905+1 = 1199041015840 isin 119878When that happens the system receives the reward 119903119905

The goal of the QL algorithm is to find an optimal policyfor state 119904 that optimizes the rewards over the long run Thealgorithm estimates the 119876-value 119876(119904 119886) as the cumulativediscounted reward Using the 119876-values the algorithm findsthe optimal119876-value119876lowast(119904 119886) in a greedymannerThe119876-valueis updated as

119876 (119904 119886) = 119876 (119904 119886) + 120572 sdot Δ119876 (119904 119886) (1)where 120572 (0 le 120572 le 1) is the learning rate When 120572 is 0 the 119876-value is not updated When 120572 is a high value learning occursquickly as in

Δ119876 (119904 119886) = 119903 + 120574 sdot max1198861015840isin119860

119876(1199041015840 1198861015840) minus 119876 (119904 119886) (2)



00

01

02

03

04

05

06

07Su

cces

s rat

e



00

02

04

06

08

10

Failu

re ra

te




00

01

02

03

04

Col

lisio

n ra

te




0

100

200

300

400

500

600

Acce

ss d

elay

(ms)













119903119905 (119904 119886) = 120596 sdot 1119877col119905


(3)





00

01

02

03

04

05

06

07Su

cces

s rat

e


120596 = 03

120596 = 05

120596 = 07


00

02

04

06

08

10

Failu

re ra

te


120596 = 03

120596 = 05

120596 = 07


00

01

02

03

04

Col

lisio

n ra

te


120596 = 03

120596 = 05

120596 = 07


0

200

400

600

800

1000

Acce

ss d

elay

(ms)


120596 = 03

120596 = 05

120596 = 07




119899119905 = 119873int119905+1119905

119901 (119905) 119889119905 (4)











00

01

02

03

04

05

06Su

cces

s rat

e


120574 = 03

120574 = 05

120574 = 07


00

02

04

06

08

10

Failu

re ra

te


120574 = 03

120574 = 05

120574 = 07


00

01

02

03

04

Col

lisio

n ra

te


120574 = 03

120574 = 05

120574 = 07


0

200

400

600

800

Acce

ss d

elay

(ms)


120574 = 03

120574 = 05

120574 = 07







00

01

02

03

04

05

06

07Su

cces

s rat

e




00

02

04

06

08

10

Failu

re ra

te




Number of data set

00

01

02

03

04

Col

lisio

n ra

te




0

200

400

600

800

1000

Acce

ss d

elay

(ms)









5 Conclusion






Acknowledgments


References

















RoboticsJournal of





Journal of



RotatingMachinery



Journal of

Volume 201


VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation











00

01

02

03

04

05

06

07Su

cces

s rat

e



00

02

04

06

08

10

Failu

re ra

te




00

01

02

03

04

Col

lisio

n ra

te




0

100

200

300

400

500

600

Acce

ss d

elay

(ms)













119903119905 (119904 119886) = 120596 sdot 1119877col119905


(3)





00

01

02

03

04

05

06

07Su

cces

s rat

e


120596 = 03

120596 = 05

120596 = 07


00

02

04

06

08

10

Failu

re ra

te


120596 = 03

120596 = 05

120596 = 07


00

01

02

03

04

Col

lisio

n ra

te


120596 = 03

120596 = 05

120596 = 07


0

200

400

600

800

1000

Acce

ss d

elay

(ms)


120596 = 03

120596 = 05

120596 = 07




119899119905 = 119873int119905+1119905

119901 (119905) 119889119905 (4)











00

01

02

03

04

05

06Su

cces

s rat

e


120574 = 03

120574 = 05

120574 = 07


00

02

04

06

08

10

Failu

re ra

te


120574 = 03

120574 = 05

120574 = 07


00

01

02

03

04

Col

lisio

n ra

te


120574 = 03

120574 = 05

120574 = 07


0

200

400

600

800

Acce

ss d

elay

(ms)


120574 = 03

120574 = 05

120574 = 07







00

01

02

03

04

05

06

07Su

cces

s rat

e




00

02

04

06

08

10

Failu

re ra

te




Number of data set

00

01

02

03

04

Col

lisio

n ra

te




0

200

400

600

800

1000

Acce

ss d

elay

(ms)









5 Conclusion






Acknowledgments


References

















RoboticsJournal of





Journal of



RotatingMachinery



Journal of

Volume 201


VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation










00

01

02

03

04

05

06

07Su

cces

s rat

e


120596 = 03

120596 = 05

120596 = 07


00

02

04

06

08

10

Failu

re ra

te


120596 = 03

120596 = 05

120596 = 07


00

01

02

03

04

Col

lisio

n ra

te


120596 = 03

120596 = 05

120596 = 07


0

200

400

600

800

1000

Acce

ss d

elay

(ms)


120596 = 03

120596 = 05

120596 = 07




119899119905 = 119873int119905+1119905

119901 (119905) 119889119905 (4)











00

01

02

03

04

05

06Su

cces

s rat

e


120574 = 03

120574 = 05

120574 = 07


00

02

04

06

08

10

Failu

re ra

te


120574 = 03

120574 = 05

120574 = 07


00

01

02

03

04

Col

lisio

n ra

te


120574 = 03

120574 = 05

120574 = 07


0

200

400

600

800

Acce

ss d

elay

(ms)


120574 = 03

120574 = 05

120574 = 07







00

01

02

03

04

05

06

07Su

cces

s rat

e




00

02

04

06

08

10

Failu

re ra

te




Number of data set

00

01

02

03

04

Col

lisio

n ra

te




0

200

400

600

800

1000

Acce

ss d

elay

(ms)









5 Conclusion






Acknowledgments


References

















RoboticsJournal of





Journal of



RotatingMachinery



Journal of

Volume 201


VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation










00

01

02

03

04

05

06Su

cces

s rat

e


120574 = 03

120574 = 05

120574 = 07


00

02

04

06

08

10

Failu

re ra

te


120574 = 03

120574 = 05

120574 = 07


00

01

02

03

04

Col

lisio

n ra

te


120574 = 03

120574 = 05

120574 = 07


0

200

400

600

800

Acce

ss d

elay

(ms)


120574 = 03

120574 = 05

120574 = 07







00

01

02

03

04

05

06

07Su

cces

s rat

e




00

02

04

06

08

10

Failu

re ra

te




Number of data set

00

01

02

03

04

Col

lisio

n ra

te




0

200

400

600

800

1000

Acce

ss d

elay

(ms)









5 Conclusion






Acknowledgments


References

















RoboticsJournal of





Journal of



RotatingMachinery



Journal of

Volume 201


VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation










00

01

02

03

04

05

06

07Su

cces

s rat

e




00

02

04

06

08

10

Failu

re ra

te




Number of data set

00

01

02

03

04

Col

lisio

n ra

te




0

200

400

600

800

1000

Acce

ss d

elay

(ms)









5 Conclusion






Acknowledgments


References

















RoboticsJournal of





Journal of



RotatingMachinery



Journal of

Volume 201


VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation













Acknowledgments


References

















RoboticsJournal of





Journal of



RotatingMachinery



Journal of

Volume 201


VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation









RoboticsJournal of





Journal of



RotatingMachinery



Journal of

Volume 201


VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation









Documents

A Reinforcement Learning Approach to Access Management in …downloads.hindawi.com/journals/wcmc/2017/6474768.pdf · ResearchArticle A Reinforcement Learning Approach to Access Management