13
Research Article A Reinforcement Learning-Based Maximum Power Point Tracking Method for Photovoltaic Array Roy Chaoming Hsu, 1 Cheng-Ting Liu, 2 Wen-Yen Chen, 2 Hung-I Hsieh, 1 and Hao-Li Wang 2 1 Department of Electrical Engineering, National Chiayi University, Chiayi City 60004, Taiwan 2 Department of Computer Science and Information Engineering, National Chiayi University, Chiayi City 60004, Taiwan Correspondence should be addressed to Hung-I Hsieh; [email protected] and Hao-Li Wang; [email protected] Received 28 November 2014; Accepted 20 March 2015 Academic Editor: Marcelo C. Cavalcanti Copyright © 2015 Roy Chaoming Hsu et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. A reinforcement learning-based maximum power point tracking (RLMPPT) method is proposed for photovoltaic (PV) array. By utilizing the developed system model of PV array and configuring the environment for the reinforcement learning, the proposed RLMPPT method is able to observe the environment state of the PV array in the learning process and to autonomously adjust the perturbation to the operating voltage of the PV array in obtaining the best MPP. Simulations of the proposed RLMPPT for a PV array are conducted. Experimental results demonstrate that, in comparison to an existing MPPT method, the RLMPPT not only achieves better efficiency factor for both simulated weather data and real weather data but also adapts to the environment much fast with very short learning time. 1. Introduction e U.S. Energy Information Administration (EIA) estimates that the primary sources of energy consisted of petroleum, coal, and natural gas, amounting to over 85% share for fossil fuels in primary energy consumption in the modern world. Yet, recent years’ over exploitation and consumption, and the expectation of depletion of fossil fuel, bring energy crisis to the modern world. Besides, awareness of environmental protection and sustainability in burning of fossil fuel and its products as the primary energy source are also arising. Many environment researchers and environmentalists advocated energy conservation and carbon dioxide (CO 2 ) reduction for the well-being of earth creatures and humans as well. As such, many alternatives for energy, such as energy generated from geothermal, solar, tidal, wind, and waste, are suggested. Among these, solar energy is the most used and promising alternative energy with a fast growing energy market share in the world’s energy industry due to the following advantages. (i) e sunlight and heat for the generation of solar energy is inexhaustible. (ii) Sunlight is easy to access for its irradiance covering the most of the land. (iii) ere is no noise or pollution in the generation of solar energy. (iv) Solar energy is considered as safe energy without burning any material. Owing to the above advantages, many countries in the world started to establish energy policy and develop the related industries for solar energy since 70s. Solar energy is normally generated by utilizing a pho- tovoltaic electrical device, called solar cell or photovoltaic cell, in converting the energy of sunlight into electrical energy. Solar cells may be integrated to form modules or panels, and large photovoltaic arrays may also be formed from the panels. e performance of a photovoltaic (PV) array system depends on the solar cell and array design quality and on the operating conditions as well. e output voltage, current, and power of PV array vary as functions of solar irradiation level, temperature, and load current. Hence, in the design of PV arrays, the PV array output to the load/utility should not be adversely affected by the change in temperature and solar irradiation levels. On the other hand, improvement of the conversion efficiency of the PV array is an issue worth exploring. Generally speaking, there Hindawi Publishing Corporation International Journal of Photoenergy Volume 2015, Article ID 496401, 12 pages http://dx.doi.org/10.1155/2015/496401

Research Article A Reinforcement Learning-Based Maximum ...downloads.hindawi.com/journals/ijp/2015/496401.pdf · Research Article A Reinforcement Learning-Based Maximum Power Point

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Research Article A Reinforcement Learning-Based Maximum ...downloads.hindawi.com/journals/ijp/2015/496401.pdf · Research Article A Reinforcement Learning-Based Maximum Power Point

Research ArticleA Reinforcement Learning-Based Maximum Power PointTracking Method for Photovoltaic Array

Roy Chaoming Hsu1 Cheng-Ting Liu2 Wen-Yen Chen2 Hung-I Hsieh1 and Hao-Li Wang2

1Department of Electrical Engineering National Chiayi University Chiayi City 60004 Taiwan2Department of Computer Science and Information Engineering National Chiayi University Chiayi City 60004 Taiwan

Correspondence should be addressed to Hung-I Hsieh hihsiehmailncyuedutw and Hao-Li Wang haolimailncyuedutw

Received 28 November 2014 Accepted 20 March 2015

Academic Editor Marcelo C Cavalcanti

Copyright copy 2015 Roy Chaoming Hsu et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

A reinforcement learning-based maximum power point tracking (RLMPPT) method is proposed for photovoltaic (PV) array Byutilizing the developed system model of PV array and configuring the environment for the reinforcement learning the proposedRLMPPT method is able to observe the environment state of the PV array in the learning process and to autonomously adjust theperturbation to the operating voltage of the PV array in obtaining the best MPP Simulations of the proposed RLMPPT for a PVarray are conducted Experimental results demonstrate that in comparison to an existing MPPT method the RLMPPT not onlyachieves better efficiency factor for both simulated weather data and real weather data but also adapts to the environment muchfast with very short learning time

1 Introduction

TheUS Energy Information Administration (EIA) estimatesthat the primary sources of energy consisted of petroleumcoal and natural gas amounting to over 85 share for fossilfuels in primary energy consumption in the modern worldYet recent yearsrsquo over exploitation and consumption andthe expectation of depletion of fossil fuel bring energy crisisto the modern world Besides awareness of environmentalprotection and sustainability in burning of fossil fuel and itsproducts as the primary energy source are also arising Manyenvironment researchers and environmentalists advocatedenergy conservation and carbon dioxide (CO

2) reduction for

the well-being of earth creatures and humans as well Assuch many alternatives for energy such as energy generatedfrom geothermal solar tidal wind and waste are suggestedAmong these solar energy is the most used and promisingalternative energy with a fast growing energy market share inthe worldrsquos energy industry due to the following advantages

(i) The sunlight and heat for the generation of solarenergy is inexhaustible

(ii) Sunlight is easy to access for its irradiance coveringthe most of the land

(iii) There is no noise or pollution in the generation ofsolar energy

(iv) Solar energy is considered as safe energy withoutburning any material

Owing to the above advantages many countries in the worldstarted to establish energy policy and develop the relatedindustries for solar energy since 70s

Solar energy is normally generated by utilizing a pho-tovoltaic electrical device called solar cell or photovoltaiccell in converting the energy of sunlight into electricalenergy Solar cells may be integrated to form modules orpanels and large photovoltaic arrays may also be formedfrom the panels The performance of a photovoltaic (PV)array system depends on the solar cell and array designquality and on the operating conditions as well The outputvoltage current and power of PV array vary as functions ofsolar irradiation level temperature and load current Hencein the design of PV arrays the PV array output to theloadutility should not be adversely affected by the changein temperature and solar irradiation levels On the otherhand improvement of the conversion efficiency of the PVarray is an issue worth exploring Generally speaking there

Hindawi Publishing CorporationInternational Journal of PhotoenergyVolume 2015 Article ID 496401 12 pageshttpdxdoiorg1011552015496401

2 International Journal of Photoenergy

are three means to improve the efficiency of photoelectricconversion (1) increasing the photoelectric conversion effi-ciency of photovoltaic diode components (2) increasing thefrequency of direct light and (3) improving the maximumpower point tracking (MPPT) for the PV array The first andsecond methods are to improve the hardware devices yet thethird one is to improve the conversion efficiency by utilizingthe internal software embedded in the PV array systemwhich attractsmany attentions HencemanyMPPTmethodshave been proposed [1] like perturbation and observationmethod [2ndash4] the open-circuit voltage method [5] theswarm intelligence method [6] and so on

In this paper a reinforcement learning-based MPPT(RLMPPT) method is proposed to solve the MPPT problemfor the PV array In the RLMPPT after observing theenvironmental conditions of the PV array the learning agentof the RLMPPT determines the perturbation to the operatingvoltage of the PV array that is the action and receives areward by the rewarding function By receiving rewards theRLMPPT is encouraged to select (state action) pairs withpositive rewards Hence a series of actions with receivedpositive rewards is generated iteratively such that a (stateaction) pair selection strategy is gradually achieved in theso-called ldquolearningrdquo process Once the agent of the RLMPPTlearned the strategy it is able to autonomously adjust theperturbation to the operating voltage of the PV array toobtain the maximum power for tracking theMPPT of the PVarray Research contributions of this study are summarized asfollows

(i) The proposed RLMPPT solves the MPPT problem ofPV array with reinforcement leaning method whichis novel to the best of our knowledge to the area ofMPPT of a PV system

(ii) Reward function constructed from the early MPPknowledge of a PV array experienced from pastweather data is employed in the learning processwithout predetermined parameters required by cer-tain MPPT techniques

(iii) Comprehensive experimental results exhibit theadvantage of the RLMPPT in self-learning and self-adapting to varied weather conditions for trackingthe maximum power point of the PV array

The rest of the paper is organized as follows In Section 2we present the concept of MPPT for PV systems Section 3introduces the proposed RLMPPT for the PV array Theexperimental configurations are described in Section 4 InSection 5 the results are illustrated with figures and tablesFinally Section 6 concludes the paper

2 Concepts of MPPT for PV Systems

21 Review of Operating Characteristics of Solar Cell Solarcells are typically fabricated from semiconductor deviceswhich produce DC electrical power when they are exposed tosunlight of adequate energy When the cells are illuminatedby solar photons the incident photons can break the bondsof ground-state (valence-band at a lower energy level)

electrons so that the valence electrons can then be pumpedby those photons from the ground-state to the excited-state(conduction-band at a higher energy level) Therefore thefree mobile electrons are driven to the external load togenerate the electrical power via a wire and then are returnedto the ground-state at a lower energy level Basically an idealsolar cell can be modeled by a current source in parallelwith a diode however in practice a real solar cell is morecomplicated and contains a shunt and series resistances 119877shand 119877s Figure 1(a) shows an equivalent circuit model of asolar cell including the parasitic shunt and series elementsin which a typical characteristic of practical solar cell withneglecting the 119877sh can be described by [7ndash10]

119868pv = 119868ph minus 119868pvo exp [119902

119860119896119879(119881pv + 119868pv119877s)] minus 1 (1)

119881pv =119902

119860119896119879ln(

119868ph minus 119868pv + 119868pvo

119868pvo) minus 119868pv119877s (2)

where 119868ph is the light-generated current 119868pvo is the darksaturation current 119868pv is the PV electric current119881pv is the PVvoltage 119877s is the series resistance 119860 is the nonideality factor119896 is the Boltzmann constant119879 is the temperature and 119902 is theelectron charge The output power from PV cell can then begiven by

119875pv = 119881pv119868pv = 119868pv 119902

119860119896119879ln(

119868ph minus 119868pv + 119868pvo

119868pvo) minus 119868pv119877s

(3)

The above equations can be applied to simulate the char-acteristics of a PV array provided that the parameters inthe equations are known to the user Figure 1(b) illustratesthe current-voltage (119868-119881) characteristic of the open-circuitvoltage (119881oc) the short-circuit current (119868sc) and the poweroperation of a typical silicon solar cell

As can be seen in the figure the parasitic element 119877shhas no effect on the current 119868sc but it decreases the voltage119881oc in turn the parasitic element 119877s has no effect on thevoltage 119881oc but it decreases the current 119868sc According to(1)ndash(3) a more accurate representation of a solar cell underdifferent irradiances that is the current-to-voltage (119868pv-119881pv)and power-to-voltage (119875pv-119881pv) curves can be described inthe same way with different levels as shown in Figure 2 Themaximum power point (MPP) in this manner occurs whenthe derivative of the power 119875pv to voltage 119881pv is zero where

119889119875pv

119889119881pv= 0 (4)

The resulting 119868-119881 and 119875-119881 curves presented in such way areshown in Figure 2

22 Review of MPPT Methods The well-known perturba-tion and observation (PampO) method for PV MPP tracking[2ndash4] has been extensively used in practical applicationsbecause the idea and implementation are simple Howeveras reported by [11 12] the PampO method is not able to track

International Journal of Photoenergy 3

Iph

D Rsh

RsIpv

Vpv

+

minus

Ish

(a)

Voltage

Curr

ent

Pow

er

Voc

Isc

Pmax

P-V curve

I-V curve

(b)

Figure 1 (a) Equivalent circuit model of a solar cell (b) Solar cell 119868-119881 and power operation curve with the characteristic of 119881oc and 119868sc

9876543210

2702402101801501209060300

0 5 10 15 20 25 30 35 40 45Vpv (V)

I pv

(A) s

olid

line

1000Wm2

600Wm2

300Wm2

MPPdPdV = 0

Ppv

(W) d

ash

lineI-V curve

P-V curve

Figure 2 The 119868pv-119881pv and 119875pv-119881pv characteristics of different irradi-ance levels The maximum power point (MPP) occurring when thederivative dPdV is zero

peak power conditions during periods of varying insolationBasically PampOmethod is amechanism tomove the operatingpoint toward themaximumpower point (MPP) by increasingor decreasing the PV array voltage in a tracking period How-ever the PampO control always deviates from the MPP duringthe tracking whose behavior results in oscillation aroundMPP in case of constant or slowly varying atmosphericconditions Although this issue can be improved by furtherdecreasing the perturbation step the tracking response willbecome slower Under rapidly changing atmospheric condi-tions for PV array the PampOmethodmay sometimesmake thetracking point far from the MPP [13 14]

23 Estimation of MPP by Using119881oc and 119868sc The open-circuitvoltage 119881oc and short-circuit current 119868sc of the PV panel canbe measured respectively when the terminal of the PV panelis open or short In reality both 119881oc and 119868sc are seriouslydependent on the solar insolation However the maximumpower point (MPP) is always located around the roll-offportion of the 119868-119881 characteristic curve in any insolationInterestingly there at MPP appears certain relation betweentheMPP set (119868mpp 119881mpp) and the set (119868sc 119881oc) which is worthyof studying Further the mentioned relation by empirical

estimation seems always to hold and not to be subject tothe insolation variation It can be presumed from commonlyknowledge of PV array that in open-circuit mode therelation of 119881mpp and 119881oc will be

119881mpp = 1198961119881oc (5)

and in the short-circuit mode the relation of 119868mpp and 119868sc willbe

119868mpp = 1198962119868sc (6)

where 1198961and 119896

2are constant factors between 0 and 1 From

(5) and (6) we have the maximum power at MPP that is

119875mpp = 119881mpp119868mpp = 11989611198962119881oc119868sc (7)

Even the 119875mpp for learning is point estimation it should begiven by satisfying the MPP criteria in (4) For learning theempirical result shows that the initial factor 119896

1for 119881mpp is

around 08 and the 1198962for 119868mpp is around 09

3 The Proposed RLMPPT for PV Array

31 Reinforcement Learning (RL) RL [15ndash17] is a heuristiclearning method that has been widely used in many fields ofapplication In the reinforcement learning a learning agentlearns to achieve the predefined goal mainly by constantlyinteractingwith the environment and exploring the appropri-ate actions in the state the agent situatesThe generalmodel ofreinforcement learning is shown in Figure 3 which includesthe agent environment state action and reward

The reinforcement learning is modeled by the Markovdecision process (MDP) where a RL learner referred to asan agent consistently and autonomously interacts with theMDP environment by exercising its predefined behaviors AMDP environment consists of a predefined set of states aset of controllable actions and a state transition model Ingeneral the first order MDP is considered in the RL wherethe next state is only affected by the current state and actionFor the cases where all the parameters of a state transitionmodel are known the optimal decision can be obtained byusing dynamic programming However in some real world

4 International Journal of Photoenergy

Agent

Environment

States Reward Action

Figure 3 Model of the reinforcement learning

cases the model parameters are absent and unknown to theusers hence the agent of the RL explores the environmentand obtains reward from the environment by try-and-errorinteraction The agent then maintains the rewardrsquos runningaverage value for a certain state-action pair According tothe reward value the next action can be decided by someexploration-exploitation strategies such as the 120576-greedy orsoftmax [15 16]

Q-learning is a useful and compact reinforcement learn-ing method for handling andmaintaining running average ofreward [17] Assume an action 119886 is applied to the environmentby the agent and the state goes to 119904

1015840 from 119904 and receives areward 1199031015840 the Q-learning update rule is then given by

119876 (119904 119886) larr997888 119876 (119904 119886) + 120578Δ119876 (1199041015840 119886) (8)

where 120578 is the learning rate for weighting the update value toassure the convergence of the learning the119876(sdot) is the rewardfunction and the delta-term Δ119876(sdot) is represented by

Δ119876 (119904 119886) = [1199031015840+ 120574119876lowast(1199041015840)] minus 119876 (119904 119886) (9)

where 119903 is the immediate reward 120574 is the discount rate toadjust the weight of current optimal value119876lowast(sdot) whose valueis computed by

119876lowast(1199041015840) = maxforall119887isinA

119876(1199041015840 119887) (10)

In (10) A is the set of all candidate actions The learningparameters of 120578 and 120574 in (7) and (8) respectively are usuallyset with value ranges between 0 and 1 Once the RL agentsuccessfully reaches the new state 1199041015840 it will receive a reward1199031015840 and update the Q-value then the 1199041015840 is substituted by thenext state that is 119904 larr 1199041015840 and the upcoming action isthen determined according to the predefined exploration-exploitation strategy such as the 120576-greedy of this study Andthe latest Q-value of the state is applied to the environmentfrom one state to the other

32 State Action and Reward Definition of the RLMPPT Inthe RLMPPT the agent receives the observable environmen-tal signal pair of (119881pv(119894) 119875pv(119894)) whichwill be subtracted from

020406080

100120140160180

0 5 10 15 20 25 30 35 40 45

Pow

er (W

)

Voltage (V)

S3

S0 S1

S2

Figure 4 Four states of the RLMPPT

the previous signal pair in obtaining the (Δ119881pv(119894) Δ119875pv(119894))pair Δ119881pv(119894) and Δ119875pv(119894) each takes positive or negative signsto constitute a state vector S with four states The agentthen adaptively decides and executes the desired pertur-bation ΔV(119894) characterized as action to the 119881pv After theselected action is executed a reward signal 119903(119894) is calculatedand granted to the agent and accordingly the agent thenevaluates the performance of the state-action interactionBy receiving the rewards the agent is encouraged to selectthe action with the best reward This leads to a series ofactions with the best rewards being iteratively generated suchthat MPPT tracking with better performance is graduallyachieved after the learning phase The state action andreward of the RLMPPT for the PV array are sequentiallydefined in the following

(i) States In RLMPPT the state vector is denoted as

119878 = [1198780 1198781 1198782 1198783] sube S (11)

where S is the space of all possible environmental state vectorswith elements transformed from the observable environmentvariables Δ119881pv(119894) and Δ119875pv(119894) where 119878

0 1198781 1198782 and 119878

3

respectively represent at any sensing time slot the state ofgoing toward theMPP from the left the state of going towardthe MPP from the right the state of leaving from the MPP tothe left and the state of leaving from theMPP to the rightThefour states of the RLMPPT can be shown in Figure 4 where1198780indicates that Δ119881pv(119894) and Δ119875pv(119894) have all the positive

sign 1198781indicates that Δ119881pv(119894) is negative and the Δ119875pv(119894) is

positive sign 1198782indicates that Δ119881pv(119894) is positive and Δ119875pv(119894)

is negative sign and finally 1198783indicates that Δ119881pv(119894) and

Δ119875pv(119894) are all the negative sign

(ii) ActionsThe action of the RLMPPT agent is defined as thecontrollable variable of the desired perturbation ΔV(119894) to the119881pv(119894) and the state of the agentrsquos action 119860per is denoted by

119860per isin A = 1198890 1198891 119889

119873 (12)

where A is a set of all the agentrsquos controllable perturbationsΔV(119894) for adding to the119881pv(119894) in obtaining the power from thePV array

(iii) Rewards In RLMPPT rewards are incorporated toaccomplish the goals of obtaining the MPP of the PV array

International Journal of Photoenergy 5

Assess

Select

Adjust

Set

Update

End

False

True

Observe

Initialize

(1) states and actions according to the given mapping range(2) Q table as an 2-D array Q(middot) with size |S| times |A|(3) iteration count i = 0 all Q value in Q(s a) = 0discount rate 120574 = 08 learning rate 120578 = 05and maximum sensing time slot number MTS

i le MTS

Vpv(i) for i and obtain Ppv(i)

i = i + 1 for next sensing time slot

action Δ(i) according to Q table using 120576-greedy algorithm

reward r by (13)

Q value by giving s a s and r into (8)ndash(10)

the next state s

998400

998400

998400998400

Figure 5 The flowchart of the proposed RLMPPT for the PV array

Intuitively the simplest but effective derivation of rewardcould be a hit-or-miss type of function that is once theobservable signal pair (119881pv(119894) 119875pv(119894)) hits the sweetest spotthat is the true MPP of (119881mpp(119894) 119875mpp(119894)) a positive rewardis given to the RL agent otherwise zero reward is given tothe RL agent The hit-or-miss type of reward function couldintuitively be defined as

119903119894+1

= 120575 ((119881pv (119894) 119875pv (119894)) ℎ 119911 (119894)) (13)

where 120575(sdot) represents the Kronecker delta function andℎ 119911(119894) is the hitting spot defined for the 119894th sensing timeslot In defining (13) a negative reward explicitly representspunishment for the agentrsquos failure that is missing the hittingspot could achieve better learning results in comparison witha zero reward in the tracking of MPP of a PV array Inreality the possibility that the observable signal pair (119881pv(119894)119875pv(119894)) leaded by the agentrsquos action of perturbation ΔV(119894) tothe 119881pv(119894) exactly hits the sweetest spot of MPP is very lowin any sensing time slot 119894 Besides it is also very difficult for

the environmentrsquos judge to define a hitting spot for everysensing time slot Hence the hitting spot ℎ 119911(119894) in (13) canbe relaxed where a required hitting zone is defined on theprevious environmental knowledge on a diurnal basis andwhere a positivenegative reward will be given if (119881pv(119894)119875pv(119894)) falls intooutside the predefined hitting zone in anytime slot Hence the reward function can be formulated as

119903119894+1

=

1198881 (119881pv (119894) 119875pv (119894)) isin Hitting Zone

minus1198882 otherwise

(14)

where 1198881and 119888

2are positive values denoting weighting

factors in maximizing the difference between reward andpunishment for better learning effect In this study the 120576-greedy algorithm is used in selecting the RLMPPT agentrsquosactions to avoid repeatedly selection of the same action Theflowchart of the proposed RLMPPT of this study is shown inFigure 5

6 International Journal of Photoenergy

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(a)

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36

Pmax

(W)

Vmax

(V)

(b)

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(c)

Figure 6 MPP distribution under simulated environment (a) Gaussian distribution function generated environment data (b) the set in (a)added with sun occultation by clouds and (c) MPP distribution obtained using real weather data

4 Configurations of Experiments

Experiments of MPPT of PV array utilizing the RLMPPTare conducted by simulation on computer and the results arecompared with existing MPPT methods for PV array

41 Environment Simulation and Configuration for PV ArrayIn this study the PV array used for simulation is SY-175Mmanufactured by Tangshan Shaiyang Solar TechnologyCo Ltd China The power rating of the simulated PVarray was 175W with open-circuit voltage 44V and short-circuit current 52 A Validation of RLMPPTrsquos effectivenessin MPPT was conducted via three experiments using twosimulated and one real weather data sets A basic experimentwas first performed to determine whether the RLMPPTcould achieve the task of MPPT by using a set of Gaussiandistribution function generated temperature and irradianceAnd the effect of sun occultation by clouds is added on theGaussian distribution function generated set of temperatureand irradiance as the second experiment Real weather datafor PV array recorded in April 2014 at Loyola MarymountUniversity California USA obtained fromNational RenewalEnergy Laboratory (NREL) database provided an empiricaldata set to test the RLMPPT under real weather conditionConfigurations of the experiments are described in thefollowing

Assume the PV array is located at the Subtropics area(Chiayi City in the south of Taiwan) during 1000 and1400 in summertime where the temperature and irradiance

respectively are simulated using Gaussian distribution func-tion with mean value of 30∘C and 800Wm2 and standarddeviation of 4∘C and 50Wm2 According to (1) the simu-lated set of temperature and irradiance produces the MPPvoltage (119881mpp) and the calculated MPP (119875mpp) as shown inFigure 6(a) where the 119881mpp and the 119875mpp respectively lay at119909-axis and 119910-axis Figures 6(b) and 6(c) respectively showthe (119881mpp 119875mpp) plots of the Gaussian generated temperatureand irradiance with sun occultation effect and the realweather data recorded on April 1 2014 at LoyolaMarymountUniversity

42 Reward Function and State Action Arrangement Inapplying the RLMPPT to solve the problem of the MPPTfor PV array the reward function plays an important rolebecause not only a good reward function definition couldachieve the right feedback on every execution of learningand tracking but also it could enhance the efficiency ofthe learning algorithm In this study a hitting zone withelliptical shape for (119881mpp 119875mpp) is defined such that a positivereward value is given to the agent whenever it obtained(119881mpp 119875mpp) falls into the hitting zone in the sensing timeslot Figure 7 shows the different size of elliptical hittingzone superimposed on the plot of simulated (119881mpp 119875mpp)of Figure 6(a) The red elliptical circle in Figures 7(a) 7(b)and 7(c) respectively represents that the hitting zone covers372 685 and 855 of the total (119881mpp 119875mpp) pointsobtained from the simulated data of the previous day

In realizing the RLMPPT the state vector is defined as119878 = [119878

0 1198781 1198782 1198783] sube S while the meaning of each state is

International Journal of Photoenergy 7

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(a)

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(b)

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(c)

Figure 7 Different size of elliptical hitting zone superimposed on the plot of simulated (119881mpp 119875mpp) for the first set of simulated environmentdata (a) 372 of the MPPs involved (b) 685 of the MPPs involved and (c) 855 of the MPPs involved

explained in the previous section and shown in Figure 5 Sixperturbations of ΔV(119894) to the 119881pv(119894) are defined as the set ofaction as follows

A = 119886119894| 1198860= minus5V 119886

1= minus2V 119886

2= minus05V

1198863= +05V 119886

4= +2V 119886

5= +5V

(15)

The 120576-greedy is used in choosing agentrsquos actions in theRLMPPT such that agent repeatedly selects the same actionis preventedThe rewarding policy of the hitting zone rewardfunction is to give a reward value of 10 and 0 respectivelyto the agent whenever it obtained (119881mpp 119875mpp) in any sensingtime slot falls-in and falls-out the hitting zone

5 Experimental Results

51 Results of the Gaussian Distribution Function GeneratedEnvironment Data In this study experiments of RLMPPTin testing the Gaussian generated and real environment dataare conducted and the results are compared with those of thePampO method and the open-circuit voltage method

For the Gaussian distribution function generated envi-ronment simulation the percentages of each RL agent choos-ing actions in early phase (0sim25 minutes) middle phase(100sim125 minutes) and final phase (215sim240 minutes) of theRLMPPT are shown in the second third and fourth rowrespectively of Table 1 within two-hour period It can be seenthat in the early phase of the simulation the RL agent isin fast learning stage such that the action chosen by agentis concentrated on the plusmn5V and plusmn2V actions However inthe middle and final phase of the simulation the learning

Table 1 The percentage of choosing action in different phase

Interval (minutes) Action (V)+5 +2 +05 minus05 minus2 minus5

Early 0sim25 16 20 8 12 24 20Middle 100sim125 4 28 24 16 20 8Final 215sim240 4 8 36 40 8 4

is completed and the agent exploited what it has learnedhence the percentage of choosing fine tuning actions that isplusmn05 V is increased from 20 of the early phase to 40 and76 respectively for the middle and final phase It can beconcluded that the learning agent fast learned the strategy inselecting the appropriate action toward reaching the MPPTand hence the goal of tracking the MPP is achieved by the RLagent

Experimental results of the offsets between the calculatedand the tracking MPP by the RLMPPT and comparingmethods at the sensing time are shown in Figure 8 Figures8(a) 8(b) and 8(c) respectively show the offsets between thecalculated MPP and the tracking MPP by the PampO methodthe open-circuit voltagemethod and the RLMPPTmethod atthe sensing time One can see that among the experimentalresults of the three comparingmethods the open-circuit volt-age method obtained the largest offset which is concentratedaround 15W Even though the offsets obtained by the PampOmethod fall largely below 10W however large portion ofoffset obtained by the PampO method also randomly scatteredbetween 1 and 10W On the other hand the proposedRLMPPT method achieves the least and condenses offsets

8 International Journal of Photoenergy

020406080

100120140

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(a)

0 50 100 150 200 250Time (min)

0

5

10

15

20

25

PmaxminusP

actu

al(W

)

(b)

0 50 100 150 200 250Time (min)

0

5

10

15

20

25

PmaxminusP

actu

al(W

)

(c)

Figure 8 The maximum power point minus the predicted current power point corresponds to the same sensing time (a) PampO method(b) open-circuit voltage method and (c) RLMPPT method

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

Figure 9Thedefinition of hitting zone for the 2nd set of experimentdata

below 5W and only small portions fall outside of 5W evenin the early learning phase

52 Results of the Gaussian Distribution Function GeneratedEnvironment Data with Sun Occultation by Clouds In thisexperiment the simulated data are obtained by adding a 30chance of sun occultation by clouds to the test data in thefirst experiment such that the temperature will fall down0 to 3∘C and the irradiance will decrease 0 to 300Wm2The experiment is conducted to illustrate the capability ofRLMPPT in tracking the MPP of PV array under variedweather condition Figure 9 shows the hitting zone definition

Table 2 The percentage of choosing action in different phase

Interval (minutes) Action (V)+5 +2 +05 minus05 minus2 minus5

Early 0sim25 28 8 12 8 12 32Middle 100sim125 8 16 24 20 20 12Final 215sim240 4 12 40 32 8 4

for the simulated data with added sun occultation by cloudsto the Gaussian distribution function generated environmentdata The red elliptical shape in Figure 9 covers the 902 ofthe total (119881mpp 119875mpp) points obtained from simulated data ofthe previous day Table 2 shows the percentage of selectingaction that is the perturbation ΔV(119894) to the 119881pv(119894) by theRLMPPT The percentages of each RL agent choosing actionin early phase middle phase and final phase of the RLMPPTmethod are shown in the second third and fourth rowrespectively of Table 2

It can be seen from Table 2 that the percentages ofselecting fine tuning actions that is the perturbation ΔV(119894)is plusmn05 V increase from 20 to 44 and finally to 72respectively for the early phase middle phase and finalphase of MPP tracking via the RL agent This table againillustrated the fact that the learning agent fast learned thestrategy in selecting the appropriate action toward reachingthe MPPT and hence the goal of tracking the MPP of thePV array is achieved by the RL agent However due tothe varied weather condition on sun occultation by clouds

International Journal of Photoenergy 9

0102030405060708090

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(a)

0

5

10

15

20

25

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(b)

0102030405060708090

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(c)

Figure 10 The maximum power point minus the predicted current power point corresponds to the same sensing time (a) PampO method(b) open-circuit voltage method and (c) RLMPPT method

the percentage of selecting fine-tuning actions is somewhatvaried a little bit in comparison with the results obtainedin Table 1 whose simulated data are generated by Gaussiandistribution function without sun occultation effect

Experimental results of the offsets between the calculatedand the tracking MPP by the comparing methods of PampOthe open-circuit voltage and the RLMPPT at the samesensing time are shown in Figures 10(a) 10(b) and 10(c)respectively Experiment data from Figure 11 again exhibitedthat among the three comparing methods the open-circuitvoltage method obtained the largest and sparsely distributedoffsets data which are concentrated around 15W Eventhough the offsets obtained by the PampO method fall largelybetween 0 and 5W however large portion of offset obtainedby the PampOmethod scattered around 1 to 40W before the 50minutes of the experiment On the other hand the proposedRLMPPT method achieves the least and condensed offsetsbelow 5W andmostly close to 3W in the final tracking phaseafter 200 minutes

53 Results of the Real Environment Data Real weather datafor PV array recorded in April 2014 at Loyola MarymountUniversity California USA is obtained online fromNationalRenewal Energy Laboratory (NREL) database for testing theRLMPPTmethod under real environment dataThe databaseis selected because the geographical location of the sensing

Table 3The percentage of choosing action in different phase for theexperiment with real weather data

Interval (minutes) Action (V)+5 +2 +05 minus05 minus2 minus5

Early 0sim25 24 16 8 12 12 28Middle 100sim125 16 16 12 24 24 8Final 215sim240 4 16 32 32 12 4

station is also located in the subtropical area A period ofrecorded data from 1000 to 1400 for 5 consecutive days isshown in Figure 6(c) and the hitting zone reward functionfor the data from earlier days is shown in Figure 11(a) Thered elliptical shape in Figure 11(a) covers the 934of the total(119881mpp 119875mpp) points obtained from the previous 5 consecutivedays of the NREL real weather data for testing Figures11(b) and 11(c) respectively show the real temperature andirradiance data recorded at 04012014 for generating the testdata

Table 3 shows the percentage of selecting action bythe RLMPPT The percentages of each RL agent choosingaction in early phase (0sim25 minutes) middle phase (100sim125minutes) and final phase (215sim240 minutes) of the RLMPPTmethod are shown in the second third and fourth rowsrespectively of Table 3

10 International Journal of Photoenergy

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(a)

15

155

16

165

17

175

0 50 100 150 200 250Time (min)

Tem

pera

ture

(∘C)

(b)

900

910

920

930

940

950

960

970

0 50 100 150 200 250Time (min)

Irra

dian

ce (W

m2)

(c)

Figure 11 (a) The definition of hitting zone for the experiment using real weather data (b) the real temperature data recorded in 04012014and used for generating the test data and (c) the real irradiance data recorded at 04012014 and used for generating the test data

In Table 3 one can see that the percentage of selectingfine tuning actions that is the perturbation ΔV(119894) is plusmn05 Vincrease from 20 to 36 and finally to 64 respectivelyfor the early phase middle phase and final phase of MPPtracking via the RL agent Even though the percentage ofselecting fine tuning actions in this real data experiment hasthe least value among the three experiments it exhibits thatthe RLMPPT learns to exercise the appropriate action of theperturbationΔV(119894) in tracking theMPP of the PV array underreal weather data This table again illustrated the fact thatthe learning agent fast learned the strategy in selecting theappropriate action toward reaching the MPP and hence thegoal of tracking the MPP of the PV array is achieved by theRL agent

Experimental results of the offsets between the MPPTand the predicted MPPT by the comparing methods at thesensing time for the simulation data generated by the realweather data are shown in Figure 12 Figures 12(a) 12(b) and12(c) respectively show the offsets between the calculatedMPPT and the tracking MPPT by the PampO method theopen-circuit voltage method and the RLMPPT methodExperiment data from Figure 12 again shows that among thethree comparing methods the open-circuit voltage methodobtained the largest and sparsely distributed offsets datawhose distributions are concentrated within a band cover bytwoGaussian distribution functionswith themaximumoffset

value of 197W The offsets obtained by the PampO methodfall largely around 5 and 25W however large portion ofoffset obtained by the PampO method sharply decreased from1 to 40W in the early 70 minutes of the experiment By theobservation of Figure 12(c) the proposed RLMPPT methodachieves the least and condenses offsets near 1 or 2W andnone of the offsets is higher than 5W after 5 simulationminutes from the beginning

54 Performance Comparison with Efficiency Factor In orderto validate whether the RLMPPTmethod is effective or not intracking MPP of a PV array the efficiency factor 120578 is used tocompare the performance of other existing methods for thethree experiments The 120578 is defined as follows

120578mppt =int119905

0119875actual (119905) 119889119905

int119905

0119875max (119905) 119889119905

(16)

where 119875actual(119905) and 119875max(119905) respectively represent the track-ing MPP of the MPPT method and the calculated MPPTable 4 shows that for the three test data sets the open-circuit voltage method has the least efficiency factor amongthe three comparing methods and the RLMPPT method hasthe best efficiency factor which is slightly better than that ofPampO method The advantage of the RLMPPT method overthe PampOmethod is shown in Figures 10 and 12where not only

International Journal of Photoenergy 11

020406080

100120140160

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(a)

186187188189

19191192193194195196

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(b)

0

50

100

150

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(c)

Figure 12 The maximum power point minus the predicted current power point corresponds to the same sensing time (a) PampO method(b) open-circuit voltage method and (c) RLMPPT method

Table 4 Comparison of efficiency factor for the three comparingmethods

Learning phase (min) MethodsOpen-circuit voltage PampO RLMPPT

Early 1000sim1025 869 835 892Middle 1140sim1205 875 978 994Final 1335sim1400 878 977 993Overall 1000sim1400 876 954 985

the RLMPPT method significantly improves the efficiencyfactor in comparing that of the PampO but also the learningagent of the RLMPPT fast learned the strategy in selecting theappropriate action toward reaching theMPPTwhich is muchfaster than the PampO method in exhibiting a slow adaptivephase Hence the RLMPPT not only improves the efficiencyfactor in tracking the MPP of the PV array but also has thefast learning capability in achieving the task of MPPT of thePV array

6 Conclusions

In this study a reinforcement learning-based maximumpower point tracking (RLMPPT) method is proposed forPV arrayThe RLMPPTmethod monitors the environmental

state of the PV array and adjusts the perturbation to theoperating voltage of the PV array in achieving the best MPPSimulations of the proposed RLMPPT for a PV array areconducted on three kinds of data set which are simulatedGaussian weather data simulated Gaussian weather dataadded with sun occultation effect and real weather data fromNREL database Experimental results demonstrate that incomparison to the existing PampO method the RLMPPT notonly achieves better efficiency factor for both simulated andreal weather data sets but also adapts to the environmentmuch fast with very short learning time Further the rein-forcement learning-basedMPPTmethodwould be employedin the real PV array to validate the effectiveness of theproposed novel MPPT method

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] T Esram andP L Chapman ldquoComparison of photovoltaic arraymaximum power point tracking techniquesrdquo IEEE Transactionson Energy Conversion vol 22 no 2 pp 439ndash449 2007

12 International Journal of Photoenergy

[2] D P Hohm and M E Ropp ldquoComparative study of maximumpower point tracking algorithmsrdquo Progress in PhotovoltaicsResearch and Applications vol 11 no 1 pp 47ndash62 2003

[3] N Femia G Petrone G Spagnuolo and M Vitelli ldquoOptimiza-tion of perturb and observe maximum power point trackingmethodrdquo IEEE Transactions on Power Electronics vol 20 no4 pp 963ndash973 2005

[4] J S Kumari D C S Babu andA K Babu ldquoDesign and analysisof PO and IPO MPPT technique for photovoltaic systemrdquoInternational Journal ofModern Engineering Research vol 2 no4 pp 2174ndash2180 2012

[5] D P Hohm and M E Ropp ldquoComparative study of maximumpower point tracking algorithms using an experimental pro-grammable maximum power point tracking test bedrdquo in Pro-ceedings of the Conference Record of the IEEE 28th PhotovoltaicSpecialists Conference pp 1699ndash1702 2000

[6] L-R Chen C-H Tsai Y-L Lin and Y-S Lai ldquoA biologicalswarm chasing algorithm for tracking the PVmaximum powerpointrdquo IEEE Transactions on Energy Conversion vol 25 no 2pp 484ndash493 2010

[7] T L Kottas Y S Boutalis and A D Karlis ldquoNew maximumpower point tracker for PV arrays using fuzzy controller in closecooperation with fuzzy cognitive networksrdquo IEEE Transactionson Energy Conversion vol 21 no 3 pp 793ndash803 2006

[8] N Mutoh M Ohno and T Inoue ldquoA method for MPPT con-trol while searching for parameters corresponding to weatherconditions for PV generation systemsrdquo IEEE Transactions onIndustrial Electronics vol 53 no 4 pp 1055ndash1065 2006

[9] G C Hsieh H I Hsieh C Y Tsai and C H Wang ldquoPhotovol-taic power-increment-aided incremental-conductance MPPTwith two-phased trackingrdquo IEEE Transactions on Power Elec-tronics vol 28 no 6 pp 2895ndash2911 2013

[10] R L Mueller M T Wallace and P Iles ldquoScaling nominal solarcell impedances for array designrdquo in Proceedings of the IEEE 1stWorld Conference on Photovoltaic Energy Conversion vol 2 pp2034ndash2037 December 1994

[11] N Femia G Petrone G Spagnuolo andMVitelli ldquoOptimizingsampling rate of PampO MPPT techniquerdquo in Proceedings ofthe IEEE 35th Annual Power Electronics Specialists Conference(PESC rsquo04) vol 3 pp 1945ndash1949 June 2004

[12] N Femia G PetroneG Spagnuolo andMVitelli ldquoA techniquefor improving PampOMPPT performances of double-stage grid-connected photovoltaic systemsrdquo IEEE Transactions on Indus-trial Electronics vol 56 no 11 pp 4473ndash4482 2009

[13] K H Hussein I Muta T Hoshino and M Osakada ldquoMax-imum photovoltaic power tracking an algorithm for rapidlychanging atmospheric conditionsrdquo IEE ProceedingsmdashGenera-tion Transmission and Distribution vol 142 no 1 pp 59ndash641995

[14] C Hua J Lin and C Shen ldquoImplementation of a DSP-controlled photovoltaic systemwith peak power trackingrdquo IEEETransactions on Industrial Electronics vol 45 no 1 pp 99ndash1071998

[15] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996

[16] A G Barto Reinforcement Learning An Introduction MITPress 1998

[17] C JWatkins and PDayan ldquoQ-learningrdquoMachine Learning vol8 no 3-4 pp 279ndash292 1992

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Inorganic ChemistryInternational Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

International Journal ofPhotoenergy

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Carbohydrate Chemistry

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Physical Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom

Analytical Methods in Chemistry

Journal of

Volume 2014

Bioinorganic Chemistry and ApplicationsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

SpectroscopyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Medicinal ChemistryInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chromatography Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied ChemistryJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Theoretical ChemistryJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Spectroscopy

Analytical ChemistryInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Quantum Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Organic Chemistry International

ElectrochemistryInternational Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CatalystsJournal of

Page 2: Research Article A Reinforcement Learning-Based Maximum ...downloads.hindawi.com/journals/ijp/2015/496401.pdf · Research Article A Reinforcement Learning-Based Maximum Power Point

2 International Journal of Photoenergy

are three means to improve the efficiency of photoelectricconversion (1) increasing the photoelectric conversion effi-ciency of photovoltaic diode components (2) increasing thefrequency of direct light and (3) improving the maximumpower point tracking (MPPT) for the PV array The first andsecond methods are to improve the hardware devices yet thethird one is to improve the conversion efficiency by utilizingthe internal software embedded in the PV array systemwhich attractsmany attentions HencemanyMPPTmethodshave been proposed [1] like perturbation and observationmethod [2ndash4] the open-circuit voltage method [5] theswarm intelligence method [6] and so on

In this paper a reinforcement learning-based MPPT(RLMPPT) method is proposed to solve the MPPT problemfor the PV array In the RLMPPT after observing theenvironmental conditions of the PV array the learning agentof the RLMPPT determines the perturbation to the operatingvoltage of the PV array that is the action and receives areward by the rewarding function By receiving rewards theRLMPPT is encouraged to select (state action) pairs withpositive rewards Hence a series of actions with receivedpositive rewards is generated iteratively such that a (stateaction) pair selection strategy is gradually achieved in theso-called ldquolearningrdquo process Once the agent of the RLMPPTlearned the strategy it is able to autonomously adjust theperturbation to the operating voltage of the PV array toobtain the maximum power for tracking theMPPT of the PVarray Research contributions of this study are summarized asfollows

(i) The proposed RLMPPT solves the MPPT problem ofPV array with reinforcement leaning method whichis novel to the best of our knowledge to the area ofMPPT of a PV system

(ii) Reward function constructed from the early MPPknowledge of a PV array experienced from pastweather data is employed in the learning processwithout predetermined parameters required by cer-tain MPPT techniques

(iii) Comprehensive experimental results exhibit theadvantage of the RLMPPT in self-learning and self-adapting to varied weather conditions for trackingthe maximum power point of the PV array

The rest of the paper is organized as follows In Section 2we present the concept of MPPT for PV systems Section 3introduces the proposed RLMPPT for the PV array Theexperimental configurations are described in Section 4 InSection 5 the results are illustrated with figures and tablesFinally Section 6 concludes the paper

2 Concepts of MPPT for PV Systems

21 Review of Operating Characteristics of Solar Cell Solarcells are typically fabricated from semiconductor deviceswhich produce DC electrical power when they are exposed tosunlight of adequate energy When the cells are illuminatedby solar photons the incident photons can break the bondsof ground-state (valence-band at a lower energy level)

electrons so that the valence electrons can then be pumpedby those photons from the ground-state to the excited-state(conduction-band at a higher energy level) Therefore thefree mobile electrons are driven to the external load togenerate the electrical power via a wire and then are returnedto the ground-state at a lower energy level Basically an idealsolar cell can be modeled by a current source in parallelwith a diode however in practice a real solar cell is morecomplicated and contains a shunt and series resistances 119877shand 119877s Figure 1(a) shows an equivalent circuit model of asolar cell including the parasitic shunt and series elementsin which a typical characteristic of practical solar cell withneglecting the 119877sh can be described by [7ndash10]

119868pv = 119868ph minus 119868pvo exp [119902

119860119896119879(119881pv + 119868pv119877s)] minus 1 (1)

119881pv =119902

119860119896119879ln(

119868ph minus 119868pv + 119868pvo

119868pvo) minus 119868pv119877s (2)

where 119868ph is the light-generated current 119868pvo is the darksaturation current 119868pv is the PV electric current119881pv is the PVvoltage 119877s is the series resistance 119860 is the nonideality factor119896 is the Boltzmann constant119879 is the temperature and 119902 is theelectron charge The output power from PV cell can then begiven by

119875pv = 119881pv119868pv = 119868pv 119902

119860119896119879ln(

119868ph minus 119868pv + 119868pvo

119868pvo) minus 119868pv119877s

(3)

The above equations can be applied to simulate the char-acteristics of a PV array provided that the parameters inthe equations are known to the user Figure 1(b) illustratesthe current-voltage (119868-119881) characteristic of the open-circuitvoltage (119881oc) the short-circuit current (119868sc) and the poweroperation of a typical silicon solar cell

As can be seen in the figure the parasitic element 119877shhas no effect on the current 119868sc but it decreases the voltage119881oc in turn the parasitic element 119877s has no effect on thevoltage 119881oc but it decreases the current 119868sc According to(1)ndash(3) a more accurate representation of a solar cell underdifferent irradiances that is the current-to-voltage (119868pv-119881pv)and power-to-voltage (119875pv-119881pv) curves can be described inthe same way with different levels as shown in Figure 2 Themaximum power point (MPP) in this manner occurs whenthe derivative of the power 119875pv to voltage 119881pv is zero where

119889119875pv

119889119881pv= 0 (4)

The resulting 119868-119881 and 119875-119881 curves presented in such way areshown in Figure 2

22 Review of MPPT Methods The well-known perturba-tion and observation (PampO) method for PV MPP tracking[2ndash4] has been extensively used in practical applicationsbecause the idea and implementation are simple Howeveras reported by [11 12] the PampO method is not able to track

International Journal of Photoenergy 3

Iph

D Rsh

RsIpv

Vpv

+

minus

Ish

(a)

Voltage

Curr

ent

Pow

er

Voc

Isc

Pmax

P-V curve

I-V curve

(b)

Figure 1 (a) Equivalent circuit model of a solar cell (b) Solar cell 119868-119881 and power operation curve with the characteristic of 119881oc and 119868sc

9876543210

2702402101801501209060300

0 5 10 15 20 25 30 35 40 45Vpv (V)

I pv

(A) s

olid

line

1000Wm2

600Wm2

300Wm2

MPPdPdV = 0

Ppv

(W) d

ash

lineI-V curve

P-V curve

Figure 2 The 119868pv-119881pv and 119875pv-119881pv characteristics of different irradi-ance levels The maximum power point (MPP) occurring when thederivative dPdV is zero

peak power conditions during periods of varying insolationBasically PampOmethod is amechanism tomove the operatingpoint toward themaximumpower point (MPP) by increasingor decreasing the PV array voltage in a tracking period How-ever the PampO control always deviates from the MPP duringthe tracking whose behavior results in oscillation aroundMPP in case of constant or slowly varying atmosphericconditions Although this issue can be improved by furtherdecreasing the perturbation step the tracking response willbecome slower Under rapidly changing atmospheric condi-tions for PV array the PampOmethodmay sometimesmake thetracking point far from the MPP [13 14]

23 Estimation of MPP by Using119881oc and 119868sc The open-circuitvoltage 119881oc and short-circuit current 119868sc of the PV panel canbe measured respectively when the terminal of the PV panelis open or short In reality both 119881oc and 119868sc are seriouslydependent on the solar insolation However the maximumpower point (MPP) is always located around the roll-offportion of the 119868-119881 characteristic curve in any insolationInterestingly there at MPP appears certain relation betweentheMPP set (119868mpp 119881mpp) and the set (119868sc 119881oc) which is worthyof studying Further the mentioned relation by empirical

estimation seems always to hold and not to be subject tothe insolation variation It can be presumed from commonlyknowledge of PV array that in open-circuit mode therelation of 119881mpp and 119881oc will be

119881mpp = 1198961119881oc (5)

and in the short-circuit mode the relation of 119868mpp and 119868sc willbe

119868mpp = 1198962119868sc (6)

where 1198961and 119896

2are constant factors between 0 and 1 From

(5) and (6) we have the maximum power at MPP that is

119875mpp = 119881mpp119868mpp = 11989611198962119881oc119868sc (7)

Even the 119875mpp for learning is point estimation it should begiven by satisfying the MPP criteria in (4) For learning theempirical result shows that the initial factor 119896

1for 119881mpp is

around 08 and the 1198962for 119868mpp is around 09

3 The Proposed RLMPPT for PV Array

31 Reinforcement Learning (RL) RL [15ndash17] is a heuristiclearning method that has been widely used in many fields ofapplication In the reinforcement learning a learning agentlearns to achieve the predefined goal mainly by constantlyinteractingwith the environment and exploring the appropri-ate actions in the state the agent situatesThe generalmodel ofreinforcement learning is shown in Figure 3 which includesthe agent environment state action and reward

The reinforcement learning is modeled by the Markovdecision process (MDP) where a RL learner referred to asan agent consistently and autonomously interacts with theMDP environment by exercising its predefined behaviors AMDP environment consists of a predefined set of states aset of controllable actions and a state transition model Ingeneral the first order MDP is considered in the RL wherethe next state is only affected by the current state and actionFor the cases where all the parameters of a state transitionmodel are known the optimal decision can be obtained byusing dynamic programming However in some real world

4 International Journal of Photoenergy

Agent

Environment

States Reward Action

Figure 3 Model of the reinforcement learning

cases the model parameters are absent and unknown to theusers hence the agent of the RL explores the environmentand obtains reward from the environment by try-and-errorinteraction The agent then maintains the rewardrsquos runningaverage value for a certain state-action pair According tothe reward value the next action can be decided by someexploration-exploitation strategies such as the 120576-greedy orsoftmax [15 16]

Q-learning is a useful and compact reinforcement learn-ing method for handling andmaintaining running average ofreward [17] Assume an action 119886 is applied to the environmentby the agent and the state goes to 119904

1015840 from 119904 and receives areward 1199031015840 the Q-learning update rule is then given by

119876 (119904 119886) larr997888 119876 (119904 119886) + 120578Δ119876 (1199041015840 119886) (8)

where 120578 is the learning rate for weighting the update value toassure the convergence of the learning the119876(sdot) is the rewardfunction and the delta-term Δ119876(sdot) is represented by

Δ119876 (119904 119886) = [1199031015840+ 120574119876lowast(1199041015840)] minus 119876 (119904 119886) (9)

where 119903 is the immediate reward 120574 is the discount rate toadjust the weight of current optimal value119876lowast(sdot) whose valueis computed by

119876lowast(1199041015840) = maxforall119887isinA

119876(1199041015840 119887) (10)

In (10) A is the set of all candidate actions The learningparameters of 120578 and 120574 in (7) and (8) respectively are usuallyset with value ranges between 0 and 1 Once the RL agentsuccessfully reaches the new state 1199041015840 it will receive a reward1199031015840 and update the Q-value then the 1199041015840 is substituted by thenext state that is 119904 larr 1199041015840 and the upcoming action isthen determined according to the predefined exploration-exploitation strategy such as the 120576-greedy of this study Andthe latest Q-value of the state is applied to the environmentfrom one state to the other

32 State Action and Reward Definition of the RLMPPT Inthe RLMPPT the agent receives the observable environmen-tal signal pair of (119881pv(119894) 119875pv(119894)) whichwill be subtracted from

020406080

100120140160180

0 5 10 15 20 25 30 35 40 45

Pow

er (W

)

Voltage (V)

S3

S0 S1

S2

Figure 4 Four states of the RLMPPT

the previous signal pair in obtaining the (Δ119881pv(119894) Δ119875pv(119894))pair Δ119881pv(119894) and Δ119875pv(119894) each takes positive or negative signsto constitute a state vector S with four states The agentthen adaptively decides and executes the desired pertur-bation ΔV(119894) characterized as action to the 119881pv After theselected action is executed a reward signal 119903(119894) is calculatedand granted to the agent and accordingly the agent thenevaluates the performance of the state-action interactionBy receiving the rewards the agent is encouraged to selectthe action with the best reward This leads to a series ofactions with the best rewards being iteratively generated suchthat MPPT tracking with better performance is graduallyachieved after the learning phase The state action andreward of the RLMPPT for the PV array are sequentiallydefined in the following

(i) States In RLMPPT the state vector is denoted as

119878 = [1198780 1198781 1198782 1198783] sube S (11)

where S is the space of all possible environmental state vectorswith elements transformed from the observable environmentvariables Δ119881pv(119894) and Δ119875pv(119894) where 119878

0 1198781 1198782 and 119878

3

respectively represent at any sensing time slot the state ofgoing toward theMPP from the left the state of going towardthe MPP from the right the state of leaving from the MPP tothe left and the state of leaving from theMPP to the rightThefour states of the RLMPPT can be shown in Figure 4 where1198780indicates that Δ119881pv(119894) and Δ119875pv(119894) have all the positive

sign 1198781indicates that Δ119881pv(119894) is negative and the Δ119875pv(119894) is

positive sign 1198782indicates that Δ119881pv(119894) is positive and Δ119875pv(119894)

is negative sign and finally 1198783indicates that Δ119881pv(119894) and

Δ119875pv(119894) are all the negative sign

(ii) ActionsThe action of the RLMPPT agent is defined as thecontrollable variable of the desired perturbation ΔV(119894) to the119881pv(119894) and the state of the agentrsquos action 119860per is denoted by

119860per isin A = 1198890 1198891 119889

119873 (12)

where A is a set of all the agentrsquos controllable perturbationsΔV(119894) for adding to the119881pv(119894) in obtaining the power from thePV array

(iii) Rewards In RLMPPT rewards are incorporated toaccomplish the goals of obtaining the MPP of the PV array

International Journal of Photoenergy 5

Assess

Select

Adjust

Set

Update

End

False

True

Observe

Initialize

(1) states and actions according to the given mapping range(2) Q table as an 2-D array Q(middot) with size |S| times |A|(3) iteration count i = 0 all Q value in Q(s a) = 0discount rate 120574 = 08 learning rate 120578 = 05and maximum sensing time slot number MTS

i le MTS

Vpv(i) for i and obtain Ppv(i)

i = i + 1 for next sensing time slot

action Δ(i) according to Q table using 120576-greedy algorithm

reward r by (13)

Q value by giving s a s and r into (8)ndash(10)

the next state s

998400

998400

998400998400

Figure 5 The flowchart of the proposed RLMPPT for the PV array

Intuitively the simplest but effective derivation of rewardcould be a hit-or-miss type of function that is once theobservable signal pair (119881pv(119894) 119875pv(119894)) hits the sweetest spotthat is the true MPP of (119881mpp(119894) 119875mpp(119894)) a positive rewardis given to the RL agent otherwise zero reward is given tothe RL agent The hit-or-miss type of reward function couldintuitively be defined as

119903119894+1

= 120575 ((119881pv (119894) 119875pv (119894)) ℎ 119911 (119894)) (13)

where 120575(sdot) represents the Kronecker delta function andℎ 119911(119894) is the hitting spot defined for the 119894th sensing timeslot In defining (13) a negative reward explicitly representspunishment for the agentrsquos failure that is missing the hittingspot could achieve better learning results in comparison witha zero reward in the tracking of MPP of a PV array Inreality the possibility that the observable signal pair (119881pv(119894)119875pv(119894)) leaded by the agentrsquos action of perturbation ΔV(119894) tothe 119881pv(119894) exactly hits the sweetest spot of MPP is very lowin any sensing time slot 119894 Besides it is also very difficult for

the environmentrsquos judge to define a hitting spot for everysensing time slot Hence the hitting spot ℎ 119911(119894) in (13) canbe relaxed where a required hitting zone is defined on theprevious environmental knowledge on a diurnal basis andwhere a positivenegative reward will be given if (119881pv(119894)119875pv(119894)) falls intooutside the predefined hitting zone in anytime slot Hence the reward function can be formulated as

119903119894+1

=

1198881 (119881pv (119894) 119875pv (119894)) isin Hitting Zone

minus1198882 otherwise

(14)

where 1198881and 119888

2are positive values denoting weighting

factors in maximizing the difference between reward andpunishment for better learning effect In this study the 120576-greedy algorithm is used in selecting the RLMPPT agentrsquosactions to avoid repeatedly selection of the same action Theflowchart of the proposed RLMPPT of this study is shown inFigure 5

6 International Journal of Photoenergy

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(a)

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36

Pmax

(W)

Vmax

(V)

(b)

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(c)

Figure 6 MPP distribution under simulated environment (a) Gaussian distribution function generated environment data (b) the set in (a)added with sun occultation by clouds and (c) MPP distribution obtained using real weather data

4 Configurations of Experiments

Experiments of MPPT of PV array utilizing the RLMPPTare conducted by simulation on computer and the results arecompared with existing MPPT methods for PV array

41 Environment Simulation and Configuration for PV ArrayIn this study the PV array used for simulation is SY-175Mmanufactured by Tangshan Shaiyang Solar TechnologyCo Ltd China The power rating of the simulated PVarray was 175W with open-circuit voltage 44V and short-circuit current 52 A Validation of RLMPPTrsquos effectivenessin MPPT was conducted via three experiments using twosimulated and one real weather data sets A basic experimentwas first performed to determine whether the RLMPPTcould achieve the task of MPPT by using a set of Gaussiandistribution function generated temperature and irradianceAnd the effect of sun occultation by clouds is added on theGaussian distribution function generated set of temperatureand irradiance as the second experiment Real weather datafor PV array recorded in April 2014 at Loyola MarymountUniversity California USA obtained fromNational RenewalEnergy Laboratory (NREL) database provided an empiricaldata set to test the RLMPPT under real weather conditionConfigurations of the experiments are described in thefollowing

Assume the PV array is located at the Subtropics area(Chiayi City in the south of Taiwan) during 1000 and1400 in summertime where the temperature and irradiance

respectively are simulated using Gaussian distribution func-tion with mean value of 30∘C and 800Wm2 and standarddeviation of 4∘C and 50Wm2 According to (1) the simu-lated set of temperature and irradiance produces the MPPvoltage (119881mpp) and the calculated MPP (119875mpp) as shown inFigure 6(a) where the 119881mpp and the 119875mpp respectively lay at119909-axis and 119910-axis Figures 6(b) and 6(c) respectively showthe (119881mpp 119875mpp) plots of the Gaussian generated temperatureand irradiance with sun occultation effect and the realweather data recorded on April 1 2014 at LoyolaMarymountUniversity

42 Reward Function and State Action Arrangement Inapplying the RLMPPT to solve the problem of the MPPTfor PV array the reward function plays an important rolebecause not only a good reward function definition couldachieve the right feedback on every execution of learningand tracking but also it could enhance the efficiency ofthe learning algorithm In this study a hitting zone withelliptical shape for (119881mpp 119875mpp) is defined such that a positivereward value is given to the agent whenever it obtained(119881mpp 119875mpp) falls into the hitting zone in the sensing timeslot Figure 7 shows the different size of elliptical hittingzone superimposed on the plot of simulated (119881mpp 119875mpp)of Figure 6(a) The red elliptical circle in Figures 7(a) 7(b)and 7(c) respectively represents that the hitting zone covers372 685 and 855 of the total (119881mpp 119875mpp) pointsobtained from the simulated data of the previous day

In realizing the RLMPPT the state vector is defined as119878 = [119878

0 1198781 1198782 1198783] sube S while the meaning of each state is

International Journal of Photoenergy 7

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(a)

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(b)

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(c)

Figure 7 Different size of elliptical hitting zone superimposed on the plot of simulated (119881mpp 119875mpp) for the first set of simulated environmentdata (a) 372 of the MPPs involved (b) 685 of the MPPs involved and (c) 855 of the MPPs involved

explained in the previous section and shown in Figure 5 Sixperturbations of ΔV(119894) to the 119881pv(119894) are defined as the set ofaction as follows

A = 119886119894| 1198860= minus5V 119886

1= minus2V 119886

2= minus05V

1198863= +05V 119886

4= +2V 119886

5= +5V

(15)

The 120576-greedy is used in choosing agentrsquos actions in theRLMPPT such that agent repeatedly selects the same actionis preventedThe rewarding policy of the hitting zone rewardfunction is to give a reward value of 10 and 0 respectivelyto the agent whenever it obtained (119881mpp 119875mpp) in any sensingtime slot falls-in and falls-out the hitting zone

5 Experimental Results

51 Results of the Gaussian Distribution Function GeneratedEnvironment Data In this study experiments of RLMPPTin testing the Gaussian generated and real environment dataare conducted and the results are compared with those of thePampO method and the open-circuit voltage method

For the Gaussian distribution function generated envi-ronment simulation the percentages of each RL agent choos-ing actions in early phase (0sim25 minutes) middle phase(100sim125 minutes) and final phase (215sim240 minutes) of theRLMPPT are shown in the second third and fourth rowrespectively of Table 1 within two-hour period It can be seenthat in the early phase of the simulation the RL agent isin fast learning stage such that the action chosen by agentis concentrated on the plusmn5V and plusmn2V actions However inthe middle and final phase of the simulation the learning

Table 1 The percentage of choosing action in different phase

Interval (minutes) Action (V)+5 +2 +05 minus05 minus2 minus5

Early 0sim25 16 20 8 12 24 20Middle 100sim125 4 28 24 16 20 8Final 215sim240 4 8 36 40 8 4

is completed and the agent exploited what it has learnedhence the percentage of choosing fine tuning actions that isplusmn05 V is increased from 20 of the early phase to 40 and76 respectively for the middle and final phase It can beconcluded that the learning agent fast learned the strategy inselecting the appropriate action toward reaching the MPPTand hence the goal of tracking the MPP is achieved by the RLagent

Experimental results of the offsets between the calculatedand the tracking MPP by the RLMPPT and comparingmethods at the sensing time are shown in Figure 8 Figures8(a) 8(b) and 8(c) respectively show the offsets between thecalculated MPP and the tracking MPP by the PampO methodthe open-circuit voltagemethod and the RLMPPTmethod atthe sensing time One can see that among the experimentalresults of the three comparingmethods the open-circuit volt-age method obtained the largest offset which is concentratedaround 15W Even though the offsets obtained by the PampOmethod fall largely below 10W however large portion ofoffset obtained by the PampO method also randomly scatteredbetween 1 and 10W On the other hand the proposedRLMPPT method achieves the least and condenses offsets

8 International Journal of Photoenergy

020406080

100120140

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(a)

0 50 100 150 200 250Time (min)

0

5

10

15

20

25

PmaxminusP

actu

al(W

)

(b)

0 50 100 150 200 250Time (min)

0

5

10

15

20

25

PmaxminusP

actu

al(W

)

(c)

Figure 8 The maximum power point minus the predicted current power point corresponds to the same sensing time (a) PampO method(b) open-circuit voltage method and (c) RLMPPT method

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

Figure 9Thedefinition of hitting zone for the 2nd set of experimentdata

below 5W and only small portions fall outside of 5W evenin the early learning phase

52 Results of the Gaussian Distribution Function GeneratedEnvironment Data with Sun Occultation by Clouds In thisexperiment the simulated data are obtained by adding a 30chance of sun occultation by clouds to the test data in thefirst experiment such that the temperature will fall down0 to 3∘C and the irradiance will decrease 0 to 300Wm2The experiment is conducted to illustrate the capability ofRLMPPT in tracking the MPP of PV array under variedweather condition Figure 9 shows the hitting zone definition

Table 2 The percentage of choosing action in different phase

Interval (minutes) Action (V)+5 +2 +05 minus05 minus2 minus5

Early 0sim25 28 8 12 8 12 32Middle 100sim125 8 16 24 20 20 12Final 215sim240 4 12 40 32 8 4

for the simulated data with added sun occultation by cloudsto the Gaussian distribution function generated environmentdata The red elliptical shape in Figure 9 covers the 902 ofthe total (119881mpp 119875mpp) points obtained from simulated data ofthe previous day Table 2 shows the percentage of selectingaction that is the perturbation ΔV(119894) to the 119881pv(119894) by theRLMPPT The percentages of each RL agent choosing actionin early phase middle phase and final phase of the RLMPPTmethod are shown in the second third and fourth rowrespectively of Table 2

It can be seen from Table 2 that the percentages ofselecting fine tuning actions that is the perturbation ΔV(119894)is plusmn05 V increase from 20 to 44 and finally to 72respectively for the early phase middle phase and finalphase of MPP tracking via the RL agent This table againillustrated the fact that the learning agent fast learned thestrategy in selecting the appropriate action toward reachingthe MPPT and hence the goal of tracking the MPP of thePV array is achieved by the RL agent However due tothe varied weather condition on sun occultation by clouds

International Journal of Photoenergy 9

0102030405060708090

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(a)

0

5

10

15

20

25

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(b)

0102030405060708090

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(c)

Figure 10 The maximum power point minus the predicted current power point corresponds to the same sensing time (a) PampO method(b) open-circuit voltage method and (c) RLMPPT method

the percentage of selecting fine-tuning actions is somewhatvaried a little bit in comparison with the results obtainedin Table 1 whose simulated data are generated by Gaussiandistribution function without sun occultation effect

Experimental results of the offsets between the calculatedand the tracking MPP by the comparing methods of PampOthe open-circuit voltage and the RLMPPT at the samesensing time are shown in Figures 10(a) 10(b) and 10(c)respectively Experiment data from Figure 11 again exhibitedthat among the three comparing methods the open-circuitvoltage method obtained the largest and sparsely distributedoffsets data which are concentrated around 15W Eventhough the offsets obtained by the PampO method fall largelybetween 0 and 5W however large portion of offset obtainedby the PampOmethod scattered around 1 to 40W before the 50minutes of the experiment On the other hand the proposedRLMPPT method achieves the least and condensed offsetsbelow 5W andmostly close to 3W in the final tracking phaseafter 200 minutes

53 Results of the Real Environment Data Real weather datafor PV array recorded in April 2014 at Loyola MarymountUniversity California USA is obtained online fromNationalRenewal Energy Laboratory (NREL) database for testing theRLMPPTmethod under real environment dataThe databaseis selected because the geographical location of the sensing

Table 3The percentage of choosing action in different phase for theexperiment with real weather data

Interval (minutes) Action (V)+5 +2 +05 minus05 minus2 minus5

Early 0sim25 24 16 8 12 12 28Middle 100sim125 16 16 12 24 24 8Final 215sim240 4 16 32 32 12 4

station is also located in the subtropical area A period ofrecorded data from 1000 to 1400 for 5 consecutive days isshown in Figure 6(c) and the hitting zone reward functionfor the data from earlier days is shown in Figure 11(a) Thered elliptical shape in Figure 11(a) covers the 934of the total(119881mpp 119875mpp) points obtained from the previous 5 consecutivedays of the NREL real weather data for testing Figures11(b) and 11(c) respectively show the real temperature andirradiance data recorded at 04012014 for generating the testdata

Table 3 shows the percentage of selecting action bythe RLMPPT The percentages of each RL agent choosingaction in early phase (0sim25 minutes) middle phase (100sim125minutes) and final phase (215sim240 minutes) of the RLMPPTmethod are shown in the second third and fourth rowsrespectively of Table 3

10 International Journal of Photoenergy

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(a)

15

155

16

165

17

175

0 50 100 150 200 250Time (min)

Tem

pera

ture

(∘C)

(b)

900

910

920

930

940

950

960

970

0 50 100 150 200 250Time (min)

Irra

dian

ce (W

m2)

(c)

Figure 11 (a) The definition of hitting zone for the experiment using real weather data (b) the real temperature data recorded in 04012014and used for generating the test data and (c) the real irradiance data recorded at 04012014 and used for generating the test data

In Table 3 one can see that the percentage of selectingfine tuning actions that is the perturbation ΔV(119894) is plusmn05 Vincrease from 20 to 36 and finally to 64 respectivelyfor the early phase middle phase and final phase of MPPtracking via the RL agent Even though the percentage ofselecting fine tuning actions in this real data experiment hasthe least value among the three experiments it exhibits thatthe RLMPPT learns to exercise the appropriate action of theperturbationΔV(119894) in tracking theMPP of the PV array underreal weather data This table again illustrated the fact thatthe learning agent fast learned the strategy in selecting theappropriate action toward reaching the MPP and hence thegoal of tracking the MPP of the PV array is achieved by theRL agent

Experimental results of the offsets between the MPPTand the predicted MPPT by the comparing methods at thesensing time for the simulation data generated by the realweather data are shown in Figure 12 Figures 12(a) 12(b) and12(c) respectively show the offsets between the calculatedMPPT and the tracking MPPT by the PampO method theopen-circuit voltage method and the RLMPPT methodExperiment data from Figure 12 again shows that among thethree comparing methods the open-circuit voltage methodobtained the largest and sparsely distributed offsets datawhose distributions are concentrated within a band cover bytwoGaussian distribution functionswith themaximumoffset

value of 197W The offsets obtained by the PampO methodfall largely around 5 and 25W however large portion ofoffset obtained by the PampO method sharply decreased from1 to 40W in the early 70 minutes of the experiment By theobservation of Figure 12(c) the proposed RLMPPT methodachieves the least and condenses offsets near 1 or 2W andnone of the offsets is higher than 5W after 5 simulationminutes from the beginning

54 Performance Comparison with Efficiency Factor In orderto validate whether the RLMPPTmethod is effective or not intracking MPP of a PV array the efficiency factor 120578 is used tocompare the performance of other existing methods for thethree experiments The 120578 is defined as follows

120578mppt =int119905

0119875actual (119905) 119889119905

int119905

0119875max (119905) 119889119905

(16)

where 119875actual(119905) and 119875max(119905) respectively represent the track-ing MPP of the MPPT method and the calculated MPPTable 4 shows that for the three test data sets the open-circuit voltage method has the least efficiency factor amongthe three comparing methods and the RLMPPT method hasthe best efficiency factor which is slightly better than that ofPampO method The advantage of the RLMPPT method overthe PampOmethod is shown in Figures 10 and 12where not only

International Journal of Photoenergy 11

020406080

100120140160

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(a)

186187188189

19191192193194195196

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(b)

0

50

100

150

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(c)

Figure 12 The maximum power point minus the predicted current power point corresponds to the same sensing time (a) PampO method(b) open-circuit voltage method and (c) RLMPPT method

Table 4 Comparison of efficiency factor for the three comparingmethods

Learning phase (min) MethodsOpen-circuit voltage PampO RLMPPT

Early 1000sim1025 869 835 892Middle 1140sim1205 875 978 994Final 1335sim1400 878 977 993Overall 1000sim1400 876 954 985

the RLMPPT method significantly improves the efficiencyfactor in comparing that of the PampO but also the learningagent of the RLMPPT fast learned the strategy in selecting theappropriate action toward reaching theMPPTwhich is muchfaster than the PampO method in exhibiting a slow adaptivephase Hence the RLMPPT not only improves the efficiencyfactor in tracking the MPP of the PV array but also has thefast learning capability in achieving the task of MPPT of thePV array

6 Conclusions

In this study a reinforcement learning-based maximumpower point tracking (RLMPPT) method is proposed forPV arrayThe RLMPPTmethod monitors the environmental

state of the PV array and adjusts the perturbation to theoperating voltage of the PV array in achieving the best MPPSimulations of the proposed RLMPPT for a PV array areconducted on three kinds of data set which are simulatedGaussian weather data simulated Gaussian weather dataadded with sun occultation effect and real weather data fromNREL database Experimental results demonstrate that incomparison to the existing PampO method the RLMPPT notonly achieves better efficiency factor for both simulated andreal weather data sets but also adapts to the environmentmuch fast with very short learning time Further the rein-forcement learning-basedMPPTmethodwould be employedin the real PV array to validate the effectiveness of theproposed novel MPPT method

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] T Esram andP L Chapman ldquoComparison of photovoltaic arraymaximum power point tracking techniquesrdquo IEEE Transactionson Energy Conversion vol 22 no 2 pp 439ndash449 2007

12 International Journal of Photoenergy

[2] D P Hohm and M E Ropp ldquoComparative study of maximumpower point tracking algorithmsrdquo Progress in PhotovoltaicsResearch and Applications vol 11 no 1 pp 47ndash62 2003

[3] N Femia G Petrone G Spagnuolo and M Vitelli ldquoOptimiza-tion of perturb and observe maximum power point trackingmethodrdquo IEEE Transactions on Power Electronics vol 20 no4 pp 963ndash973 2005

[4] J S Kumari D C S Babu andA K Babu ldquoDesign and analysisof PO and IPO MPPT technique for photovoltaic systemrdquoInternational Journal ofModern Engineering Research vol 2 no4 pp 2174ndash2180 2012

[5] D P Hohm and M E Ropp ldquoComparative study of maximumpower point tracking algorithms using an experimental pro-grammable maximum power point tracking test bedrdquo in Pro-ceedings of the Conference Record of the IEEE 28th PhotovoltaicSpecialists Conference pp 1699ndash1702 2000

[6] L-R Chen C-H Tsai Y-L Lin and Y-S Lai ldquoA biologicalswarm chasing algorithm for tracking the PVmaximum powerpointrdquo IEEE Transactions on Energy Conversion vol 25 no 2pp 484ndash493 2010

[7] T L Kottas Y S Boutalis and A D Karlis ldquoNew maximumpower point tracker for PV arrays using fuzzy controller in closecooperation with fuzzy cognitive networksrdquo IEEE Transactionson Energy Conversion vol 21 no 3 pp 793ndash803 2006

[8] N Mutoh M Ohno and T Inoue ldquoA method for MPPT con-trol while searching for parameters corresponding to weatherconditions for PV generation systemsrdquo IEEE Transactions onIndustrial Electronics vol 53 no 4 pp 1055ndash1065 2006

[9] G C Hsieh H I Hsieh C Y Tsai and C H Wang ldquoPhotovol-taic power-increment-aided incremental-conductance MPPTwith two-phased trackingrdquo IEEE Transactions on Power Elec-tronics vol 28 no 6 pp 2895ndash2911 2013

[10] R L Mueller M T Wallace and P Iles ldquoScaling nominal solarcell impedances for array designrdquo in Proceedings of the IEEE 1stWorld Conference on Photovoltaic Energy Conversion vol 2 pp2034ndash2037 December 1994

[11] N Femia G Petrone G Spagnuolo andMVitelli ldquoOptimizingsampling rate of PampO MPPT techniquerdquo in Proceedings ofthe IEEE 35th Annual Power Electronics Specialists Conference(PESC rsquo04) vol 3 pp 1945ndash1949 June 2004

[12] N Femia G PetroneG Spagnuolo andMVitelli ldquoA techniquefor improving PampOMPPT performances of double-stage grid-connected photovoltaic systemsrdquo IEEE Transactions on Indus-trial Electronics vol 56 no 11 pp 4473ndash4482 2009

[13] K H Hussein I Muta T Hoshino and M Osakada ldquoMax-imum photovoltaic power tracking an algorithm for rapidlychanging atmospheric conditionsrdquo IEE ProceedingsmdashGenera-tion Transmission and Distribution vol 142 no 1 pp 59ndash641995

[14] C Hua J Lin and C Shen ldquoImplementation of a DSP-controlled photovoltaic systemwith peak power trackingrdquo IEEETransactions on Industrial Electronics vol 45 no 1 pp 99ndash1071998

[15] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996

[16] A G Barto Reinforcement Learning An Introduction MITPress 1998

[17] C JWatkins and PDayan ldquoQ-learningrdquoMachine Learning vol8 no 3-4 pp 279ndash292 1992

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Inorganic ChemistryInternational Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

International Journal ofPhotoenergy

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Carbohydrate Chemistry

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Physical Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom

Analytical Methods in Chemistry

Journal of

Volume 2014

Bioinorganic Chemistry and ApplicationsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

SpectroscopyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Medicinal ChemistryInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chromatography Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied ChemistryJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Theoretical ChemistryJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Spectroscopy

Analytical ChemistryInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Quantum Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Organic Chemistry International

ElectrochemistryInternational Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CatalystsJournal of

Page 3: Research Article A Reinforcement Learning-Based Maximum ...downloads.hindawi.com/journals/ijp/2015/496401.pdf · Research Article A Reinforcement Learning-Based Maximum Power Point

International Journal of Photoenergy 3

Iph

D Rsh

RsIpv

Vpv

+

minus

Ish

(a)

Voltage

Curr

ent

Pow

er

Voc

Isc

Pmax

P-V curve

I-V curve

(b)

Figure 1 (a) Equivalent circuit model of a solar cell (b) Solar cell 119868-119881 and power operation curve with the characteristic of 119881oc and 119868sc

9876543210

2702402101801501209060300

0 5 10 15 20 25 30 35 40 45Vpv (V)

I pv

(A) s

olid

line

1000Wm2

600Wm2

300Wm2

MPPdPdV = 0

Ppv

(W) d

ash

lineI-V curve

P-V curve

Figure 2 The 119868pv-119881pv and 119875pv-119881pv characteristics of different irradi-ance levels The maximum power point (MPP) occurring when thederivative dPdV is zero

peak power conditions during periods of varying insolationBasically PampOmethod is amechanism tomove the operatingpoint toward themaximumpower point (MPP) by increasingor decreasing the PV array voltage in a tracking period How-ever the PampO control always deviates from the MPP duringthe tracking whose behavior results in oscillation aroundMPP in case of constant or slowly varying atmosphericconditions Although this issue can be improved by furtherdecreasing the perturbation step the tracking response willbecome slower Under rapidly changing atmospheric condi-tions for PV array the PampOmethodmay sometimesmake thetracking point far from the MPP [13 14]

23 Estimation of MPP by Using119881oc and 119868sc The open-circuitvoltage 119881oc and short-circuit current 119868sc of the PV panel canbe measured respectively when the terminal of the PV panelis open or short In reality both 119881oc and 119868sc are seriouslydependent on the solar insolation However the maximumpower point (MPP) is always located around the roll-offportion of the 119868-119881 characteristic curve in any insolationInterestingly there at MPP appears certain relation betweentheMPP set (119868mpp 119881mpp) and the set (119868sc 119881oc) which is worthyof studying Further the mentioned relation by empirical

estimation seems always to hold and not to be subject tothe insolation variation It can be presumed from commonlyknowledge of PV array that in open-circuit mode therelation of 119881mpp and 119881oc will be

119881mpp = 1198961119881oc (5)

and in the short-circuit mode the relation of 119868mpp and 119868sc willbe

119868mpp = 1198962119868sc (6)

where 1198961and 119896

2are constant factors between 0 and 1 From

(5) and (6) we have the maximum power at MPP that is

119875mpp = 119881mpp119868mpp = 11989611198962119881oc119868sc (7)

Even the 119875mpp for learning is point estimation it should begiven by satisfying the MPP criteria in (4) For learning theempirical result shows that the initial factor 119896

1for 119881mpp is

around 08 and the 1198962for 119868mpp is around 09

3 The Proposed RLMPPT for PV Array

31 Reinforcement Learning (RL) RL [15ndash17] is a heuristiclearning method that has been widely used in many fields ofapplication In the reinforcement learning a learning agentlearns to achieve the predefined goal mainly by constantlyinteractingwith the environment and exploring the appropri-ate actions in the state the agent situatesThe generalmodel ofreinforcement learning is shown in Figure 3 which includesthe agent environment state action and reward

The reinforcement learning is modeled by the Markovdecision process (MDP) where a RL learner referred to asan agent consistently and autonomously interacts with theMDP environment by exercising its predefined behaviors AMDP environment consists of a predefined set of states aset of controllable actions and a state transition model Ingeneral the first order MDP is considered in the RL wherethe next state is only affected by the current state and actionFor the cases where all the parameters of a state transitionmodel are known the optimal decision can be obtained byusing dynamic programming However in some real world

4 International Journal of Photoenergy

Agent

Environment

States Reward Action

Figure 3 Model of the reinforcement learning

cases the model parameters are absent and unknown to theusers hence the agent of the RL explores the environmentand obtains reward from the environment by try-and-errorinteraction The agent then maintains the rewardrsquos runningaverage value for a certain state-action pair According tothe reward value the next action can be decided by someexploration-exploitation strategies such as the 120576-greedy orsoftmax [15 16]

Q-learning is a useful and compact reinforcement learn-ing method for handling andmaintaining running average ofreward [17] Assume an action 119886 is applied to the environmentby the agent and the state goes to 119904

1015840 from 119904 and receives areward 1199031015840 the Q-learning update rule is then given by

119876 (119904 119886) larr997888 119876 (119904 119886) + 120578Δ119876 (1199041015840 119886) (8)

where 120578 is the learning rate for weighting the update value toassure the convergence of the learning the119876(sdot) is the rewardfunction and the delta-term Δ119876(sdot) is represented by

Δ119876 (119904 119886) = [1199031015840+ 120574119876lowast(1199041015840)] minus 119876 (119904 119886) (9)

where 119903 is the immediate reward 120574 is the discount rate toadjust the weight of current optimal value119876lowast(sdot) whose valueis computed by

119876lowast(1199041015840) = maxforall119887isinA

119876(1199041015840 119887) (10)

In (10) A is the set of all candidate actions The learningparameters of 120578 and 120574 in (7) and (8) respectively are usuallyset with value ranges between 0 and 1 Once the RL agentsuccessfully reaches the new state 1199041015840 it will receive a reward1199031015840 and update the Q-value then the 1199041015840 is substituted by thenext state that is 119904 larr 1199041015840 and the upcoming action isthen determined according to the predefined exploration-exploitation strategy such as the 120576-greedy of this study Andthe latest Q-value of the state is applied to the environmentfrom one state to the other

32 State Action and Reward Definition of the RLMPPT Inthe RLMPPT the agent receives the observable environmen-tal signal pair of (119881pv(119894) 119875pv(119894)) whichwill be subtracted from

020406080

100120140160180

0 5 10 15 20 25 30 35 40 45

Pow

er (W

)

Voltage (V)

S3

S0 S1

S2

Figure 4 Four states of the RLMPPT

the previous signal pair in obtaining the (Δ119881pv(119894) Δ119875pv(119894))pair Δ119881pv(119894) and Δ119875pv(119894) each takes positive or negative signsto constitute a state vector S with four states The agentthen adaptively decides and executes the desired pertur-bation ΔV(119894) characterized as action to the 119881pv After theselected action is executed a reward signal 119903(119894) is calculatedand granted to the agent and accordingly the agent thenevaluates the performance of the state-action interactionBy receiving the rewards the agent is encouraged to selectthe action with the best reward This leads to a series ofactions with the best rewards being iteratively generated suchthat MPPT tracking with better performance is graduallyachieved after the learning phase The state action andreward of the RLMPPT for the PV array are sequentiallydefined in the following

(i) States In RLMPPT the state vector is denoted as

119878 = [1198780 1198781 1198782 1198783] sube S (11)

where S is the space of all possible environmental state vectorswith elements transformed from the observable environmentvariables Δ119881pv(119894) and Δ119875pv(119894) where 119878

0 1198781 1198782 and 119878

3

respectively represent at any sensing time slot the state ofgoing toward theMPP from the left the state of going towardthe MPP from the right the state of leaving from the MPP tothe left and the state of leaving from theMPP to the rightThefour states of the RLMPPT can be shown in Figure 4 where1198780indicates that Δ119881pv(119894) and Δ119875pv(119894) have all the positive

sign 1198781indicates that Δ119881pv(119894) is negative and the Δ119875pv(119894) is

positive sign 1198782indicates that Δ119881pv(119894) is positive and Δ119875pv(119894)

is negative sign and finally 1198783indicates that Δ119881pv(119894) and

Δ119875pv(119894) are all the negative sign

(ii) ActionsThe action of the RLMPPT agent is defined as thecontrollable variable of the desired perturbation ΔV(119894) to the119881pv(119894) and the state of the agentrsquos action 119860per is denoted by

119860per isin A = 1198890 1198891 119889

119873 (12)

where A is a set of all the agentrsquos controllable perturbationsΔV(119894) for adding to the119881pv(119894) in obtaining the power from thePV array

(iii) Rewards In RLMPPT rewards are incorporated toaccomplish the goals of obtaining the MPP of the PV array

International Journal of Photoenergy 5

Assess

Select

Adjust

Set

Update

End

False

True

Observe

Initialize

(1) states and actions according to the given mapping range(2) Q table as an 2-D array Q(middot) with size |S| times |A|(3) iteration count i = 0 all Q value in Q(s a) = 0discount rate 120574 = 08 learning rate 120578 = 05and maximum sensing time slot number MTS

i le MTS

Vpv(i) for i and obtain Ppv(i)

i = i + 1 for next sensing time slot

action Δ(i) according to Q table using 120576-greedy algorithm

reward r by (13)

Q value by giving s a s and r into (8)ndash(10)

the next state s

998400

998400

998400998400

Figure 5 The flowchart of the proposed RLMPPT for the PV array

Intuitively the simplest but effective derivation of rewardcould be a hit-or-miss type of function that is once theobservable signal pair (119881pv(119894) 119875pv(119894)) hits the sweetest spotthat is the true MPP of (119881mpp(119894) 119875mpp(119894)) a positive rewardis given to the RL agent otherwise zero reward is given tothe RL agent The hit-or-miss type of reward function couldintuitively be defined as

119903119894+1

= 120575 ((119881pv (119894) 119875pv (119894)) ℎ 119911 (119894)) (13)

where 120575(sdot) represents the Kronecker delta function andℎ 119911(119894) is the hitting spot defined for the 119894th sensing timeslot In defining (13) a negative reward explicitly representspunishment for the agentrsquos failure that is missing the hittingspot could achieve better learning results in comparison witha zero reward in the tracking of MPP of a PV array Inreality the possibility that the observable signal pair (119881pv(119894)119875pv(119894)) leaded by the agentrsquos action of perturbation ΔV(119894) tothe 119881pv(119894) exactly hits the sweetest spot of MPP is very lowin any sensing time slot 119894 Besides it is also very difficult for

the environmentrsquos judge to define a hitting spot for everysensing time slot Hence the hitting spot ℎ 119911(119894) in (13) canbe relaxed where a required hitting zone is defined on theprevious environmental knowledge on a diurnal basis andwhere a positivenegative reward will be given if (119881pv(119894)119875pv(119894)) falls intooutside the predefined hitting zone in anytime slot Hence the reward function can be formulated as

119903119894+1

=

1198881 (119881pv (119894) 119875pv (119894)) isin Hitting Zone

minus1198882 otherwise

(14)

where 1198881and 119888

2are positive values denoting weighting

factors in maximizing the difference between reward andpunishment for better learning effect In this study the 120576-greedy algorithm is used in selecting the RLMPPT agentrsquosactions to avoid repeatedly selection of the same action Theflowchart of the proposed RLMPPT of this study is shown inFigure 5

6 International Journal of Photoenergy

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(a)

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36

Pmax

(W)

Vmax

(V)

(b)

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(c)

Figure 6 MPP distribution under simulated environment (a) Gaussian distribution function generated environment data (b) the set in (a)added with sun occultation by clouds and (c) MPP distribution obtained using real weather data

4 Configurations of Experiments

Experiments of MPPT of PV array utilizing the RLMPPTare conducted by simulation on computer and the results arecompared with existing MPPT methods for PV array

41 Environment Simulation and Configuration for PV ArrayIn this study the PV array used for simulation is SY-175Mmanufactured by Tangshan Shaiyang Solar TechnologyCo Ltd China The power rating of the simulated PVarray was 175W with open-circuit voltage 44V and short-circuit current 52 A Validation of RLMPPTrsquos effectivenessin MPPT was conducted via three experiments using twosimulated and one real weather data sets A basic experimentwas first performed to determine whether the RLMPPTcould achieve the task of MPPT by using a set of Gaussiandistribution function generated temperature and irradianceAnd the effect of sun occultation by clouds is added on theGaussian distribution function generated set of temperatureand irradiance as the second experiment Real weather datafor PV array recorded in April 2014 at Loyola MarymountUniversity California USA obtained fromNational RenewalEnergy Laboratory (NREL) database provided an empiricaldata set to test the RLMPPT under real weather conditionConfigurations of the experiments are described in thefollowing

Assume the PV array is located at the Subtropics area(Chiayi City in the south of Taiwan) during 1000 and1400 in summertime where the temperature and irradiance

respectively are simulated using Gaussian distribution func-tion with mean value of 30∘C and 800Wm2 and standarddeviation of 4∘C and 50Wm2 According to (1) the simu-lated set of temperature and irradiance produces the MPPvoltage (119881mpp) and the calculated MPP (119875mpp) as shown inFigure 6(a) where the 119881mpp and the 119875mpp respectively lay at119909-axis and 119910-axis Figures 6(b) and 6(c) respectively showthe (119881mpp 119875mpp) plots of the Gaussian generated temperatureand irradiance with sun occultation effect and the realweather data recorded on April 1 2014 at LoyolaMarymountUniversity

42 Reward Function and State Action Arrangement Inapplying the RLMPPT to solve the problem of the MPPTfor PV array the reward function plays an important rolebecause not only a good reward function definition couldachieve the right feedback on every execution of learningand tracking but also it could enhance the efficiency ofthe learning algorithm In this study a hitting zone withelliptical shape for (119881mpp 119875mpp) is defined such that a positivereward value is given to the agent whenever it obtained(119881mpp 119875mpp) falls into the hitting zone in the sensing timeslot Figure 7 shows the different size of elliptical hittingzone superimposed on the plot of simulated (119881mpp 119875mpp)of Figure 6(a) The red elliptical circle in Figures 7(a) 7(b)and 7(c) respectively represents that the hitting zone covers372 685 and 855 of the total (119881mpp 119875mpp) pointsobtained from the simulated data of the previous day

In realizing the RLMPPT the state vector is defined as119878 = [119878

0 1198781 1198782 1198783] sube S while the meaning of each state is

International Journal of Photoenergy 7

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(a)

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(b)

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(c)

Figure 7 Different size of elliptical hitting zone superimposed on the plot of simulated (119881mpp 119875mpp) for the first set of simulated environmentdata (a) 372 of the MPPs involved (b) 685 of the MPPs involved and (c) 855 of the MPPs involved

explained in the previous section and shown in Figure 5 Sixperturbations of ΔV(119894) to the 119881pv(119894) are defined as the set ofaction as follows

A = 119886119894| 1198860= minus5V 119886

1= minus2V 119886

2= minus05V

1198863= +05V 119886

4= +2V 119886

5= +5V

(15)

The 120576-greedy is used in choosing agentrsquos actions in theRLMPPT such that agent repeatedly selects the same actionis preventedThe rewarding policy of the hitting zone rewardfunction is to give a reward value of 10 and 0 respectivelyto the agent whenever it obtained (119881mpp 119875mpp) in any sensingtime slot falls-in and falls-out the hitting zone

5 Experimental Results

51 Results of the Gaussian Distribution Function GeneratedEnvironment Data In this study experiments of RLMPPTin testing the Gaussian generated and real environment dataare conducted and the results are compared with those of thePampO method and the open-circuit voltage method

For the Gaussian distribution function generated envi-ronment simulation the percentages of each RL agent choos-ing actions in early phase (0sim25 minutes) middle phase(100sim125 minutes) and final phase (215sim240 minutes) of theRLMPPT are shown in the second third and fourth rowrespectively of Table 1 within two-hour period It can be seenthat in the early phase of the simulation the RL agent isin fast learning stage such that the action chosen by agentis concentrated on the plusmn5V and plusmn2V actions However inthe middle and final phase of the simulation the learning

Table 1 The percentage of choosing action in different phase

Interval (minutes) Action (V)+5 +2 +05 minus05 minus2 minus5

Early 0sim25 16 20 8 12 24 20Middle 100sim125 4 28 24 16 20 8Final 215sim240 4 8 36 40 8 4

is completed and the agent exploited what it has learnedhence the percentage of choosing fine tuning actions that isplusmn05 V is increased from 20 of the early phase to 40 and76 respectively for the middle and final phase It can beconcluded that the learning agent fast learned the strategy inselecting the appropriate action toward reaching the MPPTand hence the goal of tracking the MPP is achieved by the RLagent

Experimental results of the offsets between the calculatedand the tracking MPP by the RLMPPT and comparingmethods at the sensing time are shown in Figure 8 Figures8(a) 8(b) and 8(c) respectively show the offsets between thecalculated MPP and the tracking MPP by the PampO methodthe open-circuit voltagemethod and the RLMPPTmethod atthe sensing time One can see that among the experimentalresults of the three comparingmethods the open-circuit volt-age method obtained the largest offset which is concentratedaround 15W Even though the offsets obtained by the PampOmethod fall largely below 10W however large portion ofoffset obtained by the PampO method also randomly scatteredbetween 1 and 10W On the other hand the proposedRLMPPT method achieves the least and condenses offsets

8 International Journal of Photoenergy

020406080

100120140

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(a)

0 50 100 150 200 250Time (min)

0

5

10

15

20

25

PmaxminusP

actu

al(W

)

(b)

0 50 100 150 200 250Time (min)

0

5

10

15

20

25

PmaxminusP

actu

al(W

)

(c)

Figure 8 The maximum power point minus the predicted current power point corresponds to the same sensing time (a) PampO method(b) open-circuit voltage method and (c) RLMPPT method

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

Figure 9Thedefinition of hitting zone for the 2nd set of experimentdata

below 5W and only small portions fall outside of 5W evenin the early learning phase

52 Results of the Gaussian Distribution Function GeneratedEnvironment Data with Sun Occultation by Clouds In thisexperiment the simulated data are obtained by adding a 30chance of sun occultation by clouds to the test data in thefirst experiment such that the temperature will fall down0 to 3∘C and the irradiance will decrease 0 to 300Wm2The experiment is conducted to illustrate the capability ofRLMPPT in tracking the MPP of PV array under variedweather condition Figure 9 shows the hitting zone definition

Table 2 The percentage of choosing action in different phase

Interval (minutes) Action (V)+5 +2 +05 minus05 minus2 minus5

Early 0sim25 28 8 12 8 12 32Middle 100sim125 8 16 24 20 20 12Final 215sim240 4 12 40 32 8 4

for the simulated data with added sun occultation by cloudsto the Gaussian distribution function generated environmentdata The red elliptical shape in Figure 9 covers the 902 ofthe total (119881mpp 119875mpp) points obtained from simulated data ofthe previous day Table 2 shows the percentage of selectingaction that is the perturbation ΔV(119894) to the 119881pv(119894) by theRLMPPT The percentages of each RL agent choosing actionin early phase middle phase and final phase of the RLMPPTmethod are shown in the second third and fourth rowrespectively of Table 2

It can be seen from Table 2 that the percentages ofselecting fine tuning actions that is the perturbation ΔV(119894)is plusmn05 V increase from 20 to 44 and finally to 72respectively for the early phase middle phase and finalphase of MPP tracking via the RL agent This table againillustrated the fact that the learning agent fast learned thestrategy in selecting the appropriate action toward reachingthe MPPT and hence the goal of tracking the MPP of thePV array is achieved by the RL agent However due tothe varied weather condition on sun occultation by clouds

International Journal of Photoenergy 9

0102030405060708090

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(a)

0

5

10

15

20

25

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(b)

0102030405060708090

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(c)

Figure 10 The maximum power point minus the predicted current power point corresponds to the same sensing time (a) PampO method(b) open-circuit voltage method and (c) RLMPPT method

the percentage of selecting fine-tuning actions is somewhatvaried a little bit in comparison with the results obtainedin Table 1 whose simulated data are generated by Gaussiandistribution function without sun occultation effect

Experimental results of the offsets between the calculatedand the tracking MPP by the comparing methods of PampOthe open-circuit voltage and the RLMPPT at the samesensing time are shown in Figures 10(a) 10(b) and 10(c)respectively Experiment data from Figure 11 again exhibitedthat among the three comparing methods the open-circuitvoltage method obtained the largest and sparsely distributedoffsets data which are concentrated around 15W Eventhough the offsets obtained by the PampO method fall largelybetween 0 and 5W however large portion of offset obtainedby the PampOmethod scattered around 1 to 40W before the 50minutes of the experiment On the other hand the proposedRLMPPT method achieves the least and condensed offsetsbelow 5W andmostly close to 3W in the final tracking phaseafter 200 minutes

53 Results of the Real Environment Data Real weather datafor PV array recorded in April 2014 at Loyola MarymountUniversity California USA is obtained online fromNationalRenewal Energy Laboratory (NREL) database for testing theRLMPPTmethod under real environment dataThe databaseis selected because the geographical location of the sensing

Table 3The percentage of choosing action in different phase for theexperiment with real weather data

Interval (minutes) Action (V)+5 +2 +05 minus05 minus2 minus5

Early 0sim25 24 16 8 12 12 28Middle 100sim125 16 16 12 24 24 8Final 215sim240 4 16 32 32 12 4

station is also located in the subtropical area A period ofrecorded data from 1000 to 1400 for 5 consecutive days isshown in Figure 6(c) and the hitting zone reward functionfor the data from earlier days is shown in Figure 11(a) Thered elliptical shape in Figure 11(a) covers the 934of the total(119881mpp 119875mpp) points obtained from the previous 5 consecutivedays of the NREL real weather data for testing Figures11(b) and 11(c) respectively show the real temperature andirradiance data recorded at 04012014 for generating the testdata

Table 3 shows the percentage of selecting action bythe RLMPPT The percentages of each RL agent choosingaction in early phase (0sim25 minutes) middle phase (100sim125minutes) and final phase (215sim240 minutes) of the RLMPPTmethod are shown in the second third and fourth rowsrespectively of Table 3

10 International Journal of Photoenergy

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(a)

15

155

16

165

17

175

0 50 100 150 200 250Time (min)

Tem

pera

ture

(∘C)

(b)

900

910

920

930

940

950

960

970

0 50 100 150 200 250Time (min)

Irra

dian

ce (W

m2)

(c)

Figure 11 (a) The definition of hitting zone for the experiment using real weather data (b) the real temperature data recorded in 04012014and used for generating the test data and (c) the real irradiance data recorded at 04012014 and used for generating the test data

In Table 3 one can see that the percentage of selectingfine tuning actions that is the perturbation ΔV(119894) is plusmn05 Vincrease from 20 to 36 and finally to 64 respectivelyfor the early phase middle phase and final phase of MPPtracking via the RL agent Even though the percentage ofselecting fine tuning actions in this real data experiment hasthe least value among the three experiments it exhibits thatthe RLMPPT learns to exercise the appropriate action of theperturbationΔV(119894) in tracking theMPP of the PV array underreal weather data This table again illustrated the fact thatthe learning agent fast learned the strategy in selecting theappropriate action toward reaching the MPP and hence thegoal of tracking the MPP of the PV array is achieved by theRL agent

Experimental results of the offsets between the MPPTand the predicted MPPT by the comparing methods at thesensing time for the simulation data generated by the realweather data are shown in Figure 12 Figures 12(a) 12(b) and12(c) respectively show the offsets between the calculatedMPPT and the tracking MPPT by the PampO method theopen-circuit voltage method and the RLMPPT methodExperiment data from Figure 12 again shows that among thethree comparing methods the open-circuit voltage methodobtained the largest and sparsely distributed offsets datawhose distributions are concentrated within a band cover bytwoGaussian distribution functionswith themaximumoffset

value of 197W The offsets obtained by the PampO methodfall largely around 5 and 25W however large portion ofoffset obtained by the PampO method sharply decreased from1 to 40W in the early 70 minutes of the experiment By theobservation of Figure 12(c) the proposed RLMPPT methodachieves the least and condenses offsets near 1 or 2W andnone of the offsets is higher than 5W after 5 simulationminutes from the beginning

54 Performance Comparison with Efficiency Factor In orderto validate whether the RLMPPTmethod is effective or not intracking MPP of a PV array the efficiency factor 120578 is used tocompare the performance of other existing methods for thethree experiments The 120578 is defined as follows

120578mppt =int119905

0119875actual (119905) 119889119905

int119905

0119875max (119905) 119889119905

(16)

where 119875actual(119905) and 119875max(119905) respectively represent the track-ing MPP of the MPPT method and the calculated MPPTable 4 shows that for the three test data sets the open-circuit voltage method has the least efficiency factor amongthe three comparing methods and the RLMPPT method hasthe best efficiency factor which is slightly better than that ofPampO method The advantage of the RLMPPT method overthe PampOmethod is shown in Figures 10 and 12where not only

International Journal of Photoenergy 11

020406080

100120140160

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(a)

186187188189

19191192193194195196

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(b)

0

50

100

150

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(c)

Figure 12 The maximum power point minus the predicted current power point corresponds to the same sensing time (a) PampO method(b) open-circuit voltage method and (c) RLMPPT method

Table 4 Comparison of efficiency factor for the three comparingmethods

Learning phase (min) MethodsOpen-circuit voltage PampO RLMPPT

Early 1000sim1025 869 835 892Middle 1140sim1205 875 978 994Final 1335sim1400 878 977 993Overall 1000sim1400 876 954 985

the RLMPPT method significantly improves the efficiencyfactor in comparing that of the PampO but also the learningagent of the RLMPPT fast learned the strategy in selecting theappropriate action toward reaching theMPPTwhich is muchfaster than the PampO method in exhibiting a slow adaptivephase Hence the RLMPPT not only improves the efficiencyfactor in tracking the MPP of the PV array but also has thefast learning capability in achieving the task of MPPT of thePV array

6 Conclusions

In this study a reinforcement learning-based maximumpower point tracking (RLMPPT) method is proposed forPV arrayThe RLMPPTmethod monitors the environmental

state of the PV array and adjusts the perturbation to theoperating voltage of the PV array in achieving the best MPPSimulations of the proposed RLMPPT for a PV array areconducted on three kinds of data set which are simulatedGaussian weather data simulated Gaussian weather dataadded with sun occultation effect and real weather data fromNREL database Experimental results demonstrate that incomparison to the existing PampO method the RLMPPT notonly achieves better efficiency factor for both simulated andreal weather data sets but also adapts to the environmentmuch fast with very short learning time Further the rein-forcement learning-basedMPPTmethodwould be employedin the real PV array to validate the effectiveness of theproposed novel MPPT method

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] T Esram andP L Chapman ldquoComparison of photovoltaic arraymaximum power point tracking techniquesrdquo IEEE Transactionson Energy Conversion vol 22 no 2 pp 439ndash449 2007

12 International Journal of Photoenergy

[2] D P Hohm and M E Ropp ldquoComparative study of maximumpower point tracking algorithmsrdquo Progress in PhotovoltaicsResearch and Applications vol 11 no 1 pp 47ndash62 2003

[3] N Femia G Petrone G Spagnuolo and M Vitelli ldquoOptimiza-tion of perturb and observe maximum power point trackingmethodrdquo IEEE Transactions on Power Electronics vol 20 no4 pp 963ndash973 2005

[4] J S Kumari D C S Babu andA K Babu ldquoDesign and analysisof PO and IPO MPPT technique for photovoltaic systemrdquoInternational Journal ofModern Engineering Research vol 2 no4 pp 2174ndash2180 2012

[5] D P Hohm and M E Ropp ldquoComparative study of maximumpower point tracking algorithms using an experimental pro-grammable maximum power point tracking test bedrdquo in Pro-ceedings of the Conference Record of the IEEE 28th PhotovoltaicSpecialists Conference pp 1699ndash1702 2000

[6] L-R Chen C-H Tsai Y-L Lin and Y-S Lai ldquoA biologicalswarm chasing algorithm for tracking the PVmaximum powerpointrdquo IEEE Transactions on Energy Conversion vol 25 no 2pp 484ndash493 2010

[7] T L Kottas Y S Boutalis and A D Karlis ldquoNew maximumpower point tracker for PV arrays using fuzzy controller in closecooperation with fuzzy cognitive networksrdquo IEEE Transactionson Energy Conversion vol 21 no 3 pp 793ndash803 2006

[8] N Mutoh M Ohno and T Inoue ldquoA method for MPPT con-trol while searching for parameters corresponding to weatherconditions for PV generation systemsrdquo IEEE Transactions onIndustrial Electronics vol 53 no 4 pp 1055ndash1065 2006

[9] G C Hsieh H I Hsieh C Y Tsai and C H Wang ldquoPhotovol-taic power-increment-aided incremental-conductance MPPTwith two-phased trackingrdquo IEEE Transactions on Power Elec-tronics vol 28 no 6 pp 2895ndash2911 2013

[10] R L Mueller M T Wallace and P Iles ldquoScaling nominal solarcell impedances for array designrdquo in Proceedings of the IEEE 1stWorld Conference on Photovoltaic Energy Conversion vol 2 pp2034ndash2037 December 1994

[11] N Femia G Petrone G Spagnuolo andMVitelli ldquoOptimizingsampling rate of PampO MPPT techniquerdquo in Proceedings ofthe IEEE 35th Annual Power Electronics Specialists Conference(PESC rsquo04) vol 3 pp 1945ndash1949 June 2004

[12] N Femia G PetroneG Spagnuolo andMVitelli ldquoA techniquefor improving PampOMPPT performances of double-stage grid-connected photovoltaic systemsrdquo IEEE Transactions on Indus-trial Electronics vol 56 no 11 pp 4473ndash4482 2009

[13] K H Hussein I Muta T Hoshino and M Osakada ldquoMax-imum photovoltaic power tracking an algorithm for rapidlychanging atmospheric conditionsrdquo IEE ProceedingsmdashGenera-tion Transmission and Distribution vol 142 no 1 pp 59ndash641995

[14] C Hua J Lin and C Shen ldquoImplementation of a DSP-controlled photovoltaic systemwith peak power trackingrdquo IEEETransactions on Industrial Electronics vol 45 no 1 pp 99ndash1071998

[15] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996

[16] A G Barto Reinforcement Learning An Introduction MITPress 1998

[17] C JWatkins and PDayan ldquoQ-learningrdquoMachine Learning vol8 no 3-4 pp 279ndash292 1992

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Inorganic ChemistryInternational Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

International Journal ofPhotoenergy

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Carbohydrate Chemistry

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Physical Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom

Analytical Methods in Chemistry

Journal of

Volume 2014

Bioinorganic Chemistry and ApplicationsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

SpectroscopyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Medicinal ChemistryInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chromatography Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied ChemistryJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Theoretical ChemistryJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Spectroscopy

Analytical ChemistryInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Quantum Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Organic Chemistry International

ElectrochemistryInternational Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CatalystsJournal of

Page 4: Research Article A Reinforcement Learning-Based Maximum ...downloads.hindawi.com/journals/ijp/2015/496401.pdf · Research Article A Reinforcement Learning-Based Maximum Power Point

4 International Journal of Photoenergy

Agent

Environment

States Reward Action

Figure 3 Model of the reinforcement learning

cases the model parameters are absent and unknown to theusers hence the agent of the RL explores the environmentand obtains reward from the environment by try-and-errorinteraction The agent then maintains the rewardrsquos runningaverage value for a certain state-action pair According tothe reward value the next action can be decided by someexploration-exploitation strategies such as the 120576-greedy orsoftmax [15 16]

Q-learning is a useful and compact reinforcement learn-ing method for handling andmaintaining running average ofreward [17] Assume an action 119886 is applied to the environmentby the agent and the state goes to 119904

1015840 from 119904 and receives areward 1199031015840 the Q-learning update rule is then given by

119876 (119904 119886) larr997888 119876 (119904 119886) + 120578Δ119876 (1199041015840 119886) (8)

where 120578 is the learning rate for weighting the update value toassure the convergence of the learning the119876(sdot) is the rewardfunction and the delta-term Δ119876(sdot) is represented by

Δ119876 (119904 119886) = [1199031015840+ 120574119876lowast(1199041015840)] minus 119876 (119904 119886) (9)

where 119903 is the immediate reward 120574 is the discount rate toadjust the weight of current optimal value119876lowast(sdot) whose valueis computed by

119876lowast(1199041015840) = maxforall119887isinA

119876(1199041015840 119887) (10)

In (10) A is the set of all candidate actions The learningparameters of 120578 and 120574 in (7) and (8) respectively are usuallyset with value ranges between 0 and 1 Once the RL agentsuccessfully reaches the new state 1199041015840 it will receive a reward1199031015840 and update the Q-value then the 1199041015840 is substituted by thenext state that is 119904 larr 1199041015840 and the upcoming action isthen determined according to the predefined exploration-exploitation strategy such as the 120576-greedy of this study Andthe latest Q-value of the state is applied to the environmentfrom one state to the other

32 State Action and Reward Definition of the RLMPPT Inthe RLMPPT the agent receives the observable environmen-tal signal pair of (119881pv(119894) 119875pv(119894)) whichwill be subtracted from

020406080

100120140160180

0 5 10 15 20 25 30 35 40 45

Pow

er (W

)

Voltage (V)

S3

S0 S1

S2

Figure 4 Four states of the RLMPPT

the previous signal pair in obtaining the (Δ119881pv(119894) Δ119875pv(119894))pair Δ119881pv(119894) and Δ119875pv(119894) each takes positive or negative signsto constitute a state vector S with four states The agentthen adaptively decides and executes the desired pertur-bation ΔV(119894) characterized as action to the 119881pv After theselected action is executed a reward signal 119903(119894) is calculatedand granted to the agent and accordingly the agent thenevaluates the performance of the state-action interactionBy receiving the rewards the agent is encouraged to selectthe action with the best reward This leads to a series ofactions with the best rewards being iteratively generated suchthat MPPT tracking with better performance is graduallyachieved after the learning phase The state action andreward of the RLMPPT for the PV array are sequentiallydefined in the following

(i) States In RLMPPT the state vector is denoted as

119878 = [1198780 1198781 1198782 1198783] sube S (11)

where S is the space of all possible environmental state vectorswith elements transformed from the observable environmentvariables Δ119881pv(119894) and Δ119875pv(119894) where 119878

0 1198781 1198782 and 119878

3

respectively represent at any sensing time slot the state ofgoing toward theMPP from the left the state of going towardthe MPP from the right the state of leaving from the MPP tothe left and the state of leaving from theMPP to the rightThefour states of the RLMPPT can be shown in Figure 4 where1198780indicates that Δ119881pv(119894) and Δ119875pv(119894) have all the positive

sign 1198781indicates that Δ119881pv(119894) is negative and the Δ119875pv(119894) is

positive sign 1198782indicates that Δ119881pv(119894) is positive and Δ119875pv(119894)

is negative sign and finally 1198783indicates that Δ119881pv(119894) and

Δ119875pv(119894) are all the negative sign

(ii) ActionsThe action of the RLMPPT agent is defined as thecontrollable variable of the desired perturbation ΔV(119894) to the119881pv(119894) and the state of the agentrsquos action 119860per is denoted by

119860per isin A = 1198890 1198891 119889

119873 (12)

where A is a set of all the agentrsquos controllable perturbationsΔV(119894) for adding to the119881pv(119894) in obtaining the power from thePV array

(iii) Rewards In RLMPPT rewards are incorporated toaccomplish the goals of obtaining the MPP of the PV array

International Journal of Photoenergy 5

Assess

Select

Adjust

Set

Update

End

False

True

Observe

Initialize

(1) states and actions according to the given mapping range(2) Q table as an 2-D array Q(middot) with size |S| times |A|(3) iteration count i = 0 all Q value in Q(s a) = 0discount rate 120574 = 08 learning rate 120578 = 05and maximum sensing time slot number MTS

i le MTS

Vpv(i) for i and obtain Ppv(i)

i = i + 1 for next sensing time slot

action Δ(i) according to Q table using 120576-greedy algorithm

reward r by (13)

Q value by giving s a s and r into (8)ndash(10)

the next state s

998400

998400

998400998400

Figure 5 The flowchart of the proposed RLMPPT for the PV array

Intuitively the simplest but effective derivation of rewardcould be a hit-or-miss type of function that is once theobservable signal pair (119881pv(119894) 119875pv(119894)) hits the sweetest spotthat is the true MPP of (119881mpp(119894) 119875mpp(119894)) a positive rewardis given to the RL agent otherwise zero reward is given tothe RL agent The hit-or-miss type of reward function couldintuitively be defined as

119903119894+1

= 120575 ((119881pv (119894) 119875pv (119894)) ℎ 119911 (119894)) (13)

where 120575(sdot) represents the Kronecker delta function andℎ 119911(119894) is the hitting spot defined for the 119894th sensing timeslot In defining (13) a negative reward explicitly representspunishment for the agentrsquos failure that is missing the hittingspot could achieve better learning results in comparison witha zero reward in the tracking of MPP of a PV array Inreality the possibility that the observable signal pair (119881pv(119894)119875pv(119894)) leaded by the agentrsquos action of perturbation ΔV(119894) tothe 119881pv(119894) exactly hits the sweetest spot of MPP is very lowin any sensing time slot 119894 Besides it is also very difficult for

the environmentrsquos judge to define a hitting spot for everysensing time slot Hence the hitting spot ℎ 119911(119894) in (13) canbe relaxed where a required hitting zone is defined on theprevious environmental knowledge on a diurnal basis andwhere a positivenegative reward will be given if (119881pv(119894)119875pv(119894)) falls intooutside the predefined hitting zone in anytime slot Hence the reward function can be formulated as

119903119894+1

=

1198881 (119881pv (119894) 119875pv (119894)) isin Hitting Zone

minus1198882 otherwise

(14)

where 1198881and 119888

2are positive values denoting weighting

factors in maximizing the difference between reward andpunishment for better learning effect In this study the 120576-greedy algorithm is used in selecting the RLMPPT agentrsquosactions to avoid repeatedly selection of the same action Theflowchart of the proposed RLMPPT of this study is shown inFigure 5

6 International Journal of Photoenergy

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(a)

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36

Pmax

(W)

Vmax

(V)

(b)

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(c)

Figure 6 MPP distribution under simulated environment (a) Gaussian distribution function generated environment data (b) the set in (a)added with sun occultation by clouds and (c) MPP distribution obtained using real weather data

4 Configurations of Experiments

Experiments of MPPT of PV array utilizing the RLMPPTare conducted by simulation on computer and the results arecompared with existing MPPT methods for PV array

41 Environment Simulation and Configuration for PV ArrayIn this study the PV array used for simulation is SY-175Mmanufactured by Tangshan Shaiyang Solar TechnologyCo Ltd China The power rating of the simulated PVarray was 175W with open-circuit voltage 44V and short-circuit current 52 A Validation of RLMPPTrsquos effectivenessin MPPT was conducted via three experiments using twosimulated and one real weather data sets A basic experimentwas first performed to determine whether the RLMPPTcould achieve the task of MPPT by using a set of Gaussiandistribution function generated temperature and irradianceAnd the effect of sun occultation by clouds is added on theGaussian distribution function generated set of temperatureand irradiance as the second experiment Real weather datafor PV array recorded in April 2014 at Loyola MarymountUniversity California USA obtained fromNational RenewalEnergy Laboratory (NREL) database provided an empiricaldata set to test the RLMPPT under real weather conditionConfigurations of the experiments are described in thefollowing

Assume the PV array is located at the Subtropics area(Chiayi City in the south of Taiwan) during 1000 and1400 in summertime where the temperature and irradiance

respectively are simulated using Gaussian distribution func-tion with mean value of 30∘C and 800Wm2 and standarddeviation of 4∘C and 50Wm2 According to (1) the simu-lated set of temperature and irradiance produces the MPPvoltage (119881mpp) and the calculated MPP (119875mpp) as shown inFigure 6(a) where the 119881mpp and the 119875mpp respectively lay at119909-axis and 119910-axis Figures 6(b) and 6(c) respectively showthe (119881mpp 119875mpp) plots of the Gaussian generated temperatureand irradiance with sun occultation effect and the realweather data recorded on April 1 2014 at LoyolaMarymountUniversity

42 Reward Function and State Action Arrangement Inapplying the RLMPPT to solve the problem of the MPPTfor PV array the reward function plays an important rolebecause not only a good reward function definition couldachieve the right feedback on every execution of learningand tracking but also it could enhance the efficiency ofthe learning algorithm In this study a hitting zone withelliptical shape for (119881mpp 119875mpp) is defined such that a positivereward value is given to the agent whenever it obtained(119881mpp 119875mpp) falls into the hitting zone in the sensing timeslot Figure 7 shows the different size of elliptical hittingzone superimposed on the plot of simulated (119881mpp 119875mpp)of Figure 6(a) The red elliptical circle in Figures 7(a) 7(b)and 7(c) respectively represents that the hitting zone covers372 685 and 855 of the total (119881mpp 119875mpp) pointsobtained from the simulated data of the previous day

In realizing the RLMPPT the state vector is defined as119878 = [119878

0 1198781 1198782 1198783] sube S while the meaning of each state is

International Journal of Photoenergy 7

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(a)

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(b)

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(c)

Figure 7 Different size of elliptical hitting zone superimposed on the plot of simulated (119881mpp 119875mpp) for the first set of simulated environmentdata (a) 372 of the MPPs involved (b) 685 of the MPPs involved and (c) 855 of the MPPs involved

explained in the previous section and shown in Figure 5 Sixperturbations of ΔV(119894) to the 119881pv(119894) are defined as the set ofaction as follows

A = 119886119894| 1198860= minus5V 119886

1= minus2V 119886

2= minus05V

1198863= +05V 119886

4= +2V 119886

5= +5V

(15)

The 120576-greedy is used in choosing agentrsquos actions in theRLMPPT such that agent repeatedly selects the same actionis preventedThe rewarding policy of the hitting zone rewardfunction is to give a reward value of 10 and 0 respectivelyto the agent whenever it obtained (119881mpp 119875mpp) in any sensingtime slot falls-in and falls-out the hitting zone

5 Experimental Results

51 Results of the Gaussian Distribution Function GeneratedEnvironment Data In this study experiments of RLMPPTin testing the Gaussian generated and real environment dataare conducted and the results are compared with those of thePampO method and the open-circuit voltage method

For the Gaussian distribution function generated envi-ronment simulation the percentages of each RL agent choos-ing actions in early phase (0sim25 minutes) middle phase(100sim125 minutes) and final phase (215sim240 minutes) of theRLMPPT are shown in the second third and fourth rowrespectively of Table 1 within two-hour period It can be seenthat in the early phase of the simulation the RL agent isin fast learning stage such that the action chosen by agentis concentrated on the plusmn5V and plusmn2V actions However inthe middle and final phase of the simulation the learning

Table 1 The percentage of choosing action in different phase

Interval (minutes) Action (V)+5 +2 +05 minus05 minus2 minus5

Early 0sim25 16 20 8 12 24 20Middle 100sim125 4 28 24 16 20 8Final 215sim240 4 8 36 40 8 4

is completed and the agent exploited what it has learnedhence the percentage of choosing fine tuning actions that isplusmn05 V is increased from 20 of the early phase to 40 and76 respectively for the middle and final phase It can beconcluded that the learning agent fast learned the strategy inselecting the appropriate action toward reaching the MPPTand hence the goal of tracking the MPP is achieved by the RLagent

Experimental results of the offsets between the calculatedand the tracking MPP by the RLMPPT and comparingmethods at the sensing time are shown in Figure 8 Figures8(a) 8(b) and 8(c) respectively show the offsets between thecalculated MPP and the tracking MPP by the PampO methodthe open-circuit voltagemethod and the RLMPPTmethod atthe sensing time One can see that among the experimentalresults of the three comparingmethods the open-circuit volt-age method obtained the largest offset which is concentratedaround 15W Even though the offsets obtained by the PampOmethod fall largely below 10W however large portion ofoffset obtained by the PampO method also randomly scatteredbetween 1 and 10W On the other hand the proposedRLMPPT method achieves the least and condenses offsets

8 International Journal of Photoenergy

020406080

100120140

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(a)

0 50 100 150 200 250Time (min)

0

5

10

15

20

25

PmaxminusP

actu

al(W

)

(b)

0 50 100 150 200 250Time (min)

0

5

10

15

20

25

PmaxminusP

actu

al(W

)

(c)

Figure 8 The maximum power point minus the predicted current power point corresponds to the same sensing time (a) PampO method(b) open-circuit voltage method and (c) RLMPPT method

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

Figure 9Thedefinition of hitting zone for the 2nd set of experimentdata

below 5W and only small portions fall outside of 5W evenin the early learning phase

52 Results of the Gaussian Distribution Function GeneratedEnvironment Data with Sun Occultation by Clouds In thisexperiment the simulated data are obtained by adding a 30chance of sun occultation by clouds to the test data in thefirst experiment such that the temperature will fall down0 to 3∘C and the irradiance will decrease 0 to 300Wm2The experiment is conducted to illustrate the capability ofRLMPPT in tracking the MPP of PV array under variedweather condition Figure 9 shows the hitting zone definition

Table 2 The percentage of choosing action in different phase

Interval (minutes) Action (V)+5 +2 +05 minus05 minus2 minus5

Early 0sim25 28 8 12 8 12 32Middle 100sim125 8 16 24 20 20 12Final 215sim240 4 12 40 32 8 4

for the simulated data with added sun occultation by cloudsto the Gaussian distribution function generated environmentdata The red elliptical shape in Figure 9 covers the 902 ofthe total (119881mpp 119875mpp) points obtained from simulated data ofthe previous day Table 2 shows the percentage of selectingaction that is the perturbation ΔV(119894) to the 119881pv(119894) by theRLMPPT The percentages of each RL agent choosing actionin early phase middle phase and final phase of the RLMPPTmethod are shown in the second third and fourth rowrespectively of Table 2

It can be seen from Table 2 that the percentages ofselecting fine tuning actions that is the perturbation ΔV(119894)is plusmn05 V increase from 20 to 44 and finally to 72respectively for the early phase middle phase and finalphase of MPP tracking via the RL agent This table againillustrated the fact that the learning agent fast learned thestrategy in selecting the appropriate action toward reachingthe MPPT and hence the goal of tracking the MPP of thePV array is achieved by the RL agent However due tothe varied weather condition on sun occultation by clouds

International Journal of Photoenergy 9

0102030405060708090

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(a)

0

5

10

15

20

25

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(b)

0102030405060708090

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(c)

Figure 10 The maximum power point minus the predicted current power point corresponds to the same sensing time (a) PampO method(b) open-circuit voltage method and (c) RLMPPT method

the percentage of selecting fine-tuning actions is somewhatvaried a little bit in comparison with the results obtainedin Table 1 whose simulated data are generated by Gaussiandistribution function without sun occultation effect

Experimental results of the offsets between the calculatedand the tracking MPP by the comparing methods of PampOthe open-circuit voltage and the RLMPPT at the samesensing time are shown in Figures 10(a) 10(b) and 10(c)respectively Experiment data from Figure 11 again exhibitedthat among the three comparing methods the open-circuitvoltage method obtained the largest and sparsely distributedoffsets data which are concentrated around 15W Eventhough the offsets obtained by the PampO method fall largelybetween 0 and 5W however large portion of offset obtainedby the PampOmethod scattered around 1 to 40W before the 50minutes of the experiment On the other hand the proposedRLMPPT method achieves the least and condensed offsetsbelow 5W andmostly close to 3W in the final tracking phaseafter 200 minutes

53 Results of the Real Environment Data Real weather datafor PV array recorded in April 2014 at Loyola MarymountUniversity California USA is obtained online fromNationalRenewal Energy Laboratory (NREL) database for testing theRLMPPTmethod under real environment dataThe databaseis selected because the geographical location of the sensing

Table 3The percentage of choosing action in different phase for theexperiment with real weather data

Interval (minutes) Action (V)+5 +2 +05 minus05 minus2 minus5

Early 0sim25 24 16 8 12 12 28Middle 100sim125 16 16 12 24 24 8Final 215sim240 4 16 32 32 12 4

station is also located in the subtropical area A period ofrecorded data from 1000 to 1400 for 5 consecutive days isshown in Figure 6(c) and the hitting zone reward functionfor the data from earlier days is shown in Figure 11(a) Thered elliptical shape in Figure 11(a) covers the 934of the total(119881mpp 119875mpp) points obtained from the previous 5 consecutivedays of the NREL real weather data for testing Figures11(b) and 11(c) respectively show the real temperature andirradiance data recorded at 04012014 for generating the testdata

Table 3 shows the percentage of selecting action bythe RLMPPT The percentages of each RL agent choosingaction in early phase (0sim25 minutes) middle phase (100sim125minutes) and final phase (215sim240 minutes) of the RLMPPTmethod are shown in the second third and fourth rowsrespectively of Table 3

10 International Journal of Photoenergy

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(a)

15

155

16

165

17

175

0 50 100 150 200 250Time (min)

Tem

pera

ture

(∘C)

(b)

900

910

920

930

940

950

960

970

0 50 100 150 200 250Time (min)

Irra

dian

ce (W

m2)

(c)

Figure 11 (a) The definition of hitting zone for the experiment using real weather data (b) the real temperature data recorded in 04012014and used for generating the test data and (c) the real irradiance data recorded at 04012014 and used for generating the test data

In Table 3 one can see that the percentage of selectingfine tuning actions that is the perturbation ΔV(119894) is plusmn05 Vincrease from 20 to 36 and finally to 64 respectivelyfor the early phase middle phase and final phase of MPPtracking via the RL agent Even though the percentage ofselecting fine tuning actions in this real data experiment hasthe least value among the three experiments it exhibits thatthe RLMPPT learns to exercise the appropriate action of theperturbationΔV(119894) in tracking theMPP of the PV array underreal weather data This table again illustrated the fact thatthe learning agent fast learned the strategy in selecting theappropriate action toward reaching the MPP and hence thegoal of tracking the MPP of the PV array is achieved by theRL agent

Experimental results of the offsets between the MPPTand the predicted MPPT by the comparing methods at thesensing time for the simulation data generated by the realweather data are shown in Figure 12 Figures 12(a) 12(b) and12(c) respectively show the offsets between the calculatedMPPT and the tracking MPPT by the PampO method theopen-circuit voltage method and the RLMPPT methodExperiment data from Figure 12 again shows that among thethree comparing methods the open-circuit voltage methodobtained the largest and sparsely distributed offsets datawhose distributions are concentrated within a band cover bytwoGaussian distribution functionswith themaximumoffset

value of 197W The offsets obtained by the PampO methodfall largely around 5 and 25W however large portion ofoffset obtained by the PampO method sharply decreased from1 to 40W in the early 70 minutes of the experiment By theobservation of Figure 12(c) the proposed RLMPPT methodachieves the least and condenses offsets near 1 or 2W andnone of the offsets is higher than 5W after 5 simulationminutes from the beginning

54 Performance Comparison with Efficiency Factor In orderto validate whether the RLMPPTmethod is effective or not intracking MPP of a PV array the efficiency factor 120578 is used tocompare the performance of other existing methods for thethree experiments The 120578 is defined as follows

120578mppt =int119905

0119875actual (119905) 119889119905

int119905

0119875max (119905) 119889119905

(16)

where 119875actual(119905) and 119875max(119905) respectively represent the track-ing MPP of the MPPT method and the calculated MPPTable 4 shows that for the three test data sets the open-circuit voltage method has the least efficiency factor amongthe three comparing methods and the RLMPPT method hasthe best efficiency factor which is slightly better than that ofPampO method The advantage of the RLMPPT method overthe PampOmethod is shown in Figures 10 and 12where not only

International Journal of Photoenergy 11

020406080

100120140160

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(a)

186187188189

19191192193194195196

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(b)

0

50

100

150

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(c)

Figure 12 The maximum power point minus the predicted current power point corresponds to the same sensing time (a) PampO method(b) open-circuit voltage method and (c) RLMPPT method

Table 4 Comparison of efficiency factor for the three comparingmethods

Learning phase (min) MethodsOpen-circuit voltage PampO RLMPPT

Early 1000sim1025 869 835 892Middle 1140sim1205 875 978 994Final 1335sim1400 878 977 993Overall 1000sim1400 876 954 985

the RLMPPT method significantly improves the efficiencyfactor in comparing that of the PampO but also the learningagent of the RLMPPT fast learned the strategy in selecting theappropriate action toward reaching theMPPTwhich is muchfaster than the PampO method in exhibiting a slow adaptivephase Hence the RLMPPT not only improves the efficiencyfactor in tracking the MPP of the PV array but also has thefast learning capability in achieving the task of MPPT of thePV array

6 Conclusions

In this study a reinforcement learning-based maximumpower point tracking (RLMPPT) method is proposed forPV arrayThe RLMPPTmethod monitors the environmental

state of the PV array and adjusts the perturbation to theoperating voltage of the PV array in achieving the best MPPSimulations of the proposed RLMPPT for a PV array areconducted on three kinds of data set which are simulatedGaussian weather data simulated Gaussian weather dataadded with sun occultation effect and real weather data fromNREL database Experimental results demonstrate that incomparison to the existing PampO method the RLMPPT notonly achieves better efficiency factor for both simulated andreal weather data sets but also adapts to the environmentmuch fast with very short learning time Further the rein-forcement learning-basedMPPTmethodwould be employedin the real PV array to validate the effectiveness of theproposed novel MPPT method

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] T Esram andP L Chapman ldquoComparison of photovoltaic arraymaximum power point tracking techniquesrdquo IEEE Transactionson Energy Conversion vol 22 no 2 pp 439ndash449 2007

12 International Journal of Photoenergy

[2] D P Hohm and M E Ropp ldquoComparative study of maximumpower point tracking algorithmsrdquo Progress in PhotovoltaicsResearch and Applications vol 11 no 1 pp 47ndash62 2003

[3] N Femia G Petrone G Spagnuolo and M Vitelli ldquoOptimiza-tion of perturb and observe maximum power point trackingmethodrdquo IEEE Transactions on Power Electronics vol 20 no4 pp 963ndash973 2005

[4] J S Kumari D C S Babu andA K Babu ldquoDesign and analysisof PO and IPO MPPT technique for photovoltaic systemrdquoInternational Journal ofModern Engineering Research vol 2 no4 pp 2174ndash2180 2012

[5] D P Hohm and M E Ropp ldquoComparative study of maximumpower point tracking algorithms using an experimental pro-grammable maximum power point tracking test bedrdquo in Pro-ceedings of the Conference Record of the IEEE 28th PhotovoltaicSpecialists Conference pp 1699ndash1702 2000

[6] L-R Chen C-H Tsai Y-L Lin and Y-S Lai ldquoA biologicalswarm chasing algorithm for tracking the PVmaximum powerpointrdquo IEEE Transactions on Energy Conversion vol 25 no 2pp 484ndash493 2010

[7] T L Kottas Y S Boutalis and A D Karlis ldquoNew maximumpower point tracker for PV arrays using fuzzy controller in closecooperation with fuzzy cognitive networksrdquo IEEE Transactionson Energy Conversion vol 21 no 3 pp 793ndash803 2006

[8] N Mutoh M Ohno and T Inoue ldquoA method for MPPT con-trol while searching for parameters corresponding to weatherconditions for PV generation systemsrdquo IEEE Transactions onIndustrial Electronics vol 53 no 4 pp 1055ndash1065 2006

[9] G C Hsieh H I Hsieh C Y Tsai and C H Wang ldquoPhotovol-taic power-increment-aided incremental-conductance MPPTwith two-phased trackingrdquo IEEE Transactions on Power Elec-tronics vol 28 no 6 pp 2895ndash2911 2013

[10] R L Mueller M T Wallace and P Iles ldquoScaling nominal solarcell impedances for array designrdquo in Proceedings of the IEEE 1stWorld Conference on Photovoltaic Energy Conversion vol 2 pp2034ndash2037 December 1994

[11] N Femia G Petrone G Spagnuolo andMVitelli ldquoOptimizingsampling rate of PampO MPPT techniquerdquo in Proceedings ofthe IEEE 35th Annual Power Electronics Specialists Conference(PESC rsquo04) vol 3 pp 1945ndash1949 June 2004

[12] N Femia G PetroneG Spagnuolo andMVitelli ldquoA techniquefor improving PampOMPPT performances of double-stage grid-connected photovoltaic systemsrdquo IEEE Transactions on Indus-trial Electronics vol 56 no 11 pp 4473ndash4482 2009

[13] K H Hussein I Muta T Hoshino and M Osakada ldquoMax-imum photovoltaic power tracking an algorithm for rapidlychanging atmospheric conditionsrdquo IEE ProceedingsmdashGenera-tion Transmission and Distribution vol 142 no 1 pp 59ndash641995

[14] C Hua J Lin and C Shen ldquoImplementation of a DSP-controlled photovoltaic systemwith peak power trackingrdquo IEEETransactions on Industrial Electronics vol 45 no 1 pp 99ndash1071998

[15] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996

[16] A G Barto Reinforcement Learning An Introduction MITPress 1998

[17] C JWatkins and PDayan ldquoQ-learningrdquoMachine Learning vol8 no 3-4 pp 279ndash292 1992

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Inorganic ChemistryInternational Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

International Journal ofPhotoenergy

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Carbohydrate Chemistry

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Physical Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom

Analytical Methods in Chemistry

Journal of

Volume 2014

Bioinorganic Chemistry and ApplicationsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

SpectroscopyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Medicinal ChemistryInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chromatography Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied ChemistryJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Theoretical ChemistryJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Spectroscopy

Analytical ChemistryInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Quantum Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Organic Chemistry International

ElectrochemistryInternational Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CatalystsJournal of

Page 5: Research Article A Reinforcement Learning-Based Maximum ...downloads.hindawi.com/journals/ijp/2015/496401.pdf · Research Article A Reinforcement Learning-Based Maximum Power Point

International Journal of Photoenergy 5

Assess

Select

Adjust

Set

Update

End

False

True

Observe

Initialize

(1) states and actions according to the given mapping range(2) Q table as an 2-D array Q(middot) with size |S| times |A|(3) iteration count i = 0 all Q value in Q(s a) = 0discount rate 120574 = 08 learning rate 120578 = 05and maximum sensing time slot number MTS

i le MTS

Vpv(i) for i and obtain Ppv(i)

i = i + 1 for next sensing time slot

action Δ(i) according to Q table using 120576-greedy algorithm

reward r by (13)

Q value by giving s a s and r into (8)ndash(10)

the next state s

998400

998400

998400998400

Figure 5 The flowchart of the proposed RLMPPT for the PV array

Intuitively the simplest but effective derivation of rewardcould be a hit-or-miss type of function that is once theobservable signal pair (119881pv(119894) 119875pv(119894)) hits the sweetest spotthat is the true MPP of (119881mpp(119894) 119875mpp(119894)) a positive rewardis given to the RL agent otherwise zero reward is given tothe RL agent The hit-or-miss type of reward function couldintuitively be defined as

119903119894+1

= 120575 ((119881pv (119894) 119875pv (119894)) ℎ 119911 (119894)) (13)

where 120575(sdot) represents the Kronecker delta function andℎ 119911(119894) is the hitting spot defined for the 119894th sensing timeslot In defining (13) a negative reward explicitly representspunishment for the agentrsquos failure that is missing the hittingspot could achieve better learning results in comparison witha zero reward in the tracking of MPP of a PV array Inreality the possibility that the observable signal pair (119881pv(119894)119875pv(119894)) leaded by the agentrsquos action of perturbation ΔV(119894) tothe 119881pv(119894) exactly hits the sweetest spot of MPP is very lowin any sensing time slot 119894 Besides it is also very difficult for

the environmentrsquos judge to define a hitting spot for everysensing time slot Hence the hitting spot ℎ 119911(119894) in (13) canbe relaxed where a required hitting zone is defined on theprevious environmental knowledge on a diurnal basis andwhere a positivenegative reward will be given if (119881pv(119894)119875pv(119894)) falls intooutside the predefined hitting zone in anytime slot Hence the reward function can be formulated as

119903119894+1

=

1198881 (119881pv (119894) 119875pv (119894)) isin Hitting Zone

minus1198882 otherwise

(14)

where 1198881and 119888

2are positive values denoting weighting

factors in maximizing the difference between reward andpunishment for better learning effect In this study the 120576-greedy algorithm is used in selecting the RLMPPT agentrsquosactions to avoid repeatedly selection of the same action Theflowchart of the proposed RLMPPT of this study is shown inFigure 5

6 International Journal of Photoenergy

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(a)

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36

Pmax

(W)

Vmax

(V)

(b)

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(c)

Figure 6 MPP distribution under simulated environment (a) Gaussian distribution function generated environment data (b) the set in (a)added with sun occultation by clouds and (c) MPP distribution obtained using real weather data

4 Configurations of Experiments

Experiments of MPPT of PV array utilizing the RLMPPTare conducted by simulation on computer and the results arecompared with existing MPPT methods for PV array

41 Environment Simulation and Configuration for PV ArrayIn this study the PV array used for simulation is SY-175Mmanufactured by Tangshan Shaiyang Solar TechnologyCo Ltd China The power rating of the simulated PVarray was 175W with open-circuit voltage 44V and short-circuit current 52 A Validation of RLMPPTrsquos effectivenessin MPPT was conducted via three experiments using twosimulated and one real weather data sets A basic experimentwas first performed to determine whether the RLMPPTcould achieve the task of MPPT by using a set of Gaussiandistribution function generated temperature and irradianceAnd the effect of sun occultation by clouds is added on theGaussian distribution function generated set of temperatureand irradiance as the second experiment Real weather datafor PV array recorded in April 2014 at Loyola MarymountUniversity California USA obtained fromNational RenewalEnergy Laboratory (NREL) database provided an empiricaldata set to test the RLMPPT under real weather conditionConfigurations of the experiments are described in thefollowing

Assume the PV array is located at the Subtropics area(Chiayi City in the south of Taiwan) during 1000 and1400 in summertime where the temperature and irradiance

respectively are simulated using Gaussian distribution func-tion with mean value of 30∘C and 800Wm2 and standarddeviation of 4∘C and 50Wm2 According to (1) the simu-lated set of temperature and irradiance produces the MPPvoltage (119881mpp) and the calculated MPP (119875mpp) as shown inFigure 6(a) where the 119881mpp and the 119875mpp respectively lay at119909-axis and 119910-axis Figures 6(b) and 6(c) respectively showthe (119881mpp 119875mpp) plots of the Gaussian generated temperatureand irradiance with sun occultation effect and the realweather data recorded on April 1 2014 at LoyolaMarymountUniversity

42 Reward Function and State Action Arrangement Inapplying the RLMPPT to solve the problem of the MPPTfor PV array the reward function plays an important rolebecause not only a good reward function definition couldachieve the right feedback on every execution of learningand tracking but also it could enhance the efficiency ofthe learning algorithm In this study a hitting zone withelliptical shape for (119881mpp 119875mpp) is defined such that a positivereward value is given to the agent whenever it obtained(119881mpp 119875mpp) falls into the hitting zone in the sensing timeslot Figure 7 shows the different size of elliptical hittingzone superimposed on the plot of simulated (119881mpp 119875mpp)of Figure 6(a) The red elliptical circle in Figures 7(a) 7(b)and 7(c) respectively represents that the hitting zone covers372 685 and 855 of the total (119881mpp 119875mpp) pointsobtained from the simulated data of the previous day

In realizing the RLMPPT the state vector is defined as119878 = [119878

0 1198781 1198782 1198783] sube S while the meaning of each state is

International Journal of Photoenergy 7

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(a)

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(b)

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(c)

Figure 7 Different size of elliptical hitting zone superimposed on the plot of simulated (119881mpp 119875mpp) for the first set of simulated environmentdata (a) 372 of the MPPs involved (b) 685 of the MPPs involved and (c) 855 of the MPPs involved

explained in the previous section and shown in Figure 5 Sixperturbations of ΔV(119894) to the 119881pv(119894) are defined as the set ofaction as follows

A = 119886119894| 1198860= minus5V 119886

1= minus2V 119886

2= minus05V

1198863= +05V 119886

4= +2V 119886

5= +5V

(15)

The 120576-greedy is used in choosing agentrsquos actions in theRLMPPT such that agent repeatedly selects the same actionis preventedThe rewarding policy of the hitting zone rewardfunction is to give a reward value of 10 and 0 respectivelyto the agent whenever it obtained (119881mpp 119875mpp) in any sensingtime slot falls-in and falls-out the hitting zone

5 Experimental Results

51 Results of the Gaussian Distribution Function GeneratedEnvironment Data In this study experiments of RLMPPTin testing the Gaussian generated and real environment dataare conducted and the results are compared with those of thePampO method and the open-circuit voltage method

For the Gaussian distribution function generated envi-ronment simulation the percentages of each RL agent choos-ing actions in early phase (0sim25 minutes) middle phase(100sim125 minutes) and final phase (215sim240 minutes) of theRLMPPT are shown in the second third and fourth rowrespectively of Table 1 within two-hour period It can be seenthat in the early phase of the simulation the RL agent isin fast learning stage such that the action chosen by agentis concentrated on the plusmn5V and plusmn2V actions However inthe middle and final phase of the simulation the learning

Table 1 The percentage of choosing action in different phase

Interval (minutes) Action (V)+5 +2 +05 minus05 minus2 minus5

Early 0sim25 16 20 8 12 24 20Middle 100sim125 4 28 24 16 20 8Final 215sim240 4 8 36 40 8 4

is completed and the agent exploited what it has learnedhence the percentage of choosing fine tuning actions that isplusmn05 V is increased from 20 of the early phase to 40 and76 respectively for the middle and final phase It can beconcluded that the learning agent fast learned the strategy inselecting the appropriate action toward reaching the MPPTand hence the goal of tracking the MPP is achieved by the RLagent

Experimental results of the offsets between the calculatedand the tracking MPP by the RLMPPT and comparingmethods at the sensing time are shown in Figure 8 Figures8(a) 8(b) and 8(c) respectively show the offsets between thecalculated MPP and the tracking MPP by the PampO methodthe open-circuit voltagemethod and the RLMPPTmethod atthe sensing time One can see that among the experimentalresults of the three comparingmethods the open-circuit volt-age method obtained the largest offset which is concentratedaround 15W Even though the offsets obtained by the PampOmethod fall largely below 10W however large portion ofoffset obtained by the PampO method also randomly scatteredbetween 1 and 10W On the other hand the proposedRLMPPT method achieves the least and condenses offsets

8 International Journal of Photoenergy

020406080

100120140

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(a)

0 50 100 150 200 250Time (min)

0

5

10

15

20

25

PmaxminusP

actu

al(W

)

(b)

0 50 100 150 200 250Time (min)

0

5

10

15

20

25

PmaxminusP

actu

al(W

)

(c)

Figure 8 The maximum power point minus the predicted current power point corresponds to the same sensing time (a) PampO method(b) open-circuit voltage method and (c) RLMPPT method

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

Figure 9Thedefinition of hitting zone for the 2nd set of experimentdata

below 5W and only small portions fall outside of 5W evenin the early learning phase

52 Results of the Gaussian Distribution Function GeneratedEnvironment Data with Sun Occultation by Clouds In thisexperiment the simulated data are obtained by adding a 30chance of sun occultation by clouds to the test data in thefirst experiment such that the temperature will fall down0 to 3∘C and the irradiance will decrease 0 to 300Wm2The experiment is conducted to illustrate the capability ofRLMPPT in tracking the MPP of PV array under variedweather condition Figure 9 shows the hitting zone definition

Table 2 The percentage of choosing action in different phase

Interval (minutes) Action (V)+5 +2 +05 minus05 minus2 minus5

Early 0sim25 28 8 12 8 12 32Middle 100sim125 8 16 24 20 20 12Final 215sim240 4 12 40 32 8 4

for the simulated data with added sun occultation by cloudsto the Gaussian distribution function generated environmentdata The red elliptical shape in Figure 9 covers the 902 ofthe total (119881mpp 119875mpp) points obtained from simulated data ofthe previous day Table 2 shows the percentage of selectingaction that is the perturbation ΔV(119894) to the 119881pv(119894) by theRLMPPT The percentages of each RL agent choosing actionin early phase middle phase and final phase of the RLMPPTmethod are shown in the second third and fourth rowrespectively of Table 2

It can be seen from Table 2 that the percentages ofselecting fine tuning actions that is the perturbation ΔV(119894)is plusmn05 V increase from 20 to 44 and finally to 72respectively for the early phase middle phase and finalphase of MPP tracking via the RL agent This table againillustrated the fact that the learning agent fast learned thestrategy in selecting the appropriate action toward reachingthe MPPT and hence the goal of tracking the MPP of thePV array is achieved by the RL agent However due tothe varied weather condition on sun occultation by clouds

International Journal of Photoenergy 9

0102030405060708090

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(a)

0

5

10

15

20

25

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(b)

0102030405060708090

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(c)

Figure 10 The maximum power point minus the predicted current power point corresponds to the same sensing time (a) PampO method(b) open-circuit voltage method and (c) RLMPPT method

the percentage of selecting fine-tuning actions is somewhatvaried a little bit in comparison with the results obtainedin Table 1 whose simulated data are generated by Gaussiandistribution function without sun occultation effect

Experimental results of the offsets between the calculatedand the tracking MPP by the comparing methods of PampOthe open-circuit voltage and the RLMPPT at the samesensing time are shown in Figures 10(a) 10(b) and 10(c)respectively Experiment data from Figure 11 again exhibitedthat among the three comparing methods the open-circuitvoltage method obtained the largest and sparsely distributedoffsets data which are concentrated around 15W Eventhough the offsets obtained by the PampO method fall largelybetween 0 and 5W however large portion of offset obtainedby the PampOmethod scattered around 1 to 40W before the 50minutes of the experiment On the other hand the proposedRLMPPT method achieves the least and condensed offsetsbelow 5W andmostly close to 3W in the final tracking phaseafter 200 minutes

53 Results of the Real Environment Data Real weather datafor PV array recorded in April 2014 at Loyola MarymountUniversity California USA is obtained online fromNationalRenewal Energy Laboratory (NREL) database for testing theRLMPPTmethod under real environment dataThe databaseis selected because the geographical location of the sensing

Table 3The percentage of choosing action in different phase for theexperiment with real weather data

Interval (minutes) Action (V)+5 +2 +05 minus05 minus2 minus5

Early 0sim25 24 16 8 12 12 28Middle 100sim125 16 16 12 24 24 8Final 215sim240 4 16 32 32 12 4

station is also located in the subtropical area A period ofrecorded data from 1000 to 1400 for 5 consecutive days isshown in Figure 6(c) and the hitting zone reward functionfor the data from earlier days is shown in Figure 11(a) Thered elliptical shape in Figure 11(a) covers the 934of the total(119881mpp 119875mpp) points obtained from the previous 5 consecutivedays of the NREL real weather data for testing Figures11(b) and 11(c) respectively show the real temperature andirradiance data recorded at 04012014 for generating the testdata

Table 3 shows the percentage of selecting action bythe RLMPPT The percentages of each RL agent choosingaction in early phase (0sim25 minutes) middle phase (100sim125minutes) and final phase (215sim240 minutes) of the RLMPPTmethod are shown in the second third and fourth rowsrespectively of Table 3

10 International Journal of Photoenergy

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(a)

15

155

16

165

17

175

0 50 100 150 200 250Time (min)

Tem

pera

ture

(∘C)

(b)

900

910

920

930

940

950

960

970

0 50 100 150 200 250Time (min)

Irra

dian

ce (W

m2)

(c)

Figure 11 (a) The definition of hitting zone for the experiment using real weather data (b) the real temperature data recorded in 04012014and used for generating the test data and (c) the real irradiance data recorded at 04012014 and used for generating the test data

In Table 3 one can see that the percentage of selectingfine tuning actions that is the perturbation ΔV(119894) is plusmn05 Vincrease from 20 to 36 and finally to 64 respectivelyfor the early phase middle phase and final phase of MPPtracking via the RL agent Even though the percentage ofselecting fine tuning actions in this real data experiment hasthe least value among the three experiments it exhibits thatthe RLMPPT learns to exercise the appropriate action of theperturbationΔV(119894) in tracking theMPP of the PV array underreal weather data This table again illustrated the fact thatthe learning agent fast learned the strategy in selecting theappropriate action toward reaching the MPP and hence thegoal of tracking the MPP of the PV array is achieved by theRL agent

Experimental results of the offsets between the MPPTand the predicted MPPT by the comparing methods at thesensing time for the simulation data generated by the realweather data are shown in Figure 12 Figures 12(a) 12(b) and12(c) respectively show the offsets between the calculatedMPPT and the tracking MPPT by the PampO method theopen-circuit voltage method and the RLMPPT methodExperiment data from Figure 12 again shows that among thethree comparing methods the open-circuit voltage methodobtained the largest and sparsely distributed offsets datawhose distributions are concentrated within a band cover bytwoGaussian distribution functionswith themaximumoffset

value of 197W The offsets obtained by the PampO methodfall largely around 5 and 25W however large portion ofoffset obtained by the PampO method sharply decreased from1 to 40W in the early 70 minutes of the experiment By theobservation of Figure 12(c) the proposed RLMPPT methodachieves the least and condenses offsets near 1 or 2W andnone of the offsets is higher than 5W after 5 simulationminutes from the beginning

54 Performance Comparison with Efficiency Factor In orderto validate whether the RLMPPTmethod is effective or not intracking MPP of a PV array the efficiency factor 120578 is used tocompare the performance of other existing methods for thethree experiments The 120578 is defined as follows

120578mppt =int119905

0119875actual (119905) 119889119905

int119905

0119875max (119905) 119889119905

(16)

where 119875actual(119905) and 119875max(119905) respectively represent the track-ing MPP of the MPPT method and the calculated MPPTable 4 shows that for the three test data sets the open-circuit voltage method has the least efficiency factor amongthe three comparing methods and the RLMPPT method hasthe best efficiency factor which is slightly better than that ofPampO method The advantage of the RLMPPT method overthe PampOmethod is shown in Figures 10 and 12where not only

International Journal of Photoenergy 11

020406080

100120140160

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(a)

186187188189

19191192193194195196

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(b)

0

50

100

150

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(c)

Figure 12 The maximum power point minus the predicted current power point corresponds to the same sensing time (a) PampO method(b) open-circuit voltage method and (c) RLMPPT method

Table 4 Comparison of efficiency factor for the three comparingmethods

Learning phase (min) MethodsOpen-circuit voltage PampO RLMPPT

Early 1000sim1025 869 835 892Middle 1140sim1205 875 978 994Final 1335sim1400 878 977 993Overall 1000sim1400 876 954 985

the RLMPPT method significantly improves the efficiencyfactor in comparing that of the PampO but also the learningagent of the RLMPPT fast learned the strategy in selecting theappropriate action toward reaching theMPPTwhich is muchfaster than the PampO method in exhibiting a slow adaptivephase Hence the RLMPPT not only improves the efficiencyfactor in tracking the MPP of the PV array but also has thefast learning capability in achieving the task of MPPT of thePV array

6 Conclusions

In this study a reinforcement learning-based maximumpower point tracking (RLMPPT) method is proposed forPV arrayThe RLMPPTmethod monitors the environmental

state of the PV array and adjusts the perturbation to theoperating voltage of the PV array in achieving the best MPPSimulations of the proposed RLMPPT for a PV array areconducted on three kinds of data set which are simulatedGaussian weather data simulated Gaussian weather dataadded with sun occultation effect and real weather data fromNREL database Experimental results demonstrate that incomparison to the existing PampO method the RLMPPT notonly achieves better efficiency factor for both simulated andreal weather data sets but also adapts to the environmentmuch fast with very short learning time Further the rein-forcement learning-basedMPPTmethodwould be employedin the real PV array to validate the effectiveness of theproposed novel MPPT method

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] T Esram andP L Chapman ldquoComparison of photovoltaic arraymaximum power point tracking techniquesrdquo IEEE Transactionson Energy Conversion vol 22 no 2 pp 439ndash449 2007

12 International Journal of Photoenergy

[2] D P Hohm and M E Ropp ldquoComparative study of maximumpower point tracking algorithmsrdquo Progress in PhotovoltaicsResearch and Applications vol 11 no 1 pp 47ndash62 2003

[3] N Femia G Petrone G Spagnuolo and M Vitelli ldquoOptimiza-tion of perturb and observe maximum power point trackingmethodrdquo IEEE Transactions on Power Electronics vol 20 no4 pp 963ndash973 2005

[4] J S Kumari D C S Babu andA K Babu ldquoDesign and analysisof PO and IPO MPPT technique for photovoltaic systemrdquoInternational Journal ofModern Engineering Research vol 2 no4 pp 2174ndash2180 2012

[5] D P Hohm and M E Ropp ldquoComparative study of maximumpower point tracking algorithms using an experimental pro-grammable maximum power point tracking test bedrdquo in Pro-ceedings of the Conference Record of the IEEE 28th PhotovoltaicSpecialists Conference pp 1699ndash1702 2000

[6] L-R Chen C-H Tsai Y-L Lin and Y-S Lai ldquoA biologicalswarm chasing algorithm for tracking the PVmaximum powerpointrdquo IEEE Transactions on Energy Conversion vol 25 no 2pp 484ndash493 2010

[7] T L Kottas Y S Boutalis and A D Karlis ldquoNew maximumpower point tracker for PV arrays using fuzzy controller in closecooperation with fuzzy cognitive networksrdquo IEEE Transactionson Energy Conversion vol 21 no 3 pp 793ndash803 2006

[8] N Mutoh M Ohno and T Inoue ldquoA method for MPPT con-trol while searching for parameters corresponding to weatherconditions for PV generation systemsrdquo IEEE Transactions onIndustrial Electronics vol 53 no 4 pp 1055ndash1065 2006

[9] G C Hsieh H I Hsieh C Y Tsai and C H Wang ldquoPhotovol-taic power-increment-aided incremental-conductance MPPTwith two-phased trackingrdquo IEEE Transactions on Power Elec-tronics vol 28 no 6 pp 2895ndash2911 2013

[10] R L Mueller M T Wallace and P Iles ldquoScaling nominal solarcell impedances for array designrdquo in Proceedings of the IEEE 1stWorld Conference on Photovoltaic Energy Conversion vol 2 pp2034ndash2037 December 1994

[11] N Femia G Petrone G Spagnuolo andMVitelli ldquoOptimizingsampling rate of PampO MPPT techniquerdquo in Proceedings ofthe IEEE 35th Annual Power Electronics Specialists Conference(PESC rsquo04) vol 3 pp 1945ndash1949 June 2004

[12] N Femia G PetroneG Spagnuolo andMVitelli ldquoA techniquefor improving PampOMPPT performances of double-stage grid-connected photovoltaic systemsrdquo IEEE Transactions on Indus-trial Electronics vol 56 no 11 pp 4473ndash4482 2009

[13] K H Hussein I Muta T Hoshino and M Osakada ldquoMax-imum photovoltaic power tracking an algorithm for rapidlychanging atmospheric conditionsrdquo IEE ProceedingsmdashGenera-tion Transmission and Distribution vol 142 no 1 pp 59ndash641995

[14] C Hua J Lin and C Shen ldquoImplementation of a DSP-controlled photovoltaic systemwith peak power trackingrdquo IEEETransactions on Industrial Electronics vol 45 no 1 pp 99ndash1071998

[15] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996

[16] A G Barto Reinforcement Learning An Introduction MITPress 1998

[17] C JWatkins and PDayan ldquoQ-learningrdquoMachine Learning vol8 no 3-4 pp 279ndash292 1992

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Inorganic ChemistryInternational Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

International Journal ofPhotoenergy

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Carbohydrate Chemistry

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Physical Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom

Analytical Methods in Chemistry

Journal of

Volume 2014

Bioinorganic Chemistry and ApplicationsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

SpectroscopyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Medicinal ChemistryInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chromatography Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied ChemistryJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Theoretical ChemistryJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Spectroscopy

Analytical ChemistryInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Quantum Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Organic Chemistry International

ElectrochemistryInternational Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CatalystsJournal of

Page 6: Research Article A Reinforcement Learning-Based Maximum ...downloads.hindawi.com/journals/ijp/2015/496401.pdf · Research Article A Reinforcement Learning-Based Maximum Power Point

6 International Journal of Photoenergy

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(a)

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36

Pmax

(W)

Vmax

(V)

(b)

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(c)

Figure 6 MPP distribution under simulated environment (a) Gaussian distribution function generated environment data (b) the set in (a)added with sun occultation by clouds and (c) MPP distribution obtained using real weather data

4 Configurations of Experiments

Experiments of MPPT of PV array utilizing the RLMPPTare conducted by simulation on computer and the results arecompared with existing MPPT methods for PV array

41 Environment Simulation and Configuration for PV ArrayIn this study the PV array used for simulation is SY-175Mmanufactured by Tangshan Shaiyang Solar TechnologyCo Ltd China The power rating of the simulated PVarray was 175W with open-circuit voltage 44V and short-circuit current 52 A Validation of RLMPPTrsquos effectivenessin MPPT was conducted via three experiments using twosimulated and one real weather data sets A basic experimentwas first performed to determine whether the RLMPPTcould achieve the task of MPPT by using a set of Gaussiandistribution function generated temperature and irradianceAnd the effect of sun occultation by clouds is added on theGaussian distribution function generated set of temperatureand irradiance as the second experiment Real weather datafor PV array recorded in April 2014 at Loyola MarymountUniversity California USA obtained fromNational RenewalEnergy Laboratory (NREL) database provided an empiricaldata set to test the RLMPPT under real weather conditionConfigurations of the experiments are described in thefollowing

Assume the PV array is located at the Subtropics area(Chiayi City in the south of Taiwan) during 1000 and1400 in summertime where the temperature and irradiance

respectively are simulated using Gaussian distribution func-tion with mean value of 30∘C and 800Wm2 and standarddeviation of 4∘C and 50Wm2 According to (1) the simu-lated set of temperature and irradiance produces the MPPvoltage (119881mpp) and the calculated MPP (119875mpp) as shown inFigure 6(a) where the 119881mpp and the 119875mpp respectively lay at119909-axis and 119910-axis Figures 6(b) and 6(c) respectively showthe (119881mpp 119875mpp) plots of the Gaussian generated temperatureand irradiance with sun occultation effect and the realweather data recorded on April 1 2014 at LoyolaMarymountUniversity

42 Reward Function and State Action Arrangement Inapplying the RLMPPT to solve the problem of the MPPTfor PV array the reward function plays an important rolebecause not only a good reward function definition couldachieve the right feedback on every execution of learningand tracking but also it could enhance the efficiency ofthe learning algorithm In this study a hitting zone withelliptical shape for (119881mpp 119875mpp) is defined such that a positivereward value is given to the agent whenever it obtained(119881mpp 119875mpp) falls into the hitting zone in the sensing timeslot Figure 7 shows the different size of elliptical hittingzone superimposed on the plot of simulated (119881mpp 119875mpp)of Figure 6(a) The red elliptical circle in Figures 7(a) 7(b)and 7(c) respectively represents that the hitting zone covers372 685 and 855 of the total (119881mpp 119875mpp) pointsobtained from the simulated data of the previous day

In realizing the RLMPPT the state vector is defined as119878 = [119878

0 1198781 1198782 1198783] sube S while the meaning of each state is

International Journal of Photoenergy 7

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(a)

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(b)

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(c)

Figure 7 Different size of elliptical hitting zone superimposed on the plot of simulated (119881mpp 119875mpp) for the first set of simulated environmentdata (a) 372 of the MPPs involved (b) 685 of the MPPs involved and (c) 855 of the MPPs involved

explained in the previous section and shown in Figure 5 Sixperturbations of ΔV(119894) to the 119881pv(119894) are defined as the set ofaction as follows

A = 119886119894| 1198860= minus5V 119886

1= minus2V 119886

2= minus05V

1198863= +05V 119886

4= +2V 119886

5= +5V

(15)

The 120576-greedy is used in choosing agentrsquos actions in theRLMPPT such that agent repeatedly selects the same actionis preventedThe rewarding policy of the hitting zone rewardfunction is to give a reward value of 10 and 0 respectivelyto the agent whenever it obtained (119881mpp 119875mpp) in any sensingtime slot falls-in and falls-out the hitting zone

5 Experimental Results

51 Results of the Gaussian Distribution Function GeneratedEnvironment Data In this study experiments of RLMPPTin testing the Gaussian generated and real environment dataare conducted and the results are compared with those of thePampO method and the open-circuit voltage method

For the Gaussian distribution function generated envi-ronment simulation the percentages of each RL agent choos-ing actions in early phase (0sim25 minutes) middle phase(100sim125 minutes) and final phase (215sim240 minutes) of theRLMPPT are shown in the second third and fourth rowrespectively of Table 1 within two-hour period It can be seenthat in the early phase of the simulation the RL agent isin fast learning stage such that the action chosen by agentis concentrated on the plusmn5V and plusmn2V actions However inthe middle and final phase of the simulation the learning

Table 1 The percentage of choosing action in different phase

Interval (minutes) Action (V)+5 +2 +05 minus05 minus2 minus5

Early 0sim25 16 20 8 12 24 20Middle 100sim125 4 28 24 16 20 8Final 215sim240 4 8 36 40 8 4

is completed and the agent exploited what it has learnedhence the percentage of choosing fine tuning actions that isplusmn05 V is increased from 20 of the early phase to 40 and76 respectively for the middle and final phase It can beconcluded that the learning agent fast learned the strategy inselecting the appropriate action toward reaching the MPPTand hence the goal of tracking the MPP is achieved by the RLagent

Experimental results of the offsets between the calculatedand the tracking MPP by the RLMPPT and comparingmethods at the sensing time are shown in Figure 8 Figures8(a) 8(b) and 8(c) respectively show the offsets between thecalculated MPP and the tracking MPP by the PampO methodthe open-circuit voltagemethod and the RLMPPTmethod atthe sensing time One can see that among the experimentalresults of the three comparingmethods the open-circuit volt-age method obtained the largest offset which is concentratedaround 15W Even though the offsets obtained by the PampOmethod fall largely below 10W however large portion ofoffset obtained by the PampO method also randomly scatteredbetween 1 and 10W On the other hand the proposedRLMPPT method achieves the least and condenses offsets

8 International Journal of Photoenergy

020406080

100120140

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(a)

0 50 100 150 200 250Time (min)

0

5

10

15

20

25

PmaxminusP

actu

al(W

)

(b)

0 50 100 150 200 250Time (min)

0

5

10

15

20

25

PmaxminusP

actu

al(W

)

(c)

Figure 8 The maximum power point minus the predicted current power point corresponds to the same sensing time (a) PampO method(b) open-circuit voltage method and (c) RLMPPT method

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

Figure 9Thedefinition of hitting zone for the 2nd set of experimentdata

below 5W and only small portions fall outside of 5W evenin the early learning phase

52 Results of the Gaussian Distribution Function GeneratedEnvironment Data with Sun Occultation by Clouds In thisexperiment the simulated data are obtained by adding a 30chance of sun occultation by clouds to the test data in thefirst experiment such that the temperature will fall down0 to 3∘C and the irradiance will decrease 0 to 300Wm2The experiment is conducted to illustrate the capability ofRLMPPT in tracking the MPP of PV array under variedweather condition Figure 9 shows the hitting zone definition

Table 2 The percentage of choosing action in different phase

Interval (minutes) Action (V)+5 +2 +05 minus05 minus2 minus5

Early 0sim25 28 8 12 8 12 32Middle 100sim125 8 16 24 20 20 12Final 215sim240 4 12 40 32 8 4

for the simulated data with added sun occultation by cloudsto the Gaussian distribution function generated environmentdata The red elliptical shape in Figure 9 covers the 902 ofthe total (119881mpp 119875mpp) points obtained from simulated data ofthe previous day Table 2 shows the percentage of selectingaction that is the perturbation ΔV(119894) to the 119881pv(119894) by theRLMPPT The percentages of each RL agent choosing actionin early phase middle phase and final phase of the RLMPPTmethod are shown in the second third and fourth rowrespectively of Table 2

It can be seen from Table 2 that the percentages ofselecting fine tuning actions that is the perturbation ΔV(119894)is plusmn05 V increase from 20 to 44 and finally to 72respectively for the early phase middle phase and finalphase of MPP tracking via the RL agent This table againillustrated the fact that the learning agent fast learned thestrategy in selecting the appropriate action toward reachingthe MPPT and hence the goal of tracking the MPP of thePV array is achieved by the RL agent However due tothe varied weather condition on sun occultation by clouds

International Journal of Photoenergy 9

0102030405060708090

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(a)

0

5

10

15

20

25

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(b)

0102030405060708090

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(c)

Figure 10 The maximum power point minus the predicted current power point corresponds to the same sensing time (a) PampO method(b) open-circuit voltage method and (c) RLMPPT method

the percentage of selecting fine-tuning actions is somewhatvaried a little bit in comparison with the results obtainedin Table 1 whose simulated data are generated by Gaussiandistribution function without sun occultation effect

Experimental results of the offsets between the calculatedand the tracking MPP by the comparing methods of PampOthe open-circuit voltage and the RLMPPT at the samesensing time are shown in Figures 10(a) 10(b) and 10(c)respectively Experiment data from Figure 11 again exhibitedthat among the three comparing methods the open-circuitvoltage method obtained the largest and sparsely distributedoffsets data which are concentrated around 15W Eventhough the offsets obtained by the PampO method fall largelybetween 0 and 5W however large portion of offset obtainedby the PampOmethod scattered around 1 to 40W before the 50minutes of the experiment On the other hand the proposedRLMPPT method achieves the least and condensed offsetsbelow 5W andmostly close to 3W in the final tracking phaseafter 200 minutes

53 Results of the Real Environment Data Real weather datafor PV array recorded in April 2014 at Loyola MarymountUniversity California USA is obtained online fromNationalRenewal Energy Laboratory (NREL) database for testing theRLMPPTmethod under real environment dataThe databaseis selected because the geographical location of the sensing

Table 3The percentage of choosing action in different phase for theexperiment with real weather data

Interval (minutes) Action (V)+5 +2 +05 minus05 minus2 minus5

Early 0sim25 24 16 8 12 12 28Middle 100sim125 16 16 12 24 24 8Final 215sim240 4 16 32 32 12 4

station is also located in the subtropical area A period ofrecorded data from 1000 to 1400 for 5 consecutive days isshown in Figure 6(c) and the hitting zone reward functionfor the data from earlier days is shown in Figure 11(a) Thered elliptical shape in Figure 11(a) covers the 934of the total(119881mpp 119875mpp) points obtained from the previous 5 consecutivedays of the NREL real weather data for testing Figures11(b) and 11(c) respectively show the real temperature andirradiance data recorded at 04012014 for generating the testdata

Table 3 shows the percentage of selecting action bythe RLMPPT The percentages of each RL agent choosingaction in early phase (0sim25 minutes) middle phase (100sim125minutes) and final phase (215sim240 minutes) of the RLMPPTmethod are shown in the second third and fourth rowsrespectively of Table 3

10 International Journal of Photoenergy

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(a)

15

155

16

165

17

175

0 50 100 150 200 250Time (min)

Tem

pera

ture

(∘C)

(b)

900

910

920

930

940

950

960

970

0 50 100 150 200 250Time (min)

Irra

dian

ce (W

m2)

(c)

Figure 11 (a) The definition of hitting zone for the experiment using real weather data (b) the real temperature data recorded in 04012014and used for generating the test data and (c) the real irradiance data recorded at 04012014 and used for generating the test data

In Table 3 one can see that the percentage of selectingfine tuning actions that is the perturbation ΔV(119894) is plusmn05 Vincrease from 20 to 36 and finally to 64 respectivelyfor the early phase middle phase and final phase of MPPtracking via the RL agent Even though the percentage ofselecting fine tuning actions in this real data experiment hasthe least value among the three experiments it exhibits thatthe RLMPPT learns to exercise the appropriate action of theperturbationΔV(119894) in tracking theMPP of the PV array underreal weather data This table again illustrated the fact thatthe learning agent fast learned the strategy in selecting theappropriate action toward reaching the MPP and hence thegoal of tracking the MPP of the PV array is achieved by theRL agent

Experimental results of the offsets between the MPPTand the predicted MPPT by the comparing methods at thesensing time for the simulation data generated by the realweather data are shown in Figure 12 Figures 12(a) 12(b) and12(c) respectively show the offsets between the calculatedMPPT and the tracking MPPT by the PampO method theopen-circuit voltage method and the RLMPPT methodExperiment data from Figure 12 again shows that among thethree comparing methods the open-circuit voltage methodobtained the largest and sparsely distributed offsets datawhose distributions are concentrated within a band cover bytwoGaussian distribution functionswith themaximumoffset

value of 197W The offsets obtained by the PampO methodfall largely around 5 and 25W however large portion ofoffset obtained by the PampO method sharply decreased from1 to 40W in the early 70 minutes of the experiment By theobservation of Figure 12(c) the proposed RLMPPT methodachieves the least and condenses offsets near 1 or 2W andnone of the offsets is higher than 5W after 5 simulationminutes from the beginning

54 Performance Comparison with Efficiency Factor In orderto validate whether the RLMPPTmethod is effective or not intracking MPP of a PV array the efficiency factor 120578 is used tocompare the performance of other existing methods for thethree experiments The 120578 is defined as follows

120578mppt =int119905

0119875actual (119905) 119889119905

int119905

0119875max (119905) 119889119905

(16)

where 119875actual(119905) and 119875max(119905) respectively represent the track-ing MPP of the MPPT method and the calculated MPPTable 4 shows that for the three test data sets the open-circuit voltage method has the least efficiency factor amongthe three comparing methods and the RLMPPT method hasthe best efficiency factor which is slightly better than that ofPampO method The advantage of the RLMPPT method overthe PampOmethod is shown in Figures 10 and 12where not only

International Journal of Photoenergy 11

020406080

100120140160

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(a)

186187188189

19191192193194195196

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(b)

0

50

100

150

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(c)

Figure 12 The maximum power point minus the predicted current power point corresponds to the same sensing time (a) PampO method(b) open-circuit voltage method and (c) RLMPPT method

Table 4 Comparison of efficiency factor for the three comparingmethods

Learning phase (min) MethodsOpen-circuit voltage PampO RLMPPT

Early 1000sim1025 869 835 892Middle 1140sim1205 875 978 994Final 1335sim1400 878 977 993Overall 1000sim1400 876 954 985

the RLMPPT method significantly improves the efficiencyfactor in comparing that of the PampO but also the learningagent of the RLMPPT fast learned the strategy in selecting theappropriate action toward reaching theMPPTwhich is muchfaster than the PampO method in exhibiting a slow adaptivephase Hence the RLMPPT not only improves the efficiencyfactor in tracking the MPP of the PV array but also has thefast learning capability in achieving the task of MPPT of thePV array

6 Conclusions

In this study a reinforcement learning-based maximumpower point tracking (RLMPPT) method is proposed forPV arrayThe RLMPPTmethod monitors the environmental

state of the PV array and adjusts the perturbation to theoperating voltage of the PV array in achieving the best MPPSimulations of the proposed RLMPPT for a PV array areconducted on three kinds of data set which are simulatedGaussian weather data simulated Gaussian weather dataadded with sun occultation effect and real weather data fromNREL database Experimental results demonstrate that incomparison to the existing PampO method the RLMPPT notonly achieves better efficiency factor for both simulated andreal weather data sets but also adapts to the environmentmuch fast with very short learning time Further the rein-forcement learning-basedMPPTmethodwould be employedin the real PV array to validate the effectiveness of theproposed novel MPPT method

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] T Esram andP L Chapman ldquoComparison of photovoltaic arraymaximum power point tracking techniquesrdquo IEEE Transactionson Energy Conversion vol 22 no 2 pp 439ndash449 2007

12 International Journal of Photoenergy

[2] D P Hohm and M E Ropp ldquoComparative study of maximumpower point tracking algorithmsrdquo Progress in PhotovoltaicsResearch and Applications vol 11 no 1 pp 47ndash62 2003

[3] N Femia G Petrone G Spagnuolo and M Vitelli ldquoOptimiza-tion of perturb and observe maximum power point trackingmethodrdquo IEEE Transactions on Power Electronics vol 20 no4 pp 963ndash973 2005

[4] J S Kumari D C S Babu andA K Babu ldquoDesign and analysisof PO and IPO MPPT technique for photovoltaic systemrdquoInternational Journal ofModern Engineering Research vol 2 no4 pp 2174ndash2180 2012

[5] D P Hohm and M E Ropp ldquoComparative study of maximumpower point tracking algorithms using an experimental pro-grammable maximum power point tracking test bedrdquo in Pro-ceedings of the Conference Record of the IEEE 28th PhotovoltaicSpecialists Conference pp 1699ndash1702 2000

[6] L-R Chen C-H Tsai Y-L Lin and Y-S Lai ldquoA biologicalswarm chasing algorithm for tracking the PVmaximum powerpointrdquo IEEE Transactions on Energy Conversion vol 25 no 2pp 484ndash493 2010

[7] T L Kottas Y S Boutalis and A D Karlis ldquoNew maximumpower point tracker for PV arrays using fuzzy controller in closecooperation with fuzzy cognitive networksrdquo IEEE Transactionson Energy Conversion vol 21 no 3 pp 793ndash803 2006

[8] N Mutoh M Ohno and T Inoue ldquoA method for MPPT con-trol while searching for parameters corresponding to weatherconditions for PV generation systemsrdquo IEEE Transactions onIndustrial Electronics vol 53 no 4 pp 1055ndash1065 2006

[9] G C Hsieh H I Hsieh C Y Tsai and C H Wang ldquoPhotovol-taic power-increment-aided incremental-conductance MPPTwith two-phased trackingrdquo IEEE Transactions on Power Elec-tronics vol 28 no 6 pp 2895ndash2911 2013

[10] R L Mueller M T Wallace and P Iles ldquoScaling nominal solarcell impedances for array designrdquo in Proceedings of the IEEE 1stWorld Conference on Photovoltaic Energy Conversion vol 2 pp2034ndash2037 December 1994

[11] N Femia G Petrone G Spagnuolo andMVitelli ldquoOptimizingsampling rate of PampO MPPT techniquerdquo in Proceedings ofthe IEEE 35th Annual Power Electronics Specialists Conference(PESC rsquo04) vol 3 pp 1945ndash1949 June 2004

[12] N Femia G PetroneG Spagnuolo andMVitelli ldquoA techniquefor improving PampOMPPT performances of double-stage grid-connected photovoltaic systemsrdquo IEEE Transactions on Indus-trial Electronics vol 56 no 11 pp 4473ndash4482 2009

[13] K H Hussein I Muta T Hoshino and M Osakada ldquoMax-imum photovoltaic power tracking an algorithm for rapidlychanging atmospheric conditionsrdquo IEE ProceedingsmdashGenera-tion Transmission and Distribution vol 142 no 1 pp 59ndash641995

[14] C Hua J Lin and C Shen ldquoImplementation of a DSP-controlled photovoltaic systemwith peak power trackingrdquo IEEETransactions on Industrial Electronics vol 45 no 1 pp 99ndash1071998

[15] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996

[16] A G Barto Reinforcement Learning An Introduction MITPress 1998

[17] C JWatkins and PDayan ldquoQ-learningrdquoMachine Learning vol8 no 3-4 pp 279ndash292 1992

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Inorganic ChemistryInternational Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

International Journal ofPhotoenergy

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Carbohydrate Chemistry

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Physical Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom

Analytical Methods in Chemistry

Journal of

Volume 2014

Bioinorganic Chemistry and ApplicationsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

SpectroscopyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Medicinal ChemistryInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chromatography Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied ChemistryJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Theoretical ChemistryJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Spectroscopy

Analytical ChemistryInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Quantum Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Organic Chemistry International

ElectrochemistryInternational Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CatalystsJournal of

Page 7: Research Article A Reinforcement Learning-Based Maximum ...downloads.hindawi.com/journals/ijp/2015/496401.pdf · Research Article A Reinforcement Learning-Based Maximum Power Point

International Journal of Photoenergy 7

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(a)

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(b)

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(c)

Figure 7 Different size of elliptical hitting zone superimposed on the plot of simulated (119881mpp 119875mpp) for the first set of simulated environmentdata (a) 372 of the MPPs involved (b) 685 of the MPPs involved and (c) 855 of the MPPs involved

explained in the previous section and shown in Figure 5 Sixperturbations of ΔV(119894) to the 119881pv(119894) are defined as the set ofaction as follows

A = 119886119894| 1198860= minus5V 119886

1= minus2V 119886

2= minus05V

1198863= +05V 119886

4= +2V 119886

5= +5V

(15)

The 120576-greedy is used in choosing agentrsquos actions in theRLMPPT such that agent repeatedly selects the same actionis preventedThe rewarding policy of the hitting zone rewardfunction is to give a reward value of 10 and 0 respectivelyto the agent whenever it obtained (119881mpp 119875mpp) in any sensingtime slot falls-in and falls-out the hitting zone

5 Experimental Results

51 Results of the Gaussian Distribution Function GeneratedEnvironment Data In this study experiments of RLMPPTin testing the Gaussian generated and real environment dataare conducted and the results are compared with those of thePampO method and the open-circuit voltage method

For the Gaussian distribution function generated envi-ronment simulation the percentages of each RL agent choos-ing actions in early phase (0sim25 minutes) middle phase(100sim125 minutes) and final phase (215sim240 minutes) of theRLMPPT are shown in the second third and fourth rowrespectively of Table 1 within two-hour period It can be seenthat in the early phase of the simulation the RL agent isin fast learning stage such that the action chosen by agentis concentrated on the plusmn5V and plusmn2V actions However inthe middle and final phase of the simulation the learning

Table 1 The percentage of choosing action in different phase

Interval (minutes) Action (V)+5 +2 +05 minus05 minus2 minus5

Early 0sim25 16 20 8 12 24 20Middle 100sim125 4 28 24 16 20 8Final 215sim240 4 8 36 40 8 4

is completed and the agent exploited what it has learnedhence the percentage of choosing fine tuning actions that isplusmn05 V is increased from 20 of the early phase to 40 and76 respectively for the middle and final phase It can beconcluded that the learning agent fast learned the strategy inselecting the appropriate action toward reaching the MPPTand hence the goal of tracking the MPP is achieved by the RLagent

Experimental results of the offsets between the calculatedand the tracking MPP by the RLMPPT and comparingmethods at the sensing time are shown in Figure 8 Figures8(a) 8(b) and 8(c) respectively show the offsets between thecalculated MPP and the tracking MPP by the PampO methodthe open-circuit voltagemethod and the RLMPPTmethod atthe sensing time One can see that among the experimentalresults of the three comparingmethods the open-circuit volt-age method obtained the largest offset which is concentratedaround 15W Even though the offsets obtained by the PampOmethod fall largely below 10W however large portion ofoffset obtained by the PampO method also randomly scatteredbetween 1 and 10W On the other hand the proposedRLMPPT method achieves the least and condenses offsets

8 International Journal of Photoenergy

020406080

100120140

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(a)

0 50 100 150 200 250Time (min)

0

5

10

15

20

25

PmaxminusP

actu

al(W

)

(b)

0 50 100 150 200 250Time (min)

0

5

10

15

20

25

PmaxminusP

actu

al(W

)

(c)

Figure 8 The maximum power point minus the predicted current power point corresponds to the same sensing time (a) PampO method(b) open-circuit voltage method and (c) RLMPPT method

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

Figure 9Thedefinition of hitting zone for the 2nd set of experimentdata

below 5W and only small portions fall outside of 5W evenin the early learning phase

52 Results of the Gaussian Distribution Function GeneratedEnvironment Data with Sun Occultation by Clouds In thisexperiment the simulated data are obtained by adding a 30chance of sun occultation by clouds to the test data in thefirst experiment such that the temperature will fall down0 to 3∘C and the irradiance will decrease 0 to 300Wm2The experiment is conducted to illustrate the capability ofRLMPPT in tracking the MPP of PV array under variedweather condition Figure 9 shows the hitting zone definition

Table 2 The percentage of choosing action in different phase

Interval (minutes) Action (V)+5 +2 +05 minus05 minus2 minus5

Early 0sim25 28 8 12 8 12 32Middle 100sim125 8 16 24 20 20 12Final 215sim240 4 12 40 32 8 4

for the simulated data with added sun occultation by cloudsto the Gaussian distribution function generated environmentdata The red elliptical shape in Figure 9 covers the 902 ofthe total (119881mpp 119875mpp) points obtained from simulated data ofthe previous day Table 2 shows the percentage of selectingaction that is the perturbation ΔV(119894) to the 119881pv(119894) by theRLMPPT The percentages of each RL agent choosing actionin early phase middle phase and final phase of the RLMPPTmethod are shown in the second third and fourth rowrespectively of Table 2

It can be seen from Table 2 that the percentages ofselecting fine tuning actions that is the perturbation ΔV(119894)is plusmn05 V increase from 20 to 44 and finally to 72respectively for the early phase middle phase and finalphase of MPP tracking via the RL agent This table againillustrated the fact that the learning agent fast learned thestrategy in selecting the appropriate action toward reachingthe MPPT and hence the goal of tracking the MPP of thePV array is achieved by the RL agent However due tothe varied weather condition on sun occultation by clouds

International Journal of Photoenergy 9

0102030405060708090

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(a)

0

5

10

15

20

25

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(b)

0102030405060708090

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(c)

Figure 10 The maximum power point minus the predicted current power point corresponds to the same sensing time (a) PampO method(b) open-circuit voltage method and (c) RLMPPT method

the percentage of selecting fine-tuning actions is somewhatvaried a little bit in comparison with the results obtainedin Table 1 whose simulated data are generated by Gaussiandistribution function without sun occultation effect

Experimental results of the offsets between the calculatedand the tracking MPP by the comparing methods of PampOthe open-circuit voltage and the RLMPPT at the samesensing time are shown in Figures 10(a) 10(b) and 10(c)respectively Experiment data from Figure 11 again exhibitedthat among the three comparing methods the open-circuitvoltage method obtained the largest and sparsely distributedoffsets data which are concentrated around 15W Eventhough the offsets obtained by the PampO method fall largelybetween 0 and 5W however large portion of offset obtainedby the PampOmethod scattered around 1 to 40W before the 50minutes of the experiment On the other hand the proposedRLMPPT method achieves the least and condensed offsetsbelow 5W andmostly close to 3W in the final tracking phaseafter 200 minutes

53 Results of the Real Environment Data Real weather datafor PV array recorded in April 2014 at Loyola MarymountUniversity California USA is obtained online fromNationalRenewal Energy Laboratory (NREL) database for testing theRLMPPTmethod under real environment dataThe databaseis selected because the geographical location of the sensing

Table 3The percentage of choosing action in different phase for theexperiment with real weather data

Interval (minutes) Action (V)+5 +2 +05 minus05 minus2 minus5

Early 0sim25 24 16 8 12 12 28Middle 100sim125 16 16 12 24 24 8Final 215sim240 4 16 32 32 12 4

station is also located in the subtropical area A period ofrecorded data from 1000 to 1400 for 5 consecutive days isshown in Figure 6(c) and the hitting zone reward functionfor the data from earlier days is shown in Figure 11(a) Thered elliptical shape in Figure 11(a) covers the 934of the total(119881mpp 119875mpp) points obtained from the previous 5 consecutivedays of the NREL real weather data for testing Figures11(b) and 11(c) respectively show the real temperature andirradiance data recorded at 04012014 for generating the testdata

Table 3 shows the percentage of selecting action bythe RLMPPT The percentages of each RL agent choosingaction in early phase (0sim25 minutes) middle phase (100sim125minutes) and final phase (215sim240 minutes) of the RLMPPTmethod are shown in the second third and fourth rowsrespectively of Table 3

10 International Journal of Photoenergy

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(a)

15

155

16

165

17

175

0 50 100 150 200 250Time (min)

Tem

pera

ture

(∘C)

(b)

900

910

920

930

940

950

960

970

0 50 100 150 200 250Time (min)

Irra

dian

ce (W

m2)

(c)

Figure 11 (a) The definition of hitting zone for the experiment using real weather data (b) the real temperature data recorded in 04012014and used for generating the test data and (c) the real irradiance data recorded at 04012014 and used for generating the test data

In Table 3 one can see that the percentage of selectingfine tuning actions that is the perturbation ΔV(119894) is plusmn05 Vincrease from 20 to 36 and finally to 64 respectivelyfor the early phase middle phase and final phase of MPPtracking via the RL agent Even though the percentage ofselecting fine tuning actions in this real data experiment hasthe least value among the three experiments it exhibits thatthe RLMPPT learns to exercise the appropriate action of theperturbationΔV(119894) in tracking theMPP of the PV array underreal weather data This table again illustrated the fact thatthe learning agent fast learned the strategy in selecting theappropriate action toward reaching the MPP and hence thegoal of tracking the MPP of the PV array is achieved by theRL agent

Experimental results of the offsets between the MPPTand the predicted MPPT by the comparing methods at thesensing time for the simulation data generated by the realweather data are shown in Figure 12 Figures 12(a) 12(b) and12(c) respectively show the offsets between the calculatedMPPT and the tracking MPPT by the PampO method theopen-circuit voltage method and the RLMPPT methodExperiment data from Figure 12 again shows that among thethree comparing methods the open-circuit voltage methodobtained the largest and sparsely distributed offsets datawhose distributions are concentrated within a band cover bytwoGaussian distribution functionswith themaximumoffset

value of 197W The offsets obtained by the PampO methodfall largely around 5 and 25W however large portion ofoffset obtained by the PampO method sharply decreased from1 to 40W in the early 70 minutes of the experiment By theobservation of Figure 12(c) the proposed RLMPPT methodachieves the least and condenses offsets near 1 or 2W andnone of the offsets is higher than 5W after 5 simulationminutes from the beginning

54 Performance Comparison with Efficiency Factor In orderto validate whether the RLMPPTmethod is effective or not intracking MPP of a PV array the efficiency factor 120578 is used tocompare the performance of other existing methods for thethree experiments The 120578 is defined as follows

120578mppt =int119905

0119875actual (119905) 119889119905

int119905

0119875max (119905) 119889119905

(16)

where 119875actual(119905) and 119875max(119905) respectively represent the track-ing MPP of the MPPT method and the calculated MPPTable 4 shows that for the three test data sets the open-circuit voltage method has the least efficiency factor amongthe three comparing methods and the RLMPPT method hasthe best efficiency factor which is slightly better than that ofPampO method The advantage of the RLMPPT method overthe PampOmethod is shown in Figures 10 and 12where not only

International Journal of Photoenergy 11

020406080

100120140160

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(a)

186187188189

19191192193194195196

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(b)

0

50

100

150

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(c)

Figure 12 The maximum power point minus the predicted current power point corresponds to the same sensing time (a) PampO method(b) open-circuit voltage method and (c) RLMPPT method

Table 4 Comparison of efficiency factor for the three comparingmethods

Learning phase (min) MethodsOpen-circuit voltage PampO RLMPPT

Early 1000sim1025 869 835 892Middle 1140sim1205 875 978 994Final 1335sim1400 878 977 993Overall 1000sim1400 876 954 985

the RLMPPT method significantly improves the efficiencyfactor in comparing that of the PampO but also the learningagent of the RLMPPT fast learned the strategy in selecting theappropriate action toward reaching theMPPTwhich is muchfaster than the PampO method in exhibiting a slow adaptivephase Hence the RLMPPT not only improves the efficiencyfactor in tracking the MPP of the PV array but also has thefast learning capability in achieving the task of MPPT of thePV array

6 Conclusions

In this study a reinforcement learning-based maximumpower point tracking (RLMPPT) method is proposed forPV arrayThe RLMPPTmethod monitors the environmental

state of the PV array and adjusts the perturbation to theoperating voltage of the PV array in achieving the best MPPSimulations of the proposed RLMPPT for a PV array areconducted on three kinds of data set which are simulatedGaussian weather data simulated Gaussian weather dataadded with sun occultation effect and real weather data fromNREL database Experimental results demonstrate that incomparison to the existing PampO method the RLMPPT notonly achieves better efficiency factor for both simulated andreal weather data sets but also adapts to the environmentmuch fast with very short learning time Further the rein-forcement learning-basedMPPTmethodwould be employedin the real PV array to validate the effectiveness of theproposed novel MPPT method

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] T Esram andP L Chapman ldquoComparison of photovoltaic arraymaximum power point tracking techniquesrdquo IEEE Transactionson Energy Conversion vol 22 no 2 pp 439ndash449 2007

12 International Journal of Photoenergy

[2] D P Hohm and M E Ropp ldquoComparative study of maximumpower point tracking algorithmsrdquo Progress in PhotovoltaicsResearch and Applications vol 11 no 1 pp 47ndash62 2003

[3] N Femia G Petrone G Spagnuolo and M Vitelli ldquoOptimiza-tion of perturb and observe maximum power point trackingmethodrdquo IEEE Transactions on Power Electronics vol 20 no4 pp 963ndash973 2005

[4] J S Kumari D C S Babu andA K Babu ldquoDesign and analysisof PO and IPO MPPT technique for photovoltaic systemrdquoInternational Journal ofModern Engineering Research vol 2 no4 pp 2174ndash2180 2012

[5] D P Hohm and M E Ropp ldquoComparative study of maximumpower point tracking algorithms using an experimental pro-grammable maximum power point tracking test bedrdquo in Pro-ceedings of the Conference Record of the IEEE 28th PhotovoltaicSpecialists Conference pp 1699ndash1702 2000

[6] L-R Chen C-H Tsai Y-L Lin and Y-S Lai ldquoA biologicalswarm chasing algorithm for tracking the PVmaximum powerpointrdquo IEEE Transactions on Energy Conversion vol 25 no 2pp 484ndash493 2010

[7] T L Kottas Y S Boutalis and A D Karlis ldquoNew maximumpower point tracker for PV arrays using fuzzy controller in closecooperation with fuzzy cognitive networksrdquo IEEE Transactionson Energy Conversion vol 21 no 3 pp 793ndash803 2006

[8] N Mutoh M Ohno and T Inoue ldquoA method for MPPT con-trol while searching for parameters corresponding to weatherconditions for PV generation systemsrdquo IEEE Transactions onIndustrial Electronics vol 53 no 4 pp 1055ndash1065 2006

[9] G C Hsieh H I Hsieh C Y Tsai and C H Wang ldquoPhotovol-taic power-increment-aided incremental-conductance MPPTwith two-phased trackingrdquo IEEE Transactions on Power Elec-tronics vol 28 no 6 pp 2895ndash2911 2013

[10] R L Mueller M T Wallace and P Iles ldquoScaling nominal solarcell impedances for array designrdquo in Proceedings of the IEEE 1stWorld Conference on Photovoltaic Energy Conversion vol 2 pp2034ndash2037 December 1994

[11] N Femia G Petrone G Spagnuolo andMVitelli ldquoOptimizingsampling rate of PampO MPPT techniquerdquo in Proceedings ofthe IEEE 35th Annual Power Electronics Specialists Conference(PESC rsquo04) vol 3 pp 1945ndash1949 June 2004

[12] N Femia G PetroneG Spagnuolo andMVitelli ldquoA techniquefor improving PampOMPPT performances of double-stage grid-connected photovoltaic systemsrdquo IEEE Transactions on Indus-trial Electronics vol 56 no 11 pp 4473ndash4482 2009

[13] K H Hussein I Muta T Hoshino and M Osakada ldquoMax-imum photovoltaic power tracking an algorithm for rapidlychanging atmospheric conditionsrdquo IEE ProceedingsmdashGenera-tion Transmission and Distribution vol 142 no 1 pp 59ndash641995

[14] C Hua J Lin and C Shen ldquoImplementation of a DSP-controlled photovoltaic systemwith peak power trackingrdquo IEEETransactions on Industrial Electronics vol 45 no 1 pp 99ndash1071998

[15] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996

[16] A G Barto Reinforcement Learning An Introduction MITPress 1998

[17] C JWatkins and PDayan ldquoQ-learningrdquoMachine Learning vol8 no 3-4 pp 279ndash292 1992

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Inorganic ChemistryInternational Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

International Journal ofPhotoenergy

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Carbohydrate Chemistry

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Physical Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom

Analytical Methods in Chemistry

Journal of

Volume 2014

Bioinorganic Chemistry and ApplicationsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

SpectroscopyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Medicinal ChemistryInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chromatography Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied ChemistryJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Theoretical ChemistryJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Spectroscopy

Analytical ChemistryInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Quantum Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Organic Chemistry International

ElectrochemistryInternational Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CatalystsJournal of

Page 8: Research Article A Reinforcement Learning-Based Maximum ...downloads.hindawi.com/journals/ijp/2015/496401.pdf · Research Article A Reinforcement Learning-Based Maximum Power Point

8 International Journal of Photoenergy

020406080

100120140

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(a)

0 50 100 150 200 250Time (min)

0

5

10

15

20

25

PmaxminusP

actu

al(W

)

(b)

0 50 100 150 200 250Time (min)

0

5

10

15

20

25

PmaxminusP

actu

al(W

)

(c)

Figure 8 The maximum power point minus the predicted current power point corresponds to the same sensing time (a) PampO method(b) open-circuit voltage method and (c) RLMPPT method

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

Figure 9Thedefinition of hitting zone for the 2nd set of experimentdata

below 5W and only small portions fall outside of 5W evenin the early learning phase

52 Results of the Gaussian Distribution Function GeneratedEnvironment Data with Sun Occultation by Clouds In thisexperiment the simulated data are obtained by adding a 30chance of sun occultation by clouds to the test data in thefirst experiment such that the temperature will fall down0 to 3∘C and the irradiance will decrease 0 to 300Wm2The experiment is conducted to illustrate the capability ofRLMPPT in tracking the MPP of PV array under variedweather condition Figure 9 shows the hitting zone definition

Table 2 The percentage of choosing action in different phase

Interval (minutes) Action (V)+5 +2 +05 minus05 minus2 minus5

Early 0sim25 28 8 12 8 12 32Middle 100sim125 8 16 24 20 20 12Final 215sim240 4 12 40 32 8 4

for the simulated data with added sun occultation by cloudsto the Gaussian distribution function generated environmentdata The red elliptical shape in Figure 9 covers the 902 ofthe total (119881mpp 119875mpp) points obtained from simulated data ofthe previous day Table 2 shows the percentage of selectingaction that is the perturbation ΔV(119894) to the 119881pv(119894) by theRLMPPT The percentages of each RL agent choosing actionin early phase middle phase and final phase of the RLMPPTmethod are shown in the second third and fourth rowrespectively of Table 2

It can be seen from Table 2 that the percentages ofselecting fine tuning actions that is the perturbation ΔV(119894)is plusmn05 V increase from 20 to 44 and finally to 72respectively for the early phase middle phase and finalphase of MPP tracking via the RL agent This table againillustrated the fact that the learning agent fast learned thestrategy in selecting the appropriate action toward reachingthe MPPT and hence the goal of tracking the MPP of thePV array is achieved by the RL agent However due tothe varied weather condition on sun occultation by clouds

International Journal of Photoenergy 9

0102030405060708090

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(a)

0

5

10

15

20

25

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(b)

0102030405060708090

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(c)

Figure 10 The maximum power point minus the predicted current power point corresponds to the same sensing time (a) PampO method(b) open-circuit voltage method and (c) RLMPPT method

the percentage of selecting fine-tuning actions is somewhatvaried a little bit in comparison with the results obtainedin Table 1 whose simulated data are generated by Gaussiandistribution function without sun occultation effect

Experimental results of the offsets between the calculatedand the tracking MPP by the comparing methods of PampOthe open-circuit voltage and the RLMPPT at the samesensing time are shown in Figures 10(a) 10(b) and 10(c)respectively Experiment data from Figure 11 again exhibitedthat among the three comparing methods the open-circuitvoltage method obtained the largest and sparsely distributedoffsets data which are concentrated around 15W Eventhough the offsets obtained by the PampO method fall largelybetween 0 and 5W however large portion of offset obtainedby the PampOmethod scattered around 1 to 40W before the 50minutes of the experiment On the other hand the proposedRLMPPT method achieves the least and condensed offsetsbelow 5W andmostly close to 3W in the final tracking phaseafter 200 minutes

53 Results of the Real Environment Data Real weather datafor PV array recorded in April 2014 at Loyola MarymountUniversity California USA is obtained online fromNationalRenewal Energy Laboratory (NREL) database for testing theRLMPPTmethod under real environment dataThe databaseis selected because the geographical location of the sensing

Table 3The percentage of choosing action in different phase for theexperiment with real weather data

Interval (minutes) Action (V)+5 +2 +05 minus05 minus2 minus5

Early 0sim25 24 16 8 12 12 28Middle 100sim125 16 16 12 24 24 8Final 215sim240 4 16 32 32 12 4

station is also located in the subtropical area A period ofrecorded data from 1000 to 1400 for 5 consecutive days isshown in Figure 6(c) and the hitting zone reward functionfor the data from earlier days is shown in Figure 11(a) Thered elliptical shape in Figure 11(a) covers the 934of the total(119881mpp 119875mpp) points obtained from the previous 5 consecutivedays of the NREL real weather data for testing Figures11(b) and 11(c) respectively show the real temperature andirradiance data recorded at 04012014 for generating the testdata

Table 3 shows the percentage of selecting action bythe RLMPPT The percentages of each RL agent choosingaction in early phase (0sim25 minutes) middle phase (100sim125minutes) and final phase (215sim240 minutes) of the RLMPPTmethod are shown in the second third and fourth rowsrespectively of Table 3

10 International Journal of Photoenergy

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(a)

15

155

16

165

17

175

0 50 100 150 200 250Time (min)

Tem

pera

ture

(∘C)

(b)

900

910

920

930

940

950

960

970

0 50 100 150 200 250Time (min)

Irra

dian

ce (W

m2)

(c)

Figure 11 (a) The definition of hitting zone for the experiment using real weather data (b) the real temperature data recorded in 04012014and used for generating the test data and (c) the real irradiance data recorded at 04012014 and used for generating the test data

In Table 3 one can see that the percentage of selectingfine tuning actions that is the perturbation ΔV(119894) is plusmn05 Vincrease from 20 to 36 and finally to 64 respectivelyfor the early phase middle phase and final phase of MPPtracking via the RL agent Even though the percentage ofselecting fine tuning actions in this real data experiment hasthe least value among the three experiments it exhibits thatthe RLMPPT learns to exercise the appropriate action of theperturbationΔV(119894) in tracking theMPP of the PV array underreal weather data This table again illustrated the fact thatthe learning agent fast learned the strategy in selecting theappropriate action toward reaching the MPP and hence thegoal of tracking the MPP of the PV array is achieved by theRL agent

Experimental results of the offsets between the MPPTand the predicted MPPT by the comparing methods at thesensing time for the simulation data generated by the realweather data are shown in Figure 12 Figures 12(a) 12(b) and12(c) respectively show the offsets between the calculatedMPPT and the tracking MPPT by the PampO method theopen-circuit voltage method and the RLMPPT methodExperiment data from Figure 12 again shows that among thethree comparing methods the open-circuit voltage methodobtained the largest and sparsely distributed offsets datawhose distributions are concentrated within a band cover bytwoGaussian distribution functionswith themaximumoffset

value of 197W The offsets obtained by the PampO methodfall largely around 5 and 25W however large portion ofoffset obtained by the PampO method sharply decreased from1 to 40W in the early 70 minutes of the experiment By theobservation of Figure 12(c) the proposed RLMPPT methodachieves the least and condenses offsets near 1 or 2W andnone of the offsets is higher than 5W after 5 simulationminutes from the beginning

54 Performance Comparison with Efficiency Factor In orderto validate whether the RLMPPTmethod is effective or not intracking MPP of a PV array the efficiency factor 120578 is used tocompare the performance of other existing methods for thethree experiments The 120578 is defined as follows

120578mppt =int119905

0119875actual (119905) 119889119905

int119905

0119875max (119905) 119889119905

(16)

where 119875actual(119905) and 119875max(119905) respectively represent the track-ing MPP of the MPPT method and the calculated MPPTable 4 shows that for the three test data sets the open-circuit voltage method has the least efficiency factor amongthe three comparing methods and the RLMPPT method hasthe best efficiency factor which is slightly better than that ofPampO method The advantage of the RLMPPT method overthe PampOmethod is shown in Figures 10 and 12where not only

International Journal of Photoenergy 11

020406080

100120140160

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(a)

186187188189

19191192193194195196

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(b)

0

50

100

150

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(c)

Figure 12 The maximum power point minus the predicted current power point corresponds to the same sensing time (a) PampO method(b) open-circuit voltage method and (c) RLMPPT method

Table 4 Comparison of efficiency factor for the three comparingmethods

Learning phase (min) MethodsOpen-circuit voltage PampO RLMPPT

Early 1000sim1025 869 835 892Middle 1140sim1205 875 978 994Final 1335sim1400 878 977 993Overall 1000sim1400 876 954 985

the RLMPPT method significantly improves the efficiencyfactor in comparing that of the PampO but also the learningagent of the RLMPPT fast learned the strategy in selecting theappropriate action toward reaching theMPPTwhich is muchfaster than the PampO method in exhibiting a slow adaptivephase Hence the RLMPPT not only improves the efficiencyfactor in tracking the MPP of the PV array but also has thefast learning capability in achieving the task of MPPT of thePV array

6 Conclusions

In this study a reinforcement learning-based maximumpower point tracking (RLMPPT) method is proposed forPV arrayThe RLMPPTmethod monitors the environmental

state of the PV array and adjusts the perturbation to theoperating voltage of the PV array in achieving the best MPPSimulations of the proposed RLMPPT for a PV array areconducted on three kinds of data set which are simulatedGaussian weather data simulated Gaussian weather dataadded with sun occultation effect and real weather data fromNREL database Experimental results demonstrate that incomparison to the existing PampO method the RLMPPT notonly achieves better efficiency factor for both simulated andreal weather data sets but also adapts to the environmentmuch fast with very short learning time Further the rein-forcement learning-basedMPPTmethodwould be employedin the real PV array to validate the effectiveness of theproposed novel MPPT method

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] T Esram andP L Chapman ldquoComparison of photovoltaic arraymaximum power point tracking techniquesrdquo IEEE Transactionson Energy Conversion vol 22 no 2 pp 439ndash449 2007

12 International Journal of Photoenergy

[2] D P Hohm and M E Ropp ldquoComparative study of maximumpower point tracking algorithmsrdquo Progress in PhotovoltaicsResearch and Applications vol 11 no 1 pp 47ndash62 2003

[3] N Femia G Petrone G Spagnuolo and M Vitelli ldquoOptimiza-tion of perturb and observe maximum power point trackingmethodrdquo IEEE Transactions on Power Electronics vol 20 no4 pp 963ndash973 2005

[4] J S Kumari D C S Babu andA K Babu ldquoDesign and analysisof PO and IPO MPPT technique for photovoltaic systemrdquoInternational Journal ofModern Engineering Research vol 2 no4 pp 2174ndash2180 2012

[5] D P Hohm and M E Ropp ldquoComparative study of maximumpower point tracking algorithms using an experimental pro-grammable maximum power point tracking test bedrdquo in Pro-ceedings of the Conference Record of the IEEE 28th PhotovoltaicSpecialists Conference pp 1699ndash1702 2000

[6] L-R Chen C-H Tsai Y-L Lin and Y-S Lai ldquoA biologicalswarm chasing algorithm for tracking the PVmaximum powerpointrdquo IEEE Transactions on Energy Conversion vol 25 no 2pp 484ndash493 2010

[7] T L Kottas Y S Boutalis and A D Karlis ldquoNew maximumpower point tracker for PV arrays using fuzzy controller in closecooperation with fuzzy cognitive networksrdquo IEEE Transactionson Energy Conversion vol 21 no 3 pp 793ndash803 2006

[8] N Mutoh M Ohno and T Inoue ldquoA method for MPPT con-trol while searching for parameters corresponding to weatherconditions for PV generation systemsrdquo IEEE Transactions onIndustrial Electronics vol 53 no 4 pp 1055ndash1065 2006

[9] G C Hsieh H I Hsieh C Y Tsai and C H Wang ldquoPhotovol-taic power-increment-aided incremental-conductance MPPTwith two-phased trackingrdquo IEEE Transactions on Power Elec-tronics vol 28 no 6 pp 2895ndash2911 2013

[10] R L Mueller M T Wallace and P Iles ldquoScaling nominal solarcell impedances for array designrdquo in Proceedings of the IEEE 1stWorld Conference on Photovoltaic Energy Conversion vol 2 pp2034ndash2037 December 1994

[11] N Femia G Petrone G Spagnuolo andMVitelli ldquoOptimizingsampling rate of PampO MPPT techniquerdquo in Proceedings ofthe IEEE 35th Annual Power Electronics Specialists Conference(PESC rsquo04) vol 3 pp 1945ndash1949 June 2004

[12] N Femia G PetroneG Spagnuolo andMVitelli ldquoA techniquefor improving PampOMPPT performances of double-stage grid-connected photovoltaic systemsrdquo IEEE Transactions on Indus-trial Electronics vol 56 no 11 pp 4473ndash4482 2009

[13] K H Hussein I Muta T Hoshino and M Osakada ldquoMax-imum photovoltaic power tracking an algorithm for rapidlychanging atmospheric conditionsrdquo IEE ProceedingsmdashGenera-tion Transmission and Distribution vol 142 no 1 pp 59ndash641995

[14] C Hua J Lin and C Shen ldquoImplementation of a DSP-controlled photovoltaic systemwith peak power trackingrdquo IEEETransactions on Industrial Electronics vol 45 no 1 pp 99ndash1071998

[15] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996

[16] A G Barto Reinforcement Learning An Introduction MITPress 1998

[17] C JWatkins and PDayan ldquoQ-learningrdquoMachine Learning vol8 no 3-4 pp 279ndash292 1992

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Inorganic ChemistryInternational Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

International Journal ofPhotoenergy

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Carbohydrate Chemistry

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Physical Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom

Analytical Methods in Chemistry

Journal of

Volume 2014

Bioinorganic Chemistry and ApplicationsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

SpectroscopyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Medicinal ChemistryInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chromatography Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied ChemistryJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Theoretical ChemistryJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Spectroscopy

Analytical ChemistryInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Quantum Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Organic Chemistry International

ElectrochemistryInternational Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CatalystsJournal of

Page 9: Research Article A Reinforcement Learning-Based Maximum ...downloads.hindawi.com/journals/ijp/2015/496401.pdf · Research Article A Reinforcement Learning-Based Maximum Power Point

International Journal of Photoenergy 9

0102030405060708090

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(a)

0

5

10

15

20

25

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(b)

0102030405060708090

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(c)

Figure 10 The maximum power point minus the predicted current power point corresponds to the same sensing time (a) PampO method(b) open-circuit voltage method and (c) RLMPPT method

the percentage of selecting fine-tuning actions is somewhatvaried a little bit in comparison with the results obtainedin Table 1 whose simulated data are generated by Gaussiandistribution function without sun occultation effect

Experimental results of the offsets between the calculatedand the tracking MPP by the comparing methods of PampOthe open-circuit voltage and the RLMPPT at the samesensing time are shown in Figures 10(a) 10(b) and 10(c)respectively Experiment data from Figure 11 again exhibitedthat among the three comparing methods the open-circuitvoltage method obtained the largest and sparsely distributedoffsets data which are concentrated around 15W Eventhough the offsets obtained by the PampO method fall largelybetween 0 and 5W however large portion of offset obtainedby the PampOmethod scattered around 1 to 40W before the 50minutes of the experiment On the other hand the proposedRLMPPT method achieves the least and condensed offsetsbelow 5W andmostly close to 3W in the final tracking phaseafter 200 minutes

53 Results of the Real Environment Data Real weather datafor PV array recorded in April 2014 at Loyola MarymountUniversity California USA is obtained online fromNationalRenewal Energy Laboratory (NREL) database for testing theRLMPPTmethod under real environment dataThe databaseis selected because the geographical location of the sensing

Table 3The percentage of choosing action in different phase for theexperiment with real weather data

Interval (minutes) Action (V)+5 +2 +05 minus05 minus2 minus5

Early 0sim25 24 16 8 12 12 28Middle 100sim125 16 16 12 24 24 8Final 215sim240 4 16 32 32 12 4

station is also located in the subtropical area A period ofrecorded data from 1000 to 1400 for 5 consecutive days isshown in Figure 6(c) and the hitting zone reward functionfor the data from earlier days is shown in Figure 11(a) Thered elliptical shape in Figure 11(a) covers the 934of the total(119881mpp 119875mpp) points obtained from the previous 5 consecutivedays of the NREL real weather data for testing Figures11(b) and 11(c) respectively show the real temperature andirradiance data recorded at 04012014 for generating the testdata

Table 3 shows the percentage of selecting action bythe RLMPPT The percentages of each RL agent choosingaction in early phase (0sim25 minutes) middle phase (100sim125minutes) and final phase (215sim240 minutes) of the RLMPPTmethod are shown in the second third and fourth rowsrespectively of Table 3

10 International Journal of Photoenergy

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(a)

15

155

16

165

17

175

0 50 100 150 200 250Time (min)

Tem

pera

ture

(∘C)

(b)

900

910

920

930

940

950

960

970

0 50 100 150 200 250Time (min)

Irra

dian

ce (W

m2)

(c)

Figure 11 (a) The definition of hitting zone for the experiment using real weather data (b) the real temperature data recorded in 04012014and used for generating the test data and (c) the real irradiance data recorded at 04012014 and used for generating the test data

In Table 3 one can see that the percentage of selectingfine tuning actions that is the perturbation ΔV(119894) is plusmn05 Vincrease from 20 to 36 and finally to 64 respectivelyfor the early phase middle phase and final phase of MPPtracking via the RL agent Even though the percentage ofselecting fine tuning actions in this real data experiment hasthe least value among the three experiments it exhibits thatthe RLMPPT learns to exercise the appropriate action of theperturbationΔV(119894) in tracking theMPP of the PV array underreal weather data This table again illustrated the fact thatthe learning agent fast learned the strategy in selecting theappropriate action toward reaching the MPP and hence thegoal of tracking the MPP of the PV array is achieved by theRL agent

Experimental results of the offsets between the MPPTand the predicted MPPT by the comparing methods at thesensing time for the simulation data generated by the realweather data are shown in Figure 12 Figures 12(a) 12(b) and12(c) respectively show the offsets between the calculatedMPPT and the tracking MPPT by the PampO method theopen-circuit voltage method and the RLMPPT methodExperiment data from Figure 12 again shows that among thethree comparing methods the open-circuit voltage methodobtained the largest and sparsely distributed offsets datawhose distributions are concentrated within a band cover bytwoGaussian distribution functionswith themaximumoffset

value of 197W The offsets obtained by the PampO methodfall largely around 5 and 25W however large portion ofoffset obtained by the PampO method sharply decreased from1 to 40W in the early 70 minutes of the experiment By theobservation of Figure 12(c) the proposed RLMPPT methodachieves the least and condenses offsets near 1 or 2W andnone of the offsets is higher than 5W after 5 simulationminutes from the beginning

54 Performance Comparison with Efficiency Factor In orderto validate whether the RLMPPTmethod is effective or not intracking MPP of a PV array the efficiency factor 120578 is used tocompare the performance of other existing methods for thethree experiments The 120578 is defined as follows

120578mppt =int119905

0119875actual (119905) 119889119905

int119905

0119875max (119905) 119889119905

(16)

where 119875actual(119905) and 119875max(119905) respectively represent the track-ing MPP of the MPPT method and the calculated MPPTable 4 shows that for the three test data sets the open-circuit voltage method has the least efficiency factor amongthe three comparing methods and the RLMPPT method hasthe best efficiency factor which is slightly better than that ofPampO method The advantage of the RLMPPT method overthe PampOmethod is shown in Figures 10 and 12where not only

International Journal of Photoenergy 11

020406080

100120140160

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(a)

186187188189

19191192193194195196

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(b)

0

50

100

150

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(c)

Figure 12 The maximum power point minus the predicted current power point corresponds to the same sensing time (a) PampO method(b) open-circuit voltage method and (c) RLMPPT method

Table 4 Comparison of efficiency factor for the three comparingmethods

Learning phase (min) MethodsOpen-circuit voltage PampO RLMPPT

Early 1000sim1025 869 835 892Middle 1140sim1205 875 978 994Final 1335sim1400 878 977 993Overall 1000sim1400 876 954 985

the RLMPPT method significantly improves the efficiencyfactor in comparing that of the PampO but also the learningagent of the RLMPPT fast learned the strategy in selecting theappropriate action toward reaching theMPPTwhich is muchfaster than the PampO method in exhibiting a slow adaptivephase Hence the RLMPPT not only improves the efficiencyfactor in tracking the MPP of the PV array but also has thefast learning capability in achieving the task of MPPT of thePV array

6 Conclusions

In this study a reinforcement learning-based maximumpower point tracking (RLMPPT) method is proposed forPV arrayThe RLMPPTmethod monitors the environmental

state of the PV array and adjusts the perturbation to theoperating voltage of the PV array in achieving the best MPPSimulations of the proposed RLMPPT for a PV array areconducted on three kinds of data set which are simulatedGaussian weather data simulated Gaussian weather dataadded with sun occultation effect and real weather data fromNREL database Experimental results demonstrate that incomparison to the existing PampO method the RLMPPT notonly achieves better efficiency factor for both simulated andreal weather data sets but also adapts to the environmentmuch fast with very short learning time Further the rein-forcement learning-basedMPPTmethodwould be employedin the real PV array to validate the effectiveness of theproposed novel MPPT method

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] T Esram andP L Chapman ldquoComparison of photovoltaic arraymaximum power point tracking techniquesrdquo IEEE Transactionson Energy Conversion vol 22 no 2 pp 439ndash449 2007

12 International Journal of Photoenergy

[2] D P Hohm and M E Ropp ldquoComparative study of maximumpower point tracking algorithmsrdquo Progress in PhotovoltaicsResearch and Applications vol 11 no 1 pp 47ndash62 2003

[3] N Femia G Petrone G Spagnuolo and M Vitelli ldquoOptimiza-tion of perturb and observe maximum power point trackingmethodrdquo IEEE Transactions on Power Electronics vol 20 no4 pp 963ndash973 2005

[4] J S Kumari D C S Babu andA K Babu ldquoDesign and analysisof PO and IPO MPPT technique for photovoltaic systemrdquoInternational Journal ofModern Engineering Research vol 2 no4 pp 2174ndash2180 2012

[5] D P Hohm and M E Ropp ldquoComparative study of maximumpower point tracking algorithms using an experimental pro-grammable maximum power point tracking test bedrdquo in Pro-ceedings of the Conference Record of the IEEE 28th PhotovoltaicSpecialists Conference pp 1699ndash1702 2000

[6] L-R Chen C-H Tsai Y-L Lin and Y-S Lai ldquoA biologicalswarm chasing algorithm for tracking the PVmaximum powerpointrdquo IEEE Transactions on Energy Conversion vol 25 no 2pp 484ndash493 2010

[7] T L Kottas Y S Boutalis and A D Karlis ldquoNew maximumpower point tracker for PV arrays using fuzzy controller in closecooperation with fuzzy cognitive networksrdquo IEEE Transactionson Energy Conversion vol 21 no 3 pp 793ndash803 2006

[8] N Mutoh M Ohno and T Inoue ldquoA method for MPPT con-trol while searching for parameters corresponding to weatherconditions for PV generation systemsrdquo IEEE Transactions onIndustrial Electronics vol 53 no 4 pp 1055ndash1065 2006

[9] G C Hsieh H I Hsieh C Y Tsai and C H Wang ldquoPhotovol-taic power-increment-aided incremental-conductance MPPTwith two-phased trackingrdquo IEEE Transactions on Power Elec-tronics vol 28 no 6 pp 2895ndash2911 2013

[10] R L Mueller M T Wallace and P Iles ldquoScaling nominal solarcell impedances for array designrdquo in Proceedings of the IEEE 1stWorld Conference on Photovoltaic Energy Conversion vol 2 pp2034ndash2037 December 1994

[11] N Femia G Petrone G Spagnuolo andMVitelli ldquoOptimizingsampling rate of PampO MPPT techniquerdquo in Proceedings ofthe IEEE 35th Annual Power Electronics Specialists Conference(PESC rsquo04) vol 3 pp 1945ndash1949 June 2004

[12] N Femia G PetroneG Spagnuolo andMVitelli ldquoA techniquefor improving PampOMPPT performances of double-stage grid-connected photovoltaic systemsrdquo IEEE Transactions on Indus-trial Electronics vol 56 no 11 pp 4473ndash4482 2009

[13] K H Hussein I Muta T Hoshino and M Osakada ldquoMax-imum photovoltaic power tracking an algorithm for rapidlychanging atmospheric conditionsrdquo IEE ProceedingsmdashGenera-tion Transmission and Distribution vol 142 no 1 pp 59ndash641995

[14] C Hua J Lin and C Shen ldquoImplementation of a DSP-controlled photovoltaic systemwith peak power trackingrdquo IEEETransactions on Industrial Electronics vol 45 no 1 pp 99ndash1071998

[15] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996

[16] A G Barto Reinforcement Learning An Introduction MITPress 1998

[17] C JWatkins and PDayan ldquoQ-learningrdquoMachine Learning vol8 no 3-4 pp 279ndash292 1992

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Inorganic ChemistryInternational Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

International Journal ofPhotoenergy

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Carbohydrate Chemistry

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Physical Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom

Analytical Methods in Chemistry

Journal of

Volume 2014

Bioinorganic Chemistry and ApplicationsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

SpectroscopyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Medicinal ChemistryInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chromatography Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied ChemistryJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Theoretical ChemistryJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Spectroscopy

Analytical ChemistryInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Quantum Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Organic Chemistry International

ElectrochemistryInternational Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CatalystsJournal of

Page 10: Research Article A Reinforcement Learning-Based Maximum ...downloads.hindawi.com/journals/ijp/2015/496401.pdf · Research Article A Reinforcement Learning-Based Maximum Power Point

10 International Journal of Photoenergy

0

50

100

150

200

250

28 29 30 31 32 33 34 35 36 37

Pmax

(W)

Vmax

(V)

(a)

15

155

16

165

17

175

0 50 100 150 200 250Time (min)

Tem

pera

ture

(∘C)

(b)

900

910

920

930

940

950

960

970

0 50 100 150 200 250Time (min)

Irra

dian

ce (W

m2)

(c)

Figure 11 (a) The definition of hitting zone for the experiment using real weather data (b) the real temperature data recorded in 04012014and used for generating the test data and (c) the real irradiance data recorded at 04012014 and used for generating the test data

In Table 3 one can see that the percentage of selectingfine tuning actions that is the perturbation ΔV(119894) is plusmn05 Vincrease from 20 to 36 and finally to 64 respectivelyfor the early phase middle phase and final phase of MPPtracking via the RL agent Even though the percentage ofselecting fine tuning actions in this real data experiment hasthe least value among the three experiments it exhibits thatthe RLMPPT learns to exercise the appropriate action of theperturbationΔV(119894) in tracking theMPP of the PV array underreal weather data This table again illustrated the fact thatthe learning agent fast learned the strategy in selecting theappropriate action toward reaching the MPP and hence thegoal of tracking the MPP of the PV array is achieved by theRL agent

Experimental results of the offsets between the MPPTand the predicted MPPT by the comparing methods at thesensing time for the simulation data generated by the realweather data are shown in Figure 12 Figures 12(a) 12(b) and12(c) respectively show the offsets between the calculatedMPPT and the tracking MPPT by the PampO method theopen-circuit voltage method and the RLMPPT methodExperiment data from Figure 12 again shows that among thethree comparing methods the open-circuit voltage methodobtained the largest and sparsely distributed offsets datawhose distributions are concentrated within a band cover bytwoGaussian distribution functionswith themaximumoffset

value of 197W The offsets obtained by the PampO methodfall largely around 5 and 25W however large portion ofoffset obtained by the PampO method sharply decreased from1 to 40W in the early 70 minutes of the experiment By theobservation of Figure 12(c) the proposed RLMPPT methodachieves the least and condenses offsets near 1 or 2W andnone of the offsets is higher than 5W after 5 simulationminutes from the beginning

54 Performance Comparison with Efficiency Factor In orderto validate whether the RLMPPTmethod is effective or not intracking MPP of a PV array the efficiency factor 120578 is used tocompare the performance of other existing methods for thethree experiments The 120578 is defined as follows

120578mppt =int119905

0119875actual (119905) 119889119905

int119905

0119875max (119905) 119889119905

(16)

where 119875actual(119905) and 119875max(119905) respectively represent the track-ing MPP of the MPPT method and the calculated MPPTable 4 shows that for the three test data sets the open-circuit voltage method has the least efficiency factor amongthe three comparing methods and the RLMPPT method hasthe best efficiency factor which is slightly better than that ofPampO method The advantage of the RLMPPT method overthe PampOmethod is shown in Figures 10 and 12where not only

International Journal of Photoenergy 11

020406080

100120140160

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(a)

186187188189

19191192193194195196

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(b)

0

50

100

150

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(c)

Figure 12 The maximum power point minus the predicted current power point corresponds to the same sensing time (a) PampO method(b) open-circuit voltage method and (c) RLMPPT method

Table 4 Comparison of efficiency factor for the three comparingmethods

Learning phase (min) MethodsOpen-circuit voltage PampO RLMPPT

Early 1000sim1025 869 835 892Middle 1140sim1205 875 978 994Final 1335sim1400 878 977 993Overall 1000sim1400 876 954 985

the RLMPPT method significantly improves the efficiencyfactor in comparing that of the PampO but also the learningagent of the RLMPPT fast learned the strategy in selecting theappropriate action toward reaching theMPPTwhich is muchfaster than the PampO method in exhibiting a slow adaptivephase Hence the RLMPPT not only improves the efficiencyfactor in tracking the MPP of the PV array but also has thefast learning capability in achieving the task of MPPT of thePV array

6 Conclusions

In this study a reinforcement learning-based maximumpower point tracking (RLMPPT) method is proposed forPV arrayThe RLMPPTmethod monitors the environmental

state of the PV array and adjusts the perturbation to theoperating voltage of the PV array in achieving the best MPPSimulations of the proposed RLMPPT for a PV array areconducted on three kinds of data set which are simulatedGaussian weather data simulated Gaussian weather dataadded with sun occultation effect and real weather data fromNREL database Experimental results demonstrate that incomparison to the existing PampO method the RLMPPT notonly achieves better efficiency factor for both simulated andreal weather data sets but also adapts to the environmentmuch fast with very short learning time Further the rein-forcement learning-basedMPPTmethodwould be employedin the real PV array to validate the effectiveness of theproposed novel MPPT method

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] T Esram andP L Chapman ldquoComparison of photovoltaic arraymaximum power point tracking techniquesrdquo IEEE Transactionson Energy Conversion vol 22 no 2 pp 439ndash449 2007

12 International Journal of Photoenergy

[2] D P Hohm and M E Ropp ldquoComparative study of maximumpower point tracking algorithmsrdquo Progress in PhotovoltaicsResearch and Applications vol 11 no 1 pp 47ndash62 2003

[3] N Femia G Petrone G Spagnuolo and M Vitelli ldquoOptimiza-tion of perturb and observe maximum power point trackingmethodrdquo IEEE Transactions on Power Electronics vol 20 no4 pp 963ndash973 2005

[4] J S Kumari D C S Babu andA K Babu ldquoDesign and analysisof PO and IPO MPPT technique for photovoltaic systemrdquoInternational Journal ofModern Engineering Research vol 2 no4 pp 2174ndash2180 2012

[5] D P Hohm and M E Ropp ldquoComparative study of maximumpower point tracking algorithms using an experimental pro-grammable maximum power point tracking test bedrdquo in Pro-ceedings of the Conference Record of the IEEE 28th PhotovoltaicSpecialists Conference pp 1699ndash1702 2000

[6] L-R Chen C-H Tsai Y-L Lin and Y-S Lai ldquoA biologicalswarm chasing algorithm for tracking the PVmaximum powerpointrdquo IEEE Transactions on Energy Conversion vol 25 no 2pp 484ndash493 2010

[7] T L Kottas Y S Boutalis and A D Karlis ldquoNew maximumpower point tracker for PV arrays using fuzzy controller in closecooperation with fuzzy cognitive networksrdquo IEEE Transactionson Energy Conversion vol 21 no 3 pp 793ndash803 2006

[8] N Mutoh M Ohno and T Inoue ldquoA method for MPPT con-trol while searching for parameters corresponding to weatherconditions for PV generation systemsrdquo IEEE Transactions onIndustrial Electronics vol 53 no 4 pp 1055ndash1065 2006

[9] G C Hsieh H I Hsieh C Y Tsai and C H Wang ldquoPhotovol-taic power-increment-aided incremental-conductance MPPTwith two-phased trackingrdquo IEEE Transactions on Power Elec-tronics vol 28 no 6 pp 2895ndash2911 2013

[10] R L Mueller M T Wallace and P Iles ldquoScaling nominal solarcell impedances for array designrdquo in Proceedings of the IEEE 1stWorld Conference on Photovoltaic Energy Conversion vol 2 pp2034ndash2037 December 1994

[11] N Femia G Petrone G Spagnuolo andMVitelli ldquoOptimizingsampling rate of PampO MPPT techniquerdquo in Proceedings ofthe IEEE 35th Annual Power Electronics Specialists Conference(PESC rsquo04) vol 3 pp 1945ndash1949 June 2004

[12] N Femia G PetroneG Spagnuolo andMVitelli ldquoA techniquefor improving PampOMPPT performances of double-stage grid-connected photovoltaic systemsrdquo IEEE Transactions on Indus-trial Electronics vol 56 no 11 pp 4473ndash4482 2009

[13] K H Hussein I Muta T Hoshino and M Osakada ldquoMax-imum photovoltaic power tracking an algorithm for rapidlychanging atmospheric conditionsrdquo IEE ProceedingsmdashGenera-tion Transmission and Distribution vol 142 no 1 pp 59ndash641995

[14] C Hua J Lin and C Shen ldquoImplementation of a DSP-controlled photovoltaic systemwith peak power trackingrdquo IEEETransactions on Industrial Electronics vol 45 no 1 pp 99ndash1071998

[15] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996

[16] A G Barto Reinforcement Learning An Introduction MITPress 1998

[17] C JWatkins and PDayan ldquoQ-learningrdquoMachine Learning vol8 no 3-4 pp 279ndash292 1992

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Inorganic ChemistryInternational Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

International Journal ofPhotoenergy

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Carbohydrate Chemistry

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Physical Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom

Analytical Methods in Chemistry

Journal of

Volume 2014

Bioinorganic Chemistry and ApplicationsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

SpectroscopyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Medicinal ChemistryInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chromatography Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied ChemistryJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Theoretical ChemistryJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Spectroscopy

Analytical ChemistryInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Quantum Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Organic Chemistry International

ElectrochemistryInternational Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CatalystsJournal of

Page 11: Research Article A Reinforcement Learning-Based Maximum ...downloads.hindawi.com/journals/ijp/2015/496401.pdf · Research Article A Reinforcement Learning-Based Maximum Power Point

International Journal of Photoenergy 11

020406080

100120140160

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(a)

186187188189

19191192193194195196

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(b)

0

50

100

150

0 50 100 150 200 250Time (min)

PmaxminusP

actu

al(W

)

(c)

Figure 12 The maximum power point minus the predicted current power point corresponds to the same sensing time (a) PampO method(b) open-circuit voltage method and (c) RLMPPT method

Table 4 Comparison of efficiency factor for the three comparingmethods

Learning phase (min) MethodsOpen-circuit voltage PampO RLMPPT

Early 1000sim1025 869 835 892Middle 1140sim1205 875 978 994Final 1335sim1400 878 977 993Overall 1000sim1400 876 954 985

the RLMPPT method significantly improves the efficiencyfactor in comparing that of the PampO but also the learningagent of the RLMPPT fast learned the strategy in selecting theappropriate action toward reaching theMPPTwhich is muchfaster than the PampO method in exhibiting a slow adaptivephase Hence the RLMPPT not only improves the efficiencyfactor in tracking the MPP of the PV array but also has thefast learning capability in achieving the task of MPPT of thePV array

6 Conclusions

In this study a reinforcement learning-based maximumpower point tracking (RLMPPT) method is proposed forPV arrayThe RLMPPTmethod monitors the environmental

state of the PV array and adjusts the perturbation to theoperating voltage of the PV array in achieving the best MPPSimulations of the proposed RLMPPT for a PV array areconducted on three kinds of data set which are simulatedGaussian weather data simulated Gaussian weather dataadded with sun occultation effect and real weather data fromNREL database Experimental results demonstrate that incomparison to the existing PampO method the RLMPPT notonly achieves better efficiency factor for both simulated andreal weather data sets but also adapts to the environmentmuch fast with very short learning time Further the rein-forcement learning-basedMPPTmethodwould be employedin the real PV array to validate the effectiveness of theproposed novel MPPT method

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] T Esram andP L Chapman ldquoComparison of photovoltaic arraymaximum power point tracking techniquesrdquo IEEE Transactionson Energy Conversion vol 22 no 2 pp 439ndash449 2007

12 International Journal of Photoenergy

[2] D P Hohm and M E Ropp ldquoComparative study of maximumpower point tracking algorithmsrdquo Progress in PhotovoltaicsResearch and Applications vol 11 no 1 pp 47ndash62 2003

[3] N Femia G Petrone G Spagnuolo and M Vitelli ldquoOptimiza-tion of perturb and observe maximum power point trackingmethodrdquo IEEE Transactions on Power Electronics vol 20 no4 pp 963ndash973 2005

[4] J S Kumari D C S Babu andA K Babu ldquoDesign and analysisof PO and IPO MPPT technique for photovoltaic systemrdquoInternational Journal ofModern Engineering Research vol 2 no4 pp 2174ndash2180 2012

[5] D P Hohm and M E Ropp ldquoComparative study of maximumpower point tracking algorithms using an experimental pro-grammable maximum power point tracking test bedrdquo in Pro-ceedings of the Conference Record of the IEEE 28th PhotovoltaicSpecialists Conference pp 1699ndash1702 2000

[6] L-R Chen C-H Tsai Y-L Lin and Y-S Lai ldquoA biologicalswarm chasing algorithm for tracking the PVmaximum powerpointrdquo IEEE Transactions on Energy Conversion vol 25 no 2pp 484ndash493 2010

[7] T L Kottas Y S Boutalis and A D Karlis ldquoNew maximumpower point tracker for PV arrays using fuzzy controller in closecooperation with fuzzy cognitive networksrdquo IEEE Transactionson Energy Conversion vol 21 no 3 pp 793ndash803 2006

[8] N Mutoh M Ohno and T Inoue ldquoA method for MPPT con-trol while searching for parameters corresponding to weatherconditions for PV generation systemsrdquo IEEE Transactions onIndustrial Electronics vol 53 no 4 pp 1055ndash1065 2006

[9] G C Hsieh H I Hsieh C Y Tsai and C H Wang ldquoPhotovol-taic power-increment-aided incremental-conductance MPPTwith two-phased trackingrdquo IEEE Transactions on Power Elec-tronics vol 28 no 6 pp 2895ndash2911 2013

[10] R L Mueller M T Wallace and P Iles ldquoScaling nominal solarcell impedances for array designrdquo in Proceedings of the IEEE 1stWorld Conference on Photovoltaic Energy Conversion vol 2 pp2034ndash2037 December 1994

[11] N Femia G Petrone G Spagnuolo andMVitelli ldquoOptimizingsampling rate of PampO MPPT techniquerdquo in Proceedings ofthe IEEE 35th Annual Power Electronics Specialists Conference(PESC rsquo04) vol 3 pp 1945ndash1949 June 2004

[12] N Femia G PetroneG Spagnuolo andMVitelli ldquoA techniquefor improving PampOMPPT performances of double-stage grid-connected photovoltaic systemsrdquo IEEE Transactions on Indus-trial Electronics vol 56 no 11 pp 4473ndash4482 2009

[13] K H Hussein I Muta T Hoshino and M Osakada ldquoMax-imum photovoltaic power tracking an algorithm for rapidlychanging atmospheric conditionsrdquo IEE ProceedingsmdashGenera-tion Transmission and Distribution vol 142 no 1 pp 59ndash641995

[14] C Hua J Lin and C Shen ldquoImplementation of a DSP-controlled photovoltaic systemwith peak power trackingrdquo IEEETransactions on Industrial Electronics vol 45 no 1 pp 99ndash1071998

[15] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996

[16] A G Barto Reinforcement Learning An Introduction MITPress 1998

[17] C JWatkins and PDayan ldquoQ-learningrdquoMachine Learning vol8 no 3-4 pp 279ndash292 1992

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Inorganic ChemistryInternational Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

International Journal ofPhotoenergy

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Carbohydrate Chemistry

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Physical Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom

Analytical Methods in Chemistry

Journal of

Volume 2014

Bioinorganic Chemistry and ApplicationsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

SpectroscopyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Medicinal ChemistryInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chromatography Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied ChemistryJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Theoretical ChemistryJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Spectroscopy

Analytical ChemistryInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Quantum Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Organic Chemistry International

ElectrochemistryInternational Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CatalystsJournal of

Page 12: Research Article A Reinforcement Learning-Based Maximum ...downloads.hindawi.com/journals/ijp/2015/496401.pdf · Research Article A Reinforcement Learning-Based Maximum Power Point

12 International Journal of Photoenergy

[2] D P Hohm and M E Ropp ldquoComparative study of maximumpower point tracking algorithmsrdquo Progress in PhotovoltaicsResearch and Applications vol 11 no 1 pp 47ndash62 2003

[3] N Femia G Petrone G Spagnuolo and M Vitelli ldquoOptimiza-tion of perturb and observe maximum power point trackingmethodrdquo IEEE Transactions on Power Electronics vol 20 no4 pp 963ndash973 2005

[4] J S Kumari D C S Babu andA K Babu ldquoDesign and analysisof PO and IPO MPPT technique for photovoltaic systemrdquoInternational Journal ofModern Engineering Research vol 2 no4 pp 2174ndash2180 2012

[5] D P Hohm and M E Ropp ldquoComparative study of maximumpower point tracking algorithms using an experimental pro-grammable maximum power point tracking test bedrdquo in Pro-ceedings of the Conference Record of the IEEE 28th PhotovoltaicSpecialists Conference pp 1699ndash1702 2000

[6] L-R Chen C-H Tsai Y-L Lin and Y-S Lai ldquoA biologicalswarm chasing algorithm for tracking the PVmaximum powerpointrdquo IEEE Transactions on Energy Conversion vol 25 no 2pp 484ndash493 2010

[7] T L Kottas Y S Boutalis and A D Karlis ldquoNew maximumpower point tracker for PV arrays using fuzzy controller in closecooperation with fuzzy cognitive networksrdquo IEEE Transactionson Energy Conversion vol 21 no 3 pp 793ndash803 2006

[8] N Mutoh M Ohno and T Inoue ldquoA method for MPPT con-trol while searching for parameters corresponding to weatherconditions for PV generation systemsrdquo IEEE Transactions onIndustrial Electronics vol 53 no 4 pp 1055ndash1065 2006

[9] G C Hsieh H I Hsieh C Y Tsai and C H Wang ldquoPhotovol-taic power-increment-aided incremental-conductance MPPTwith two-phased trackingrdquo IEEE Transactions on Power Elec-tronics vol 28 no 6 pp 2895ndash2911 2013

[10] R L Mueller M T Wallace and P Iles ldquoScaling nominal solarcell impedances for array designrdquo in Proceedings of the IEEE 1stWorld Conference on Photovoltaic Energy Conversion vol 2 pp2034ndash2037 December 1994

[11] N Femia G Petrone G Spagnuolo andMVitelli ldquoOptimizingsampling rate of PampO MPPT techniquerdquo in Proceedings ofthe IEEE 35th Annual Power Electronics Specialists Conference(PESC rsquo04) vol 3 pp 1945ndash1949 June 2004

[12] N Femia G PetroneG Spagnuolo andMVitelli ldquoA techniquefor improving PampOMPPT performances of double-stage grid-connected photovoltaic systemsrdquo IEEE Transactions on Indus-trial Electronics vol 56 no 11 pp 4473ndash4482 2009

[13] K H Hussein I Muta T Hoshino and M Osakada ldquoMax-imum photovoltaic power tracking an algorithm for rapidlychanging atmospheric conditionsrdquo IEE ProceedingsmdashGenera-tion Transmission and Distribution vol 142 no 1 pp 59ndash641995

[14] C Hua J Lin and C Shen ldquoImplementation of a DSP-controlled photovoltaic systemwith peak power trackingrdquo IEEETransactions on Industrial Electronics vol 45 no 1 pp 99ndash1071998

[15] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996

[16] A G Barto Reinforcement Learning An Introduction MITPress 1998

[17] C JWatkins and PDayan ldquoQ-learningrdquoMachine Learning vol8 no 3-4 pp 279ndash292 1992

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Inorganic ChemistryInternational Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

International Journal ofPhotoenergy

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Carbohydrate Chemistry

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Physical Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom

Analytical Methods in Chemistry

Journal of

Volume 2014

Bioinorganic Chemistry and ApplicationsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

SpectroscopyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Medicinal ChemistryInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chromatography Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied ChemistryJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Theoretical ChemistryJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Spectroscopy

Analytical ChemistryInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Quantum Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Organic Chemistry International

ElectrochemistryInternational Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CatalystsJournal of

Page 13: Research Article A Reinforcement Learning-Based Maximum ...downloads.hindawi.com/journals/ijp/2015/496401.pdf · Research Article A Reinforcement Learning-Based Maximum Power Point

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Inorganic ChemistryInternational Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

International Journal ofPhotoenergy

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Carbohydrate Chemistry

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Physical Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom

Analytical Methods in Chemistry

Journal of

Volume 2014

Bioinorganic Chemistry and ApplicationsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

SpectroscopyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Medicinal ChemistryInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chromatography Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied ChemistryJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Theoretical ChemistryJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Spectroscopy

Analytical ChemistryInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Quantum Chemistry

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Organic Chemistry International

ElectrochemistryInternational Journal of

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CatalystsJournal of