Autoregressive Linear Thermal Model of a Residential ... Estimation Algorithm Eric Burger Scott Moura ... Heating, ventilation, and air-conditioning (HVAC) 1. Introduction Heating,

Autoregressive Linear Thermal Model of a ResidentialForced-Air Heating System with Backpropagation

Parameter Estimation Algorithm

Eric BurgerScott MouraDavid E. Culler

Electrical Engineering and Computer SciencesUniversity of California at Berkeley

Technical Report No. UCB/EECS-2017-28http://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-28.html

May 5, 2017

Copyright © 2017, by the author(s).All rights reserved.

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires priorspecific permission.

Autoregressive Linear Thermal Modelof a Residential Forced-Air Heating System

with Backpropagation Parameter Estimation Algorithm

Eric M. Burger, Scott J. Moura, David E. Culler

Abstract

Model predictive control (MPC) strategies show great potential for improving the performance and energyefficiency of building heating, ventilation, and air-conditioning (HVAC) systems. A challenge in the deploy-ment of such predictive thermostatic control systems is the need to learn accurate models for the thermalcharacteristics of individual buildings. This necessitates the development of online and data-driven methodsfor system identification. In this paper, we propose an autoregressive with exogenous terms (ARX) modelof a building. To learn the model, we present a backpropagation approach for recursively estimating theparameters. Finally, we fit the linear model to data collected from a residential building with a forced-airheating and ventilation system and validate the accuracy of the trained model.

Keywords: Thermal modeling, Control-oriented model, Autoregressive with exogenous terms (ARX)model, Backpropagation, Demand response, Heating, ventilation, and air-conditioning (HVAC)

1. Introduction

Heating, ventilation, and air-conditioning(HVAC) account for 43% of commercial and 54%of residential energy consumption [1]. Spaceheating alone accounts for 45% of all residentialenergy use. HVAC systems are an integral part ofbuildings responsible for regulating temperature,humidity, carbon dioxide, and airflow, conditionswhich directly impact occupant health and com-fort. Estimates suggest that component upgradesand advanced HVAC control systems could reducebuilding energy usage by up to 30% [2]. Suchintelligent systems can improve the efficiencyof building operations, better regulate indoorconditions to improve air quality and occupantcomfort, and enable buildings to participatein demand response services to improve powergrid stability and reduce energy related carbonemissions [3, 4, 5, 6, 7, 8].

To effectively control the operation of an HVACsystem, it is essential that a model predictive con-troller incorporate an accurate mathematical rep-resentation of a building’s thermal dynamics. Theprocesses that determine the evolution of temper-atures within a building are complex and uncer-

tain. A reliable model improves the ability of acontroller to forecast conditions and meet cost, ef-ficiency, and/or comfort objectives [9, 10]. Simula-tion software, such as EnergyPlus and TRNSYS, iscapable of high fidelity modeling of building HVACsystems. These mathematical models play a crucialrole in the architectural and mechanical design ofnew buildings, however, due to high dimensionalityand computational complexity, are not suitable forincorporation into HVAC control systems [9, 11].

The American Society of Heating, Refrigeration,and Air-Conditioning Engineers (ASHRAE) hand-book [12] describes how to determine the thermalresistance values of a building surface given it ma-terials and construction type. However, for existingbuildings, details about the materials in and con-struction of walls and windows may be difficult toobtain or non-existent [13]. Additionally, modifi-cations to the building or changes brought aboutby time and use (e.g. cracks in windows or walls)further diminish the potential for characterizing abuilding based on design or construction informa-tion.

Therefore, an ideal control-oriented model wouldcapture the predominant dynamics and disturbancepatterns within a building, enable accurate fore-

casting, adapt to future changes in building use,provide a model structure suitable for optimization,and be amenable to real-time data-driven modelidentification methods. For these reasons, low or-der linear models are widely employed for control-oriented thermal building models [13, 14, 15]. Suchmodels trade complexity and accuracy for simplic-ity and efficiency.

In this paper, we present an autoregressive withexogenous terms (ARX) model for the thermostaticcontrol of buildings and a recursive backpropaga-tion method for parameter estimation. The struc-ture of the linear model enables the approximateidentification of unmodeled dynamics, in particularhigher-order dynamics and time delays related tochanges in the mechanical state of the system. Byemploying a recursive parameter estimation tech-nique, we are able to perform online data-drivenlearning of the model.

We do not model heating from solar gain, build-ing occupants, or equipment. This does not restrictthe applicability of this work because the modelstructure can be extended for such cases. By esti-mating these effects with a single time-varying gain,we produce a simpler model better suited for pre-dictive control.

This paper is organized as follows. Section2 presents our autoregressive exogenous thermalmodel and Section 3 overviews the parameter es-timation problem. Section 4 formulates our re-cursive parameter estimation approach employingbackpropagation and stochastic gradient descent.Section 5 provides numerical examples of our pro-posed model and algorithm for the parameter esti-mation of an apartment with a forced-air heatingand ventilation system. Finally, Section 6 summa-rizes key results.

2. Building Thermal Model

2.1. Linear Thermal Model

In this paper, we focus on the modeling of anapartment with a forced-air heating system. To be-gin, we consider a simple linear discrete time model[16, 17, 5, 4]

T k+1 = θaTk + θbT

k∞ + θcm

k + θd (1)

where T k ∈ R, T k∞ ∈ R, and mk ∈ {0, 1} are theindoor air temperature (state, ◦C), outdoor air tem-perature (disturbance input, ◦C), and heater state

(control input, On/Off), respectively, at time stepk.

The parameters θa and θb correspond to thethermal characteristics of the conditioned space asdefined by θa = exp(−∆t/RC) and θb = 1 −exp(−∆t/RC), θc to the energy transfer due tothe system’s mechanical state as defined by θb =(1− exp(−∆t/RC))RP , and θd to an additive pro-cess accounting for energy gain or loss not directlymodeled.

The linear discrete time model (1) is a discretiza-tion of a RC-equivalent continuous time model andthus derived from (very basic) concepts of heattransfer. As noted in [17, 5], the discrete timemodel implicitly assumes that all changes in me-chanical state occur on the time steps of the sim-ulation. In this paper, we assume that this behav-ior reflects the programming of the systems beingmodeled. In other words, we assume that the ther-mostat has a sampling frequency of 1/(3600∆t) Hzor once per minute.

2.2. Autoregressive Exogenous Thermal Model

The linear discrete time model (1) is capableof representing the predominant thermal dynamicswithin a conditioned space. Unfortunately, becauseit does not capture any higher-order dynamics ortime delays related to changes in the mechanicalstate of the system, the model is fairly inaccuratein practice. Research into higher-order RC mod-els, in particular multi-zone network models andthe modeling of walls as 2R-1C or 3R-2C elements,have shown potential for producing higher fidelitybuilding models [13, 14, 15]. However, this comes atthe cost of increasing the model complexity and theneed for temperature sensing (in particular, withininterior and exterior walls).

In this paper, we present an autoregressive exoge-nous (ARX) model capable of approximating dy-namics related to trends in the ambient tempera-ture and to changes in the mechanical state of thesystem. We note that the linear discrete time model(1) is, by definition, a first-order ARX model. Thedistinguishing characteristic of the ARX model pre-sented below is that the model is higher-order withrespect to the exogenous input terms. By increasingthe number of exogenous input terms, we can bet-ter approximate observed dynamics in the systems.However, we will not pursue a physics-based justifi-cation for the number of exogenous terms and thusthe ARX model represents a slight departure from

2

the practice of increasing the model order throughRC-equivalent circuit modeling.

Our autoregressive exogenous (ARX) thermalmodel is given by

T k+1 = θaTk +

s−1∑i=0

(θb,iTk−i∞ + θc,im

k−i) + θd (2)

where T k ∈ R, T k∞ ∈ R, and mk ∈ {0, 1} are theindoor air temperature (state, ◦C), outdoor air tem-perature (disturbance input, ◦C), and heater state(control input, On/Off), respectively, at time stepk. The order of the exogenous terms (and thus thenumber of θb and θc parameters) is given by s.

The ARX model can be expressed more com-pactly as

T k+1 = θaTk + θTb T

k∞ + θTc m

k + θd (3)

where T k ∈ R, Tk∞ ∈ Rs, and mk ∈ {0, 1}s are the

indoor air temperature (state, ◦C), previous out-door air temperatures (disturbance input, ◦C), andprevious heater states (control input, On/Off), re-spectively, at time step k. Lastly, θb ∈ Rs andθc ∈ Rs are the parameters of the exogenous terms.

3. Parameter Estimation Background

A fundamental machine learning problem in-volves the identification of a linear mapping

yk = θTxk (4)

where variable xk ∈ RX is the input, yk ∈ RY isthe output, and the linear map is parameterized byθ ∈ RX×Y . Additionally, X and Y are the numberof inputs and outputs, respectively.

3.1. Batch Parameter Estimation

Learning can be performed in a batch manner byproducing θ̂, an estimate of the model parameters,given a training set of observed inputs and desiredoutputs, {x, y}. The goal of a parameter estimationalgorithm is to minimize some function of the errorbetween the desired and estimated outputs as givenby ek = yk − θ̂Txk.

The least squares problem is given by

minimizeθ̂

1

2

N∑i=1

(θ̂Txi − yi)2 (5)

with variables xi ∈ Rn, the model input for the i-thdata point, yi ∈ R, the i-th observed response, θ̂ ∈

Rn, the weighting coefficients, and i = 1, . . . , N ,where N is the number of data samples and n isthe number of features in xi.

3.2. Recursive Parameter Estimation

The least squares problem can be solved recur-sively with stochastic gradient descent as given by

θ̂ := θ̂ − η(θ̂Txk − yk) (6)

with variables xk ∈ Rn, the model input for at timestep k, yk ∈ R, the observed response at time stepk, θ̂ ∈ Rn, the weighting coefficients, and η, thelearning rate.

4. Backward Propagation of Errors

A fundamental limitation of least squares regres-sion when applied to autoregressive models of dy-namical systems is that the optimization only min-imizes the error of the output at one time step intothe future. Thus, the model may produce a smallerror when employed to predict the state in the nexttime step but perform poorly when used to recur-sively produce a multiple time step forecast. Toaddress this issue, we can represent the system as amultilayer neural network where each layer sharesthe same set of weights. By training the neural net-work with backpropagation and stochastic gradientdescent, we can produce an estimate of the system’sparameters that minimizes the output error multi-ple time steps into the future.

Backward propagation of errors, or backpropaga-tion, is a technique commonly used for training mul-tilayer artificial neural networks. The method con-sists of propagating an input forward through thelayers of the neural network until the output layeris reached. The estimated output is then comparedto the desired output to calculate an error valueaccording to a loss function. Next, the error valueis propagated backwards in order to calculate therelative contribution of each neuron in each layerto the network’s estimated output. These relativecontributions are used to calculate the gradient ofthe loss function with respect to the weights in thenetwork. Finally, the weights of the network areupdated according to a gradient-based optimizationmethod, such as stochastic gradient descent, so asto minimize the loss function.

In this paper, we employ backpropagation totrain the ARX thermal model (3) according to theoptimization problems presented below. In each

3

case, we represent the system as a multilayer neu-ral network where each layer shares the same set ofweights. Thus, for a network with ` layers,

T̂ k+1 = θaTk + θTb T

k∞ + θTc m

k + θd

ek1 = T k+1 − T̂ k+1

T̂ k+2 = θaT̂k+1 + θTb T

k+1∞ + θTc m

k+1 + θd

ek2 = T k+2 − T̂ k+2

T̂ k+3 = θaT̂k+2 + θTb T

k+2∞ + θTc m

k+2 + θd

ek3 = T k+3 − T̂ k+3

...

T̂ k+` = θaT̂k+`−1+

θTb Tk+`−1∞ + θTc m

k+`−1 + θd

ek` = T k+` − T̂ k+`

(7)

where T̂ k+i is the output of layer i (i.e. the esti-mated temperature i time steps from k) and eki isthe error of the layer i output (i.e. the error of theestimated temperature i time steps from k). Notethat the output of the first layer, T̂ k+1, is a func-tion of the measured temperature, T k, whereas theoutput of each subsequent layer, T̂ k+i+1, takes theoutput of the previous layer, T̂ k+i, as an input.

Unlike a typical neural network, the activationfunction of each layer in our model is linear andwe can express the output, T̂ k+`, in terms of themeasured temperature, T k, as

T̂ k+` = (θa)`T k

+∑̀i=1

(θa)i−`(θTb Tk+i−1∞ + θTc m

k+i−1)

+ θd∑̀i=1

(θa)i−`

ek` = T k+` − T̂ k+`

(8)

Therefore, the neural network model is linearwith respect to the inputs but nonlinear with re-spect to the parameters. This nonlinearity, as wellas the forward propagation of noise, is a centralchallenge with respect to training the network. Fur-thermore, each of the training approaches presentedin this paper are essentially nonlinear least squareproblems. However, for convenience, we employ theterminology used for multilayer neural networks todescribe each parameter estimation approach.

Next, we present 3 approaches for training themultilayer neural network so as to produce esti-

mates of the ARX model (3) that perform well whenused to product multiple time step forecasts.

4.1. Final Error Backpropagation

In our first training approach, we define our ob-jective function so as to minimize the error of thefinal output layer of the neural network as given by

minimizeθ̂

1

2

N∑k=1

(ek` )2 (9)

with variables ek` ∈ R, the output error of thefinal output layer (as defined in (7)) given theinput and output data samples at time step k,θ̂ ∈ R2s+2, the model parameter estimates (i.e.

θ̂ = [θ̂a, θ̂Tb , θ̂

Tc , θ̂d]

T ), and k = 1, . . . , N , where N isthe number of data samples.

We solve the optimization program (9) recur-sively using backpropagation and stochastic gradi-ent descent. Therefore, at each time step k, thestochastic gradient descent update equation is

θ̂ := θ̂ − η δ(ek` )2

δθ̂(10)

and the gradient of the loss function with respectto the parameters is

δ(ek` )2

δθ̂=

∑̀i=1

δ(ek` )2

δT̂ k+`δT̂ k+`

δT̂ k+iδT̂ k+i

δθ̂

=∑̀i=1

(ek` )(θ̂a)`−iδT̂ k+i

δθ̂

=∑̀i=1

ek` (θ̂a)`−i ◦ xki−1

(11)

where

xk0 = [T k, (Tk∞)T , (mk)T , 1]T

xki−1 = [T̂ k+i, (Tk+i∞ )T , (mk+i)T , 1]T

∀i = 2, . . . , `

(12)

Note that with this training approach, we onlybackpropagate the error of the final output layer.The assumption is that by minimizing the finaloutput error, we will minimize the error of everylayer in the network. In the following training ap-proaches, we incorporate the output errors of mul-tiple layers into the loss function in an effort toimprove the robustness of the model training.

4

4.2. All Error Backpropagation

In our second training approach, we define ourobjective function so as to minimize the error ofeach layer in the neural network as given by

minimizeθ̂

1

2

N∑k=1

∑̀i=1

(eki )2 (13)

with variables eki ∈ R, the output error of each layeri (as defined in (7)) given the input and output data

samples at k, θ̂ ∈ R2s+2, the model parameter esti-mates (i.e. θ̂ = [θ̂a, θ̂

Tb , θ̂

Tc , θ̂d]

T ), and k = 1, . . . , N ,where N is the number of data samples.

We solve the optimization program (13) recur-sively using backpropagation and stochastic gradi-ent descent. Therefore, at each time step k, thestochastic gradient descent update equation is

θ̂ := θ̂ − ηδ∑ì=1(eki )2

δθ̂(14)

and the gradient of the loss function with respectto the parameters is

δ∑ì=1(eki )2

δθ̂=

∑̀i=1

i∑j=1

eki (θ̂a)i−j ◦ xkj−1 (15)

where xki−1 is defined in (12).

4.3. Partial Error Backpropagation

An issue with the Final Error Backpropagationand All Error Backpropagation methods presentedabove is that, for large values of `, we are propa-gating the errors backwards over many time steps.However, given that we are using the neural net-work model to represent a dynamical system, theremay be very little signal between the input at timestep k and the output at time step k+`. This poten-tial lack of signal between the input and output is awell known issue with training deep artificial neu-ral networks using backpropagation and gradient-based optimization methods and can result in whatis often described as the vanishing (or exploding)gradient problem.

In our case, the issue stems from the exponentialterms in (11) and (15). Specifically, small values of

θ̂a may cause the gradient to “vanish” while largevalues may cause the gradient to “explode”. Toaddress this, our third training approach will back-propagate the errors of each layer a maximum of β

time steps. As with the All Error Backpropagationmethod, the objective function is defined so as tominimize the error of each layer in the neural net-work as given by (13) and the stochastic gradientdescent update equation is (14).

However, for the Partial Error Backpropagationapproach, we approximate the gradient of the lossfunction with respect to the parameters as

δ∑ì=1(eki )2

δθ̂≈

∑̀i=1

i∑j=f(i,β)

eki (θ̂a)i−j ◦ xkj−1 (16)

where xki−1 is defined in (12) and f(i, β) is givenby

f(i, β) = max(1, i− β + 1) (17)

Note that with the Partial Error Backpropaga-tion method, the output error eki of each layer i isbackpropagated a maximum of β layers (i.e. back-wards β time steps).

4.4. Growing the Neural Network

When training the neural network using the 3methods described above, poor initial estimates ofthe parameter values will cause the algorithm to di-verge. Therefore, it is necessary to start with a shal-low network and gradually increase the depth as theparameter estimates improve. In other words, whentraining the model, we start with a small value of`. Once the algorithm has converged, we increasethe value of ` and continue to recursively updatethe parameters. We repeat this procedure until theneural network has reached the desired depth (i.e.desired value of `).

5. Residential Heating SystemParameter Estimation Experiments

In this section, we present parameter estimationresults for an 850 sq ft apartment with a forced-airheating and ventilation system. The apartment islocated in Berkeley, California and equipped witha custom thermostat designed and built for this re-search. Therefore, we are able to control the op-eration of the heating system and to measure theindoor air temperature. Local weather data, specif-ically ambient air temperature, is retrieved from theInternet service, Weather Underground [18].

5

Figure 1: Examples of 2 hour temperature forecasts over 8 hours of test data generated by ARX models with varying numbersof exogenous input terms

Data was collect at a time-scale of one minutefor 6 weeks during December and January of 2015-2016. With this data, we are able to perform re-cursive parameter estimation of the ARX thermalmodel (2). The results presented in this section fo-cus of quantifying and qualifying the advantages ofthe ARX model and the backpropagation parame-ter estimation methods presented above.

5.1. Increasing Model Order

With the ARX model, we are able to adjust thenumber of exogenous terms, s, based on the dynam-ics of a particular conditioned space. Increasing thenumber of exogenous terms increases the compu-tational cost of training and employing the ARXmodel. Therefore, we want to find the minimumvalue of s such that the model performs well for aspecific system.

To evaluate the sensitivity of the ARX model tothe number of exogenous terms, we have trained themodel using different values of s. In each case, themodel is trained using batch least squares on 80% of

the sensor data (i.e. training data) and the modelperformance is evaluated by producing multi-hourforecasts with the remaining 20% of the data (i.e.test data).

Figures 1 and 2 present examples of 2 hour tem-perature forecasts produced by ARX models withvarying numbers of exogenous input terms. The topsubplots show forecasts from an ARX model withs = 1, which is equivalent to the linear thermalmodel in (1). As shown, the model is simply inca-pable of representing the evolution of the indoor airtemperature. Most notably, the forecasts poorly ac-count for the thermal dynamics immediately afterthe heating system turns off. These dynamics arerelated to the interaction between the air and theother thermal masses (walls, furniture, etc.) withinthe conditioned space.

By increasing s to 10, the ARX model is ableto better represent the dynamics immediately af-ter the heating system turns off. However, we ob-serve an elbow in the temperature forecasts at 10time steps after the heating system turns off, as

6

Figure 2: Examples of 2 hour temperature forecasts over 24 hours of test data generated by ARX models with varying numbersof exogenous input terms

shown in the second subplot. This suggests that theconditioned space is still responding to the changein state of the heating system, but that the ARXmodel no longer has any knowledge of the statechange and thus cannot estimate its impact on theindoor air temperature.

By increasing s to 30, the model is able to bet-ter represent the dynamics of the conditioned spacefrom the time the heating system turns off until itturns on again. This is an intuitive result and indi-cates that s must be sufficiently large so as to cap-ture a full cycle of the heating system. Increasings to 60 and 100, as shown in the bottom 2 sub-plots, does not significantly improve the accuracyof the 2 hour forecasts. In other words, each addi-tional exogenous input increases the complexity ofthe ARX model but provides less information thanthe previous input.

Figure 3 shows the performance of ARX mod-els with varying numbers of exogenous terms, s,when used to generate forecasts of 1, 5, 10, 30, 60,120, 240, and 480 time steps. Each ARX model

was trained using batch least squares on 80% of thesensor data (i.e. training data) and the model per-formance was evaluated by producing forecasts withthe remaining 20% of the data (i.e. test data). Theperformance of each model is measured as the rootmean squared error (RMSE) of all multiple timestep forecasts of a certain length. In other words,the RMSE60 is the RMSE of all 60 time step fore-casts over a given data set. For comparison, Figure3 includes the performances of each ARX modelwhen used to produce forecasts on both the train-ing data and test data.

As shown in Figure 3, the RMSEs of the ARXmodels over horizons of 1, 5, 10, and 30 time stepsdecrease as s increases from 1 to 30 and level off ataround 40. The RMSEs of the 240 and 480 timestep forecasts also decrease at first, but begin toincrease as s increases from 40 to 80, particularlyfor the test data. A simple (though imprecise) ex-planation of this behavior is that we are underfit-ting the model when s is less than 30 and overfit-ting when s is greater than 40. The lowest RMSE1

7

(a) RMSE of 1, 5, 10, and 30 time step forecasts (b) RMSE of 60, 120, 240, and 480 time step forecasts

Figure 3: Performance (RMSE) of ARX models with varying numbers of exogenous input terms on training and test data whenused to generate forecasts of 1, 5, 10, 30, 60, 120, 240, and 480 time steps

(i.e. the RMSE of all 1-minute forecasts) on thetraining data is 0.0343◦C when s = 120 and on thetest data is 0.0384◦C when s = 48. Since the leastsquares optimization problem minimizes the 1 timestep ahead error, it is no surprise that each addi-tional exogenous terms reduces the RMSE1 of thetraining data. By contrast, the lowest RMSE480(i.e. the RMSE of all 8-hour forecasts) on the train-ing data is 0.431◦C when s = 40 and on the testdata is 0.523◦C when s = 32. With the longer fore-cast horizon, we see more agreement between thetraining and test performances with respect to theoptimal number of exogenous terms.

5.2. Backpropagation Methods andIncreasing Neural Network Depth

In this section, we present results from train-ing the ARX model using the 3 backpropagationmethods: Final Error Backpropagation, All ErrorBackpropagation, and Partial Error Backpropaga-tion. Once again, we use 80% of the sensor datacollected from the apartment as training data andthe remaining 20% of the data as test data. Foreach backpropagation method, we train ARX mod-els with 30, 60, and 100 exogenous terms. Addition-ally, each model is trained with different numbersof neural network layers, `. Increasing the numberof layers increases the computational cost of train-ing the model and therefore, we want to find theminimum value of ` such that the model performswell for a specific system.

As previously noted, poor initial parameter esti-mates will cause the training algorithm to diverge.Therefore, when training a network with depth `,

we initialize the parameters with estimates from anetwork of depth ` − 1. For a network of depth` = 1, we train the model using least squares ratherthan backpropagation and stochastic gradient de-scent. Lastly, to reduce the likelihood that thestochastic gradient descent algorithm diverges forlarge values of `, we set a small learning rate, η,of 3 ∗ 10−9 and limit the number of iterations (i.e.number of stochastic gradient descent updates) to200,000.

Results from training the ARX models using theFinal Error Backpropagation method are presentedin Figures 4, 5, and 6. For the ARX model withs = 30 exogenous terms, we observe little to no im-provement in the forecast error as a result of thebackpropagation training method. In fact, the low-est RMSE480 on the training data is 0.468◦C when` = 3 and on the test data is 0.525◦C when ` = 7.As the depth of the neural network increases, theaccuracy of the forecasts remain relatively stableuntil ` reaches about 40. With an ` of 70, we startto experience exploding gradients resulting in poorparameter estimates and a sharp increase in theRMSEs of the forecasts.

Figures 5 and 6 present results from ARX mod-els with s = 60 and s = 100 exogenous terms. Asdiscussed in the previous section, training modelswith such large numbers of exogenous terms usingleast squares caused overfitting and an increase inthe RMSE240 and RMSE480. Using the Final Er-ror Backpropagation method, we are able to im-prove the performance of both models on the train-ing and test data. In fact, we are able to produce8-hour forecasts that are, on average, more accu-

8


Figure 4: Performance (RMSE) of ARX model with s = 30 exogenous input terms when trained using Final Error Backprop-agation with varying neural network depth, `





9


Figure 7: Performance (RMSE) of ARX model with s = 30 exogenous input terms when trained using All Error Backpropagationwith varying neural network depth, `


Figure 8: Performance (RMSE) of ARX model with s = 60 exogenous input terms when trained using All Error Backpropagationwith varying neural network depth, `


Figure 9: Performance (RMSE) of ARX model with s = 100 exogenous input terms when trained using All Error Backpropa-gation with varying neural network depth, `

10


Figure 10: Performance (RMSE) of ARX model with s = 30 exogenous input terms when trained using Partial Error Back-propagation with a backpropagation limit of β = 5 and varying neural network depth, `


Figure 11: Performance (RMSE) of ARX model with s = 60 exogenous input terms when trained using Partial Error Back-propagation with a backpropagation limit of β = 5 and varying neural network depth, `


Figure 12: Performance (RMSE) of ARX model with s = 100 exogenous input terms when trained using Partial ErrorBackpropagation with a backpropagation limit of β = 5 and varying neural network depth, `

11

(a) RMSE of 60, 120, 240, and 480 time step forecastswith a backpropagation limit of β = 10

(b) RMSE of 60, 120, 240, and 480 time step forecastswith a backpropagation limit of β = 20

Figure 13: Performance (RMSE) of ARX model with s = 100 exogenous input terms when trained using Partial ErrorBackpropagation with a backpropagation limit of β = 10 and β = 20 and varying neural network depth, `

rate than with the s = 30 model. For the s = 60ARX model, the lowest RMSE480 on the trainingdata is 0.413◦C when ` = 30 and on the test datais 0.475◦C when ` = 60. For the s = 100 ARXmodel, the lowest RMSE480 on the training datais 0.394◦C when ` = 53 and on the test data is0.463◦C when ` = 65. Once again, with an ` of70, we start to experience exploding gradients anda sharp increase in forecast error.

Results from training the ARX models using theAll Error Backpropagation method are presentedin Figures 7, 8, and 9. For the ARX model withs = 30 exogenous terms, we observe an overall in-crease in forecast error as a result of the backprop-agation training method. The lowest RMSE480 onthe training data is 0.468◦C when ` = 3 and on thetest data is 0.527◦C when ` = 12. By contrast, forthe s = 60 and s = 100 ARX models, we again seean improvement in the model performance as a re-sult of the backpropagation training method. Forthe s = 60 ARX model, the lowest RMSE480 on thetraining data is 0.419◦C when ` = 25 and on thetest data is 0.485◦C when ` = 55. For the s = 100ARX model, the lowest RMSE480 on the train-ing data is 0.398◦C when ` = 53 and on the testdata is 0.461◦C when ` = 47. The performancesof the ARX models exhibit greater variability whentrained with the All Error Backpropagation methodthan compared with the Final Error Backpropaga-tion approach and we observe a sharp increase inforecast error at an ` of about 60 due to explodinggradients.

Results from training the ARX models using

the Partial Error Backpropagation method are pre-sented in Figures 10, 11, 12, and 13. Figures 10, 11,and 12 present results from training ARX modelsusing a maximum of β = 5 layers for backpropa-gation and Figure 13 presents results using β = 10and β = 20. Unlike with Final Error Backpropa-gation and All Error Backpropagation, we only ob-serve divergence in the gradient descent algorithmfor the β = 20 case when using Partial Error Back-propagation. For the other cases, the algorithm re-mains stable (or as stable as can be expected ofstochastic gradient descent) even at large values of`. This suggests that by limiting the number ofneural network layers through which the errors arebackpropagated, we can approximate the gradientof the objective function and reduce the risk of ex-ploding gradients.

For the ARX model with s = 30 exogenousterms, the lowest RMSE480 on the training datais 0.468◦C when ` = 2 and on the test data is0.526◦C when ` = 12. These results are very closeto those when trained with Final Error Backprop-agation and All Error Backpropagation. For thes = 60 ARX model, the lowest RMSE480 on thetraining data is 0.417◦C when ` = 22 and on thetest data is 0.501◦C when ` = 93. For the s = 100ARX model, the lowest RMSE480 on the trainingdata is 0.398◦C when ` = 26 and on the test datais 0.470◦C when ` = 100. Note that with PartialError Backpropagation for the s = 60 and s = 100cases, the test error is minimized with an ` greaterthan 90. With the previous training approaches,the gradient descent algorithm began to diverge

12

with an ` of around 60. If we increase β to 10,the lowest RMSE480 of the s = 100 ARX model onthe training data is 0.401◦C when ` = 22 and onthe test data is 0.465◦C when ` = 96. By increasingβ again to 20, the lowest RMSE480 on the trainingdata becomes 0.400◦C when ` = 26 and on the testdata becomes 0.470◦C when ` = 38. As previouslynoted, with s = 100 and β = 20, the algorithmdiverges at around ` = 60.

Using the Final Error Backpropagation, All Er-ror Backpropagation, and Partial Error Backprop-agation approaches, the lowest RMSE480 values onthe test data were 0.463◦C, 0.461◦C, and 0.465◦C,respectively. Each of these was achieved by anARX model with s = 100 exogenous terms. Giventhe clear potential for instability in the Final Er-ror Backpropagation and All Error Backpropaga-tion methods, these parameter estimation methodsare poorly suited for control applications. However,given the greater stability and comparable modelperformances (as measured by the RMSE480 val-ues), the Partial Error Backpropagation methodpresented in this paper has the greatest poten-tial for improving the accuracy of the ARX modelby minimizing the output error over multiple timesteps rather than one time step into the future.

5.3. Control Simulations

In the previous sections, the performance of eachARX model is quantified using the root meansquared error (RMSE) of all multiple time step fore-casts of a certain length for a given data set. TheseRMSE values are useful for understanding the ac-curacy of the model (in a statistical sense) and mea-suring the capability of the model to estimate theair temperature of the conditioned space given themechanical state (On/Off) of the system. For appli-cations like temperature estimation (e.g. Kalmanfiltering of temperature measurements) and faultdetection (e.g. detecting if the system has failedto deliver heat to the conditioned space), we wouldlike a model with a low RMSE value. However, theRMSE of the temperature forecasts is not sufficientfor quantifying the fidelity of the ARX model or itssuitability for model predictive control applications.

In this section, we present results from controlsimulations in which various ARX models are usedto estimate both the indoor air temperature, T k,and mechanical state, mk, given the outdoor airtemperature, T k∞, and upper and lower temperaturebounds at each time step k. Each ARX model has30, 60, or 100 exogenous input terms (s=30, 60, or

100) and is fit to the training data using batch leastsquares (` = 1) or Partial Error Backpropagationwith β = 5 and a network depth of 20, 30, 40, or 50layers (`=20, 30, 40, or 50). We simulate the controlof the system using the outdoor air temperature,T k∞, and temperature setpoints, T kset, of the testdata set. The indoor temperature and mechanicalstate estimates are initialized with measured data(i.e. T̂ 0 = T 0 and m̂0 = m0) and the evolution ofthe states are given by the update equations

T k+1 = θaTk + θTb T

k∞ + θTc m

k + θd

mk+1 =

1 if T k+1 < T kset − δ

2

0 if T k+1 > T kset + δ2

mk otherwise

(18)

where T k ∈ R and mk ∈ {0, 1} are the indoorair temperature (◦C) and heater state (On/Off),respectively, at time step k. As in (3), Tk

∞ ∈Rs denotes the previous outdoor air temperatures(◦C), mk ∈ {0, 1}s, the previous heater states(mk = [mk,mk−1, . . . ,mk−s+1]T ), and θb ∈ Rs andθc ∈ Rs, the parameters of the exogenous terms.Lastly, T kset ∈ R denotes the temperature setpoint(◦C) at time step k and δ ∈ R, the temperaturedeadband width (◦C). Therefore, at each time stepk, the upper temperature bound is T kset + δ/2 andthe lower temperature bound is T kset − δ/2.

Examples of temperature estimates, T̂ k, and me-chanical state estimates, m̂k, produced by ARXmodels trained with batch least squares and simu-lated with (18) are presented in Figure 14. The topsubplot shows the measured temperature, T k, andmechanical state, mk, and the remaining four sub-plots show estimates from ARX models with s=10,30, 60, and 100 exogenous terms.

Figure 15 summarizes results from the controlsimulation tests. As shown in the subfigures, weemploy four metrics to quantify the fidelity of theARX models with respect to observations of theforced-air heating system. In Figure 15(a), we com-pare the number of time steps that the temper-ature estimates are within the upper and lowerbounds and the number of time steps that themechanical system is on according to the controlsimulation of each ARX model. The results arepresented as errors relative to the observed num-ber of time steps within the deadband and timesteps that the system in on, respectively. In Fig-ure 15(b), we show the root mean squared error

13

Figure 14: Examples of temperature and mechanical state estimates over test data generated by ARX models with varyingnumbers of exogenous input terms and trained with batch least squares

(RMSE) of temperature deviations above the up-per bound (T k+1 > T kset+δ/2) and below the lowerbound (T k+1 < T kset − δ/2). The errors are cal-culated relative to the upper or lower bounds andtemperatures within the bounds have an error of 0.

As shown in Figure 15(a), and to a lesser degree,Figure 14, the ARX models trained with batch leastsquares (` = 1) tend to overestimate the number oftime steps that the indoor temperature is withinthe upper and lower bounds. Similarly, these mod-els underestimate the number of time steps that thesystem is on and therefore underestimate the energyrequired to maintain the temperature within theconditioned space. Using Partial Error Backpropa-gation with ` = 20, the ARX models underestimatethe time steps within the deadband and overesti-mate the energy demand. When we increase ` to30 and 40, the relative errors of the ARX modelsmove closer to 0, suggesting that the temperatureand mechanical state estimates better reflect ob-served dynamics of the system. However, increas-ing ` to 50 causes the ARX models to once again

overestimate the time steps within the deadbandand underestimate the number of time steps thatthe system is on.

Figure 15(b) shows the RMSE values of temper-ature deviations above the upper bound and belowthe lower bound for the temperature measurements,T k, and temperature estimates, T̂ k, produced bythe control simulations. Given the formulation ofthe update equations in (18), some deviation out-side the bounds is necessary to change the mechan-ical state. Accordingly, for the various models, wewould like to observe RMSE values that are closeto those of the temperature measurements, indicat-ing that the ARX model accurately represents thedynamics of the system just after it turns on or off.

As shown in the subfigure, the models trainedwith ` = 1 underestimate deviations below thelower bound but do a relatively good job of estimat-ing deviations above the upper bound. For ` = 20,the models overestimate the deviations above theupper bound. In other words, these models over-shoot the upper bound, which may help to explain

14

(a) Percent error of estimated number of time steps thatthe temperature is within the upper and lower bounds(y-axis) and the heating system is on (x-axis)

(b) RMSE of temperature deviations above the upperbound (y-axis) and below the lower bound (x-axis)

Figure 15: Fidelity of ARX model with s = 100 exogenous input terms when trained using Partial Error Backpropagation witha backpropagation limit of β = 5 and varying neural network depth, `

why these models also overestimate the number oftime steps that the system is on. While the modelstrained with an ` of 30 and 40 produced good esti-mates of the number of time steps within the dead-band and the number of time steps that the systemis on, they overestimate the deviations above andbelow the temperature bounds. Lastly, with ` = 50,we achieve RMSE values that are relatively close tothose of the measured data.

Based on the results presented in Figure 15, weconclude that the Partial Error Backpropagationtraining method does have potential to improve thefidelity of the ARX models relative to batch leastsquares. This improvement in model fidelity is par-ticularly important for model predictive control ap-plications which estimate and optimize the energydemand of residential heating systems. However,further research is necessary to optimize for modelfidelity during the training of the ARX models. Po-tential research directions include stopping criteriawhich include a control simulation, the training ofan ensemble of ARX models with model selectionbased on a control simulation, and the developmentof a non-linear training technique (e.g. genetic al-gorithm, particle swarm optimization) which opti-mizes both the accuracy (RMSE of multi time stepforecasts) and fidelity (error of control simulations)of an ARX model.

6. Conclusions

This paper addresses the need for control-oriented thermal models of buildings. We presentan autoregressive with exogenous terms (ARX)model of a building that is suitable for modelpredictive control applications. To estimate themodel parameters, we present 3 backpropagationand stochastic gradient descent methods for recur-sive parameter estimation: Final Error Backprop-agation, All Error Backpropagation, and PartialError Backpropagation. Finally, we present ex-perimental results using real temperature data col-lected from an apartment with a forced-air heatingand ventilation system. These results demonstratethe potential of the ARX model and Partial ErrorBackpropagation parameter estimation method toproduce accurate forecasts of the air temperaturewithin the apartment.

7. References

[1] U.S. Department of Energy, 2010 Buildings EnergyData Book, accessed May. 2, 2014.URL http://buildingsdatabook.eren.doe.gov

[2] R. Brown, Us building-sector energy efficiency poten-tial, Lawrence Berkeley National Laboratory.

[3] T. X. Nghiem, G. J. Pappas, Receding-horizon super-visory control of green buildings, in: American ControlConference (ACC), IEEE, 2011, pp. 4416–4421.

[4] E. M. Burger, S. J. Moura, Generation followingwith thermostatically controlled loads via alternat-ing direction method of multipliers sharing algorithm,

15

Electric Power Systems Research 146 (2017) 141–160.doi:10.1016/j.epsr.2016.12.001.URL http://escholarship.org/uc/item/2m5333xx

[5] D. S. Callaway, Tapping the energy storage potential inelectric loads to deliver load following and regulation,with application to wind energy, Energy Conversion andManagement 50 (5) (2009) 1389–1400.

[6] M. Maasoumy, C. Rosenberg, A. Sangiovanni-Vincentelli, D. S. Callaway, Model predictive controlapproach to online computation of demand-side flexi-bility of commercial buildings HVAC systems for sup-ply following, in: American Control Conference (ACC),Portland, Oregon, 2014, pp. 1082–1089.

[7] J. L. Mathieu, S. Koch, D. S. Callaway, State estima-tion and control of electric loads to manage real-timeenergy imbalance, Power Systems, IEEE Transactionson 28 (1) (2013) 430–440.

[8] A. Kelman, F. Borrelli, Bilinear model predictive con-trol of a HVAC system using sequential quadratic pro-gramming, IFAC Proceedings Volumes 44 (1) (2011)9869–9874.

[9] A. Aswani, N. Master, J. Taneja, A. Krioukov,D. Culler, C. Tomlin, Energy-efficient building HVACcontrol using hybrid system LBMPC, arXiv preprintarXiv:1204.4717.

[10] E. M. Burger, S. J. Moura, Recursive parameter estima-tion of thermostatically controlled loads via unscentedKalman filter, Sustainable Energy, Grids and Networks8 (2016) 12–25. doi:10.1016/j.segan.2016.09.001.URL http://escholarship.org/uc/item/7t453713

[11] A. Aswani, N. Master, J. Taneja, V. Smith, A. Kri-oukov, D. Culler, C. Tomlin, Identifying models ofHVAC systems using semiparametric regression, in:American Control Conference (ACC), IEEE, 2012, pp.3675–3680.

[12] A. Handbook-Fundamentals, American society of heat-ing, refrigerating and air-conditioning engineers, Inc.,NE Atlanta, GA 30329.

[13] Y. Lin, T. Middelkoop, P. Barooah, Issues in identifi-cation of control-oriented thermal models of zones inmulti-zone buildings, in: Decision and Control (CDC),51st Annual Conference on, IEEE, 2012, pp. 6932–6937.

[14] C. Agbi, Z. Song, B. Krogh, Parameter identifiabilityfor multi-zone building models, in: Decision and Con-trol (CDC), 51st Annual Conference on, IEEE, 2012,pp. 6951–6956.

[15] P. Radecki, B. Hencey, Online building thermal pa-rameter estimation via unscented Kalman filtering, in:American Control Conference (ACC), IEEE, 2012, pp.3056–3062.

[16] S. Ihara, F. C. Schweppe, Physically based modeling ofcold load pickup, Power Apparatus and Systems, IEEETransactions on 100 (9) (1981) 4142–4250.

[17] R. E. Mortensen, K. P. Haggerty, A stochastic computermodel for heating and cooling loads., Power Systems,IEEE Transactions on 3 (3) (1998) 1213–1219.

[18] Weather Underground Web Service and API.URL wunderground.com/weather/api/

16

Documents

Autoregressive Linear Thermal Model of a Residential ... Estimation Algorithm Eric Burger Scott Moura ... Heating, ventilation, and air-conditioning (HVAC) 1. Introduction Heating,