13
Research Article ElectricityTheftDetectioninPowerGridswithDeep LearningandRandomForests ShuanLi , 1 YinghuaHan , 1 XuYao, 1 SongYingchen, 2 JinkuanWang, 3 andQiangZhao 2 1 School of Computer and Communication Engineering, Northeastern University at Qinhuangdao, Qinhuangdao 066004, China 2 School of Control Engineering, Northeastern University at Qinhuangdao, Qinhuangdao 066004, China 3 College of Information Science and Engineering, Northeastern University, Shenyang 110819, China Correspondence should be addressed to Yinghua Han; [email protected] Received 19 March 2019; Revised 30 July 2019; Accepted 14 August 2019; Published 3 October 2019 Academic Editor: Ping-Feng Pai Copyright©2019ShuanLietal.isisanopenaccessarticledistributedundertheCreativeCommonsAttributionLicense,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. As one of the major factors of the nontechnical losses (NTLs) in distribution networks, the electricity theft causes significant harm to power grids, which influences power supply quality and reduces operating profits. In order to help utility companies solve the problems of inefficient electricity inspection and irregular power consumption, a novel hybrid convolutional neural network- random forest (CNN-RF) model for automatic electricity theft detection is presented in this paper. In this model, a convolutional neural network (CNN) firstly is designed to learn the features between different hours of the day and different days from massive and varying smart meter data by the operations of convolution and downsampling. In addition, a dropout layer is added to retard the risk of overfitting, and the backpropagation algorithm is applied to update network parameters in the training phase. And then, the random forest (RF) is trained based on the obtained features to detect whether the consumer steals electricity. To build the RF in the hybrid model, the grid search algorithm is adopted to determine optimal parameters. Finally, experiments are conducted based on real energy consumption data, and the results show that the proposed detection model outperforms other methods in terms of accuracy and efficiency. 1.Introduction e loss of energy in electricity transmission and distribu- tion is an important problem faced by power companies all over the world. e energy losses are usually classified into technical losses (TLs) and nontechnical losses (NTLs) [1]. e TL is inherent to the transportation of electricity, which is caused by internal actions in the power system compo- nents such as the transmission liner and transformers [2]; the NTL is defined as the difference between total losses and TLs, which is primarily caused by electricity theft. Actually, the electricity theft occurs mostly through physical attacks like line tapping, meter breaking, or meter reading tam- pering [3]. ese electricity fraud behaviours may bring about the revenue loss of power companies. As an example, the losses caused by electricity theft are estimated as about $4.5 billion every year in the United States (US) [4]. And it is estimated that utility companies worldwide lose more than 20 billion every year in the form of electricity theft [5]. In addition, electricity theft behaviours can also affect the power system safety. For instance, the heavy load of electrical systems caused by electricity theft may lead to fires, which threaten the public safety. erefore, accurate electricity theft detection is crucial for power grid safety and stableness. With the implementation of the advanced metering infrastructure (AMI) in smart grids, power utilities obtained massive amounts of electricity consumption data at a high frequency from smart meters, which is helpful for us to detect electricity theft [6, 7]. However, every coin has two sides; the AMI network opens the door for some new electricity theft attacks. ese attacks in the AMI can be launched by various means such as digital tools and cyber attacks. e primary means of electricity theft detection include humanly examining unauthorized line diversions, comparing malicious meter records with the benign ones, and checking problematic equipment or hardware. How- ever, these methods are extremely time-consuming and costly during full verification of all meters in a system. Hindawi Journal of Electrical and Computer Engineering Volume 2019, Article ID 4136874, 12 pages https://doi.org/10.1155/2019/4136874

ElectricityTheftDetectioninPowerGridswithDeep ...downloads.hindawi.com/journals/jece/2019/4136874.pdf(1)Datacleaning:thereareerroneousvalues(e.g.,outliers) in raw data, which correspond

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ElectricityTheftDetectioninPowerGridswithDeep ...downloads.hindawi.com/journals/jece/2019/4136874.pdf(1)Datacleaning:thereareerroneousvalues(e.g.,outliers) in raw data, which correspond

Research ArticleElectricity Theft Detection in Power Grids with DeepLearning and Random Forests

Shuan Li 1 YinghuaHan 1 Xu Yao1 Song Yingchen2 JinkuanWang3 andQiang Zhao2

1School of Computer and Communication Engineering Northeastern University at Qinhuangdao Qinhuangdao 066004 China2School of Control Engineering Northeastern University at Qinhuangdao Qinhuangdao 066004 China3College of Information Science and Engineering Northeastern University Shenyang 110819 China

Correspondence should be addressed to Yinghua Han yhhanneuqeducn

Received 19 March 2019 Revised 30 July 2019 Accepted 14 August 2019 Published 3 October 2019

Academic Editor Ping-Feng Pai

Copyright copy 2019 Shuan Li et al-is is an open access article distributed under the Creative CommonsAttribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

As one of the major factors of the nontechnical losses (NTLs) in distribution networks the electricity theft causes significant harmto power grids which influences power supply quality and reduces operating profits In order to help utility companies solve theproblems of inefficient electricity inspection and irregular power consumption a novel hybrid convolutional neural network-random forest (CNN-RF) model for automatic electricity theft detection is presented in this paper In this model a convolutionalneural network (CNN) firstly is designed to learn the features between different hours of the day and different days from massiveand varying smart meter data by the operations of convolution and downsampling In addition a dropout layer is added to retardthe risk of overfitting and the backpropagation algorithm is applied to update network parameters in the training phase Andthen the random forest (RF) is trained based on the obtained features to detect whether the consumer steals electricity To buildthe RF in the hybrid model the grid search algorithm is adopted to determine optimal parameters Finally experiments areconducted based on real energy consumption data and the results show that the proposed detection model outperforms othermethods in terms of accuracy and efficiency

1 Introduction

-e loss of energy in electricity transmission and distribu-tion is an important problem faced by power companies allover the world -e energy losses are usually classified intotechnical losses (TLs) and nontechnical losses (NTLs) [1]-e TL is inherent to the transportation of electricity whichis caused by internal actions in the power system compo-nents such as the transmission liner and transformers [2]the NTL is defined as the difference between total losses andTLs which is primarily caused by electricity theft Actuallythe electricity theft occurs mostly through physical attackslike line tapping meter breaking or meter reading tam-pering [3] -ese electricity fraud behaviours may bringabout the revenue loss of power companies As an examplethe losses caused by electricity theft are estimated as about$45 billion every year in the United States (US) [4] And it isestimated that utility companies worldwide lose more than20 billion every year in the form of electricity theft [5] In

addition electricity theft behaviours can also affect thepower system safety For instance the heavy load of electricalsystems caused by electricity theft may lead to fires whichthreaten the public safety -erefore accurate electricitytheft detection is crucial for power grid safety and stableness

With the implementation of the advanced meteringinfrastructure (AMI) in smart grids power utilities obtainedmassive amounts of electricity consumption data at a highfrequency from smart meters which is helpful for us todetect electricity theft [6 7] However every coin has twosides the AMI network opens the door for some newelectricity theft attacks -ese attacks in the AMI can belaunched by various means such as digital tools and cyberattacks -e primary means of electricity theft detectioninclude humanly examining unauthorized line diversionscomparing malicious meter records with the benign onesand checking problematic equipment or hardware How-ever these methods are extremely time-consuming andcostly during full verification of all meters in a system

HindawiJournal of Electrical and Computer EngineeringVolume 2019 Article ID 4136874 12 pageshttpsdoiorg10115520194136874

Besides these manual approaches cannot avoid cyber at-tacks In order to solve the problemsmentioned above manyapproaches have been put forward in the past years -esemethods are mainly categorized into state-based game-theory-based and artificial-intelligence-based models [8]

-e key idea of state-based detection [9ndash11] is based onspecial devices such as wireless sensors and distributiontransformers [12] -ese methods could detect electricitytheft but rely on the real-time acquisition of system topologyand additional physical measurements which is sometimesunattainable Game-based detection schemes formulate agame between electricity utility and theft and then differentdistributions of normal and abnormal behaviours can bederived from the game equilibrium As detailed in [13] theycan achieve a low cost and reasonable result for reducingenergy theft Yet formulating the utility function of allplayers (eg distributors regulators and thieves) is still achallenge Artificial-intelligence-based methods includemachine learning and deep learning methods Existingmachine learning solutions can be further categorized intoclassification and clustering models as is presented in[14ndash17] Although aforementioned machine learning de-tection methods are innovative and remarkable their per-formances are still not satisfactory enough for practice Forexample most of these approaches require manual featureextraction which partly results from their limited ability tohandle high-dimensional data Indeed traditional hand-designed features include the mean standard deviationmaximum and minimum of consumption data -e processof manual feature extraction is a tedious and time-con-suming task and cannot capture the 2D features from smartmeter data

Deep learning techniques for electricity theft detectionare studied in [18] where the authors present a comparisonbetween different deep learning architectures such as con-volutional neural networks (CNNs) long-short-termmemory (LSTM) recurrent neural networks (RNNs) andstacked autoencoders However the performance of thedetectors is investigated using synthetic data which does notallow a reliable assessment of the detectorrsquos performancecompared with shallow architectures Moreover the authorsin [19] proposed a deep neural network- (DNN-) basedcustomer-specific detector that can efficiently thwart suchcyber attacks In recent years the CNN has been applied togenerate useful and discriminative features from raw dataand has wide applications in different areas [20ndash22] -eseapplicationsmotivate the CNN applied for feature extractionfrom high-resolution smart meter data in electricity theftdetection In [23] a wide and deep convolutional neuralnetwork (CNN)model was developed and applied to analysethe electricity theft in smart grids

In a plain CNN the softmax classifier layer is the same asa general single hidden layer feedforward neural network(SLFN) and trained through the backpropagation algorithm[24] On the one hand the SLFN is likely to be overtrainedleading to degradation of its generalization performancewhen it performs the backpropagation algorithm On theother hand the backpropagation algorithm is based onempirical risk minimization which is sensitive to local

minima of training errors As mentioned above because ofthe shortcoming of the softmax classifier the CNN is notalways optimal for classification although it has shown greatadvantages in the feature extraction process -erefore it isurgent to find a better classifier which not only owns thesimilar ability as the softmax classifier but also can make fulluse of the obtained features In most classifiers the randomforest (RF) classifier takes advantage of two powerful ma-chine learning techniques including bagging and randomfeature selection which could overcome the limitation of thesoftmax classifier Inspired by these particular works a novelconvolutional neural network-random forest (CNN-RF)model is adopted for electricity theft detection -e CNN isproposed to automatically capture various features of cus-tomersrsquo consumption behaviours from smart meter datawhich is one of the key factors in the success of the electricitytheft detection model To improve detection performancethe RF is used to replace the softmax classifier detecting thepatterns of consumers based on extracted features -ismodel has been trained and tested with real data from all thecustomers of electricity utility in Ireland and London

2 Overview Flow

-e main aim of the methodology described in this paper isto provide the utilities with a ranked list of their customersaccording to their probability of having an anomaly in theirelectricity meter

As shown in Figure 1 the electricity theft detectionsystem is divided into three main stages as follows

(i) Data analysis and preprocess to explain the reasonof applying a CNN for feature extraction we firstlyanalyse the factors that affect the behaviours ofelectricity consumers For the data preprocess weconsider several tasks such as data cleaning (re-solving outliers) missing value imputation anddata transformation (normalization)

(ii) Generation of train and test datasets to evaluate theperformance of the methodology described in thispaper the preprocessed dataset is split into the traindataset and test dataset by the cross-validation al-gorithm -e train dataset is used to train the pa-rameters of our model whilst the test dataset is usedto assess how well the model generalizes to newunseen customer samples Given that electricitytheft consumers remarkably outnumber non-fraudulent ones the imbalanced nature of thedataset can have a major negative impact on theperformance of supervised machine learningmethods To reduce this bias the synthetic minorityoversampling technique (SMOT) algorithm is usedto make the number of electricity thefts and non-fraudulent consumers equal in the train dataset

(iii) Classification using the CNN-RF model in theproposed CNN-RF model the CNN firstly isdesigned to learn the features between differenthours of the day and different days from massiveand varying smart meter data by the operations of

2 Journal of Electrical and Computer Engineering

convolution and downsampling And then RFclassification is trained based on the obtained fea-tures to detect whether the consumer steals elec-tricity Finally the confusion matrix and receiver-operating characteristic (ROC) curves are used toevaluate the accuracy of the CNN-RF model on thetest dataset

3 Data Analysis and Preprocess

-e dataset was collected by Electric Ireland and SustainableEnergy Authority of Ireland (SEAI) in January 2012 [25]which consists of electricity usage data from over 5000residential households and businesses -e smart metersrecorded electricity consumption at a resolution of 30minduring 2009 and 2010 Customers who participated in thetrial had a smart meter installed in their homes and agreed totake part in the research -erefore it is a reasonable as-sumption that all samples belong to honest users Howevermalicious samples cannot be obtained since energy theftmight never or rarely happen for a given customer Toaddress this issue we generated 1200 malicious customers asdescribed in [26] -e large number and a variety of cus-tomers with a long period of measurements make thisdataset an excellent source for research in the area of smartmeter data analysis For each customer i there is a half-hourly meter reporting for a 525-day periodWe reduced thesampling rate to one per hour for each customer -issection firstly analyses smart meter data to describe therationale of applying a CNN for feature extraction -en itdescribes three main procedures utilized for preprocessingthe original energy consumption data data cleaning missingvalue imputation and data normalization

31 Data Analysis To introduce the rationale for applying aCNN for feature extraction rather than other machinelearning techniques this section analyses smart meter data-ere are the daily load curves of four different users(residential and nonresidential) as shown in Figure 2 and itcan be seen that the daily electricity consumption of non-residential users is significantly higher than that of resi-dential users

In addition Figures 3 and 4 show the daily load profiles ofa consumer during working days and holidays -e load filesare highly similar but slightly shifted In a CNN the filterweights are uniform for different regions -us the featurescalculated in the convolutional layer are invariant to smallshifts which means that relatively stable features can be ob-tained from varying load profiles As described in [27] a deepCNN first automatically extracts features from massive loadprofiles and support vector machine (SVM) identifies thecharacteristics of the consumers To further investigate the loadseries we plot the four seasonsrsquo load profiles for a customer Itcan be seen from Figure 5 that the electricity consumption ofthe consumer changes with the change of seasons

In summary the load consumption differs in bothmagnitude and time of use and is dependent on lifestyleseasons weather and many other uncontrollable factors-erefore consumer load profiles are affected not only byweather and conditions but also by the types of consumersand other factors

Based on the above analysis extracting features fromsmart meter data based on experience is difficult Howeverfeature extraction is a key factor in the success of the de-tection system Conventionally manual feature extractionrequires elaborately designed features for a specific problemthat make it uneasy to adapt to other domains In the CNNthe successive alternating convolutional and pooling layersare designed to learn progressively higher-level features (egtrend indicators sequence standard deviation and linearslope) with 2D historical electricity consumption data Inaddition highly nonlinear correlations exist between elec-tricity consumption and these influencing factors Sinceactivation function has been designed on convolutional andfully connected layers the CNN is able to model highlynonlinear correlations In this paper activation functionnamed ldquorectified linear unitrdquo (ReLU) is used because of itssparsity and minimizing gradient vanishing problem in theproposed CNN-RF model

32 Data Preprocess xit and xprimeit are defined as the energyconsumption for an honest or malicious customer i at timeinterval t In this section the main procedures utilized forpreprocessing raw electricity data are described as follows

Smart meterdata

Data analysis and preprocess CNN-RF model

Model evaluation

Train dataset

Test dataset

Generation of train and test datasets

CNN feature extractor

RF classifier

Training

Balanced dataset

Unbalanced dataset

SMOTalgorithm

Cross-validationalgorithm

Confusion matrix

ROC curves

Original dataset

Data preprocess

Data normalization

Data analysis

Nonresidential and residential users

Working days and holidays

The change of seasons

Data cleaning

Missing valueimputation

Figure 1 Flow of electricity theft detection

Journal of Electrical and Computer Engineering 3

(1) Data cleaning there are erroneous values (eg outliers)in raw data which correspond to peak electricityconsumption activities during holidays or special

occasions such as birthdays and celebrations In thispaper the ldquothree-sigma rule of thumbrdquo [28] is used torestore the outliers according to the following formula

0 5 10 15 20Time (h)

0

1

2

3

4

Usa

ge (k

W)

Residential user 1

0 5 10 15 20Time (h)

0

1

2

3

4

Usa

ge (k

W)

Residential user 2

0 5 10 15 20Time (h)

0

05

1

15

Usa

ge (k

W)

Nonresidential user 1

0 5 10 15 20Time (h)

5

10

15

20

Usa

ge (k

W)

Nonresidential user 2

Figure 2 Load profiles for nonresidential and residential users

0 5 10 15 20 25Time (h)

0

05

1

15

2

25

Usa

ge (k

Wmiddoth

)

MondayTuesdayWednesday

ThursdayFriday

Figure 3 Daily load profiles for a customer in working days

0 5 10 15 20 25Time (h)

0

02

04

06

08

1

12

14

Usa

ge (k

Wmiddoth

)

SaturdaySundayNew year

HalloweenChristmas

Figure 4 Daily load profiles for a customer during holidays

4 Journal of Electrical and Computer Engineering

F xit1113872 1113873 avg xit1113872 1113873 + 2σ xit1113872 1113873 xit gt xlowastit

xit else

⎧⎨

⎩ (1)

where xlowastit is computed by the mean avg(middot) and standarddeviation σ(middot) for each time interval comprising theweekdaytime pair for each month

(2) Missing value imputation because of various reasonssuch as storage issues and failure of smart meters thereare missing values in electricity consumption data-rough the analysis of original data it is found thatthere are two kinds of data missing one is the con-tinuous missing of multiple data and the solution is todelete the users when the number of missing valuesexceeds 10 the other is missing single data which isprocessed by the formula (2) -us missing values canbe recovered as follows

F xit1113872 1113873

xitminus 1 + xit+1

2 xit isin NaN

xit else

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

(2)

where xit stands for the electricity usage of the consumer i

over a period (eg an hour) if xit is null we represent it asNaN

(3) Data normalization finally data need to be nor-malized because the neural network is sensitive to thediverse data One of the common methods for thisscope is min-max normalization which is computedas

F xit1113872 1113873 xit minus min xiT1113872 1113873

max xiT1113872 1113873 minus min xiT1113872 1113873 (3)

where min(middot) and max(middot) represent the min and max valuesover a day respectively

4 Generation of Train and Test Datasets

After data processing the SEAI dataset contains the elec-tricity consumption data of 4737 normal electricity cus-tomers and 1200 malicious electricity customers within525 days (with 24 hours)

Our preliminary analytical results (refer to Section 31)reveal the electricity usage consumer load profiles are af-fected not only by weather and conditions but also by thetypes of consumers and other factors However it is difficultto extracting features based on experience from the 1Delectricity consumption data since the electricity con-sumption every day fluctuates in a relatively independentway Motivated by the previous work in [29] we design adeep CNN to process the electricity consumption data in a2D manner In particular we transform the 1D electricityconsumption data into 2D data according to days We define2D data as a matrix of actual energy consumption values fora specific customer where the rows of 2D data representdays as D (D 1 525) and columns represent the timeperiods as T (T 1 24) It is worthmentioning that theCNN learns the features between different hours of the dayand different days from 2D data

-en we divide the dataset into train and test sets usingthe cross-validation method in which 80 are the train setand 20 are the test set Given that electricity theftconsumers remarkably outnumber nonfraudulent onesthe imbalanced nature of the dataset can have a majornegative impact on the performance of supervised ma-chine learning methods To reduce this bias the SMOTalgorithm is used to make the number of normal andabnormal samples equal in the train set Finally the traindataset contains 7484 customers in which the number ofnormal and abnormal samples is equal And the testdataset consists of 1669 customers

5 The Novel CNN-RF Algorithm

In this section the design of the CNN-RF structure ispresented in detail and some techniques are proposed toreduce overfitting Moreover correlative parameters in-cluding the number of filters and the size of each feature mapand kernel are given Finally the training process of theproposed CNN-RF algorithm is introduced

51 CNN-RF Architecture -e proposed CNN-RF model isdesigned by integrating CNN and RF classifiers As shown inFigure 6 the architecture of the CNN-RF mainly includes anautomatic feature extractor and a trainable RF classifier -efeature extractor CNN consists of convolutional layersdownsampling layers and a fully connection layer-e CNNis a deep supervised learning architecture which usuallyincludes multiple layers and can be trained with the back-propagation algorithm It also can explore the intricatedistribution in smart meter data by performing the

0 10 20 30 40 50 60 70 80 90 100Time (days)

0

5

10

15

20

25

30

Usa

ge (k

Wmiddotd

ay)

SpringSummer

AutumnWinter

Figure 5 Four seasonsrsquo load profiles for a customer

Journal of Electrical and Computer Engineering 5

stochastic gradient method -e RF classifier consists of acombination of tree classifiers in which each tree contrib-utes with a single vote to the assignment of themost frequentclass in input data [30] In the following the design of eachpart of the CNN-RF structure is presented in detail

511 Convolutional Layer -e main purpose of convolu-tional layers is to learn feature representation of input dataand reduce the influence of noise -e convolutional layer iscomposed of several feature filters which are used tocompute different feature maps Specially to reduce theparameters of networks and lower the complication ofrelevant layers during the process of generating featuremaps the weights of each kernel in each feature map areshared Moreover each neuron in the convolutional layerconnects the local region of front layers and then a ReLUactivation function is applied on the convolved resultsMathematically the outputs of the convolutional layers canbe expressed as

yconv Xfl

l1113872 1113873 δ 1113944

Fl

fl1W

fl

l lowastXfl

l + bfl

l⎛⎝ ⎞⎠ (4)

where δ is the activation function lowast is the convolutionoperation and both W

fl

l and bfl

l are the learnable param-eters in the f-th feature filter

512 Downsampling Layer Downsampling is an importantconcept of the CNN which is usually placed between twoconvolutional layers It can reduce the number of parametersand achieve dimensionality reduction In particular eachfeature map of the pooling layer is connected to its corre-sponding feature map of the previous convolutional layerAnd the size of output feature maps is reduced withoutreducing the number of output feature maps in thedownsampling layer In the literature downsampling op-erations mainly include max-pooling and average-pooling-e max-pooling operation transforms small windows intosingle values by maximum which is expressed as formula 5but average-pooling returns average values of activations in asmall window In [31] experiments show that the max-pooling operation shows better performance than average-pooling -erefore the max-pooling operation is usuallyimplemented in the downsampling layer

ypool Xfl

l1113872 1113873 maxmisinM

Xlm1113966 1113967 (5)

where M is a set of activations in the pooling window andmis the index of activations in the pooling window

513 Fully Connected Layer With the features extracted bysequential convolution and pooling a fully connected layeris applied for flattening feature maps into one vector asfollows

yflXl( 1113857 δ Wl middot Xl + bl( 1113857 (6)

where Wl is the weight of the l-th layer and bl is the bias ofthe l-th layer

514 RF Classifier Layer Traditionally the softmax clas-sifier is used for the last output layer in the CNN but theproposed model uses the RF classifier to predict the classbased on the obtained features -erefore the RF classifierlayer can be defined as

yout XL( 1113857 sigm Wrf middot Xl + bl( 1113857 (7)

where sigm(middot) is the sigmoid function which maps ab-normal values to 0 and normal values to 1-e parameter setWrf in the RF layer includes the number of decision treesand the maximum depth of the tree which are obtained bythe grid search algorithm

52 Techniques for Selecting Parameters and AvoidingOverfitting -e detailed parameters of the proposed CNN-RF structure are summarized in Table 1 which include thenumbers of filters in each layer filter size and stride Twofactors are considered in determining the CNN structure-e first factor is the characteristics of consumersrsquo electricityconsumption behaviours Since the load files are such avariable two convolutional layers with a filter size of 3lowast 3(stride 1) are applied to capture hidden features for detectingelectricity theft Moreover because the dimension of inputdata size is 525 times 24 which is similar to what is used in imagerecognition problems two max-pooling layers with a filtersize of 2lowast 2 (stride 2) are used -e second factor is thenumber of training samples since the number of samples islimited to reduce the risk of overfitting we design a fullyconnected layer with a dropout rate of 04 whose main idea

d d

d d d d

n a n a n a n a

Input data Convolutional layer Convolutional layer

Downsamplinglayer

Downsamplinglayer

L232525 lowast 24 L332268 lowast 12 L464263 lowast 12 L564132 lowast 6L1525 lowast 24Fully connected layer

Random forest layer

L6132 lowast 1

Figure 6 Architecture of the hybrid CNN-RF model

6 Journal of Electrical and Computer Engineering

is that dropping units randomly from the neural networkduring training can prevent units from coadapting too much[32] and make a neuron not rely on the presence of otherspecific neurons Applying an appropriate training methodis also useful for reducing overfitting It is essentially aregularization that adds a penalty for weight update at eachiteration And we also use a binary cross entropy as the lossfunction Finally the grid search algorithm is used to op-timize RF classifier parameters such as the maximumnumber of decision trees and features

53 Training Process of CNN-RF -ere is no doubt that theCNN-RF structure needs to train the CNN at first before theRF classifier is invoked After that the RF classifier is trainedbased on features learned by the CNN

531 Training Process of CNN With a batch size (thenumber of training samples in an iteration) of 100 CNNs aretrained using the forward propagation algorithm andbackpropagation algorithm As shown in Figure 6 firstly thedata in the input layer are transferred to convolutionallayers pooling layers and a fully connected layer and thepredicted value is obtained As shown in Figure 7 thebackpropagation phase begins to update parameters if thedifference of the output value and target value is too largeand exceeds a certain threshold -e difference between theactual output of the neural network and the expected outputdetermines the adjustment direction and strip size of weightsamong layers in the network

Given a sample set D (x1 y1) (xm ym)1113864 1113865 with msamples in total 1113957y is obtained at first by using the feed-forward process As for all samples the mean of differencebetween the actual output yi and the expected output1113957ywb(xi) can be expressed by

J(W b) 1m

1113944

m

i1

12

1113957ywb xi( 1113857 minus yi

2

1113874 1113875 (8)

where w represents the connection weight between layers ofthe network and b is the corresponding bias

-en each parameter is initialized with a random valuegenerated by a normal distribution when the mean is 0 andvariance is ϑ which are updated with the gradient descentmethod -e corresponding expressions are as follows

W(l)ij W

(l)ij minus α

z

zW(l)ij

J(W b) (9)

b(l)i b

(l)i minus α

z

zb(l)i

J(W b) (10)

where α represents a learning rate W(l)ij represents the

connection weight between the i-th neuron in the l-th layerand the j-th neuron in the (l + 1)-th layer and b

(l)i represents

the bias of the i-th neuron in the l-th layerFinally all parameters W and b in the network are

updated according to formulas 9 and 10 And all these stepsare repeated to reduce the objective function J(W b)

532 Training Process of RF Classifier -e process of the RFclassifier takes advantage of two powerful machine learningtechniques bagging and random feature selection Baggingis a method to generate a particular bootstrap samplerandomly from the learnt features for growing each treeInstead of using all features the RF randomly selects a subsetof features to determine the best splitting points whichensures complete splitting to one classification of leaf nodesof decision trees Specifically the whole process of RFclassification can be written as Algorithm 1

6 Experiments and Result Analysis

All the experiments are implemented using Python 36 on astandard PC with an Intel Core i5-7500MQ CPU running at340GHz and with 80 GB of RAM-e CNN architecture isconstructed based on TensorFlow [33] and the interfacebetween the CNN and the RF is programmed using scikit-learn [34]

61 Performance Metrics In this paper the problem ofelectricity theft is considered a discrete two-class classifi-cation task which should assign each consumer into one ofthe predefined classes (abnormal or normal)

In particular the result of classifier validation is oftenpresented as confusion matrices Table 2 shows a confusionmatrix applied in electricity theft detection where TP FNFP and TN respectively represent the number of con-sumers that are classified correctly as normal classifiedfalsely as abnormal classified falsely as normal andclassified correctly as abnormal Although these four indicesshow all of the information about the performance of theclassifier more meaningful performance measurescan be extracted from them to illustrate certain performancecriteria such as TPR TP(TP + FN) FPR FP(FP +TN) precisionTP(TP+ FP) recallTP(TP+ FN)and F1 score 2(precision times recall)(precision + recall)where the true positive rate (TPR) is the ability of theclassifier to correctly identify a consumer as normal alsoreferred to as sensitivity and the false positive rate (FPR) isthe risk of positively identifying an abnormal consumerPrecision is the ratio of consumers detected correctly to thetotal number of detected normal consumers Recall is theratio of the number of consumers correctly detected to theactual number of consumers (Figure 7)

In addition the ROC curve is introduced by plotting allTPR values on the y-axis against their equivalent FPR valuesfor all available thresholds on the x-axis -e ROC curve atthe upper left has better detection effects where lower FPRvalues are caused by the same TPR -e AUC indicates the

Table 1 Parameters of the proposed CNN-RF model

Layer Size of the filter Number of filters StrideConvolution1 3lowast 3 32 1MaxPooling1 2lowast 2 mdash 2Convolution2 3lowast 3 64 1MaxPooling2 2lowast 2 mdash 2

Journal of Electrical and Computer Engineering 7

quality of the classifier and the larger AUC represents betterperformance

62 Experiment Based onCNN-RFModel -e experimentis conducted based on normalized consumption dataWith the batch size of 100 the training procedure isstopped after 300 epochs As shown in Figure 8 it could beseen that training and testing losses settled downsmoothly -is means that the proposed model learnedtrain dataset and there was small bias in the test datadataset as well -en the RF classifier is the last layer of theCNN to predict labels of input consumers Forty valuesfrom the fully connected layer of the trained CNN are usedas a new feature vector to detect abnormal users and arefed to the RF for learning and testing To build the RF inthe hybrid model the optimal parameters are determinedby applying the grid search algorithm based on the

training dataset -e grid searching range of each pa-rameter is given as follows numTrees [50 60 100]

and max depth [10 20 40] 6 times 4 24 are tried with

Require(1) Original training datasetsS (xi yi) i 1 2 3 m1113864 1113865 (X Y) isin Re times R

(2) Test datasets xj isin Rm

for d 1 to Ntree do(1) Draw a bootstrap Sd from the original training data(2) Grow an unpruned tree hd using data Sd

(a)Randomly select new feature set Mtry from original feature set e

(b)Select the best features from feature set Mtry based on Gini indicator on each node(c)Split until each tree grows to its maximum

end forEnsure(1) -e collection of trees hd d 1 2 Ntree1113864 1113865

(2) For xj the output of a decision tree is hd(xj)

f(xj) majority vote hd(xj)1113966 1113967d

Ntreereturn f(xj)

ALGORITHM 1 Random forest algorithm

x Wconv Wpool J (W b)

bx

bconv

bpool

(a)

x Wconv Wpool J (W b)

bx

bconv

bpool

(b)

Figure 7 Process of forward propagation (a) and backpropagation (b)

0 50 100 150 200 250 300Epoch

03

035

04

045

05

055

06

Usa

ge (k

W)

TrainingTesting

Figure 8 Loss curves of training and testing

Table 2 Confusion matrix applied in electricity theft detection

Actualdetected Normal AbnormalNormal TP (true positive) FN (false negative)Abnormal FP (false positive) TN (true negative)

8 Journal of Electrical and Computer Engineering

different combinations -e best result is achieved whennumTrees 100 and maxDepth 30 -ese parametersare then used to train the hybrid model

-e classification results of the detection model pro-posed in this paper are as follows the number of TPs FNsFPs and TNs is 1041 28 23 and 577 respectively-erefore the precision recall and F1 score can be calcu-lated as shown in Table 3 where Class 0 is an abnormalpattern and Class 1 is a normal pattern Also the AUC valueof the CNN-RF algorithm is calculated which is 098 and farbetter than the value of the baseline model (AUC 05) -ismeans the proposed algorithm is able to classify both classesaccurately as shown in Figure 9

63 Comparative Experiments and the Analysis of ResultsTo evaluate the accuracy of the CNN-RF model comparativeexperiments are conducted based on handcrafted featureswith nondeep learning methods including SVM RF gradientboosting decision tree (GBDT) and logistic regression (LR)Moreover we compared the obtained classification results byvarious supervised classifiers CNN features with the SVMclassifier (CNN-SVM) and CNN features with the GBDTclassifier (CNN-GBDT) to those obtained by the previousclassification work In the following five methods are in-troduced respectively and then the results are analysed

(1) Logistic regression it is a basic model in binaryclassification that is equivalent to one layer of neuralnetwork with sigmoid activation function A sigmoidfunction obtains a value ranging from 0 to 1 based onlinear regression -en any value larger than 05 willbe classified as a normal pattern and less than 05 willbe classified as an abnormal pattern

(2) Support vector machine this classifier with kernelfunctions can transform a nonlinear separableproblem into a linear separable problem by pro-jecting data into the feature space and then findingthe optimal separate hyperplane It is applied topredict the patterns of consumers based on theaforementioned handcrafted features

(3) Random forest it is essentially an integration ofmultiple decision trees that can achieve better per-formance when maintaining the effective control ofoverfitting in comparison with a single decision tree-e RF classifier also can handle high-dimensionaldata while maintaining high computationalefficiency

(4) Gradient boosting decision tree it is an iterativedecision tree algorithm which consists of multipledecision trees For the final output all results orweights are accumulated in the GBDT while randomforests use majority voting

(5) Deep learning methods softmax is used in the lastlayer of the CNN CNN-GBDT and CNN-SVM incomparison with the proposed method

Table 4 summarizes the parameters of comparativemethods All experiments have been done Figure 10 shows

the results of different methods and proposed hybrid CNN-RF method and it can be seen that the AUC value of CNN-RF CNN-GBDT CNN-SVM CNN SVM RF LR andGBDT is 099 097 098 093 077 091 063 and 077 AndFigure 11 presents the results of all comparative experimentsin terms of precision recall and F1 score Among theperformances of eight different electricity theft detectionalgorithms the deep learning methods (such as CNN CNN-RF CNN-SVM and CNN-GBDT) show better perfor-mances than machine learning methods (eg LR SVMGBDT and RF)-e reason of this result is that the CNN canlearn features through a large number of electricity con-sumption data -us the accuracy of electricity theft de-tection is greatly improved In addition it is indicated thatthe proposed CNN-RF model shows the best performancecompared with other methods

Table 3 Classification score summary for CNN-RF

Precision Recall F1 scoreClass 0 097 096 096Class 1 098 098 098Averagetotal 097 097 097

0 01 02 03 04 05 06 07 08 09 1FPR

0

01

02

03

04

05

06

07

08

09

1

TPR

AUC (CNN-RF) = 099

Figure 9 ROC curve for CNN-RF

Table 4 Experimental parameters for comparative methods

Methods Input data Parameters

LR Features(1D)

Penalty L2Inverse of regulation strength 10

SVM Features(1D)

Penalty parameter of the error term 105Parameter of kernel function (RBF)

00005

GBDT Features(1D) -e number of estimators 200

RF Features(1D) -e number of trees in the forest 100

CNN Raw data(2D)

-e max depth of each tree 30-e same as the CNN component of the

proposed approach

Journal of Electrical and Computer Engineering 9

As detecting electricity theft consumers is achieved bydiscovering the anomalous consumption behaviours ofsuspicious customers we can also regard it as an anomalydetection problem essentially Since CNN-RF has shownsuperior performance on the SEAI dataset it would beinteresting to test its generalization ability on other datasetsIn particular we have further tested the performance ofCNN-RF together with the above-mentioned models onanother dataset Low Carbon London (LCL) dataset [35]-e LCL dataset is composed of over 5500 honest electricityconsumers in total each of which consists of 525 days (the

sampling rate is hour) Among the 5500 customers werandomly chose 1200 as malicious customers and modifiedtheir intraday load profiles according to the fraudulentsample generation methods similar to the SEAI dataset

After the preprocess of the LCL dataset (similar to theSEAI dataset) the train dataset contains 7584 customers inwhich the number of normal and abnormal samples is equalAnd the test dataset consists of 1700 customers together withcorresponding labels indicating anomaly or not -en thetraining of the proposed CNN-RF model and comparativemodels is all done following the aforementioned procedure

CNN

CNN

-RF

CNN

-GBD

T

CNN

-SV

M

GBD

T LR RF

SVM

Different algorithms

0

01

02

03

04

05

06

07

08

09

1

Perfo

rman

ce m

etric

PrecisionRecallF1 score

Figure 11 Comparison with other algorithms based on the SEAIdataset

0 01 02 03 04 05 06 07 08 09 1FPR

0

01

02

03

04

05

06

07

08

09

1

TPR

AUC (CNN) = 093AUC (CNN-RF) = 099AUC (CNN-GBDT) = 097AUC (CNN-SVM) = 098

AUC (GBDT) = 077AUC (LR) = 063AUC (RF) = 091AUC (SVM) = 077

Figure 10 ROC curves for eight different algorithms based on theSEAI dataset

TPR

FPR

AUC (CNN) = 095AUC (CNN-RF) = 097AUC (CNN-GBDT) = 094AUC (CNN-SVM) = 096

AUC (GBDT) = 074AUC (LR) = 068AUC (RF) = 077AUC (SVM) = 071

0

01

02

03

04

05

06

07

08

09

1

0 01 02 03 04 05 06 07 08 09 1

Figure 12 ROC curves for eight different algorithms based on theLCL dataset

0

01

02

03

04

05

06

07

08

09

1

Perfo

rman

ce m

etric

CNN

CNN

-RF

CNN

-GBD

T

CNN

-SV

M

GBD

T

SVM

Different algorithms

LR RF

PrecisionRecallF1 score

Figure 13 Comparison with other algorithms based on the LCLdataset

10 Journal of Electrical and Computer Engineering

based on the train dataset -e performances of all themodels on the test dataset are shown in Figures 12 and 13

Figures 12 and 13 show that the performances of CNN-SVM CNN-GBDT and CNN-RF are better than those ofother models on the whole Since they are deep learningmodels the distinguishable features can be learned directlyfrom raw smart meter data by the CNN which has equippedthem with better generalization ability In addition theCNN-RF model has achieved the best performance since theRF can handle high-dimensional data while maintaininghigh computational efficiency

7 Conclusions

In this paper a novel CNN-RF model is presented to detectelectricity theft In this model the CNN is similar to anautomatic feature extractor in investigating smart meter dataand the RF is the output classifier Because a large number ofparameters must be optimized that increase the risk ofoverfitting a fully connected layer with a dropout rate of 04is designed during the training phase In addition the SMOTalgorithm is adopted to overcome the problem of dataimbalance Some machine learning and deep learningmethods such as SVM RF GBDT and LR are applied to thesame problem as a benchmark and all those methods havebeen conducted on SEAI and LCL datasets -e resultsindicate that the proposed CNN-RF model is quite apromising classification method in the electricity theft de-tection field because of two properties -e first is thatfeatures can be automatically extracted by the hybrid modelwhile the success of most other traditional classifiers relieslargely on the retrieval of good hand-designed featureswhich is a laborious and time-consuming task -e secondlies in that the hybrid model combines the advantages of theRF and CNN as both are the most popular and successfulclassifiers in the electricity theft detection field

Since the detection of electricity theft affects the privacyof consumers the future work will focus on investigatinghow the granularity and duration of smart meter data mightaffect this privacy Extending the proposed hybrid CNN-RFmodel to other applications (eg load forecasting) is a taskworth investigating

Data Availability

Previously reported [SEAI][LCL] datasets were used tosupport this study and are available at [httpwwwucdieissdadata][httpsbetaukdataserviceacukdatacataloguestudiesstudyid=7857] -ese prior studies (and datasets)are cited at relevant places within the text as references[19 26]

Conflicts of Interest

-e authors declare that there are no conflicts of interest

Acknowledgments

-is work was supported by the National Key Research andDevelopment Program of China (2016YFB0901900) the

Natural Science Foundation of Hebei Province of China(F2017501107) Open Research Fund from the State KeyLaboratory of Rolling and Automation Northeastern Uni-versity (2017RALKFKT003) the Foundation of Northeast-ern University at Qinhuangdao (XNB201803) and theFundamental Research Funds for the Central Universities(N182303037)

References

[1] S S S R Depuru L Wang and V Devabhaktuni ldquoElectricitytheft overview issues prevention and a smart meter basedapproach to control theftrdquo Energy Policy vol 39 no 2pp 1007ndash1015 2011

[2] J P Navani N K Sharma and S Sapra ldquoTechnical and non-technical losses in power system and its economic conse-quence in Indian economyrdquo International Journal of Elec-tronics and Computer Science Engineering vol 1 no 2pp 757ndash761 2012

[3] S McLaughlin B Holbert A Fawaz R Berthier andS Zonouz ldquoA multi-sensor energy theft detection frameworkfor advanced metering infrastructuresrdquo IEEE Journal on Se-lected Areas in Communications vol 31 no 7 pp 1319ndash13302013

[4] P McDaniel and S McLaughlin ldquoSecurity and privacychallenges in the smart gridrdquo IEEE Security amp PrivacyMagazine vol 7 no 3 pp 75ndash77 2009

[5] T B Smith ldquoElectricity theft a comparative analysisrdquo EnergyPolicy vol 32 no 1 pp 2067ndash2076 2004

[6] J I Guerrero C Leon I Monedero F Biscarri andJ Biscarri ldquoImproving knowledge-based systems with sta-tistical techniques text mining and neural networks for non-technical loss detectionrdquo Knowledge-Based Systems vol 71no 4 pp 376ndash388 2014

[7] C C O Ramos A N Souza G Chiachia A X Falcatildeo andJ P Papa ldquoA novel algorithm for feature selection usingharmony search and its application for non-technical lossesdetectionrdquo Computers amp Electrical Engineering vol 37 no 6pp 886ndash894 2011

[8] P Glauner J A Meira P Valtchev R State and F Bettingerldquo-e challenge of non-technical loss detection using artificialintelligence a surveyficial intelligence a surveyrdquo In-ternational Journal of Computational Intelligence Systemsvol 10 no 1 pp 760ndash775 2017

[9] S-C Huang Y-L Lo and C-N Lu ldquoNon-technical lossdetection using state estimation and analysis of variancerdquoIEEE Transactions on Power Systems vol 28 no 3pp 2959ndash2966 2013

[10] O Rahmati H R Pourghasemi and A M Melesse ldquoAp-plication of GIS-based data driven random forest and max-imum entropy models for groundwater potential mapping acase study at Mehran region Iranrdquo CATENA vol 137pp 360ndash372 2016

[11] N Edison A C Aranha and J Coelho ldquoProbabilisticmethodology for technical and non-technical losses estima-tion in distribution systemrdquo Electric Power Systems Researchvol 97 no 11 pp 93ndash99 2013

[12] J B Leite and J R S Mantovani ldquoDetecting and locating non-technical losses in modern distribution networksrdquo IEEETransactions on Smart Grid vol 9 no 2 pp 1023ndash1032 2018

[13] S Amin G A Schwartz A A Cardenas and S S SastryldquoGame-theoretic models of electricity theft detection in smartutility networks providing new capabilities with advanced

Journal of Electrical and Computer Engineering 11

metering infrastructurerdquo IEEE Control Systems Magazinevol 35 no 1 pp 66ndash81 2015

[14] A H Nizar Z Y Dong Y Wang and A N Souza ldquoPowerutility nontechnical loss analysis with extreme learning ma-chine methodrdquo IEEE Transactions on Power Systems vol 23no 3 pp 946ndash955 2008

[15] J Nagi K S Yap S K Tiong S K Ahmed andMMohamadldquoNontechnical loss detection for metered customers in powerutility using support vector machinesrdquo IEEE Transactions onPower Delivery vol 25 no 2 pp 1162ndash1171 2010

[16] E W S Angelos O R Saavedra O A C Cortes andA N de Souza ldquoDetection and identification of abnormalitiesin customer consumptions in power distribution systemsrdquoIEEE Transactions on Power Delivery vol 26 no 4pp 2436ndash2442 2011

[17] K A P Costa L A M Pereira R Y M NakamuraC R Pereira J P Papa and A Xavier Falcatildeo ldquoA nature-inspired approach to speed up optimum-path forest clusteringand its application to intrusion detection in computer net-worksrdquo Information Sciences vol 294 pp 95ndash108 2015

[18] R R Bhat R D Trevizan X Li and A Bretas ldquoIdentifyingnontechnical power loss via spatial and temporal deeplearningrdquo in Proceedings of the IEEE International Conferenceon Machine Learning and Applications Anaheim CA USADecember 2016

[19] M Ismail M Shahin M F Shaaban E Serpedin andK Qaraqe ldquoEfficient detection of electricity theft cyber attacksin ami networksrdquo in Proceedings of the IEEE Wireless Com-munications and Networking Conference Barcelona SpainApril 2018

[20] RMehrizi X Peng X Xu S Zhang DMetaxas and K Li ldquoAcomputer vision based method for 3D posture estimation ofsymmetrical liftingrdquo Journal of Biomechanics vol 69 no 1pp 40ndash46 2018

[21] K He X Zhang S Ren and J Sun ldquoDelving deep intorectifiers surpassing human-level performance on imagenetclassificationrdquo in Proceedings of the IEEE Conference onComputer for Sustainable Global Development pp 1026ndash1034New Delhi India March 2015

[22] A Sharif Razavian H Azizpour J Sullivan and S CarlssonldquoCNN features off-the-shelf an astounding baseline for rec-ognitionrdquo Astrophysical Journal Supplement Series vol 221no 1 pp 806ndash813 2014

[23] Z Zheng Y Yang X Niu H-N Dai and Y Zhou ldquoWide anddeep convolutional neural networks for electricity-theft de-tection to secure smart gridsrdquo IEEE Transactions on IndustrialInformatics vol 14 no 4 pp 1606ndash1615 2018

[24] D Svozil V Kvasnicka and J Pospichal ldquoIntroduction tomulti-layer feed-forward neural networksrdquo Chemometricsand Intelligent Laboratory Systems vol 39 no 1 pp 43ndash621997

[25] Irish social science data archive httpwwwucdieissdadatacommissionforenegryregulationcer

[26] P Jokar N Arianpoo and V C M Leung ldquoElectricity theftdetection in AMI using customersrsquo consumption patternsrdquoIEEE Transactions on Smart Grid vol 7 no 1 pp 216ndash2262016

[27] Y Wang Q Chen D Gan J Yang D S Kirschen andC Kang ldquoDeep learning-based socio-demographic in-formation identification from smart meter datafication fromsmart meter datardquo IEEE Transactions on Smart Grid vol 10no 3 pp 2593ndash2602 2019

[28] V Chandola A Banerjee and V Kumar ldquoAnomaly de-tection a surveyrdquo ACM Computing Surveys vol 41 no 32009

[29] H-T Cheng L Koc J Harmsen et al ldquoWide amp deep learningfor recommender systemsrdquo in Proceedings of the 1stWorkshopon Deep Learning for Recommender SystemsmdashDLRS 2016pp 7ndash10 Boston MA USA September 2016

[30] E Feczko N M Balba O Miranda-Dominguez et alldquoSubtyping cognitive profiles in autism spectrum disorderusing a functional random forest algorithmfiles in autismspectrum disorder using a functional random forest algo-rithmrdquo NeuroImage vol 172 pp 674ndash688 2018

[31] Y-L Boureau J Ponce and Y LeCun ldquoA theoretical analysisof feature pooling in visual recognitionrdquo in Proceedings of the27th International Conference on Machine Learningpp 111ndash118 Haifa Israel June 2010

[32] N Srivastava G Hinton A Krizhevsky I Sutskever andR Salakhutdinov ldquoDropout a simply way to prevent neuralnetworks from overfittingrdquo Ce Journal of Machine LearningResearch vol 15 no 1 pp 1929ndash1958 2014

[33] M Abadi A Agarwal P Barham et al ldquoTensorflow Large-scale machine learning on heterogeneous distributed sys-temsrdquo 2016 httpsarxivorgabs16030446

[34] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 no 4 pp 2825ndash2830 2011

[35] J R Schofield ldquoLow carbon london project data from thedynamic time-of-use electricity pricing trialrdquo 2017 httpsdiscoverukdataserviceacukcataloguesn7857

12 Journal of Electrical and Computer Engineering

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 2: ElectricityTheftDetectioninPowerGridswithDeep ...downloads.hindawi.com/journals/jece/2019/4136874.pdf(1)Datacleaning:thereareerroneousvalues(e.g.,outliers) in raw data, which correspond

Besides these manual approaches cannot avoid cyber at-tacks In order to solve the problemsmentioned above manyapproaches have been put forward in the past years -esemethods are mainly categorized into state-based game-theory-based and artificial-intelligence-based models [8]

-e key idea of state-based detection [9ndash11] is based onspecial devices such as wireless sensors and distributiontransformers [12] -ese methods could detect electricitytheft but rely on the real-time acquisition of system topologyand additional physical measurements which is sometimesunattainable Game-based detection schemes formulate agame between electricity utility and theft and then differentdistributions of normal and abnormal behaviours can bederived from the game equilibrium As detailed in [13] theycan achieve a low cost and reasonable result for reducingenergy theft Yet formulating the utility function of allplayers (eg distributors regulators and thieves) is still achallenge Artificial-intelligence-based methods includemachine learning and deep learning methods Existingmachine learning solutions can be further categorized intoclassification and clustering models as is presented in[14ndash17] Although aforementioned machine learning de-tection methods are innovative and remarkable their per-formances are still not satisfactory enough for practice Forexample most of these approaches require manual featureextraction which partly results from their limited ability tohandle high-dimensional data Indeed traditional hand-designed features include the mean standard deviationmaximum and minimum of consumption data -e processof manual feature extraction is a tedious and time-con-suming task and cannot capture the 2D features from smartmeter data

Deep learning techniques for electricity theft detectionare studied in [18] where the authors present a comparisonbetween different deep learning architectures such as con-volutional neural networks (CNNs) long-short-termmemory (LSTM) recurrent neural networks (RNNs) andstacked autoencoders However the performance of thedetectors is investigated using synthetic data which does notallow a reliable assessment of the detectorrsquos performancecompared with shallow architectures Moreover the authorsin [19] proposed a deep neural network- (DNN-) basedcustomer-specific detector that can efficiently thwart suchcyber attacks In recent years the CNN has been applied togenerate useful and discriminative features from raw dataand has wide applications in different areas [20ndash22] -eseapplicationsmotivate the CNN applied for feature extractionfrom high-resolution smart meter data in electricity theftdetection In [23] a wide and deep convolutional neuralnetwork (CNN)model was developed and applied to analysethe electricity theft in smart grids

In a plain CNN the softmax classifier layer is the same asa general single hidden layer feedforward neural network(SLFN) and trained through the backpropagation algorithm[24] On the one hand the SLFN is likely to be overtrainedleading to degradation of its generalization performancewhen it performs the backpropagation algorithm On theother hand the backpropagation algorithm is based onempirical risk minimization which is sensitive to local

minima of training errors As mentioned above because ofthe shortcoming of the softmax classifier the CNN is notalways optimal for classification although it has shown greatadvantages in the feature extraction process -erefore it isurgent to find a better classifier which not only owns thesimilar ability as the softmax classifier but also can make fulluse of the obtained features In most classifiers the randomforest (RF) classifier takes advantage of two powerful ma-chine learning techniques including bagging and randomfeature selection which could overcome the limitation of thesoftmax classifier Inspired by these particular works a novelconvolutional neural network-random forest (CNN-RF)model is adopted for electricity theft detection -e CNN isproposed to automatically capture various features of cus-tomersrsquo consumption behaviours from smart meter datawhich is one of the key factors in the success of the electricitytheft detection model To improve detection performancethe RF is used to replace the softmax classifier detecting thepatterns of consumers based on extracted features -ismodel has been trained and tested with real data from all thecustomers of electricity utility in Ireland and London

2 Overview Flow

-e main aim of the methodology described in this paper isto provide the utilities with a ranked list of their customersaccording to their probability of having an anomaly in theirelectricity meter

As shown in Figure 1 the electricity theft detectionsystem is divided into three main stages as follows

(i) Data analysis and preprocess to explain the reasonof applying a CNN for feature extraction we firstlyanalyse the factors that affect the behaviours ofelectricity consumers For the data preprocess weconsider several tasks such as data cleaning (re-solving outliers) missing value imputation anddata transformation (normalization)

(ii) Generation of train and test datasets to evaluate theperformance of the methodology described in thispaper the preprocessed dataset is split into the traindataset and test dataset by the cross-validation al-gorithm -e train dataset is used to train the pa-rameters of our model whilst the test dataset is usedto assess how well the model generalizes to newunseen customer samples Given that electricitytheft consumers remarkably outnumber non-fraudulent ones the imbalanced nature of thedataset can have a major negative impact on theperformance of supervised machine learningmethods To reduce this bias the synthetic minorityoversampling technique (SMOT) algorithm is usedto make the number of electricity thefts and non-fraudulent consumers equal in the train dataset

(iii) Classification using the CNN-RF model in theproposed CNN-RF model the CNN firstly isdesigned to learn the features between differenthours of the day and different days from massiveand varying smart meter data by the operations of

2 Journal of Electrical and Computer Engineering

convolution and downsampling And then RFclassification is trained based on the obtained fea-tures to detect whether the consumer steals elec-tricity Finally the confusion matrix and receiver-operating characteristic (ROC) curves are used toevaluate the accuracy of the CNN-RF model on thetest dataset

3 Data Analysis and Preprocess

-e dataset was collected by Electric Ireland and SustainableEnergy Authority of Ireland (SEAI) in January 2012 [25]which consists of electricity usage data from over 5000residential households and businesses -e smart metersrecorded electricity consumption at a resolution of 30minduring 2009 and 2010 Customers who participated in thetrial had a smart meter installed in their homes and agreed totake part in the research -erefore it is a reasonable as-sumption that all samples belong to honest users Howevermalicious samples cannot be obtained since energy theftmight never or rarely happen for a given customer Toaddress this issue we generated 1200 malicious customers asdescribed in [26] -e large number and a variety of cus-tomers with a long period of measurements make thisdataset an excellent source for research in the area of smartmeter data analysis For each customer i there is a half-hourly meter reporting for a 525-day periodWe reduced thesampling rate to one per hour for each customer -issection firstly analyses smart meter data to describe therationale of applying a CNN for feature extraction -en itdescribes three main procedures utilized for preprocessingthe original energy consumption data data cleaning missingvalue imputation and data normalization

31 Data Analysis To introduce the rationale for applying aCNN for feature extraction rather than other machinelearning techniques this section analyses smart meter data-ere are the daily load curves of four different users(residential and nonresidential) as shown in Figure 2 and itcan be seen that the daily electricity consumption of non-residential users is significantly higher than that of resi-dential users

In addition Figures 3 and 4 show the daily load profiles ofa consumer during working days and holidays -e load filesare highly similar but slightly shifted In a CNN the filterweights are uniform for different regions -us the featurescalculated in the convolutional layer are invariant to smallshifts which means that relatively stable features can be ob-tained from varying load profiles As described in [27] a deepCNN first automatically extracts features from massive loadprofiles and support vector machine (SVM) identifies thecharacteristics of the consumers To further investigate the loadseries we plot the four seasonsrsquo load profiles for a customer Itcan be seen from Figure 5 that the electricity consumption ofthe consumer changes with the change of seasons

In summary the load consumption differs in bothmagnitude and time of use and is dependent on lifestyleseasons weather and many other uncontrollable factors-erefore consumer load profiles are affected not only byweather and conditions but also by the types of consumersand other factors

Based on the above analysis extracting features fromsmart meter data based on experience is difficult Howeverfeature extraction is a key factor in the success of the de-tection system Conventionally manual feature extractionrequires elaborately designed features for a specific problemthat make it uneasy to adapt to other domains In the CNNthe successive alternating convolutional and pooling layersare designed to learn progressively higher-level features (egtrend indicators sequence standard deviation and linearslope) with 2D historical electricity consumption data Inaddition highly nonlinear correlations exist between elec-tricity consumption and these influencing factors Sinceactivation function has been designed on convolutional andfully connected layers the CNN is able to model highlynonlinear correlations In this paper activation functionnamed ldquorectified linear unitrdquo (ReLU) is used because of itssparsity and minimizing gradient vanishing problem in theproposed CNN-RF model

32 Data Preprocess xit and xprimeit are defined as the energyconsumption for an honest or malicious customer i at timeinterval t In this section the main procedures utilized forpreprocessing raw electricity data are described as follows

Smart meterdata

Data analysis and preprocess CNN-RF model

Model evaluation

Train dataset

Test dataset

Generation of train and test datasets

CNN feature extractor

RF classifier

Training

Balanced dataset

Unbalanced dataset

SMOTalgorithm

Cross-validationalgorithm

Confusion matrix

ROC curves

Original dataset

Data preprocess

Data normalization

Data analysis

Nonresidential and residential users

Working days and holidays

The change of seasons

Data cleaning

Missing valueimputation

Figure 1 Flow of electricity theft detection

Journal of Electrical and Computer Engineering 3

(1) Data cleaning there are erroneous values (eg outliers)in raw data which correspond to peak electricityconsumption activities during holidays or special

occasions such as birthdays and celebrations In thispaper the ldquothree-sigma rule of thumbrdquo [28] is used torestore the outliers according to the following formula

0 5 10 15 20Time (h)

0

1

2

3

4

Usa

ge (k

W)

Residential user 1

0 5 10 15 20Time (h)

0

1

2

3

4

Usa

ge (k

W)

Residential user 2

0 5 10 15 20Time (h)

0

05

1

15

Usa

ge (k

W)

Nonresidential user 1

0 5 10 15 20Time (h)

5

10

15

20

Usa

ge (k

W)

Nonresidential user 2

Figure 2 Load profiles for nonresidential and residential users

0 5 10 15 20 25Time (h)

0

05

1

15

2

25

Usa

ge (k

Wmiddoth

)

MondayTuesdayWednesday

ThursdayFriday

Figure 3 Daily load profiles for a customer in working days

0 5 10 15 20 25Time (h)

0

02

04

06

08

1

12

14

Usa

ge (k

Wmiddoth

)

SaturdaySundayNew year

HalloweenChristmas

Figure 4 Daily load profiles for a customer during holidays

4 Journal of Electrical and Computer Engineering

F xit1113872 1113873 avg xit1113872 1113873 + 2σ xit1113872 1113873 xit gt xlowastit

xit else

⎧⎨

⎩ (1)

where xlowastit is computed by the mean avg(middot) and standarddeviation σ(middot) for each time interval comprising theweekdaytime pair for each month

(2) Missing value imputation because of various reasonssuch as storage issues and failure of smart meters thereare missing values in electricity consumption data-rough the analysis of original data it is found thatthere are two kinds of data missing one is the con-tinuous missing of multiple data and the solution is todelete the users when the number of missing valuesexceeds 10 the other is missing single data which isprocessed by the formula (2) -us missing values canbe recovered as follows

F xit1113872 1113873

xitminus 1 + xit+1

2 xit isin NaN

xit else

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

(2)

where xit stands for the electricity usage of the consumer i

over a period (eg an hour) if xit is null we represent it asNaN

(3) Data normalization finally data need to be nor-malized because the neural network is sensitive to thediverse data One of the common methods for thisscope is min-max normalization which is computedas

F xit1113872 1113873 xit minus min xiT1113872 1113873

max xiT1113872 1113873 minus min xiT1113872 1113873 (3)

where min(middot) and max(middot) represent the min and max valuesover a day respectively

4 Generation of Train and Test Datasets

After data processing the SEAI dataset contains the elec-tricity consumption data of 4737 normal electricity cus-tomers and 1200 malicious electricity customers within525 days (with 24 hours)

Our preliminary analytical results (refer to Section 31)reveal the electricity usage consumer load profiles are af-fected not only by weather and conditions but also by thetypes of consumers and other factors However it is difficultto extracting features based on experience from the 1Delectricity consumption data since the electricity con-sumption every day fluctuates in a relatively independentway Motivated by the previous work in [29] we design adeep CNN to process the electricity consumption data in a2D manner In particular we transform the 1D electricityconsumption data into 2D data according to days We define2D data as a matrix of actual energy consumption values fora specific customer where the rows of 2D data representdays as D (D 1 525) and columns represent the timeperiods as T (T 1 24) It is worthmentioning that theCNN learns the features between different hours of the dayand different days from 2D data

-en we divide the dataset into train and test sets usingthe cross-validation method in which 80 are the train setand 20 are the test set Given that electricity theftconsumers remarkably outnumber nonfraudulent onesthe imbalanced nature of the dataset can have a majornegative impact on the performance of supervised ma-chine learning methods To reduce this bias the SMOTalgorithm is used to make the number of normal andabnormal samples equal in the train set Finally the traindataset contains 7484 customers in which the number ofnormal and abnormal samples is equal And the testdataset consists of 1669 customers

5 The Novel CNN-RF Algorithm

In this section the design of the CNN-RF structure ispresented in detail and some techniques are proposed toreduce overfitting Moreover correlative parameters in-cluding the number of filters and the size of each feature mapand kernel are given Finally the training process of theproposed CNN-RF algorithm is introduced

51 CNN-RF Architecture -e proposed CNN-RF model isdesigned by integrating CNN and RF classifiers As shown inFigure 6 the architecture of the CNN-RF mainly includes anautomatic feature extractor and a trainable RF classifier -efeature extractor CNN consists of convolutional layersdownsampling layers and a fully connection layer-e CNNis a deep supervised learning architecture which usuallyincludes multiple layers and can be trained with the back-propagation algorithm It also can explore the intricatedistribution in smart meter data by performing the

0 10 20 30 40 50 60 70 80 90 100Time (days)

0

5

10

15

20

25

30

Usa

ge (k

Wmiddotd

ay)

SpringSummer

AutumnWinter

Figure 5 Four seasonsrsquo load profiles for a customer

Journal of Electrical and Computer Engineering 5

stochastic gradient method -e RF classifier consists of acombination of tree classifiers in which each tree contrib-utes with a single vote to the assignment of themost frequentclass in input data [30] In the following the design of eachpart of the CNN-RF structure is presented in detail

511 Convolutional Layer -e main purpose of convolu-tional layers is to learn feature representation of input dataand reduce the influence of noise -e convolutional layer iscomposed of several feature filters which are used tocompute different feature maps Specially to reduce theparameters of networks and lower the complication ofrelevant layers during the process of generating featuremaps the weights of each kernel in each feature map areshared Moreover each neuron in the convolutional layerconnects the local region of front layers and then a ReLUactivation function is applied on the convolved resultsMathematically the outputs of the convolutional layers canbe expressed as

yconv Xfl

l1113872 1113873 δ 1113944

Fl

fl1W

fl

l lowastXfl

l + bfl

l⎛⎝ ⎞⎠ (4)

where δ is the activation function lowast is the convolutionoperation and both W

fl

l and bfl

l are the learnable param-eters in the f-th feature filter

512 Downsampling Layer Downsampling is an importantconcept of the CNN which is usually placed between twoconvolutional layers It can reduce the number of parametersand achieve dimensionality reduction In particular eachfeature map of the pooling layer is connected to its corre-sponding feature map of the previous convolutional layerAnd the size of output feature maps is reduced withoutreducing the number of output feature maps in thedownsampling layer In the literature downsampling op-erations mainly include max-pooling and average-pooling-e max-pooling operation transforms small windows intosingle values by maximum which is expressed as formula 5but average-pooling returns average values of activations in asmall window In [31] experiments show that the max-pooling operation shows better performance than average-pooling -erefore the max-pooling operation is usuallyimplemented in the downsampling layer

ypool Xfl

l1113872 1113873 maxmisinM

Xlm1113966 1113967 (5)

where M is a set of activations in the pooling window andmis the index of activations in the pooling window

513 Fully Connected Layer With the features extracted bysequential convolution and pooling a fully connected layeris applied for flattening feature maps into one vector asfollows

yflXl( 1113857 δ Wl middot Xl + bl( 1113857 (6)

where Wl is the weight of the l-th layer and bl is the bias ofthe l-th layer

514 RF Classifier Layer Traditionally the softmax clas-sifier is used for the last output layer in the CNN but theproposed model uses the RF classifier to predict the classbased on the obtained features -erefore the RF classifierlayer can be defined as

yout XL( 1113857 sigm Wrf middot Xl + bl( 1113857 (7)

where sigm(middot) is the sigmoid function which maps ab-normal values to 0 and normal values to 1-e parameter setWrf in the RF layer includes the number of decision treesand the maximum depth of the tree which are obtained bythe grid search algorithm

52 Techniques for Selecting Parameters and AvoidingOverfitting -e detailed parameters of the proposed CNN-RF structure are summarized in Table 1 which include thenumbers of filters in each layer filter size and stride Twofactors are considered in determining the CNN structure-e first factor is the characteristics of consumersrsquo electricityconsumption behaviours Since the load files are such avariable two convolutional layers with a filter size of 3lowast 3(stride 1) are applied to capture hidden features for detectingelectricity theft Moreover because the dimension of inputdata size is 525 times 24 which is similar to what is used in imagerecognition problems two max-pooling layers with a filtersize of 2lowast 2 (stride 2) are used -e second factor is thenumber of training samples since the number of samples islimited to reduce the risk of overfitting we design a fullyconnected layer with a dropout rate of 04 whose main idea

d d

d d d d

n a n a n a n a

Input data Convolutional layer Convolutional layer

Downsamplinglayer

Downsamplinglayer

L232525 lowast 24 L332268 lowast 12 L464263 lowast 12 L564132 lowast 6L1525 lowast 24Fully connected layer

Random forest layer

L6132 lowast 1

Figure 6 Architecture of the hybrid CNN-RF model

6 Journal of Electrical and Computer Engineering

is that dropping units randomly from the neural networkduring training can prevent units from coadapting too much[32] and make a neuron not rely on the presence of otherspecific neurons Applying an appropriate training methodis also useful for reducing overfitting It is essentially aregularization that adds a penalty for weight update at eachiteration And we also use a binary cross entropy as the lossfunction Finally the grid search algorithm is used to op-timize RF classifier parameters such as the maximumnumber of decision trees and features

53 Training Process of CNN-RF -ere is no doubt that theCNN-RF structure needs to train the CNN at first before theRF classifier is invoked After that the RF classifier is trainedbased on features learned by the CNN

531 Training Process of CNN With a batch size (thenumber of training samples in an iteration) of 100 CNNs aretrained using the forward propagation algorithm andbackpropagation algorithm As shown in Figure 6 firstly thedata in the input layer are transferred to convolutionallayers pooling layers and a fully connected layer and thepredicted value is obtained As shown in Figure 7 thebackpropagation phase begins to update parameters if thedifference of the output value and target value is too largeand exceeds a certain threshold -e difference between theactual output of the neural network and the expected outputdetermines the adjustment direction and strip size of weightsamong layers in the network

Given a sample set D (x1 y1) (xm ym)1113864 1113865 with msamples in total 1113957y is obtained at first by using the feed-forward process As for all samples the mean of differencebetween the actual output yi and the expected output1113957ywb(xi) can be expressed by

J(W b) 1m

1113944

m

i1

12

1113957ywb xi( 1113857 minus yi

2

1113874 1113875 (8)

where w represents the connection weight between layers ofthe network and b is the corresponding bias

-en each parameter is initialized with a random valuegenerated by a normal distribution when the mean is 0 andvariance is ϑ which are updated with the gradient descentmethod -e corresponding expressions are as follows

W(l)ij W

(l)ij minus α

z

zW(l)ij

J(W b) (9)

b(l)i b

(l)i minus α

z

zb(l)i

J(W b) (10)

where α represents a learning rate W(l)ij represents the

connection weight between the i-th neuron in the l-th layerand the j-th neuron in the (l + 1)-th layer and b

(l)i represents

the bias of the i-th neuron in the l-th layerFinally all parameters W and b in the network are

updated according to formulas 9 and 10 And all these stepsare repeated to reduce the objective function J(W b)

532 Training Process of RF Classifier -e process of the RFclassifier takes advantage of two powerful machine learningtechniques bagging and random feature selection Baggingis a method to generate a particular bootstrap samplerandomly from the learnt features for growing each treeInstead of using all features the RF randomly selects a subsetof features to determine the best splitting points whichensures complete splitting to one classification of leaf nodesof decision trees Specifically the whole process of RFclassification can be written as Algorithm 1

6 Experiments and Result Analysis

All the experiments are implemented using Python 36 on astandard PC with an Intel Core i5-7500MQ CPU running at340GHz and with 80 GB of RAM-e CNN architecture isconstructed based on TensorFlow [33] and the interfacebetween the CNN and the RF is programmed using scikit-learn [34]

61 Performance Metrics In this paper the problem ofelectricity theft is considered a discrete two-class classifi-cation task which should assign each consumer into one ofthe predefined classes (abnormal or normal)

In particular the result of classifier validation is oftenpresented as confusion matrices Table 2 shows a confusionmatrix applied in electricity theft detection where TP FNFP and TN respectively represent the number of con-sumers that are classified correctly as normal classifiedfalsely as abnormal classified falsely as normal andclassified correctly as abnormal Although these four indicesshow all of the information about the performance of theclassifier more meaningful performance measurescan be extracted from them to illustrate certain performancecriteria such as TPR TP(TP + FN) FPR FP(FP +TN) precisionTP(TP+ FP) recallTP(TP+ FN)and F1 score 2(precision times recall)(precision + recall)where the true positive rate (TPR) is the ability of theclassifier to correctly identify a consumer as normal alsoreferred to as sensitivity and the false positive rate (FPR) isthe risk of positively identifying an abnormal consumerPrecision is the ratio of consumers detected correctly to thetotal number of detected normal consumers Recall is theratio of the number of consumers correctly detected to theactual number of consumers (Figure 7)

In addition the ROC curve is introduced by plotting allTPR values on the y-axis against their equivalent FPR valuesfor all available thresholds on the x-axis -e ROC curve atthe upper left has better detection effects where lower FPRvalues are caused by the same TPR -e AUC indicates the

Table 1 Parameters of the proposed CNN-RF model

Layer Size of the filter Number of filters StrideConvolution1 3lowast 3 32 1MaxPooling1 2lowast 2 mdash 2Convolution2 3lowast 3 64 1MaxPooling2 2lowast 2 mdash 2

Journal of Electrical and Computer Engineering 7

quality of the classifier and the larger AUC represents betterperformance

62 Experiment Based onCNN-RFModel -e experimentis conducted based on normalized consumption dataWith the batch size of 100 the training procedure isstopped after 300 epochs As shown in Figure 8 it could beseen that training and testing losses settled downsmoothly -is means that the proposed model learnedtrain dataset and there was small bias in the test datadataset as well -en the RF classifier is the last layer of theCNN to predict labels of input consumers Forty valuesfrom the fully connected layer of the trained CNN are usedas a new feature vector to detect abnormal users and arefed to the RF for learning and testing To build the RF inthe hybrid model the optimal parameters are determinedby applying the grid search algorithm based on the

training dataset -e grid searching range of each pa-rameter is given as follows numTrees [50 60 100]

and max depth [10 20 40] 6 times 4 24 are tried with

Require(1) Original training datasetsS (xi yi) i 1 2 3 m1113864 1113865 (X Y) isin Re times R

(2) Test datasets xj isin Rm

for d 1 to Ntree do(1) Draw a bootstrap Sd from the original training data(2) Grow an unpruned tree hd using data Sd

(a)Randomly select new feature set Mtry from original feature set e

(b)Select the best features from feature set Mtry based on Gini indicator on each node(c)Split until each tree grows to its maximum

end forEnsure(1) -e collection of trees hd d 1 2 Ntree1113864 1113865

(2) For xj the output of a decision tree is hd(xj)

f(xj) majority vote hd(xj)1113966 1113967d

Ntreereturn f(xj)

ALGORITHM 1 Random forest algorithm

x Wconv Wpool J (W b)

bx

bconv

bpool

(a)

x Wconv Wpool J (W b)

bx

bconv

bpool

(b)

Figure 7 Process of forward propagation (a) and backpropagation (b)

0 50 100 150 200 250 300Epoch

03

035

04

045

05

055

06

Usa

ge (k

W)

TrainingTesting

Figure 8 Loss curves of training and testing

Table 2 Confusion matrix applied in electricity theft detection

Actualdetected Normal AbnormalNormal TP (true positive) FN (false negative)Abnormal FP (false positive) TN (true negative)

8 Journal of Electrical and Computer Engineering

different combinations -e best result is achieved whennumTrees 100 and maxDepth 30 -ese parametersare then used to train the hybrid model

-e classification results of the detection model pro-posed in this paper are as follows the number of TPs FNsFPs and TNs is 1041 28 23 and 577 respectively-erefore the precision recall and F1 score can be calcu-lated as shown in Table 3 where Class 0 is an abnormalpattern and Class 1 is a normal pattern Also the AUC valueof the CNN-RF algorithm is calculated which is 098 and farbetter than the value of the baseline model (AUC 05) -ismeans the proposed algorithm is able to classify both classesaccurately as shown in Figure 9

63 Comparative Experiments and the Analysis of ResultsTo evaluate the accuracy of the CNN-RF model comparativeexperiments are conducted based on handcrafted featureswith nondeep learning methods including SVM RF gradientboosting decision tree (GBDT) and logistic regression (LR)Moreover we compared the obtained classification results byvarious supervised classifiers CNN features with the SVMclassifier (CNN-SVM) and CNN features with the GBDTclassifier (CNN-GBDT) to those obtained by the previousclassification work In the following five methods are in-troduced respectively and then the results are analysed

(1) Logistic regression it is a basic model in binaryclassification that is equivalent to one layer of neuralnetwork with sigmoid activation function A sigmoidfunction obtains a value ranging from 0 to 1 based onlinear regression -en any value larger than 05 willbe classified as a normal pattern and less than 05 willbe classified as an abnormal pattern

(2) Support vector machine this classifier with kernelfunctions can transform a nonlinear separableproblem into a linear separable problem by pro-jecting data into the feature space and then findingthe optimal separate hyperplane It is applied topredict the patterns of consumers based on theaforementioned handcrafted features

(3) Random forest it is essentially an integration ofmultiple decision trees that can achieve better per-formance when maintaining the effective control ofoverfitting in comparison with a single decision tree-e RF classifier also can handle high-dimensionaldata while maintaining high computationalefficiency

(4) Gradient boosting decision tree it is an iterativedecision tree algorithm which consists of multipledecision trees For the final output all results orweights are accumulated in the GBDT while randomforests use majority voting

(5) Deep learning methods softmax is used in the lastlayer of the CNN CNN-GBDT and CNN-SVM incomparison with the proposed method

Table 4 summarizes the parameters of comparativemethods All experiments have been done Figure 10 shows

the results of different methods and proposed hybrid CNN-RF method and it can be seen that the AUC value of CNN-RF CNN-GBDT CNN-SVM CNN SVM RF LR andGBDT is 099 097 098 093 077 091 063 and 077 AndFigure 11 presents the results of all comparative experimentsin terms of precision recall and F1 score Among theperformances of eight different electricity theft detectionalgorithms the deep learning methods (such as CNN CNN-RF CNN-SVM and CNN-GBDT) show better perfor-mances than machine learning methods (eg LR SVMGBDT and RF)-e reason of this result is that the CNN canlearn features through a large number of electricity con-sumption data -us the accuracy of electricity theft de-tection is greatly improved In addition it is indicated thatthe proposed CNN-RF model shows the best performancecompared with other methods

Table 3 Classification score summary for CNN-RF

Precision Recall F1 scoreClass 0 097 096 096Class 1 098 098 098Averagetotal 097 097 097

0 01 02 03 04 05 06 07 08 09 1FPR

0

01

02

03

04

05

06

07

08

09

1

TPR

AUC (CNN-RF) = 099

Figure 9 ROC curve for CNN-RF

Table 4 Experimental parameters for comparative methods

Methods Input data Parameters

LR Features(1D)

Penalty L2Inverse of regulation strength 10

SVM Features(1D)

Penalty parameter of the error term 105Parameter of kernel function (RBF)

00005

GBDT Features(1D) -e number of estimators 200

RF Features(1D) -e number of trees in the forest 100

CNN Raw data(2D)

-e max depth of each tree 30-e same as the CNN component of the

proposed approach

Journal of Electrical and Computer Engineering 9

As detecting electricity theft consumers is achieved bydiscovering the anomalous consumption behaviours ofsuspicious customers we can also regard it as an anomalydetection problem essentially Since CNN-RF has shownsuperior performance on the SEAI dataset it would beinteresting to test its generalization ability on other datasetsIn particular we have further tested the performance ofCNN-RF together with the above-mentioned models onanother dataset Low Carbon London (LCL) dataset [35]-e LCL dataset is composed of over 5500 honest electricityconsumers in total each of which consists of 525 days (the

sampling rate is hour) Among the 5500 customers werandomly chose 1200 as malicious customers and modifiedtheir intraday load profiles according to the fraudulentsample generation methods similar to the SEAI dataset

After the preprocess of the LCL dataset (similar to theSEAI dataset) the train dataset contains 7584 customers inwhich the number of normal and abnormal samples is equalAnd the test dataset consists of 1700 customers together withcorresponding labels indicating anomaly or not -en thetraining of the proposed CNN-RF model and comparativemodels is all done following the aforementioned procedure

CNN

CNN

-RF

CNN

-GBD

T

CNN

-SV

M

GBD

T LR RF

SVM

Different algorithms

0

01

02

03

04

05

06

07

08

09

1

Perfo

rman

ce m

etric

PrecisionRecallF1 score

Figure 11 Comparison with other algorithms based on the SEAIdataset

0 01 02 03 04 05 06 07 08 09 1FPR

0

01

02

03

04

05

06

07

08

09

1

TPR

AUC (CNN) = 093AUC (CNN-RF) = 099AUC (CNN-GBDT) = 097AUC (CNN-SVM) = 098

AUC (GBDT) = 077AUC (LR) = 063AUC (RF) = 091AUC (SVM) = 077

Figure 10 ROC curves for eight different algorithms based on theSEAI dataset

TPR

FPR

AUC (CNN) = 095AUC (CNN-RF) = 097AUC (CNN-GBDT) = 094AUC (CNN-SVM) = 096

AUC (GBDT) = 074AUC (LR) = 068AUC (RF) = 077AUC (SVM) = 071

0

01

02

03

04

05

06

07

08

09

1

0 01 02 03 04 05 06 07 08 09 1

Figure 12 ROC curves for eight different algorithms based on theLCL dataset

0

01

02

03

04

05

06

07

08

09

1

Perfo

rman

ce m

etric

CNN

CNN

-RF

CNN

-GBD

T

CNN

-SV

M

GBD

T

SVM

Different algorithms

LR RF

PrecisionRecallF1 score

Figure 13 Comparison with other algorithms based on the LCLdataset

10 Journal of Electrical and Computer Engineering

based on the train dataset -e performances of all themodels on the test dataset are shown in Figures 12 and 13

Figures 12 and 13 show that the performances of CNN-SVM CNN-GBDT and CNN-RF are better than those ofother models on the whole Since they are deep learningmodels the distinguishable features can be learned directlyfrom raw smart meter data by the CNN which has equippedthem with better generalization ability In addition theCNN-RF model has achieved the best performance since theRF can handle high-dimensional data while maintaininghigh computational efficiency

7 Conclusions

In this paper a novel CNN-RF model is presented to detectelectricity theft In this model the CNN is similar to anautomatic feature extractor in investigating smart meter dataand the RF is the output classifier Because a large number ofparameters must be optimized that increase the risk ofoverfitting a fully connected layer with a dropout rate of 04is designed during the training phase In addition the SMOTalgorithm is adopted to overcome the problem of dataimbalance Some machine learning and deep learningmethods such as SVM RF GBDT and LR are applied to thesame problem as a benchmark and all those methods havebeen conducted on SEAI and LCL datasets -e resultsindicate that the proposed CNN-RF model is quite apromising classification method in the electricity theft de-tection field because of two properties -e first is thatfeatures can be automatically extracted by the hybrid modelwhile the success of most other traditional classifiers relieslargely on the retrieval of good hand-designed featureswhich is a laborious and time-consuming task -e secondlies in that the hybrid model combines the advantages of theRF and CNN as both are the most popular and successfulclassifiers in the electricity theft detection field

Since the detection of electricity theft affects the privacyof consumers the future work will focus on investigatinghow the granularity and duration of smart meter data mightaffect this privacy Extending the proposed hybrid CNN-RFmodel to other applications (eg load forecasting) is a taskworth investigating

Data Availability

Previously reported [SEAI][LCL] datasets were used tosupport this study and are available at [httpwwwucdieissdadata][httpsbetaukdataserviceacukdatacataloguestudiesstudyid=7857] -ese prior studies (and datasets)are cited at relevant places within the text as references[19 26]

Conflicts of Interest

-e authors declare that there are no conflicts of interest

Acknowledgments

-is work was supported by the National Key Research andDevelopment Program of China (2016YFB0901900) the

Natural Science Foundation of Hebei Province of China(F2017501107) Open Research Fund from the State KeyLaboratory of Rolling and Automation Northeastern Uni-versity (2017RALKFKT003) the Foundation of Northeast-ern University at Qinhuangdao (XNB201803) and theFundamental Research Funds for the Central Universities(N182303037)

References

[1] S S S R Depuru L Wang and V Devabhaktuni ldquoElectricitytheft overview issues prevention and a smart meter basedapproach to control theftrdquo Energy Policy vol 39 no 2pp 1007ndash1015 2011

[2] J P Navani N K Sharma and S Sapra ldquoTechnical and non-technical losses in power system and its economic conse-quence in Indian economyrdquo International Journal of Elec-tronics and Computer Science Engineering vol 1 no 2pp 757ndash761 2012

[3] S McLaughlin B Holbert A Fawaz R Berthier andS Zonouz ldquoA multi-sensor energy theft detection frameworkfor advanced metering infrastructuresrdquo IEEE Journal on Se-lected Areas in Communications vol 31 no 7 pp 1319ndash13302013

[4] P McDaniel and S McLaughlin ldquoSecurity and privacychallenges in the smart gridrdquo IEEE Security amp PrivacyMagazine vol 7 no 3 pp 75ndash77 2009

[5] T B Smith ldquoElectricity theft a comparative analysisrdquo EnergyPolicy vol 32 no 1 pp 2067ndash2076 2004

[6] J I Guerrero C Leon I Monedero F Biscarri andJ Biscarri ldquoImproving knowledge-based systems with sta-tistical techniques text mining and neural networks for non-technical loss detectionrdquo Knowledge-Based Systems vol 71no 4 pp 376ndash388 2014

[7] C C O Ramos A N Souza G Chiachia A X Falcatildeo andJ P Papa ldquoA novel algorithm for feature selection usingharmony search and its application for non-technical lossesdetectionrdquo Computers amp Electrical Engineering vol 37 no 6pp 886ndash894 2011

[8] P Glauner J A Meira P Valtchev R State and F Bettingerldquo-e challenge of non-technical loss detection using artificialintelligence a surveyficial intelligence a surveyrdquo In-ternational Journal of Computational Intelligence Systemsvol 10 no 1 pp 760ndash775 2017

[9] S-C Huang Y-L Lo and C-N Lu ldquoNon-technical lossdetection using state estimation and analysis of variancerdquoIEEE Transactions on Power Systems vol 28 no 3pp 2959ndash2966 2013

[10] O Rahmati H R Pourghasemi and A M Melesse ldquoAp-plication of GIS-based data driven random forest and max-imum entropy models for groundwater potential mapping acase study at Mehran region Iranrdquo CATENA vol 137pp 360ndash372 2016

[11] N Edison A C Aranha and J Coelho ldquoProbabilisticmethodology for technical and non-technical losses estima-tion in distribution systemrdquo Electric Power Systems Researchvol 97 no 11 pp 93ndash99 2013

[12] J B Leite and J R S Mantovani ldquoDetecting and locating non-technical losses in modern distribution networksrdquo IEEETransactions on Smart Grid vol 9 no 2 pp 1023ndash1032 2018

[13] S Amin G A Schwartz A A Cardenas and S S SastryldquoGame-theoretic models of electricity theft detection in smartutility networks providing new capabilities with advanced

Journal of Electrical and Computer Engineering 11

metering infrastructurerdquo IEEE Control Systems Magazinevol 35 no 1 pp 66ndash81 2015

[14] A H Nizar Z Y Dong Y Wang and A N Souza ldquoPowerutility nontechnical loss analysis with extreme learning ma-chine methodrdquo IEEE Transactions on Power Systems vol 23no 3 pp 946ndash955 2008

[15] J Nagi K S Yap S K Tiong S K Ahmed andMMohamadldquoNontechnical loss detection for metered customers in powerutility using support vector machinesrdquo IEEE Transactions onPower Delivery vol 25 no 2 pp 1162ndash1171 2010

[16] E W S Angelos O R Saavedra O A C Cortes andA N de Souza ldquoDetection and identification of abnormalitiesin customer consumptions in power distribution systemsrdquoIEEE Transactions on Power Delivery vol 26 no 4pp 2436ndash2442 2011

[17] K A P Costa L A M Pereira R Y M NakamuraC R Pereira J P Papa and A Xavier Falcatildeo ldquoA nature-inspired approach to speed up optimum-path forest clusteringand its application to intrusion detection in computer net-worksrdquo Information Sciences vol 294 pp 95ndash108 2015

[18] R R Bhat R D Trevizan X Li and A Bretas ldquoIdentifyingnontechnical power loss via spatial and temporal deeplearningrdquo in Proceedings of the IEEE International Conferenceon Machine Learning and Applications Anaheim CA USADecember 2016

[19] M Ismail M Shahin M F Shaaban E Serpedin andK Qaraqe ldquoEfficient detection of electricity theft cyber attacksin ami networksrdquo in Proceedings of the IEEE Wireless Com-munications and Networking Conference Barcelona SpainApril 2018

[20] RMehrizi X Peng X Xu S Zhang DMetaxas and K Li ldquoAcomputer vision based method for 3D posture estimation ofsymmetrical liftingrdquo Journal of Biomechanics vol 69 no 1pp 40ndash46 2018

[21] K He X Zhang S Ren and J Sun ldquoDelving deep intorectifiers surpassing human-level performance on imagenetclassificationrdquo in Proceedings of the IEEE Conference onComputer for Sustainable Global Development pp 1026ndash1034New Delhi India March 2015

[22] A Sharif Razavian H Azizpour J Sullivan and S CarlssonldquoCNN features off-the-shelf an astounding baseline for rec-ognitionrdquo Astrophysical Journal Supplement Series vol 221no 1 pp 806ndash813 2014

[23] Z Zheng Y Yang X Niu H-N Dai and Y Zhou ldquoWide anddeep convolutional neural networks for electricity-theft de-tection to secure smart gridsrdquo IEEE Transactions on IndustrialInformatics vol 14 no 4 pp 1606ndash1615 2018

[24] D Svozil V Kvasnicka and J Pospichal ldquoIntroduction tomulti-layer feed-forward neural networksrdquo Chemometricsand Intelligent Laboratory Systems vol 39 no 1 pp 43ndash621997

[25] Irish social science data archive httpwwwucdieissdadatacommissionforenegryregulationcer

[26] P Jokar N Arianpoo and V C M Leung ldquoElectricity theftdetection in AMI using customersrsquo consumption patternsrdquoIEEE Transactions on Smart Grid vol 7 no 1 pp 216ndash2262016

[27] Y Wang Q Chen D Gan J Yang D S Kirschen andC Kang ldquoDeep learning-based socio-demographic in-formation identification from smart meter datafication fromsmart meter datardquo IEEE Transactions on Smart Grid vol 10no 3 pp 2593ndash2602 2019

[28] V Chandola A Banerjee and V Kumar ldquoAnomaly de-tection a surveyrdquo ACM Computing Surveys vol 41 no 32009

[29] H-T Cheng L Koc J Harmsen et al ldquoWide amp deep learningfor recommender systemsrdquo in Proceedings of the 1stWorkshopon Deep Learning for Recommender SystemsmdashDLRS 2016pp 7ndash10 Boston MA USA September 2016

[30] E Feczko N M Balba O Miranda-Dominguez et alldquoSubtyping cognitive profiles in autism spectrum disorderusing a functional random forest algorithmfiles in autismspectrum disorder using a functional random forest algo-rithmrdquo NeuroImage vol 172 pp 674ndash688 2018

[31] Y-L Boureau J Ponce and Y LeCun ldquoA theoretical analysisof feature pooling in visual recognitionrdquo in Proceedings of the27th International Conference on Machine Learningpp 111ndash118 Haifa Israel June 2010

[32] N Srivastava G Hinton A Krizhevsky I Sutskever andR Salakhutdinov ldquoDropout a simply way to prevent neuralnetworks from overfittingrdquo Ce Journal of Machine LearningResearch vol 15 no 1 pp 1929ndash1958 2014

[33] M Abadi A Agarwal P Barham et al ldquoTensorflow Large-scale machine learning on heterogeneous distributed sys-temsrdquo 2016 httpsarxivorgabs16030446

[34] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 no 4 pp 2825ndash2830 2011

[35] J R Schofield ldquoLow carbon london project data from thedynamic time-of-use electricity pricing trialrdquo 2017 httpsdiscoverukdataserviceacukcataloguesn7857

12 Journal of Electrical and Computer Engineering

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 3: ElectricityTheftDetectioninPowerGridswithDeep ...downloads.hindawi.com/journals/jece/2019/4136874.pdf(1)Datacleaning:thereareerroneousvalues(e.g.,outliers) in raw data, which correspond

convolution and downsampling And then RFclassification is trained based on the obtained fea-tures to detect whether the consumer steals elec-tricity Finally the confusion matrix and receiver-operating characteristic (ROC) curves are used toevaluate the accuracy of the CNN-RF model on thetest dataset

3 Data Analysis and Preprocess

-e dataset was collected by Electric Ireland and SustainableEnergy Authority of Ireland (SEAI) in January 2012 [25]which consists of electricity usage data from over 5000residential households and businesses -e smart metersrecorded electricity consumption at a resolution of 30minduring 2009 and 2010 Customers who participated in thetrial had a smart meter installed in their homes and agreed totake part in the research -erefore it is a reasonable as-sumption that all samples belong to honest users Howevermalicious samples cannot be obtained since energy theftmight never or rarely happen for a given customer Toaddress this issue we generated 1200 malicious customers asdescribed in [26] -e large number and a variety of cus-tomers with a long period of measurements make thisdataset an excellent source for research in the area of smartmeter data analysis For each customer i there is a half-hourly meter reporting for a 525-day periodWe reduced thesampling rate to one per hour for each customer -issection firstly analyses smart meter data to describe therationale of applying a CNN for feature extraction -en itdescribes three main procedures utilized for preprocessingthe original energy consumption data data cleaning missingvalue imputation and data normalization

31 Data Analysis To introduce the rationale for applying aCNN for feature extraction rather than other machinelearning techniques this section analyses smart meter data-ere are the daily load curves of four different users(residential and nonresidential) as shown in Figure 2 and itcan be seen that the daily electricity consumption of non-residential users is significantly higher than that of resi-dential users

In addition Figures 3 and 4 show the daily load profiles ofa consumer during working days and holidays -e load filesare highly similar but slightly shifted In a CNN the filterweights are uniform for different regions -us the featurescalculated in the convolutional layer are invariant to smallshifts which means that relatively stable features can be ob-tained from varying load profiles As described in [27] a deepCNN first automatically extracts features from massive loadprofiles and support vector machine (SVM) identifies thecharacteristics of the consumers To further investigate the loadseries we plot the four seasonsrsquo load profiles for a customer Itcan be seen from Figure 5 that the electricity consumption ofthe consumer changes with the change of seasons

In summary the load consumption differs in bothmagnitude and time of use and is dependent on lifestyleseasons weather and many other uncontrollable factors-erefore consumer load profiles are affected not only byweather and conditions but also by the types of consumersand other factors

Based on the above analysis extracting features fromsmart meter data based on experience is difficult Howeverfeature extraction is a key factor in the success of the de-tection system Conventionally manual feature extractionrequires elaborately designed features for a specific problemthat make it uneasy to adapt to other domains In the CNNthe successive alternating convolutional and pooling layersare designed to learn progressively higher-level features (egtrend indicators sequence standard deviation and linearslope) with 2D historical electricity consumption data Inaddition highly nonlinear correlations exist between elec-tricity consumption and these influencing factors Sinceactivation function has been designed on convolutional andfully connected layers the CNN is able to model highlynonlinear correlations In this paper activation functionnamed ldquorectified linear unitrdquo (ReLU) is used because of itssparsity and minimizing gradient vanishing problem in theproposed CNN-RF model

32 Data Preprocess xit and xprimeit are defined as the energyconsumption for an honest or malicious customer i at timeinterval t In this section the main procedures utilized forpreprocessing raw electricity data are described as follows

Smart meterdata

Data analysis and preprocess CNN-RF model

Model evaluation

Train dataset

Test dataset

Generation of train and test datasets

CNN feature extractor

RF classifier

Training

Balanced dataset

Unbalanced dataset

SMOTalgorithm

Cross-validationalgorithm

Confusion matrix

ROC curves

Original dataset

Data preprocess

Data normalization

Data analysis

Nonresidential and residential users

Working days and holidays

The change of seasons

Data cleaning

Missing valueimputation

Figure 1 Flow of electricity theft detection

Journal of Electrical and Computer Engineering 3

(1) Data cleaning there are erroneous values (eg outliers)in raw data which correspond to peak electricityconsumption activities during holidays or special

occasions such as birthdays and celebrations In thispaper the ldquothree-sigma rule of thumbrdquo [28] is used torestore the outliers according to the following formula

0 5 10 15 20Time (h)

0

1

2

3

4

Usa

ge (k

W)

Residential user 1

0 5 10 15 20Time (h)

0

1

2

3

4

Usa

ge (k

W)

Residential user 2

0 5 10 15 20Time (h)

0

05

1

15

Usa

ge (k

W)

Nonresidential user 1

0 5 10 15 20Time (h)

5

10

15

20

Usa

ge (k

W)

Nonresidential user 2

Figure 2 Load profiles for nonresidential and residential users

0 5 10 15 20 25Time (h)

0

05

1

15

2

25

Usa

ge (k

Wmiddoth

)

MondayTuesdayWednesday

ThursdayFriday

Figure 3 Daily load profiles for a customer in working days

0 5 10 15 20 25Time (h)

0

02

04

06

08

1

12

14

Usa

ge (k

Wmiddoth

)

SaturdaySundayNew year

HalloweenChristmas

Figure 4 Daily load profiles for a customer during holidays

4 Journal of Electrical and Computer Engineering

F xit1113872 1113873 avg xit1113872 1113873 + 2σ xit1113872 1113873 xit gt xlowastit

xit else

⎧⎨

⎩ (1)

where xlowastit is computed by the mean avg(middot) and standarddeviation σ(middot) for each time interval comprising theweekdaytime pair for each month

(2) Missing value imputation because of various reasonssuch as storage issues and failure of smart meters thereare missing values in electricity consumption data-rough the analysis of original data it is found thatthere are two kinds of data missing one is the con-tinuous missing of multiple data and the solution is todelete the users when the number of missing valuesexceeds 10 the other is missing single data which isprocessed by the formula (2) -us missing values canbe recovered as follows

F xit1113872 1113873

xitminus 1 + xit+1

2 xit isin NaN

xit else

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

(2)

where xit stands for the electricity usage of the consumer i

over a period (eg an hour) if xit is null we represent it asNaN

(3) Data normalization finally data need to be nor-malized because the neural network is sensitive to thediverse data One of the common methods for thisscope is min-max normalization which is computedas

F xit1113872 1113873 xit minus min xiT1113872 1113873

max xiT1113872 1113873 minus min xiT1113872 1113873 (3)

where min(middot) and max(middot) represent the min and max valuesover a day respectively

4 Generation of Train and Test Datasets

After data processing the SEAI dataset contains the elec-tricity consumption data of 4737 normal electricity cus-tomers and 1200 malicious electricity customers within525 days (with 24 hours)

Our preliminary analytical results (refer to Section 31)reveal the electricity usage consumer load profiles are af-fected not only by weather and conditions but also by thetypes of consumers and other factors However it is difficultto extracting features based on experience from the 1Delectricity consumption data since the electricity con-sumption every day fluctuates in a relatively independentway Motivated by the previous work in [29] we design adeep CNN to process the electricity consumption data in a2D manner In particular we transform the 1D electricityconsumption data into 2D data according to days We define2D data as a matrix of actual energy consumption values fora specific customer where the rows of 2D data representdays as D (D 1 525) and columns represent the timeperiods as T (T 1 24) It is worthmentioning that theCNN learns the features between different hours of the dayand different days from 2D data

-en we divide the dataset into train and test sets usingthe cross-validation method in which 80 are the train setand 20 are the test set Given that electricity theftconsumers remarkably outnumber nonfraudulent onesthe imbalanced nature of the dataset can have a majornegative impact on the performance of supervised ma-chine learning methods To reduce this bias the SMOTalgorithm is used to make the number of normal andabnormal samples equal in the train set Finally the traindataset contains 7484 customers in which the number ofnormal and abnormal samples is equal And the testdataset consists of 1669 customers

5 The Novel CNN-RF Algorithm

In this section the design of the CNN-RF structure ispresented in detail and some techniques are proposed toreduce overfitting Moreover correlative parameters in-cluding the number of filters and the size of each feature mapand kernel are given Finally the training process of theproposed CNN-RF algorithm is introduced

51 CNN-RF Architecture -e proposed CNN-RF model isdesigned by integrating CNN and RF classifiers As shown inFigure 6 the architecture of the CNN-RF mainly includes anautomatic feature extractor and a trainable RF classifier -efeature extractor CNN consists of convolutional layersdownsampling layers and a fully connection layer-e CNNis a deep supervised learning architecture which usuallyincludes multiple layers and can be trained with the back-propagation algorithm It also can explore the intricatedistribution in smart meter data by performing the

0 10 20 30 40 50 60 70 80 90 100Time (days)

0

5

10

15

20

25

30

Usa

ge (k

Wmiddotd

ay)

SpringSummer

AutumnWinter

Figure 5 Four seasonsrsquo load profiles for a customer

Journal of Electrical and Computer Engineering 5

stochastic gradient method -e RF classifier consists of acombination of tree classifiers in which each tree contrib-utes with a single vote to the assignment of themost frequentclass in input data [30] In the following the design of eachpart of the CNN-RF structure is presented in detail

511 Convolutional Layer -e main purpose of convolu-tional layers is to learn feature representation of input dataand reduce the influence of noise -e convolutional layer iscomposed of several feature filters which are used tocompute different feature maps Specially to reduce theparameters of networks and lower the complication ofrelevant layers during the process of generating featuremaps the weights of each kernel in each feature map areshared Moreover each neuron in the convolutional layerconnects the local region of front layers and then a ReLUactivation function is applied on the convolved resultsMathematically the outputs of the convolutional layers canbe expressed as

yconv Xfl

l1113872 1113873 δ 1113944

Fl

fl1W

fl

l lowastXfl

l + bfl

l⎛⎝ ⎞⎠ (4)

where δ is the activation function lowast is the convolutionoperation and both W

fl

l and bfl

l are the learnable param-eters in the f-th feature filter

512 Downsampling Layer Downsampling is an importantconcept of the CNN which is usually placed between twoconvolutional layers It can reduce the number of parametersand achieve dimensionality reduction In particular eachfeature map of the pooling layer is connected to its corre-sponding feature map of the previous convolutional layerAnd the size of output feature maps is reduced withoutreducing the number of output feature maps in thedownsampling layer In the literature downsampling op-erations mainly include max-pooling and average-pooling-e max-pooling operation transforms small windows intosingle values by maximum which is expressed as formula 5but average-pooling returns average values of activations in asmall window In [31] experiments show that the max-pooling operation shows better performance than average-pooling -erefore the max-pooling operation is usuallyimplemented in the downsampling layer

ypool Xfl

l1113872 1113873 maxmisinM

Xlm1113966 1113967 (5)

where M is a set of activations in the pooling window andmis the index of activations in the pooling window

513 Fully Connected Layer With the features extracted bysequential convolution and pooling a fully connected layeris applied for flattening feature maps into one vector asfollows

yflXl( 1113857 δ Wl middot Xl + bl( 1113857 (6)

where Wl is the weight of the l-th layer and bl is the bias ofthe l-th layer

514 RF Classifier Layer Traditionally the softmax clas-sifier is used for the last output layer in the CNN but theproposed model uses the RF classifier to predict the classbased on the obtained features -erefore the RF classifierlayer can be defined as

yout XL( 1113857 sigm Wrf middot Xl + bl( 1113857 (7)

where sigm(middot) is the sigmoid function which maps ab-normal values to 0 and normal values to 1-e parameter setWrf in the RF layer includes the number of decision treesand the maximum depth of the tree which are obtained bythe grid search algorithm

52 Techniques for Selecting Parameters and AvoidingOverfitting -e detailed parameters of the proposed CNN-RF structure are summarized in Table 1 which include thenumbers of filters in each layer filter size and stride Twofactors are considered in determining the CNN structure-e first factor is the characteristics of consumersrsquo electricityconsumption behaviours Since the load files are such avariable two convolutional layers with a filter size of 3lowast 3(stride 1) are applied to capture hidden features for detectingelectricity theft Moreover because the dimension of inputdata size is 525 times 24 which is similar to what is used in imagerecognition problems two max-pooling layers with a filtersize of 2lowast 2 (stride 2) are used -e second factor is thenumber of training samples since the number of samples islimited to reduce the risk of overfitting we design a fullyconnected layer with a dropout rate of 04 whose main idea

d d

d d d d

n a n a n a n a

Input data Convolutional layer Convolutional layer

Downsamplinglayer

Downsamplinglayer

L232525 lowast 24 L332268 lowast 12 L464263 lowast 12 L564132 lowast 6L1525 lowast 24Fully connected layer

Random forest layer

L6132 lowast 1

Figure 6 Architecture of the hybrid CNN-RF model

6 Journal of Electrical and Computer Engineering

is that dropping units randomly from the neural networkduring training can prevent units from coadapting too much[32] and make a neuron not rely on the presence of otherspecific neurons Applying an appropriate training methodis also useful for reducing overfitting It is essentially aregularization that adds a penalty for weight update at eachiteration And we also use a binary cross entropy as the lossfunction Finally the grid search algorithm is used to op-timize RF classifier parameters such as the maximumnumber of decision trees and features

53 Training Process of CNN-RF -ere is no doubt that theCNN-RF structure needs to train the CNN at first before theRF classifier is invoked After that the RF classifier is trainedbased on features learned by the CNN

531 Training Process of CNN With a batch size (thenumber of training samples in an iteration) of 100 CNNs aretrained using the forward propagation algorithm andbackpropagation algorithm As shown in Figure 6 firstly thedata in the input layer are transferred to convolutionallayers pooling layers and a fully connected layer and thepredicted value is obtained As shown in Figure 7 thebackpropagation phase begins to update parameters if thedifference of the output value and target value is too largeand exceeds a certain threshold -e difference between theactual output of the neural network and the expected outputdetermines the adjustment direction and strip size of weightsamong layers in the network

Given a sample set D (x1 y1) (xm ym)1113864 1113865 with msamples in total 1113957y is obtained at first by using the feed-forward process As for all samples the mean of differencebetween the actual output yi and the expected output1113957ywb(xi) can be expressed by

J(W b) 1m

1113944

m

i1

12

1113957ywb xi( 1113857 minus yi

2

1113874 1113875 (8)

where w represents the connection weight between layers ofthe network and b is the corresponding bias

-en each parameter is initialized with a random valuegenerated by a normal distribution when the mean is 0 andvariance is ϑ which are updated with the gradient descentmethod -e corresponding expressions are as follows

W(l)ij W

(l)ij minus α

z

zW(l)ij

J(W b) (9)

b(l)i b

(l)i minus α

z

zb(l)i

J(W b) (10)

where α represents a learning rate W(l)ij represents the

connection weight between the i-th neuron in the l-th layerand the j-th neuron in the (l + 1)-th layer and b

(l)i represents

the bias of the i-th neuron in the l-th layerFinally all parameters W and b in the network are

updated according to formulas 9 and 10 And all these stepsare repeated to reduce the objective function J(W b)

532 Training Process of RF Classifier -e process of the RFclassifier takes advantage of two powerful machine learningtechniques bagging and random feature selection Baggingis a method to generate a particular bootstrap samplerandomly from the learnt features for growing each treeInstead of using all features the RF randomly selects a subsetof features to determine the best splitting points whichensures complete splitting to one classification of leaf nodesof decision trees Specifically the whole process of RFclassification can be written as Algorithm 1

6 Experiments and Result Analysis

All the experiments are implemented using Python 36 on astandard PC with an Intel Core i5-7500MQ CPU running at340GHz and with 80 GB of RAM-e CNN architecture isconstructed based on TensorFlow [33] and the interfacebetween the CNN and the RF is programmed using scikit-learn [34]

61 Performance Metrics In this paper the problem ofelectricity theft is considered a discrete two-class classifi-cation task which should assign each consumer into one ofthe predefined classes (abnormal or normal)

In particular the result of classifier validation is oftenpresented as confusion matrices Table 2 shows a confusionmatrix applied in electricity theft detection where TP FNFP and TN respectively represent the number of con-sumers that are classified correctly as normal classifiedfalsely as abnormal classified falsely as normal andclassified correctly as abnormal Although these four indicesshow all of the information about the performance of theclassifier more meaningful performance measurescan be extracted from them to illustrate certain performancecriteria such as TPR TP(TP + FN) FPR FP(FP +TN) precisionTP(TP+ FP) recallTP(TP+ FN)and F1 score 2(precision times recall)(precision + recall)where the true positive rate (TPR) is the ability of theclassifier to correctly identify a consumer as normal alsoreferred to as sensitivity and the false positive rate (FPR) isthe risk of positively identifying an abnormal consumerPrecision is the ratio of consumers detected correctly to thetotal number of detected normal consumers Recall is theratio of the number of consumers correctly detected to theactual number of consumers (Figure 7)

In addition the ROC curve is introduced by plotting allTPR values on the y-axis against their equivalent FPR valuesfor all available thresholds on the x-axis -e ROC curve atthe upper left has better detection effects where lower FPRvalues are caused by the same TPR -e AUC indicates the

Table 1 Parameters of the proposed CNN-RF model

Layer Size of the filter Number of filters StrideConvolution1 3lowast 3 32 1MaxPooling1 2lowast 2 mdash 2Convolution2 3lowast 3 64 1MaxPooling2 2lowast 2 mdash 2

Journal of Electrical and Computer Engineering 7

quality of the classifier and the larger AUC represents betterperformance

62 Experiment Based onCNN-RFModel -e experimentis conducted based on normalized consumption dataWith the batch size of 100 the training procedure isstopped after 300 epochs As shown in Figure 8 it could beseen that training and testing losses settled downsmoothly -is means that the proposed model learnedtrain dataset and there was small bias in the test datadataset as well -en the RF classifier is the last layer of theCNN to predict labels of input consumers Forty valuesfrom the fully connected layer of the trained CNN are usedas a new feature vector to detect abnormal users and arefed to the RF for learning and testing To build the RF inthe hybrid model the optimal parameters are determinedby applying the grid search algorithm based on the

training dataset -e grid searching range of each pa-rameter is given as follows numTrees [50 60 100]

and max depth [10 20 40] 6 times 4 24 are tried with

Require(1) Original training datasetsS (xi yi) i 1 2 3 m1113864 1113865 (X Y) isin Re times R

(2) Test datasets xj isin Rm

for d 1 to Ntree do(1) Draw a bootstrap Sd from the original training data(2) Grow an unpruned tree hd using data Sd

(a)Randomly select new feature set Mtry from original feature set e

(b)Select the best features from feature set Mtry based on Gini indicator on each node(c)Split until each tree grows to its maximum

end forEnsure(1) -e collection of trees hd d 1 2 Ntree1113864 1113865

(2) For xj the output of a decision tree is hd(xj)

f(xj) majority vote hd(xj)1113966 1113967d

Ntreereturn f(xj)

ALGORITHM 1 Random forest algorithm

x Wconv Wpool J (W b)

bx

bconv

bpool

(a)

x Wconv Wpool J (W b)

bx

bconv

bpool

(b)

Figure 7 Process of forward propagation (a) and backpropagation (b)

0 50 100 150 200 250 300Epoch

03

035

04

045

05

055

06

Usa

ge (k

W)

TrainingTesting

Figure 8 Loss curves of training and testing

Table 2 Confusion matrix applied in electricity theft detection

Actualdetected Normal AbnormalNormal TP (true positive) FN (false negative)Abnormal FP (false positive) TN (true negative)

8 Journal of Electrical and Computer Engineering

different combinations -e best result is achieved whennumTrees 100 and maxDepth 30 -ese parametersare then used to train the hybrid model

-e classification results of the detection model pro-posed in this paper are as follows the number of TPs FNsFPs and TNs is 1041 28 23 and 577 respectively-erefore the precision recall and F1 score can be calcu-lated as shown in Table 3 where Class 0 is an abnormalpattern and Class 1 is a normal pattern Also the AUC valueof the CNN-RF algorithm is calculated which is 098 and farbetter than the value of the baseline model (AUC 05) -ismeans the proposed algorithm is able to classify both classesaccurately as shown in Figure 9

63 Comparative Experiments and the Analysis of ResultsTo evaluate the accuracy of the CNN-RF model comparativeexperiments are conducted based on handcrafted featureswith nondeep learning methods including SVM RF gradientboosting decision tree (GBDT) and logistic regression (LR)Moreover we compared the obtained classification results byvarious supervised classifiers CNN features with the SVMclassifier (CNN-SVM) and CNN features with the GBDTclassifier (CNN-GBDT) to those obtained by the previousclassification work In the following five methods are in-troduced respectively and then the results are analysed

(1) Logistic regression it is a basic model in binaryclassification that is equivalent to one layer of neuralnetwork with sigmoid activation function A sigmoidfunction obtains a value ranging from 0 to 1 based onlinear regression -en any value larger than 05 willbe classified as a normal pattern and less than 05 willbe classified as an abnormal pattern

(2) Support vector machine this classifier with kernelfunctions can transform a nonlinear separableproblem into a linear separable problem by pro-jecting data into the feature space and then findingthe optimal separate hyperplane It is applied topredict the patterns of consumers based on theaforementioned handcrafted features

(3) Random forest it is essentially an integration ofmultiple decision trees that can achieve better per-formance when maintaining the effective control ofoverfitting in comparison with a single decision tree-e RF classifier also can handle high-dimensionaldata while maintaining high computationalefficiency

(4) Gradient boosting decision tree it is an iterativedecision tree algorithm which consists of multipledecision trees For the final output all results orweights are accumulated in the GBDT while randomforests use majority voting

(5) Deep learning methods softmax is used in the lastlayer of the CNN CNN-GBDT and CNN-SVM incomparison with the proposed method

Table 4 summarizes the parameters of comparativemethods All experiments have been done Figure 10 shows

the results of different methods and proposed hybrid CNN-RF method and it can be seen that the AUC value of CNN-RF CNN-GBDT CNN-SVM CNN SVM RF LR andGBDT is 099 097 098 093 077 091 063 and 077 AndFigure 11 presents the results of all comparative experimentsin terms of precision recall and F1 score Among theperformances of eight different electricity theft detectionalgorithms the deep learning methods (such as CNN CNN-RF CNN-SVM and CNN-GBDT) show better perfor-mances than machine learning methods (eg LR SVMGBDT and RF)-e reason of this result is that the CNN canlearn features through a large number of electricity con-sumption data -us the accuracy of electricity theft de-tection is greatly improved In addition it is indicated thatthe proposed CNN-RF model shows the best performancecompared with other methods

Table 3 Classification score summary for CNN-RF

Precision Recall F1 scoreClass 0 097 096 096Class 1 098 098 098Averagetotal 097 097 097

0 01 02 03 04 05 06 07 08 09 1FPR

0

01

02

03

04

05

06

07

08

09

1

TPR

AUC (CNN-RF) = 099

Figure 9 ROC curve for CNN-RF

Table 4 Experimental parameters for comparative methods

Methods Input data Parameters

LR Features(1D)

Penalty L2Inverse of regulation strength 10

SVM Features(1D)

Penalty parameter of the error term 105Parameter of kernel function (RBF)

00005

GBDT Features(1D) -e number of estimators 200

RF Features(1D) -e number of trees in the forest 100

CNN Raw data(2D)

-e max depth of each tree 30-e same as the CNN component of the

proposed approach

Journal of Electrical and Computer Engineering 9

As detecting electricity theft consumers is achieved bydiscovering the anomalous consumption behaviours ofsuspicious customers we can also regard it as an anomalydetection problem essentially Since CNN-RF has shownsuperior performance on the SEAI dataset it would beinteresting to test its generalization ability on other datasetsIn particular we have further tested the performance ofCNN-RF together with the above-mentioned models onanother dataset Low Carbon London (LCL) dataset [35]-e LCL dataset is composed of over 5500 honest electricityconsumers in total each of which consists of 525 days (the

sampling rate is hour) Among the 5500 customers werandomly chose 1200 as malicious customers and modifiedtheir intraday load profiles according to the fraudulentsample generation methods similar to the SEAI dataset

After the preprocess of the LCL dataset (similar to theSEAI dataset) the train dataset contains 7584 customers inwhich the number of normal and abnormal samples is equalAnd the test dataset consists of 1700 customers together withcorresponding labels indicating anomaly or not -en thetraining of the proposed CNN-RF model and comparativemodels is all done following the aforementioned procedure

CNN

CNN

-RF

CNN

-GBD

T

CNN

-SV

M

GBD

T LR RF

SVM

Different algorithms

0

01

02

03

04

05

06

07

08

09

1

Perfo

rman

ce m

etric

PrecisionRecallF1 score

Figure 11 Comparison with other algorithms based on the SEAIdataset

0 01 02 03 04 05 06 07 08 09 1FPR

0

01

02

03

04

05

06

07

08

09

1

TPR

AUC (CNN) = 093AUC (CNN-RF) = 099AUC (CNN-GBDT) = 097AUC (CNN-SVM) = 098

AUC (GBDT) = 077AUC (LR) = 063AUC (RF) = 091AUC (SVM) = 077

Figure 10 ROC curves for eight different algorithms based on theSEAI dataset

TPR

FPR

AUC (CNN) = 095AUC (CNN-RF) = 097AUC (CNN-GBDT) = 094AUC (CNN-SVM) = 096

AUC (GBDT) = 074AUC (LR) = 068AUC (RF) = 077AUC (SVM) = 071

0

01

02

03

04

05

06

07

08

09

1

0 01 02 03 04 05 06 07 08 09 1

Figure 12 ROC curves for eight different algorithms based on theLCL dataset

0

01

02

03

04

05

06

07

08

09

1

Perfo

rman

ce m

etric

CNN

CNN

-RF

CNN

-GBD

T

CNN

-SV

M

GBD

T

SVM

Different algorithms

LR RF

PrecisionRecallF1 score

Figure 13 Comparison with other algorithms based on the LCLdataset

10 Journal of Electrical and Computer Engineering

based on the train dataset -e performances of all themodels on the test dataset are shown in Figures 12 and 13

Figures 12 and 13 show that the performances of CNN-SVM CNN-GBDT and CNN-RF are better than those ofother models on the whole Since they are deep learningmodels the distinguishable features can be learned directlyfrom raw smart meter data by the CNN which has equippedthem with better generalization ability In addition theCNN-RF model has achieved the best performance since theRF can handle high-dimensional data while maintaininghigh computational efficiency

7 Conclusions

In this paper a novel CNN-RF model is presented to detectelectricity theft In this model the CNN is similar to anautomatic feature extractor in investigating smart meter dataand the RF is the output classifier Because a large number ofparameters must be optimized that increase the risk ofoverfitting a fully connected layer with a dropout rate of 04is designed during the training phase In addition the SMOTalgorithm is adopted to overcome the problem of dataimbalance Some machine learning and deep learningmethods such as SVM RF GBDT and LR are applied to thesame problem as a benchmark and all those methods havebeen conducted on SEAI and LCL datasets -e resultsindicate that the proposed CNN-RF model is quite apromising classification method in the electricity theft de-tection field because of two properties -e first is thatfeatures can be automatically extracted by the hybrid modelwhile the success of most other traditional classifiers relieslargely on the retrieval of good hand-designed featureswhich is a laborious and time-consuming task -e secondlies in that the hybrid model combines the advantages of theRF and CNN as both are the most popular and successfulclassifiers in the electricity theft detection field

Since the detection of electricity theft affects the privacyof consumers the future work will focus on investigatinghow the granularity and duration of smart meter data mightaffect this privacy Extending the proposed hybrid CNN-RFmodel to other applications (eg load forecasting) is a taskworth investigating

Data Availability

Previously reported [SEAI][LCL] datasets were used tosupport this study and are available at [httpwwwucdieissdadata][httpsbetaukdataserviceacukdatacataloguestudiesstudyid=7857] -ese prior studies (and datasets)are cited at relevant places within the text as references[19 26]

Conflicts of Interest

-e authors declare that there are no conflicts of interest

Acknowledgments

-is work was supported by the National Key Research andDevelopment Program of China (2016YFB0901900) the

Natural Science Foundation of Hebei Province of China(F2017501107) Open Research Fund from the State KeyLaboratory of Rolling and Automation Northeastern Uni-versity (2017RALKFKT003) the Foundation of Northeast-ern University at Qinhuangdao (XNB201803) and theFundamental Research Funds for the Central Universities(N182303037)

References

[1] S S S R Depuru L Wang and V Devabhaktuni ldquoElectricitytheft overview issues prevention and a smart meter basedapproach to control theftrdquo Energy Policy vol 39 no 2pp 1007ndash1015 2011

[2] J P Navani N K Sharma and S Sapra ldquoTechnical and non-technical losses in power system and its economic conse-quence in Indian economyrdquo International Journal of Elec-tronics and Computer Science Engineering vol 1 no 2pp 757ndash761 2012

[3] S McLaughlin B Holbert A Fawaz R Berthier andS Zonouz ldquoA multi-sensor energy theft detection frameworkfor advanced metering infrastructuresrdquo IEEE Journal on Se-lected Areas in Communications vol 31 no 7 pp 1319ndash13302013

[4] P McDaniel and S McLaughlin ldquoSecurity and privacychallenges in the smart gridrdquo IEEE Security amp PrivacyMagazine vol 7 no 3 pp 75ndash77 2009

[5] T B Smith ldquoElectricity theft a comparative analysisrdquo EnergyPolicy vol 32 no 1 pp 2067ndash2076 2004

[6] J I Guerrero C Leon I Monedero F Biscarri andJ Biscarri ldquoImproving knowledge-based systems with sta-tistical techniques text mining and neural networks for non-technical loss detectionrdquo Knowledge-Based Systems vol 71no 4 pp 376ndash388 2014

[7] C C O Ramos A N Souza G Chiachia A X Falcatildeo andJ P Papa ldquoA novel algorithm for feature selection usingharmony search and its application for non-technical lossesdetectionrdquo Computers amp Electrical Engineering vol 37 no 6pp 886ndash894 2011

[8] P Glauner J A Meira P Valtchev R State and F Bettingerldquo-e challenge of non-technical loss detection using artificialintelligence a surveyficial intelligence a surveyrdquo In-ternational Journal of Computational Intelligence Systemsvol 10 no 1 pp 760ndash775 2017

[9] S-C Huang Y-L Lo and C-N Lu ldquoNon-technical lossdetection using state estimation and analysis of variancerdquoIEEE Transactions on Power Systems vol 28 no 3pp 2959ndash2966 2013

[10] O Rahmati H R Pourghasemi and A M Melesse ldquoAp-plication of GIS-based data driven random forest and max-imum entropy models for groundwater potential mapping acase study at Mehran region Iranrdquo CATENA vol 137pp 360ndash372 2016

[11] N Edison A C Aranha and J Coelho ldquoProbabilisticmethodology for technical and non-technical losses estima-tion in distribution systemrdquo Electric Power Systems Researchvol 97 no 11 pp 93ndash99 2013

[12] J B Leite and J R S Mantovani ldquoDetecting and locating non-technical losses in modern distribution networksrdquo IEEETransactions on Smart Grid vol 9 no 2 pp 1023ndash1032 2018

[13] S Amin G A Schwartz A A Cardenas and S S SastryldquoGame-theoretic models of electricity theft detection in smartutility networks providing new capabilities with advanced

Journal of Electrical and Computer Engineering 11

metering infrastructurerdquo IEEE Control Systems Magazinevol 35 no 1 pp 66ndash81 2015

[14] A H Nizar Z Y Dong Y Wang and A N Souza ldquoPowerutility nontechnical loss analysis with extreme learning ma-chine methodrdquo IEEE Transactions on Power Systems vol 23no 3 pp 946ndash955 2008

[15] J Nagi K S Yap S K Tiong S K Ahmed andMMohamadldquoNontechnical loss detection for metered customers in powerutility using support vector machinesrdquo IEEE Transactions onPower Delivery vol 25 no 2 pp 1162ndash1171 2010

[16] E W S Angelos O R Saavedra O A C Cortes andA N de Souza ldquoDetection and identification of abnormalitiesin customer consumptions in power distribution systemsrdquoIEEE Transactions on Power Delivery vol 26 no 4pp 2436ndash2442 2011

[17] K A P Costa L A M Pereira R Y M NakamuraC R Pereira J P Papa and A Xavier Falcatildeo ldquoA nature-inspired approach to speed up optimum-path forest clusteringand its application to intrusion detection in computer net-worksrdquo Information Sciences vol 294 pp 95ndash108 2015

[18] R R Bhat R D Trevizan X Li and A Bretas ldquoIdentifyingnontechnical power loss via spatial and temporal deeplearningrdquo in Proceedings of the IEEE International Conferenceon Machine Learning and Applications Anaheim CA USADecember 2016

[19] M Ismail M Shahin M F Shaaban E Serpedin andK Qaraqe ldquoEfficient detection of electricity theft cyber attacksin ami networksrdquo in Proceedings of the IEEE Wireless Com-munications and Networking Conference Barcelona SpainApril 2018

[20] RMehrizi X Peng X Xu S Zhang DMetaxas and K Li ldquoAcomputer vision based method for 3D posture estimation ofsymmetrical liftingrdquo Journal of Biomechanics vol 69 no 1pp 40ndash46 2018

[21] K He X Zhang S Ren and J Sun ldquoDelving deep intorectifiers surpassing human-level performance on imagenetclassificationrdquo in Proceedings of the IEEE Conference onComputer for Sustainable Global Development pp 1026ndash1034New Delhi India March 2015

[22] A Sharif Razavian H Azizpour J Sullivan and S CarlssonldquoCNN features off-the-shelf an astounding baseline for rec-ognitionrdquo Astrophysical Journal Supplement Series vol 221no 1 pp 806ndash813 2014

[23] Z Zheng Y Yang X Niu H-N Dai and Y Zhou ldquoWide anddeep convolutional neural networks for electricity-theft de-tection to secure smart gridsrdquo IEEE Transactions on IndustrialInformatics vol 14 no 4 pp 1606ndash1615 2018

[24] D Svozil V Kvasnicka and J Pospichal ldquoIntroduction tomulti-layer feed-forward neural networksrdquo Chemometricsand Intelligent Laboratory Systems vol 39 no 1 pp 43ndash621997

[25] Irish social science data archive httpwwwucdieissdadatacommissionforenegryregulationcer

[26] P Jokar N Arianpoo and V C M Leung ldquoElectricity theftdetection in AMI using customersrsquo consumption patternsrdquoIEEE Transactions on Smart Grid vol 7 no 1 pp 216ndash2262016

[27] Y Wang Q Chen D Gan J Yang D S Kirschen andC Kang ldquoDeep learning-based socio-demographic in-formation identification from smart meter datafication fromsmart meter datardquo IEEE Transactions on Smart Grid vol 10no 3 pp 2593ndash2602 2019

[28] V Chandola A Banerjee and V Kumar ldquoAnomaly de-tection a surveyrdquo ACM Computing Surveys vol 41 no 32009

[29] H-T Cheng L Koc J Harmsen et al ldquoWide amp deep learningfor recommender systemsrdquo in Proceedings of the 1stWorkshopon Deep Learning for Recommender SystemsmdashDLRS 2016pp 7ndash10 Boston MA USA September 2016

[30] E Feczko N M Balba O Miranda-Dominguez et alldquoSubtyping cognitive profiles in autism spectrum disorderusing a functional random forest algorithmfiles in autismspectrum disorder using a functional random forest algo-rithmrdquo NeuroImage vol 172 pp 674ndash688 2018

[31] Y-L Boureau J Ponce and Y LeCun ldquoA theoretical analysisof feature pooling in visual recognitionrdquo in Proceedings of the27th International Conference on Machine Learningpp 111ndash118 Haifa Israel June 2010

[32] N Srivastava G Hinton A Krizhevsky I Sutskever andR Salakhutdinov ldquoDropout a simply way to prevent neuralnetworks from overfittingrdquo Ce Journal of Machine LearningResearch vol 15 no 1 pp 1929ndash1958 2014

[33] M Abadi A Agarwal P Barham et al ldquoTensorflow Large-scale machine learning on heterogeneous distributed sys-temsrdquo 2016 httpsarxivorgabs16030446

[34] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 no 4 pp 2825ndash2830 2011

[35] J R Schofield ldquoLow carbon london project data from thedynamic time-of-use electricity pricing trialrdquo 2017 httpsdiscoverukdataserviceacukcataloguesn7857

12 Journal of Electrical and Computer Engineering

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 4: ElectricityTheftDetectioninPowerGridswithDeep ...downloads.hindawi.com/journals/jece/2019/4136874.pdf(1)Datacleaning:thereareerroneousvalues(e.g.,outliers) in raw data, which correspond

(1) Data cleaning there are erroneous values (eg outliers)in raw data which correspond to peak electricityconsumption activities during holidays or special

occasions such as birthdays and celebrations In thispaper the ldquothree-sigma rule of thumbrdquo [28] is used torestore the outliers according to the following formula

0 5 10 15 20Time (h)

0

1

2

3

4

Usa

ge (k

W)

Residential user 1

0 5 10 15 20Time (h)

0

1

2

3

4

Usa

ge (k

W)

Residential user 2

0 5 10 15 20Time (h)

0

05

1

15

Usa

ge (k

W)

Nonresidential user 1

0 5 10 15 20Time (h)

5

10

15

20

Usa

ge (k

W)

Nonresidential user 2

Figure 2 Load profiles for nonresidential and residential users

0 5 10 15 20 25Time (h)

0

05

1

15

2

25

Usa

ge (k

Wmiddoth

)

MondayTuesdayWednesday

ThursdayFriday

Figure 3 Daily load profiles for a customer in working days

0 5 10 15 20 25Time (h)

0

02

04

06

08

1

12

14

Usa

ge (k

Wmiddoth

)

SaturdaySundayNew year

HalloweenChristmas

Figure 4 Daily load profiles for a customer during holidays

4 Journal of Electrical and Computer Engineering

F xit1113872 1113873 avg xit1113872 1113873 + 2σ xit1113872 1113873 xit gt xlowastit

xit else

⎧⎨

⎩ (1)

where xlowastit is computed by the mean avg(middot) and standarddeviation σ(middot) for each time interval comprising theweekdaytime pair for each month

(2) Missing value imputation because of various reasonssuch as storage issues and failure of smart meters thereare missing values in electricity consumption data-rough the analysis of original data it is found thatthere are two kinds of data missing one is the con-tinuous missing of multiple data and the solution is todelete the users when the number of missing valuesexceeds 10 the other is missing single data which isprocessed by the formula (2) -us missing values canbe recovered as follows

F xit1113872 1113873

xitminus 1 + xit+1

2 xit isin NaN

xit else

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

(2)

where xit stands for the electricity usage of the consumer i

over a period (eg an hour) if xit is null we represent it asNaN

(3) Data normalization finally data need to be nor-malized because the neural network is sensitive to thediverse data One of the common methods for thisscope is min-max normalization which is computedas

F xit1113872 1113873 xit minus min xiT1113872 1113873

max xiT1113872 1113873 minus min xiT1113872 1113873 (3)

where min(middot) and max(middot) represent the min and max valuesover a day respectively

4 Generation of Train and Test Datasets

After data processing the SEAI dataset contains the elec-tricity consumption data of 4737 normal electricity cus-tomers and 1200 malicious electricity customers within525 days (with 24 hours)

Our preliminary analytical results (refer to Section 31)reveal the electricity usage consumer load profiles are af-fected not only by weather and conditions but also by thetypes of consumers and other factors However it is difficultto extracting features based on experience from the 1Delectricity consumption data since the electricity con-sumption every day fluctuates in a relatively independentway Motivated by the previous work in [29] we design adeep CNN to process the electricity consumption data in a2D manner In particular we transform the 1D electricityconsumption data into 2D data according to days We define2D data as a matrix of actual energy consumption values fora specific customer where the rows of 2D data representdays as D (D 1 525) and columns represent the timeperiods as T (T 1 24) It is worthmentioning that theCNN learns the features between different hours of the dayand different days from 2D data

-en we divide the dataset into train and test sets usingthe cross-validation method in which 80 are the train setand 20 are the test set Given that electricity theftconsumers remarkably outnumber nonfraudulent onesthe imbalanced nature of the dataset can have a majornegative impact on the performance of supervised ma-chine learning methods To reduce this bias the SMOTalgorithm is used to make the number of normal andabnormal samples equal in the train set Finally the traindataset contains 7484 customers in which the number ofnormal and abnormal samples is equal And the testdataset consists of 1669 customers

5 The Novel CNN-RF Algorithm

In this section the design of the CNN-RF structure ispresented in detail and some techniques are proposed toreduce overfitting Moreover correlative parameters in-cluding the number of filters and the size of each feature mapand kernel are given Finally the training process of theproposed CNN-RF algorithm is introduced

51 CNN-RF Architecture -e proposed CNN-RF model isdesigned by integrating CNN and RF classifiers As shown inFigure 6 the architecture of the CNN-RF mainly includes anautomatic feature extractor and a trainable RF classifier -efeature extractor CNN consists of convolutional layersdownsampling layers and a fully connection layer-e CNNis a deep supervised learning architecture which usuallyincludes multiple layers and can be trained with the back-propagation algorithm It also can explore the intricatedistribution in smart meter data by performing the

0 10 20 30 40 50 60 70 80 90 100Time (days)

0

5

10

15

20

25

30

Usa

ge (k

Wmiddotd

ay)

SpringSummer

AutumnWinter

Figure 5 Four seasonsrsquo load profiles for a customer

Journal of Electrical and Computer Engineering 5

stochastic gradient method -e RF classifier consists of acombination of tree classifiers in which each tree contrib-utes with a single vote to the assignment of themost frequentclass in input data [30] In the following the design of eachpart of the CNN-RF structure is presented in detail

511 Convolutional Layer -e main purpose of convolu-tional layers is to learn feature representation of input dataand reduce the influence of noise -e convolutional layer iscomposed of several feature filters which are used tocompute different feature maps Specially to reduce theparameters of networks and lower the complication ofrelevant layers during the process of generating featuremaps the weights of each kernel in each feature map areshared Moreover each neuron in the convolutional layerconnects the local region of front layers and then a ReLUactivation function is applied on the convolved resultsMathematically the outputs of the convolutional layers canbe expressed as

yconv Xfl

l1113872 1113873 δ 1113944

Fl

fl1W

fl

l lowastXfl

l + bfl

l⎛⎝ ⎞⎠ (4)

where δ is the activation function lowast is the convolutionoperation and both W

fl

l and bfl

l are the learnable param-eters in the f-th feature filter

512 Downsampling Layer Downsampling is an importantconcept of the CNN which is usually placed between twoconvolutional layers It can reduce the number of parametersand achieve dimensionality reduction In particular eachfeature map of the pooling layer is connected to its corre-sponding feature map of the previous convolutional layerAnd the size of output feature maps is reduced withoutreducing the number of output feature maps in thedownsampling layer In the literature downsampling op-erations mainly include max-pooling and average-pooling-e max-pooling operation transforms small windows intosingle values by maximum which is expressed as formula 5but average-pooling returns average values of activations in asmall window In [31] experiments show that the max-pooling operation shows better performance than average-pooling -erefore the max-pooling operation is usuallyimplemented in the downsampling layer

ypool Xfl

l1113872 1113873 maxmisinM

Xlm1113966 1113967 (5)

where M is a set of activations in the pooling window andmis the index of activations in the pooling window

513 Fully Connected Layer With the features extracted bysequential convolution and pooling a fully connected layeris applied for flattening feature maps into one vector asfollows

yflXl( 1113857 δ Wl middot Xl + bl( 1113857 (6)

where Wl is the weight of the l-th layer and bl is the bias ofthe l-th layer

514 RF Classifier Layer Traditionally the softmax clas-sifier is used for the last output layer in the CNN but theproposed model uses the RF classifier to predict the classbased on the obtained features -erefore the RF classifierlayer can be defined as

yout XL( 1113857 sigm Wrf middot Xl + bl( 1113857 (7)

where sigm(middot) is the sigmoid function which maps ab-normal values to 0 and normal values to 1-e parameter setWrf in the RF layer includes the number of decision treesand the maximum depth of the tree which are obtained bythe grid search algorithm

52 Techniques for Selecting Parameters and AvoidingOverfitting -e detailed parameters of the proposed CNN-RF structure are summarized in Table 1 which include thenumbers of filters in each layer filter size and stride Twofactors are considered in determining the CNN structure-e first factor is the characteristics of consumersrsquo electricityconsumption behaviours Since the load files are such avariable two convolutional layers with a filter size of 3lowast 3(stride 1) are applied to capture hidden features for detectingelectricity theft Moreover because the dimension of inputdata size is 525 times 24 which is similar to what is used in imagerecognition problems two max-pooling layers with a filtersize of 2lowast 2 (stride 2) are used -e second factor is thenumber of training samples since the number of samples islimited to reduce the risk of overfitting we design a fullyconnected layer with a dropout rate of 04 whose main idea

d d

d d d d

n a n a n a n a

Input data Convolutional layer Convolutional layer

Downsamplinglayer

Downsamplinglayer

L232525 lowast 24 L332268 lowast 12 L464263 lowast 12 L564132 lowast 6L1525 lowast 24Fully connected layer

Random forest layer

L6132 lowast 1

Figure 6 Architecture of the hybrid CNN-RF model

6 Journal of Electrical and Computer Engineering

is that dropping units randomly from the neural networkduring training can prevent units from coadapting too much[32] and make a neuron not rely on the presence of otherspecific neurons Applying an appropriate training methodis also useful for reducing overfitting It is essentially aregularization that adds a penalty for weight update at eachiteration And we also use a binary cross entropy as the lossfunction Finally the grid search algorithm is used to op-timize RF classifier parameters such as the maximumnumber of decision trees and features

53 Training Process of CNN-RF -ere is no doubt that theCNN-RF structure needs to train the CNN at first before theRF classifier is invoked After that the RF classifier is trainedbased on features learned by the CNN

531 Training Process of CNN With a batch size (thenumber of training samples in an iteration) of 100 CNNs aretrained using the forward propagation algorithm andbackpropagation algorithm As shown in Figure 6 firstly thedata in the input layer are transferred to convolutionallayers pooling layers and a fully connected layer and thepredicted value is obtained As shown in Figure 7 thebackpropagation phase begins to update parameters if thedifference of the output value and target value is too largeand exceeds a certain threshold -e difference between theactual output of the neural network and the expected outputdetermines the adjustment direction and strip size of weightsamong layers in the network

Given a sample set D (x1 y1) (xm ym)1113864 1113865 with msamples in total 1113957y is obtained at first by using the feed-forward process As for all samples the mean of differencebetween the actual output yi and the expected output1113957ywb(xi) can be expressed by

J(W b) 1m

1113944

m

i1

12

1113957ywb xi( 1113857 minus yi

2

1113874 1113875 (8)

where w represents the connection weight between layers ofthe network and b is the corresponding bias

-en each parameter is initialized with a random valuegenerated by a normal distribution when the mean is 0 andvariance is ϑ which are updated with the gradient descentmethod -e corresponding expressions are as follows

W(l)ij W

(l)ij minus α

z

zW(l)ij

J(W b) (9)

b(l)i b

(l)i minus α

z

zb(l)i

J(W b) (10)

where α represents a learning rate W(l)ij represents the

connection weight between the i-th neuron in the l-th layerand the j-th neuron in the (l + 1)-th layer and b

(l)i represents

the bias of the i-th neuron in the l-th layerFinally all parameters W and b in the network are

updated according to formulas 9 and 10 And all these stepsare repeated to reduce the objective function J(W b)

532 Training Process of RF Classifier -e process of the RFclassifier takes advantage of two powerful machine learningtechniques bagging and random feature selection Baggingis a method to generate a particular bootstrap samplerandomly from the learnt features for growing each treeInstead of using all features the RF randomly selects a subsetof features to determine the best splitting points whichensures complete splitting to one classification of leaf nodesof decision trees Specifically the whole process of RFclassification can be written as Algorithm 1

6 Experiments and Result Analysis

All the experiments are implemented using Python 36 on astandard PC with an Intel Core i5-7500MQ CPU running at340GHz and with 80 GB of RAM-e CNN architecture isconstructed based on TensorFlow [33] and the interfacebetween the CNN and the RF is programmed using scikit-learn [34]

61 Performance Metrics In this paper the problem ofelectricity theft is considered a discrete two-class classifi-cation task which should assign each consumer into one ofthe predefined classes (abnormal or normal)

In particular the result of classifier validation is oftenpresented as confusion matrices Table 2 shows a confusionmatrix applied in electricity theft detection where TP FNFP and TN respectively represent the number of con-sumers that are classified correctly as normal classifiedfalsely as abnormal classified falsely as normal andclassified correctly as abnormal Although these four indicesshow all of the information about the performance of theclassifier more meaningful performance measurescan be extracted from them to illustrate certain performancecriteria such as TPR TP(TP + FN) FPR FP(FP +TN) precisionTP(TP+ FP) recallTP(TP+ FN)and F1 score 2(precision times recall)(precision + recall)where the true positive rate (TPR) is the ability of theclassifier to correctly identify a consumer as normal alsoreferred to as sensitivity and the false positive rate (FPR) isthe risk of positively identifying an abnormal consumerPrecision is the ratio of consumers detected correctly to thetotal number of detected normal consumers Recall is theratio of the number of consumers correctly detected to theactual number of consumers (Figure 7)

In addition the ROC curve is introduced by plotting allTPR values on the y-axis against their equivalent FPR valuesfor all available thresholds on the x-axis -e ROC curve atthe upper left has better detection effects where lower FPRvalues are caused by the same TPR -e AUC indicates the

Table 1 Parameters of the proposed CNN-RF model

Layer Size of the filter Number of filters StrideConvolution1 3lowast 3 32 1MaxPooling1 2lowast 2 mdash 2Convolution2 3lowast 3 64 1MaxPooling2 2lowast 2 mdash 2

Journal of Electrical and Computer Engineering 7

quality of the classifier and the larger AUC represents betterperformance

62 Experiment Based onCNN-RFModel -e experimentis conducted based on normalized consumption dataWith the batch size of 100 the training procedure isstopped after 300 epochs As shown in Figure 8 it could beseen that training and testing losses settled downsmoothly -is means that the proposed model learnedtrain dataset and there was small bias in the test datadataset as well -en the RF classifier is the last layer of theCNN to predict labels of input consumers Forty valuesfrom the fully connected layer of the trained CNN are usedas a new feature vector to detect abnormal users and arefed to the RF for learning and testing To build the RF inthe hybrid model the optimal parameters are determinedby applying the grid search algorithm based on the

training dataset -e grid searching range of each pa-rameter is given as follows numTrees [50 60 100]

and max depth [10 20 40] 6 times 4 24 are tried with

Require(1) Original training datasetsS (xi yi) i 1 2 3 m1113864 1113865 (X Y) isin Re times R

(2) Test datasets xj isin Rm

for d 1 to Ntree do(1) Draw a bootstrap Sd from the original training data(2) Grow an unpruned tree hd using data Sd

(a)Randomly select new feature set Mtry from original feature set e

(b)Select the best features from feature set Mtry based on Gini indicator on each node(c)Split until each tree grows to its maximum

end forEnsure(1) -e collection of trees hd d 1 2 Ntree1113864 1113865

(2) For xj the output of a decision tree is hd(xj)

f(xj) majority vote hd(xj)1113966 1113967d

Ntreereturn f(xj)

ALGORITHM 1 Random forest algorithm

x Wconv Wpool J (W b)

bx

bconv

bpool

(a)

x Wconv Wpool J (W b)

bx

bconv

bpool

(b)

Figure 7 Process of forward propagation (a) and backpropagation (b)

0 50 100 150 200 250 300Epoch

03

035

04

045

05

055

06

Usa

ge (k

W)

TrainingTesting

Figure 8 Loss curves of training and testing

Table 2 Confusion matrix applied in electricity theft detection

Actualdetected Normal AbnormalNormal TP (true positive) FN (false negative)Abnormal FP (false positive) TN (true negative)

8 Journal of Electrical and Computer Engineering

different combinations -e best result is achieved whennumTrees 100 and maxDepth 30 -ese parametersare then used to train the hybrid model

-e classification results of the detection model pro-posed in this paper are as follows the number of TPs FNsFPs and TNs is 1041 28 23 and 577 respectively-erefore the precision recall and F1 score can be calcu-lated as shown in Table 3 where Class 0 is an abnormalpattern and Class 1 is a normal pattern Also the AUC valueof the CNN-RF algorithm is calculated which is 098 and farbetter than the value of the baseline model (AUC 05) -ismeans the proposed algorithm is able to classify both classesaccurately as shown in Figure 9

63 Comparative Experiments and the Analysis of ResultsTo evaluate the accuracy of the CNN-RF model comparativeexperiments are conducted based on handcrafted featureswith nondeep learning methods including SVM RF gradientboosting decision tree (GBDT) and logistic regression (LR)Moreover we compared the obtained classification results byvarious supervised classifiers CNN features with the SVMclassifier (CNN-SVM) and CNN features with the GBDTclassifier (CNN-GBDT) to those obtained by the previousclassification work In the following five methods are in-troduced respectively and then the results are analysed

(1) Logistic regression it is a basic model in binaryclassification that is equivalent to one layer of neuralnetwork with sigmoid activation function A sigmoidfunction obtains a value ranging from 0 to 1 based onlinear regression -en any value larger than 05 willbe classified as a normal pattern and less than 05 willbe classified as an abnormal pattern

(2) Support vector machine this classifier with kernelfunctions can transform a nonlinear separableproblem into a linear separable problem by pro-jecting data into the feature space and then findingthe optimal separate hyperplane It is applied topredict the patterns of consumers based on theaforementioned handcrafted features

(3) Random forest it is essentially an integration ofmultiple decision trees that can achieve better per-formance when maintaining the effective control ofoverfitting in comparison with a single decision tree-e RF classifier also can handle high-dimensionaldata while maintaining high computationalefficiency

(4) Gradient boosting decision tree it is an iterativedecision tree algorithm which consists of multipledecision trees For the final output all results orweights are accumulated in the GBDT while randomforests use majority voting

(5) Deep learning methods softmax is used in the lastlayer of the CNN CNN-GBDT and CNN-SVM incomparison with the proposed method

Table 4 summarizes the parameters of comparativemethods All experiments have been done Figure 10 shows

the results of different methods and proposed hybrid CNN-RF method and it can be seen that the AUC value of CNN-RF CNN-GBDT CNN-SVM CNN SVM RF LR andGBDT is 099 097 098 093 077 091 063 and 077 AndFigure 11 presents the results of all comparative experimentsin terms of precision recall and F1 score Among theperformances of eight different electricity theft detectionalgorithms the deep learning methods (such as CNN CNN-RF CNN-SVM and CNN-GBDT) show better perfor-mances than machine learning methods (eg LR SVMGBDT and RF)-e reason of this result is that the CNN canlearn features through a large number of electricity con-sumption data -us the accuracy of electricity theft de-tection is greatly improved In addition it is indicated thatthe proposed CNN-RF model shows the best performancecompared with other methods

Table 3 Classification score summary for CNN-RF

Precision Recall F1 scoreClass 0 097 096 096Class 1 098 098 098Averagetotal 097 097 097

0 01 02 03 04 05 06 07 08 09 1FPR

0

01

02

03

04

05

06

07

08

09

1

TPR

AUC (CNN-RF) = 099

Figure 9 ROC curve for CNN-RF

Table 4 Experimental parameters for comparative methods

Methods Input data Parameters

LR Features(1D)

Penalty L2Inverse of regulation strength 10

SVM Features(1D)

Penalty parameter of the error term 105Parameter of kernel function (RBF)

00005

GBDT Features(1D) -e number of estimators 200

RF Features(1D) -e number of trees in the forest 100

CNN Raw data(2D)

-e max depth of each tree 30-e same as the CNN component of the

proposed approach

Journal of Electrical and Computer Engineering 9

As detecting electricity theft consumers is achieved bydiscovering the anomalous consumption behaviours ofsuspicious customers we can also regard it as an anomalydetection problem essentially Since CNN-RF has shownsuperior performance on the SEAI dataset it would beinteresting to test its generalization ability on other datasetsIn particular we have further tested the performance ofCNN-RF together with the above-mentioned models onanother dataset Low Carbon London (LCL) dataset [35]-e LCL dataset is composed of over 5500 honest electricityconsumers in total each of which consists of 525 days (the

sampling rate is hour) Among the 5500 customers werandomly chose 1200 as malicious customers and modifiedtheir intraday load profiles according to the fraudulentsample generation methods similar to the SEAI dataset

After the preprocess of the LCL dataset (similar to theSEAI dataset) the train dataset contains 7584 customers inwhich the number of normal and abnormal samples is equalAnd the test dataset consists of 1700 customers together withcorresponding labels indicating anomaly or not -en thetraining of the proposed CNN-RF model and comparativemodels is all done following the aforementioned procedure

CNN

CNN

-RF

CNN

-GBD

T

CNN

-SV

M

GBD

T LR RF

SVM

Different algorithms

0

01

02

03

04

05

06

07

08

09

1

Perfo

rman

ce m

etric

PrecisionRecallF1 score

Figure 11 Comparison with other algorithms based on the SEAIdataset

0 01 02 03 04 05 06 07 08 09 1FPR

0

01

02

03

04

05

06

07

08

09

1

TPR

AUC (CNN) = 093AUC (CNN-RF) = 099AUC (CNN-GBDT) = 097AUC (CNN-SVM) = 098

AUC (GBDT) = 077AUC (LR) = 063AUC (RF) = 091AUC (SVM) = 077

Figure 10 ROC curves for eight different algorithms based on theSEAI dataset

TPR

FPR

AUC (CNN) = 095AUC (CNN-RF) = 097AUC (CNN-GBDT) = 094AUC (CNN-SVM) = 096

AUC (GBDT) = 074AUC (LR) = 068AUC (RF) = 077AUC (SVM) = 071

0

01

02

03

04

05

06

07

08

09

1

0 01 02 03 04 05 06 07 08 09 1

Figure 12 ROC curves for eight different algorithms based on theLCL dataset

0

01

02

03

04

05

06

07

08

09

1

Perfo

rman

ce m

etric

CNN

CNN

-RF

CNN

-GBD

T

CNN

-SV

M

GBD

T

SVM

Different algorithms

LR RF

PrecisionRecallF1 score

Figure 13 Comparison with other algorithms based on the LCLdataset

10 Journal of Electrical and Computer Engineering

based on the train dataset -e performances of all themodels on the test dataset are shown in Figures 12 and 13

Figures 12 and 13 show that the performances of CNN-SVM CNN-GBDT and CNN-RF are better than those ofother models on the whole Since they are deep learningmodels the distinguishable features can be learned directlyfrom raw smart meter data by the CNN which has equippedthem with better generalization ability In addition theCNN-RF model has achieved the best performance since theRF can handle high-dimensional data while maintaininghigh computational efficiency

7 Conclusions

In this paper a novel CNN-RF model is presented to detectelectricity theft In this model the CNN is similar to anautomatic feature extractor in investigating smart meter dataand the RF is the output classifier Because a large number ofparameters must be optimized that increase the risk ofoverfitting a fully connected layer with a dropout rate of 04is designed during the training phase In addition the SMOTalgorithm is adopted to overcome the problem of dataimbalance Some machine learning and deep learningmethods such as SVM RF GBDT and LR are applied to thesame problem as a benchmark and all those methods havebeen conducted on SEAI and LCL datasets -e resultsindicate that the proposed CNN-RF model is quite apromising classification method in the electricity theft de-tection field because of two properties -e first is thatfeatures can be automatically extracted by the hybrid modelwhile the success of most other traditional classifiers relieslargely on the retrieval of good hand-designed featureswhich is a laborious and time-consuming task -e secondlies in that the hybrid model combines the advantages of theRF and CNN as both are the most popular and successfulclassifiers in the electricity theft detection field

Since the detection of electricity theft affects the privacyof consumers the future work will focus on investigatinghow the granularity and duration of smart meter data mightaffect this privacy Extending the proposed hybrid CNN-RFmodel to other applications (eg load forecasting) is a taskworth investigating

Data Availability

Previously reported [SEAI][LCL] datasets were used tosupport this study and are available at [httpwwwucdieissdadata][httpsbetaukdataserviceacukdatacataloguestudiesstudyid=7857] -ese prior studies (and datasets)are cited at relevant places within the text as references[19 26]

Conflicts of Interest

-e authors declare that there are no conflicts of interest

Acknowledgments

-is work was supported by the National Key Research andDevelopment Program of China (2016YFB0901900) the

Natural Science Foundation of Hebei Province of China(F2017501107) Open Research Fund from the State KeyLaboratory of Rolling and Automation Northeastern Uni-versity (2017RALKFKT003) the Foundation of Northeast-ern University at Qinhuangdao (XNB201803) and theFundamental Research Funds for the Central Universities(N182303037)

References

[1] S S S R Depuru L Wang and V Devabhaktuni ldquoElectricitytheft overview issues prevention and a smart meter basedapproach to control theftrdquo Energy Policy vol 39 no 2pp 1007ndash1015 2011

[2] J P Navani N K Sharma and S Sapra ldquoTechnical and non-technical losses in power system and its economic conse-quence in Indian economyrdquo International Journal of Elec-tronics and Computer Science Engineering vol 1 no 2pp 757ndash761 2012

[3] S McLaughlin B Holbert A Fawaz R Berthier andS Zonouz ldquoA multi-sensor energy theft detection frameworkfor advanced metering infrastructuresrdquo IEEE Journal on Se-lected Areas in Communications vol 31 no 7 pp 1319ndash13302013

[4] P McDaniel and S McLaughlin ldquoSecurity and privacychallenges in the smart gridrdquo IEEE Security amp PrivacyMagazine vol 7 no 3 pp 75ndash77 2009

[5] T B Smith ldquoElectricity theft a comparative analysisrdquo EnergyPolicy vol 32 no 1 pp 2067ndash2076 2004

[6] J I Guerrero C Leon I Monedero F Biscarri andJ Biscarri ldquoImproving knowledge-based systems with sta-tistical techniques text mining and neural networks for non-technical loss detectionrdquo Knowledge-Based Systems vol 71no 4 pp 376ndash388 2014

[7] C C O Ramos A N Souza G Chiachia A X Falcatildeo andJ P Papa ldquoA novel algorithm for feature selection usingharmony search and its application for non-technical lossesdetectionrdquo Computers amp Electrical Engineering vol 37 no 6pp 886ndash894 2011

[8] P Glauner J A Meira P Valtchev R State and F Bettingerldquo-e challenge of non-technical loss detection using artificialintelligence a surveyficial intelligence a surveyrdquo In-ternational Journal of Computational Intelligence Systemsvol 10 no 1 pp 760ndash775 2017

[9] S-C Huang Y-L Lo and C-N Lu ldquoNon-technical lossdetection using state estimation and analysis of variancerdquoIEEE Transactions on Power Systems vol 28 no 3pp 2959ndash2966 2013

[10] O Rahmati H R Pourghasemi and A M Melesse ldquoAp-plication of GIS-based data driven random forest and max-imum entropy models for groundwater potential mapping acase study at Mehran region Iranrdquo CATENA vol 137pp 360ndash372 2016

[11] N Edison A C Aranha and J Coelho ldquoProbabilisticmethodology for technical and non-technical losses estima-tion in distribution systemrdquo Electric Power Systems Researchvol 97 no 11 pp 93ndash99 2013

[12] J B Leite and J R S Mantovani ldquoDetecting and locating non-technical losses in modern distribution networksrdquo IEEETransactions on Smart Grid vol 9 no 2 pp 1023ndash1032 2018

[13] S Amin G A Schwartz A A Cardenas and S S SastryldquoGame-theoretic models of electricity theft detection in smartutility networks providing new capabilities with advanced

Journal of Electrical and Computer Engineering 11

metering infrastructurerdquo IEEE Control Systems Magazinevol 35 no 1 pp 66ndash81 2015

[14] A H Nizar Z Y Dong Y Wang and A N Souza ldquoPowerutility nontechnical loss analysis with extreme learning ma-chine methodrdquo IEEE Transactions on Power Systems vol 23no 3 pp 946ndash955 2008

[15] J Nagi K S Yap S K Tiong S K Ahmed andMMohamadldquoNontechnical loss detection for metered customers in powerutility using support vector machinesrdquo IEEE Transactions onPower Delivery vol 25 no 2 pp 1162ndash1171 2010

[16] E W S Angelos O R Saavedra O A C Cortes andA N de Souza ldquoDetection and identification of abnormalitiesin customer consumptions in power distribution systemsrdquoIEEE Transactions on Power Delivery vol 26 no 4pp 2436ndash2442 2011

[17] K A P Costa L A M Pereira R Y M NakamuraC R Pereira J P Papa and A Xavier Falcatildeo ldquoA nature-inspired approach to speed up optimum-path forest clusteringand its application to intrusion detection in computer net-worksrdquo Information Sciences vol 294 pp 95ndash108 2015

[18] R R Bhat R D Trevizan X Li and A Bretas ldquoIdentifyingnontechnical power loss via spatial and temporal deeplearningrdquo in Proceedings of the IEEE International Conferenceon Machine Learning and Applications Anaheim CA USADecember 2016

[19] M Ismail M Shahin M F Shaaban E Serpedin andK Qaraqe ldquoEfficient detection of electricity theft cyber attacksin ami networksrdquo in Proceedings of the IEEE Wireless Com-munications and Networking Conference Barcelona SpainApril 2018

[20] RMehrizi X Peng X Xu S Zhang DMetaxas and K Li ldquoAcomputer vision based method for 3D posture estimation ofsymmetrical liftingrdquo Journal of Biomechanics vol 69 no 1pp 40ndash46 2018

[21] K He X Zhang S Ren and J Sun ldquoDelving deep intorectifiers surpassing human-level performance on imagenetclassificationrdquo in Proceedings of the IEEE Conference onComputer for Sustainable Global Development pp 1026ndash1034New Delhi India March 2015

[22] A Sharif Razavian H Azizpour J Sullivan and S CarlssonldquoCNN features off-the-shelf an astounding baseline for rec-ognitionrdquo Astrophysical Journal Supplement Series vol 221no 1 pp 806ndash813 2014

[23] Z Zheng Y Yang X Niu H-N Dai and Y Zhou ldquoWide anddeep convolutional neural networks for electricity-theft de-tection to secure smart gridsrdquo IEEE Transactions on IndustrialInformatics vol 14 no 4 pp 1606ndash1615 2018

[24] D Svozil V Kvasnicka and J Pospichal ldquoIntroduction tomulti-layer feed-forward neural networksrdquo Chemometricsand Intelligent Laboratory Systems vol 39 no 1 pp 43ndash621997

[25] Irish social science data archive httpwwwucdieissdadatacommissionforenegryregulationcer

[26] P Jokar N Arianpoo and V C M Leung ldquoElectricity theftdetection in AMI using customersrsquo consumption patternsrdquoIEEE Transactions on Smart Grid vol 7 no 1 pp 216ndash2262016

[27] Y Wang Q Chen D Gan J Yang D S Kirschen andC Kang ldquoDeep learning-based socio-demographic in-formation identification from smart meter datafication fromsmart meter datardquo IEEE Transactions on Smart Grid vol 10no 3 pp 2593ndash2602 2019

[28] V Chandola A Banerjee and V Kumar ldquoAnomaly de-tection a surveyrdquo ACM Computing Surveys vol 41 no 32009

[29] H-T Cheng L Koc J Harmsen et al ldquoWide amp deep learningfor recommender systemsrdquo in Proceedings of the 1stWorkshopon Deep Learning for Recommender SystemsmdashDLRS 2016pp 7ndash10 Boston MA USA September 2016

[30] E Feczko N M Balba O Miranda-Dominguez et alldquoSubtyping cognitive profiles in autism spectrum disorderusing a functional random forest algorithmfiles in autismspectrum disorder using a functional random forest algo-rithmrdquo NeuroImage vol 172 pp 674ndash688 2018

[31] Y-L Boureau J Ponce and Y LeCun ldquoA theoretical analysisof feature pooling in visual recognitionrdquo in Proceedings of the27th International Conference on Machine Learningpp 111ndash118 Haifa Israel June 2010

[32] N Srivastava G Hinton A Krizhevsky I Sutskever andR Salakhutdinov ldquoDropout a simply way to prevent neuralnetworks from overfittingrdquo Ce Journal of Machine LearningResearch vol 15 no 1 pp 1929ndash1958 2014

[33] M Abadi A Agarwal P Barham et al ldquoTensorflow Large-scale machine learning on heterogeneous distributed sys-temsrdquo 2016 httpsarxivorgabs16030446

[34] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 no 4 pp 2825ndash2830 2011

[35] J R Schofield ldquoLow carbon london project data from thedynamic time-of-use electricity pricing trialrdquo 2017 httpsdiscoverukdataserviceacukcataloguesn7857

12 Journal of Electrical and Computer Engineering

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 5: ElectricityTheftDetectioninPowerGridswithDeep ...downloads.hindawi.com/journals/jece/2019/4136874.pdf(1)Datacleaning:thereareerroneousvalues(e.g.,outliers) in raw data, which correspond

F xit1113872 1113873 avg xit1113872 1113873 + 2σ xit1113872 1113873 xit gt xlowastit

xit else

⎧⎨

⎩ (1)

where xlowastit is computed by the mean avg(middot) and standarddeviation σ(middot) for each time interval comprising theweekdaytime pair for each month

(2) Missing value imputation because of various reasonssuch as storage issues and failure of smart meters thereare missing values in electricity consumption data-rough the analysis of original data it is found thatthere are two kinds of data missing one is the con-tinuous missing of multiple data and the solution is todelete the users when the number of missing valuesexceeds 10 the other is missing single data which isprocessed by the formula (2) -us missing values canbe recovered as follows

F xit1113872 1113873

xitminus 1 + xit+1

2 xit isin NaN

xit else

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

(2)

where xit stands for the electricity usage of the consumer i

over a period (eg an hour) if xit is null we represent it asNaN

(3) Data normalization finally data need to be nor-malized because the neural network is sensitive to thediverse data One of the common methods for thisscope is min-max normalization which is computedas

F xit1113872 1113873 xit minus min xiT1113872 1113873

max xiT1113872 1113873 minus min xiT1113872 1113873 (3)

where min(middot) and max(middot) represent the min and max valuesover a day respectively

4 Generation of Train and Test Datasets

After data processing the SEAI dataset contains the elec-tricity consumption data of 4737 normal electricity cus-tomers and 1200 malicious electricity customers within525 days (with 24 hours)

Our preliminary analytical results (refer to Section 31)reveal the electricity usage consumer load profiles are af-fected not only by weather and conditions but also by thetypes of consumers and other factors However it is difficultto extracting features based on experience from the 1Delectricity consumption data since the electricity con-sumption every day fluctuates in a relatively independentway Motivated by the previous work in [29] we design adeep CNN to process the electricity consumption data in a2D manner In particular we transform the 1D electricityconsumption data into 2D data according to days We define2D data as a matrix of actual energy consumption values fora specific customer where the rows of 2D data representdays as D (D 1 525) and columns represent the timeperiods as T (T 1 24) It is worthmentioning that theCNN learns the features between different hours of the dayand different days from 2D data

-en we divide the dataset into train and test sets usingthe cross-validation method in which 80 are the train setand 20 are the test set Given that electricity theftconsumers remarkably outnumber nonfraudulent onesthe imbalanced nature of the dataset can have a majornegative impact on the performance of supervised ma-chine learning methods To reduce this bias the SMOTalgorithm is used to make the number of normal andabnormal samples equal in the train set Finally the traindataset contains 7484 customers in which the number ofnormal and abnormal samples is equal And the testdataset consists of 1669 customers

5 The Novel CNN-RF Algorithm

In this section the design of the CNN-RF structure ispresented in detail and some techniques are proposed toreduce overfitting Moreover correlative parameters in-cluding the number of filters and the size of each feature mapand kernel are given Finally the training process of theproposed CNN-RF algorithm is introduced

51 CNN-RF Architecture -e proposed CNN-RF model isdesigned by integrating CNN and RF classifiers As shown inFigure 6 the architecture of the CNN-RF mainly includes anautomatic feature extractor and a trainable RF classifier -efeature extractor CNN consists of convolutional layersdownsampling layers and a fully connection layer-e CNNis a deep supervised learning architecture which usuallyincludes multiple layers and can be trained with the back-propagation algorithm It also can explore the intricatedistribution in smart meter data by performing the

0 10 20 30 40 50 60 70 80 90 100Time (days)

0

5

10

15

20

25

30

Usa

ge (k

Wmiddotd

ay)

SpringSummer

AutumnWinter

Figure 5 Four seasonsrsquo load profiles for a customer

Journal of Electrical and Computer Engineering 5

stochastic gradient method -e RF classifier consists of acombination of tree classifiers in which each tree contrib-utes with a single vote to the assignment of themost frequentclass in input data [30] In the following the design of eachpart of the CNN-RF structure is presented in detail

511 Convolutional Layer -e main purpose of convolu-tional layers is to learn feature representation of input dataand reduce the influence of noise -e convolutional layer iscomposed of several feature filters which are used tocompute different feature maps Specially to reduce theparameters of networks and lower the complication ofrelevant layers during the process of generating featuremaps the weights of each kernel in each feature map areshared Moreover each neuron in the convolutional layerconnects the local region of front layers and then a ReLUactivation function is applied on the convolved resultsMathematically the outputs of the convolutional layers canbe expressed as

yconv Xfl

l1113872 1113873 δ 1113944

Fl

fl1W

fl

l lowastXfl

l + bfl

l⎛⎝ ⎞⎠ (4)

where δ is the activation function lowast is the convolutionoperation and both W

fl

l and bfl

l are the learnable param-eters in the f-th feature filter

512 Downsampling Layer Downsampling is an importantconcept of the CNN which is usually placed between twoconvolutional layers It can reduce the number of parametersand achieve dimensionality reduction In particular eachfeature map of the pooling layer is connected to its corre-sponding feature map of the previous convolutional layerAnd the size of output feature maps is reduced withoutreducing the number of output feature maps in thedownsampling layer In the literature downsampling op-erations mainly include max-pooling and average-pooling-e max-pooling operation transforms small windows intosingle values by maximum which is expressed as formula 5but average-pooling returns average values of activations in asmall window In [31] experiments show that the max-pooling operation shows better performance than average-pooling -erefore the max-pooling operation is usuallyimplemented in the downsampling layer

ypool Xfl

l1113872 1113873 maxmisinM

Xlm1113966 1113967 (5)

where M is a set of activations in the pooling window andmis the index of activations in the pooling window

513 Fully Connected Layer With the features extracted bysequential convolution and pooling a fully connected layeris applied for flattening feature maps into one vector asfollows

yflXl( 1113857 δ Wl middot Xl + bl( 1113857 (6)

where Wl is the weight of the l-th layer and bl is the bias ofthe l-th layer

514 RF Classifier Layer Traditionally the softmax clas-sifier is used for the last output layer in the CNN but theproposed model uses the RF classifier to predict the classbased on the obtained features -erefore the RF classifierlayer can be defined as

yout XL( 1113857 sigm Wrf middot Xl + bl( 1113857 (7)

where sigm(middot) is the sigmoid function which maps ab-normal values to 0 and normal values to 1-e parameter setWrf in the RF layer includes the number of decision treesand the maximum depth of the tree which are obtained bythe grid search algorithm

52 Techniques for Selecting Parameters and AvoidingOverfitting -e detailed parameters of the proposed CNN-RF structure are summarized in Table 1 which include thenumbers of filters in each layer filter size and stride Twofactors are considered in determining the CNN structure-e first factor is the characteristics of consumersrsquo electricityconsumption behaviours Since the load files are such avariable two convolutional layers with a filter size of 3lowast 3(stride 1) are applied to capture hidden features for detectingelectricity theft Moreover because the dimension of inputdata size is 525 times 24 which is similar to what is used in imagerecognition problems two max-pooling layers with a filtersize of 2lowast 2 (stride 2) are used -e second factor is thenumber of training samples since the number of samples islimited to reduce the risk of overfitting we design a fullyconnected layer with a dropout rate of 04 whose main idea

d d

d d d d

n a n a n a n a

Input data Convolutional layer Convolutional layer

Downsamplinglayer

Downsamplinglayer

L232525 lowast 24 L332268 lowast 12 L464263 lowast 12 L564132 lowast 6L1525 lowast 24Fully connected layer

Random forest layer

L6132 lowast 1

Figure 6 Architecture of the hybrid CNN-RF model

6 Journal of Electrical and Computer Engineering

is that dropping units randomly from the neural networkduring training can prevent units from coadapting too much[32] and make a neuron not rely on the presence of otherspecific neurons Applying an appropriate training methodis also useful for reducing overfitting It is essentially aregularization that adds a penalty for weight update at eachiteration And we also use a binary cross entropy as the lossfunction Finally the grid search algorithm is used to op-timize RF classifier parameters such as the maximumnumber of decision trees and features

53 Training Process of CNN-RF -ere is no doubt that theCNN-RF structure needs to train the CNN at first before theRF classifier is invoked After that the RF classifier is trainedbased on features learned by the CNN

531 Training Process of CNN With a batch size (thenumber of training samples in an iteration) of 100 CNNs aretrained using the forward propagation algorithm andbackpropagation algorithm As shown in Figure 6 firstly thedata in the input layer are transferred to convolutionallayers pooling layers and a fully connected layer and thepredicted value is obtained As shown in Figure 7 thebackpropagation phase begins to update parameters if thedifference of the output value and target value is too largeand exceeds a certain threshold -e difference between theactual output of the neural network and the expected outputdetermines the adjustment direction and strip size of weightsamong layers in the network

Given a sample set D (x1 y1) (xm ym)1113864 1113865 with msamples in total 1113957y is obtained at first by using the feed-forward process As for all samples the mean of differencebetween the actual output yi and the expected output1113957ywb(xi) can be expressed by

J(W b) 1m

1113944

m

i1

12

1113957ywb xi( 1113857 minus yi

2

1113874 1113875 (8)

where w represents the connection weight between layers ofthe network and b is the corresponding bias

-en each parameter is initialized with a random valuegenerated by a normal distribution when the mean is 0 andvariance is ϑ which are updated with the gradient descentmethod -e corresponding expressions are as follows

W(l)ij W

(l)ij minus α

z

zW(l)ij

J(W b) (9)

b(l)i b

(l)i minus α

z

zb(l)i

J(W b) (10)

where α represents a learning rate W(l)ij represents the

connection weight between the i-th neuron in the l-th layerand the j-th neuron in the (l + 1)-th layer and b

(l)i represents

the bias of the i-th neuron in the l-th layerFinally all parameters W and b in the network are

updated according to formulas 9 and 10 And all these stepsare repeated to reduce the objective function J(W b)

532 Training Process of RF Classifier -e process of the RFclassifier takes advantage of two powerful machine learningtechniques bagging and random feature selection Baggingis a method to generate a particular bootstrap samplerandomly from the learnt features for growing each treeInstead of using all features the RF randomly selects a subsetof features to determine the best splitting points whichensures complete splitting to one classification of leaf nodesof decision trees Specifically the whole process of RFclassification can be written as Algorithm 1

6 Experiments and Result Analysis

All the experiments are implemented using Python 36 on astandard PC with an Intel Core i5-7500MQ CPU running at340GHz and with 80 GB of RAM-e CNN architecture isconstructed based on TensorFlow [33] and the interfacebetween the CNN and the RF is programmed using scikit-learn [34]

61 Performance Metrics In this paper the problem ofelectricity theft is considered a discrete two-class classifi-cation task which should assign each consumer into one ofthe predefined classes (abnormal or normal)

In particular the result of classifier validation is oftenpresented as confusion matrices Table 2 shows a confusionmatrix applied in electricity theft detection where TP FNFP and TN respectively represent the number of con-sumers that are classified correctly as normal classifiedfalsely as abnormal classified falsely as normal andclassified correctly as abnormal Although these four indicesshow all of the information about the performance of theclassifier more meaningful performance measurescan be extracted from them to illustrate certain performancecriteria such as TPR TP(TP + FN) FPR FP(FP +TN) precisionTP(TP+ FP) recallTP(TP+ FN)and F1 score 2(precision times recall)(precision + recall)where the true positive rate (TPR) is the ability of theclassifier to correctly identify a consumer as normal alsoreferred to as sensitivity and the false positive rate (FPR) isthe risk of positively identifying an abnormal consumerPrecision is the ratio of consumers detected correctly to thetotal number of detected normal consumers Recall is theratio of the number of consumers correctly detected to theactual number of consumers (Figure 7)

In addition the ROC curve is introduced by plotting allTPR values on the y-axis against their equivalent FPR valuesfor all available thresholds on the x-axis -e ROC curve atthe upper left has better detection effects where lower FPRvalues are caused by the same TPR -e AUC indicates the

Table 1 Parameters of the proposed CNN-RF model

Layer Size of the filter Number of filters StrideConvolution1 3lowast 3 32 1MaxPooling1 2lowast 2 mdash 2Convolution2 3lowast 3 64 1MaxPooling2 2lowast 2 mdash 2

Journal of Electrical and Computer Engineering 7

quality of the classifier and the larger AUC represents betterperformance

62 Experiment Based onCNN-RFModel -e experimentis conducted based on normalized consumption dataWith the batch size of 100 the training procedure isstopped after 300 epochs As shown in Figure 8 it could beseen that training and testing losses settled downsmoothly -is means that the proposed model learnedtrain dataset and there was small bias in the test datadataset as well -en the RF classifier is the last layer of theCNN to predict labels of input consumers Forty valuesfrom the fully connected layer of the trained CNN are usedas a new feature vector to detect abnormal users and arefed to the RF for learning and testing To build the RF inthe hybrid model the optimal parameters are determinedby applying the grid search algorithm based on the

training dataset -e grid searching range of each pa-rameter is given as follows numTrees [50 60 100]

and max depth [10 20 40] 6 times 4 24 are tried with

Require(1) Original training datasetsS (xi yi) i 1 2 3 m1113864 1113865 (X Y) isin Re times R

(2) Test datasets xj isin Rm

for d 1 to Ntree do(1) Draw a bootstrap Sd from the original training data(2) Grow an unpruned tree hd using data Sd

(a)Randomly select new feature set Mtry from original feature set e

(b)Select the best features from feature set Mtry based on Gini indicator on each node(c)Split until each tree grows to its maximum

end forEnsure(1) -e collection of trees hd d 1 2 Ntree1113864 1113865

(2) For xj the output of a decision tree is hd(xj)

f(xj) majority vote hd(xj)1113966 1113967d

Ntreereturn f(xj)

ALGORITHM 1 Random forest algorithm

x Wconv Wpool J (W b)

bx

bconv

bpool

(a)

x Wconv Wpool J (W b)

bx

bconv

bpool

(b)

Figure 7 Process of forward propagation (a) and backpropagation (b)

0 50 100 150 200 250 300Epoch

03

035

04

045

05

055

06

Usa

ge (k

W)

TrainingTesting

Figure 8 Loss curves of training and testing

Table 2 Confusion matrix applied in electricity theft detection

Actualdetected Normal AbnormalNormal TP (true positive) FN (false negative)Abnormal FP (false positive) TN (true negative)

8 Journal of Electrical and Computer Engineering

different combinations -e best result is achieved whennumTrees 100 and maxDepth 30 -ese parametersare then used to train the hybrid model

-e classification results of the detection model pro-posed in this paper are as follows the number of TPs FNsFPs and TNs is 1041 28 23 and 577 respectively-erefore the precision recall and F1 score can be calcu-lated as shown in Table 3 where Class 0 is an abnormalpattern and Class 1 is a normal pattern Also the AUC valueof the CNN-RF algorithm is calculated which is 098 and farbetter than the value of the baseline model (AUC 05) -ismeans the proposed algorithm is able to classify both classesaccurately as shown in Figure 9

63 Comparative Experiments and the Analysis of ResultsTo evaluate the accuracy of the CNN-RF model comparativeexperiments are conducted based on handcrafted featureswith nondeep learning methods including SVM RF gradientboosting decision tree (GBDT) and logistic regression (LR)Moreover we compared the obtained classification results byvarious supervised classifiers CNN features with the SVMclassifier (CNN-SVM) and CNN features with the GBDTclassifier (CNN-GBDT) to those obtained by the previousclassification work In the following five methods are in-troduced respectively and then the results are analysed

(1) Logistic regression it is a basic model in binaryclassification that is equivalent to one layer of neuralnetwork with sigmoid activation function A sigmoidfunction obtains a value ranging from 0 to 1 based onlinear regression -en any value larger than 05 willbe classified as a normal pattern and less than 05 willbe classified as an abnormal pattern

(2) Support vector machine this classifier with kernelfunctions can transform a nonlinear separableproblem into a linear separable problem by pro-jecting data into the feature space and then findingthe optimal separate hyperplane It is applied topredict the patterns of consumers based on theaforementioned handcrafted features

(3) Random forest it is essentially an integration ofmultiple decision trees that can achieve better per-formance when maintaining the effective control ofoverfitting in comparison with a single decision tree-e RF classifier also can handle high-dimensionaldata while maintaining high computationalefficiency

(4) Gradient boosting decision tree it is an iterativedecision tree algorithm which consists of multipledecision trees For the final output all results orweights are accumulated in the GBDT while randomforests use majority voting

(5) Deep learning methods softmax is used in the lastlayer of the CNN CNN-GBDT and CNN-SVM incomparison with the proposed method

Table 4 summarizes the parameters of comparativemethods All experiments have been done Figure 10 shows

the results of different methods and proposed hybrid CNN-RF method and it can be seen that the AUC value of CNN-RF CNN-GBDT CNN-SVM CNN SVM RF LR andGBDT is 099 097 098 093 077 091 063 and 077 AndFigure 11 presents the results of all comparative experimentsin terms of precision recall and F1 score Among theperformances of eight different electricity theft detectionalgorithms the deep learning methods (such as CNN CNN-RF CNN-SVM and CNN-GBDT) show better perfor-mances than machine learning methods (eg LR SVMGBDT and RF)-e reason of this result is that the CNN canlearn features through a large number of electricity con-sumption data -us the accuracy of electricity theft de-tection is greatly improved In addition it is indicated thatthe proposed CNN-RF model shows the best performancecompared with other methods

Table 3 Classification score summary for CNN-RF

Precision Recall F1 scoreClass 0 097 096 096Class 1 098 098 098Averagetotal 097 097 097

0 01 02 03 04 05 06 07 08 09 1FPR

0

01

02

03

04

05

06

07

08

09

1

TPR

AUC (CNN-RF) = 099

Figure 9 ROC curve for CNN-RF

Table 4 Experimental parameters for comparative methods

Methods Input data Parameters

LR Features(1D)

Penalty L2Inverse of regulation strength 10

SVM Features(1D)

Penalty parameter of the error term 105Parameter of kernel function (RBF)

00005

GBDT Features(1D) -e number of estimators 200

RF Features(1D) -e number of trees in the forest 100

CNN Raw data(2D)

-e max depth of each tree 30-e same as the CNN component of the

proposed approach

Journal of Electrical and Computer Engineering 9

As detecting electricity theft consumers is achieved bydiscovering the anomalous consumption behaviours ofsuspicious customers we can also regard it as an anomalydetection problem essentially Since CNN-RF has shownsuperior performance on the SEAI dataset it would beinteresting to test its generalization ability on other datasetsIn particular we have further tested the performance ofCNN-RF together with the above-mentioned models onanother dataset Low Carbon London (LCL) dataset [35]-e LCL dataset is composed of over 5500 honest electricityconsumers in total each of which consists of 525 days (the

sampling rate is hour) Among the 5500 customers werandomly chose 1200 as malicious customers and modifiedtheir intraday load profiles according to the fraudulentsample generation methods similar to the SEAI dataset

After the preprocess of the LCL dataset (similar to theSEAI dataset) the train dataset contains 7584 customers inwhich the number of normal and abnormal samples is equalAnd the test dataset consists of 1700 customers together withcorresponding labels indicating anomaly or not -en thetraining of the proposed CNN-RF model and comparativemodels is all done following the aforementioned procedure

CNN

CNN

-RF

CNN

-GBD

T

CNN

-SV

M

GBD

T LR RF

SVM

Different algorithms

0

01

02

03

04

05

06

07

08

09

1

Perfo

rman

ce m

etric

PrecisionRecallF1 score

Figure 11 Comparison with other algorithms based on the SEAIdataset

0 01 02 03 04 05 06 07 08 09 1FPR

0

01

02

03

04

05

06

07

08

09

1

TPR

AUC (CNN) = 093AUC (CNN-RF) = 099AUC (CNN-GBDT) = 097AUC (CNN-SVM) = 098

AUC (GBDT) = 077AUC (LR) = 063AUC (RF) = 091AUC (SVM) = 077

Figure 10 ROC curves for eight different algorithms based on theSEAI dataset

TPR

FPR

AUC (CNN) = 095AUC (CNN-RF) = 097AUC (CNN-GBDT) = 094AUC (CNN-SVM) = 096

AUC (GBDT) = 074AUC (LR) = 068AUC (RF) = 077AUC (SVM) = 071

0

01

02

03

04

05

06

07

08

09

1

0 01 02 03 04 05 06 07 08 09 1

Figure 12 ROC curves for eight different algorithms based on theLCL dataset

0

01

02

03

04

05

06

07

08

09

1

Perfo

rman

ce m

etric

CNN

CNN

-RF

CNN

-GBD

T

CNN

-SV

M

GBD

T

SVM

Different algorithms

LR RF

PrecisionRecallF1 score

Figure 13 Comparison with other algorithms based on the LCLdataset

10 Journal of Electrical and Computer Engineering

based on the train dataset -e performances of all themodels on the test dataset are shown in Figures 12 and 13

Figures 12 and 13 show that the performances of CNN-SVM CNN-GBDT and CNN-RF are better than those ofother models on the whole Since they are deep learningmodels the distinguishable features can be learned directlyfrom raw smart meter data by the CNN which has equippedthem with better generalization ability In addition theCNN-RF model has achieved the best performance since theRF can handle high-dimensional data while maintaininghigh computational efficiency

7 Conclusions

In this paper a novel CNN-RF model is presented to detectelectricity theft In this model the CNN is similar to anautomatic feature extractor in investigating smart meter dataand the RF is the output classifier Because a large number ofparameters must be optimized that increase the risk ofoverfitting a fully connected layer with a dropout rate of 04is designed during the training phase In addition the SMOTalgorithm is adopted to overcome the problem of dataimbalance Some machine learning and deep learningmethods such as SVM RF GBDT and LR are applied to thesame problem as a benchmark and all those methods havebeen conducted on SEAI and LCL datasets -e resultsindicate that the proposed CNN-RF model is quite apromising classification method in the electricity theft de-tection field because of two properties -e first is thatfeatures can be automatically extracted by the hybrid modelwhile the success of most other traditional classifiers relieslargely on the retrieval of good hand-designed featureswhich is a laborious and time-consuming task -e secondlies in that the hybrid model combines the advantages of theRF and CNN as both are the most popular and successfulclassifiers in the electricity theft detection field

Since the detection of electricity theft affects the privacyof consumers the future work will focus on investigatinghow the granularity and duration of smart meter data mightaffect this privacy Extending the proposed hybrid CNN-RFmodel to other applications (eg load forecasting) is a taskworth investigating

Data Availability

Previously reported [SEAI][LCL] datasets were used tosupport this study and are available at [httpwwwucdieissdadata][httpsbetaukdataserviceacukdatacataloguestudiesstudyid=7857] -ese prior studies (and datasets)are cited at relevant places within the text as references[19 26]

Conflicts of Interest

-e authors declare that there are no conflicts of interest

Acknowledgments

-is work was supported by the National Key Research andDevelopment Program of China (2016YFB0901900) the

Natural Science Foundation of Hebei Province of China(F2017501107) Open Research Fund from the State KeyLaboratory of Rolling and Automation Northeastern Uni-versity (2017RALKFKT003) the Foundation of Northeast-ern University at Qinhuangdao (XNB201803) and theFundamental Research Funds for the Central Universities(N182303037)

References

[1] S S S R Depuru L Wang and V Devabhaktuni ldquoElectricitytheft overview issues prevention and a smart meter basedapproach to control theftrdquo Energy Policy vol 39 no 2pp 1007ndash1015 2011

[2] J P Navani N K Sharma and S Sapra ldquoTechnical and non-technical losses in power system and its economic conse-quence in Indian economyrdquo International Journal of Elec-tronics and Computer Science Engineering vol 1 no 2pp 757ndash761 2012

[3] S McLaughlin B Holbert A Fawaz R Berthier andS Zonouz ldquoA multi-sensor energy theft detection frameworkfor advanced metering infrastructuresrdquo IEEE Journal on Se-lected Areas in Communications vol 31 no 7 pp 1319ndash13302013

[4] P McDaniel and S McLaughlin ldquoSecurity and privacychallenges in the smart gridrdquo IEEE Security amp PrivacyMagazine vol 7 no 3 pp 75ndash77 2009

[5] T B Smith ldquoElectricity theft a comparative analysisrdquo EnergyPolicy vol 32 no 1 pp 2067ndash2076 2004

[6] J I Guerrero C Leon I Monedero F Biscarri andJ Biscarri ldquoImproving knowledge-based systems with sta-tistical techniques text mining and neural networks for non-technical loss detectionrdquo Knowledge-Based Systems vol 71no 4 pp 376ndash388 2014

[7] C C O Ramos A N Souza G Chiachia A X Falcatildeo andJ P Papa ldquoA novel algorithm for feature selection usingharmony search and its application for non-technical lossesdetectionrdquo Computers amp Electrical Engineering vol 37 no 6pp 886ndash894 2011

[8] P Glauner J A Meira P Valtchev R State and F Bettingerldquo-e challenge of non-technical loss detection using artificialintelligence a surveyficial intelligence a surveyrdquo In-ternational Journal of Computational Intelligence Systemsvol 10 no 1 pp 760ndash775 2017

[9] S-C Huang Y-L Lo and C-N Lu ldquoNon-technical lossdetection using state estimation and analysis of variancerdquoIEEE Transactions on Power Systems vol 28 no 3pp 2959ndash2966 2013

[10] O Rahmati H R Pourghasemi and A M Melesse ldquoAp-plication of GIS-based data driven random forest and max-imum entropy models for groundwater potential mapping acase study at Mehran region Iranrdquo CATENA vol 137pp 360ndash372 2016

[11] N Edison A C Aranha and J Coelho ldquoProbabilisticmethodology for technical and non-technical losses estima-tion in distribution systemrdquo Electric Power Systems Researchvol 97 no 11 pp 93ndash99 2013

[12] J B Leite and J R S Mantovani ldquoDetecting and locating non-technical losses in modern distribution networksrdquo IEEETransactions on Smart Grid vol 9 no 2 pp 1023ndash1032 2018

[13] S Amin G A Schwartz A A Cardenas and S S SastryldquoGame-theoretic models of electricity theft detection in smartutility networks providing new capabilities with advanced

Journal of Electrical and Computer Engineering 11

metering infrastructurerdquo IEEE Control Systems Magazinevol 35 no 1 pp 66ndash81 2015

[14] A H Nizar Z Y Dong Y Wang and A N Souza ldquoPowerutility nontechnical loss analysis with extreme learning ma-chine methodrdquo IEEE Transactions on Power Systems vol 23no 3 pp 946ndash955 2008

[15] J Nagi K S Yap S K Tiong S K Ahmed andMMohamadldquoNontechnical loss detection for metered customers in powerutility using support vector machinesrdquo IEEE Transactions onPower Delivery vol 25 no 2 pp 1162ndash1171 2010

[16] E W S Angelos O R Saavedra O A C Cortes andA N de Souza ldquoDetection and identification of abnormalitiesin customer consumptions in power distribution systemsrdquoIEEE Transactions on Power Delivery vol 26 no 4pp 2436ndash2442 2011

[17] K A P Costa L A M Pereira R Y M NakamuraC R Pereira J P Papa and A Xavier Falcatildeo ldquoA nature-inspired approach to speed up optimum-path forest clusteringand its application to intrusion detection in computer net-worksrdquo Information Sciences vol 294 pp 95ndash108 2015

[18] R R Bhat R D Trevizan X Li and A Bretas ldquoIdentifyingnontechnical power loss via spatial and temporal deeplearningrdquo in Proceedings of the IEEE International Conferenceon Machine Learning and Applications Anaheim CA USADecember 2016

[19] M Ismail M Shahin M F Shaaban E Serpedin andK Qaraqe ldquoEfficient detection of electricity theft cyber attacksin ami networksrdquo in Proceedings of the IEEE Wireless Com-munications and Networking Conference Barcelona SpainApril 2018

[20] RMehrizi X Peng X Xu S Zhang DMetaxas and K Li ldquoAcomputer vision based method for 3D posture estimation ofsymmetrical liftingrdquo Journal of Biomechanics vol 69 no 1pp 40ndash46 2018

[21] K He X Zhang S Ren and J Sun ldquoDelving deep intorectifiers surpassing human-level performance on imagenetclassificationrdquo in Proceedings of the IEEE Conference onComputer for Sustainable Global Development pp 1026ndash1034New Delhi India March 2015

[22] A Sharif Razavian H Azizpour J Sullivan and S CarlssonldquoCNN features off-the-shelf an astounding baseline for rec-ognitionrdquo Astrophysical Journal Supplement Series vol 221no 1 pp 806ndash813 2014

[23] Z Zheng Y Yang X Niu H-N Dai and Y Zhou ldquoWide anddeep convolutional neural networks for electricity-theft de-tection to secure smart gridsrdquo IEEE Transactions on IndustrialInformatics vol 14 no 4 pp 1606ndash1615 2018

[24] D Svozil V Kvasnicka and J Pospichal ldquoIntroduction tomulti-layer feed-forward neural networksrdquo Chemometricsand Intelligent Laboratory Systems vol 39 no 1 pp 43ndash621997

[25] Irish social science data archive httpwwwucdieissdadatacommissionforenegryregulationcer

[26] P Jokar N Arianpoo and V C M Leung ldquoElectricity theftdetection in AMI using customersrsquo consumption patternsrdquoIEEE Transactions on Smart Grid vol 7 no 1 pp 216ndash2262016

[27] Y Wang Q Chen D Gan J Yang D S Kirschen andC Kang ldquoDeep learning-based socio-demographic in-formation identification from smart meter datafication fromsmart meter datardquo IEEE Transactions on Smart Grid vol 10no 3 pp 2593ndash2602 2019

[28] V Chandola A Banerjee and V Kumar ldquoAnomaly de-tection a surveyrdquo ACM Computing Surveys vol 41 no 32009

[29] H-T Cheng L Koc J Harmsen et al ldquoWide amp deep learningfor recommender systemsrdquo in Proceedings of the 1stWorkshopon Deep Learning for Recommender SystemsmdashDLRS 2016pp 7ndash10 Boston MA USA September 2016

[30] E Feczko N M Balba O Miranda-Dominguez et alldquoSubtyping cognitive profiles in autism spectrum disorderusing a functional random forest algorithmfiles in autismspectrum disorder using a functional random forest algo-rithmrdquo NeuroImage vol 172 pp 674ndash688 2018

[31] Y-L Boureau J Ponce and Y LeCun ldquoA theoretical analysisof feature pooling in visual recognitionrdquo in Proceedings of the27th International Conference on Machine Learningpp 111ndash118 Haifa Israel June 2010

[32] N Srivastava G Hinton A Krizhevsky I Sutskever andR Salakhutdinov ldquoDropout a simply way to prevent neuralnetworks from overfittingrdquo Ce Journal of Machine LearningResearch vol 15 no 1 pp 1929ndash1958 2014

[33] M Abadi A Agarwal P Barham et al ldquoTensorflow Large-scale machine learning on heterogeneous distributed sys-temsrdquo 2016 httpsarxivorgabs16030446

[34] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 no 4 pp 2825ndash2830 2011

[35] J R Schofield ldquoLow carbon london project data from thedynamic time-of-use electricity pricing trialrdquo 2017 httpsdiscoverukdataserviceacukcataloguesn7857

12 Journal of Electrical and Computer Engineering

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 6: ElectricityTheftDetectioninPowerGridswithDeep ...downloads.hindawi.com/journals/jece/2019/4136874.pdf(1)Datacleaning:thereareerroneousvalues(e.g.,outliers) in raw data, which correspond

stochastic gradient method -e RF classifier consists of acombination of tree classifiers in which each tree contrib-utes with a single vote to the assignment of themost frequentclass in input data [30] In the following the design of eachpart of the CNN-RF structure is presented in detail

511 Convolutional Layer -e main purpose of convolu-tional layers is to learn feature representation of input dataand reduce the influence of noise -e convolutional layer iscomposed of several feature filters which are used tocompute different feature maps Specially to reduce theparameters of networks and lower the complication ofrelevant layers during the process of generating featuremaps the weights of each kernel in each feature map areshared Moreover each neuron in the convolutional layerconnects the local region of front layers and then a ReLUactivation function is applied on the convolved resultsMathematically the outputs of the convolutional layers canbe expressed as

yconv Xfl

l1113872 1113873 δ 1113944

Fl

fl1W

fl

l lowastXfl

l + bfl

l⎛⎝ ⎞⎠ (4)

where δ is the activation function lowast is the convolutionoperation and both W

fl

l and bfl

l are the learnable param-eters in the f-th feature filter

512 Downsampling Layer Downsampling is an importantconcept of the CNN which is usually placed between twoconvolutional layers It can reduce the number of parametersand achieve dimensionality reduction In particular eachfeature map of the pooling layer is connected to its corre-sponding feature map of the previous convolutional layerAnd the size of output feature maps is reduced withoutreducing the number of output feature maps in thedownsampling layer In the literature downsampling op-erations mainly include max-pooling and average-pooling-e max-pooling operation transforms small windows intosingle values by maximum which is expressed as formula 5but average-pooling returns average values of activations in asmall window In [31] experiments show that the max-pooling operation shows better performance than average-pooling -erefore the max-pooling operation is usuallyimplemented in the downsampling layer

ypool Xfl

l1113872 1113873 maxmisinM

Xlm1113966 1113967 (5)

where M is a set of activations in the pooling window andmis the index of activations in the pooling window

513 Fully Connected Layer With the features extracted bysequential convolution and pooling a fully connected layeris applied for flattening feature maps into one vector asfollows

yflXl( 1113857 δ Wl middot Xl + bl( 1113857 (6)

where Wl is the weight of the l-th layer and bl is the bias ofthe l-th layer

514 RF Classifier Layer Traditionally the softmax clas-sifier is used for the last output layer in the CNN but theproposed model uses the RF classifier to predict the classbased on the obtained features -erefore the RF classifierlayer can be defined as

yout XL( 1113857 sigm Wrf middot Xl + bl( 1113857 (7)

where sigm(middot) is the sigmoid function which maps ab-normal values to 0 and normal values to 1-e parameter setWrf in the RF layer includes the number of decision treesand the maximum depth of the tree which are obtained bythe grid search algorithm

52 Techniques for Selecting Parameters and AvoidingOverfitting -e detailed parameters of the proposed CNN-RF structure are summarized in Table 1 which include thenumbers of filters in each layer filter size and stride Twofactors are considered in determining the CNN structure-e first factor is the characteristics of consumersrsquo electricityconsumption behaviours Since the load files are such avariable two convolutional layers with a filter size of 3lowast 3(stride 1) are applied to capture hidden features for detectingelectricity theft Moreover because the dimension of inputdata size is 525 times 24 which is similar to what is used in imagerecognition problems two max-pooling layers with a filtersize of 2lowast 2 (stride 2) are used -e second factor is thenumber of training samples since the number of samples islimited to reduce the risk of overfitting we design a fullyconnected layer with a dropout rate of 04 whose main idea

d d

d d d d

n a n a n a n a

Input data Convolutional layer Convolutional layer

Downsamplinglayer

Downsamplinglayer

L232525 lowast 24 L332268 lowast 12 L464263 lowast 12 L564132 lowast 6L1525 lowast 24Fully connected layer

Random forest layer

L6132 lowast 1

Figure 6 Architecture of the hybrid CNN-RF model

6 Journal of Electrical and Computer Engineering

is that dropping units randomly from the neural networkduring training can prevent units from coadapting too much[32] and make a neuron not rely on the presence of otherspecific neurons Applying an appropriate training methodis also useful for reducing overfitting It is essentially aregularization that adds a penalty for weight update at eachiteration And we also use a binary cross entropy as the lossfunction Finally the grid search algorithm is used to op-timize RF classifier parameters such as the maximumnumber of decision trees and features

53 Training Process of CNN-RF -ere is no doubt that theCNN-RF structure needs to train the CNN at first before theRF classifier is invoked After that the RF classifier is trainedbased on features learned by the CNN

531 Training Process of CNN With a batch size (thenumber of training samples in an iteration) of 100 CNNs aretrained using the forward propagation algorithm andbackpropagation algorithm As shown in Figure 6 firstly thedata in the input layer are transferred to convolutionallayers pooling layers and a fully connected layer and thepredicted value is obtained As shown in Figure 7 thebackpropagation phase begins to update parameters if thedifference of the output value and target value is too largeand exceeds a certain threshold -e difference between theactual output of the neural network and the expected outputdetermines the adjustment direction and strip size of weightsamong layers in the network

Given a sample set D (x1 y1) (xm ym)1113864 1113865 with msamples in total 1113957y is obtained at first by using the feed-forward process As for all samples the mean of differencebetween the actual output yi and the expected output1113957ywb(xi) can be expressed by

J(W b) 1m

1113944

m

i1

12

1113957ywb xi( 1113857 minus yi

2

1113874 1113875 (8)

where w represents the connection weight between layers ofthe network and b is the corresponding bias

-en each parameter is initialized with a random valuegenerated by a normal distribution when the mean is 0 andvariance is ϑ which are updated with the gradient descentmethod -e corresponding expressions are as follows

W(l)ij W

(l)ij minus α

z

zW(l)ij

J(W b) (9)

b(l)i b

(l)i minus α

z

zb(l)i

J(W b) (10)

where α represents a learning rate W(l)ij represents the

connection weight between the i-th neuron in the l-th layerand the j-th neuron in the (l + 1)-th layer and b

(l)i represents

the bias of the i-th neuron in the l-th layerFinally all parameters W and b in the network are

updated according to formulas 9 and 10 And all these stepsare repeated to reduce the objective function J(W b)

532 Training Process of RF Classifier -e process of the RFclassifier takes advantage of two powerful machine learningtechniques bagging and random feature selection Baggingis a method to generate a particular bootstrap samplerandomly from the learnt features for growing each treeInstead of using all features the RF randomly selects a subsetof features to determine the best splitting points whichensures complete splitting to one classification of leaf nodesof decision trees Specifically the whole process of RFclassification can be written as Algorithm 1

6 Experiments and Result Analysis

All the experiments are implemented using Python 36 on astandard PC with an Intel Core i5-7500MQ CPU running at340GHz and with 80 GB of RAM-e CNN architecture isconstructed based on TensorFlow [33] and the interfacebetween the CNN and the RF is programmed using scikit-learn [34]

61 Performance Metrics In this paper the problem ofelectricity theft is considered a discrete two-class classifi-cation task which should assign each consumer into one ofthe predefined classes (abnormal or normal)

In particular the result of classifier validation is oftenpresented as confusion matrices Table 2 shows a confusionmatrix applied in electricity theft detection where TP FNFP and TN respectively represent the number of con-sumers that are classified correctly as normal classifiedfalsely as abnormal classified falsely as normal andclassified correctly as abnormal Although these four indicesshow all of the information about the performance of theclassifier more meaningful performance measurescan be extracted from them to illustrate certain performancecriteria such as TPR TP(TP + FN) FPR FP(FP +TN) precisionTP(TP+ FP) recallTP(TP+ FN)and F1 score 2(precision times recall)(precision + recall)where the true positive rate (TPR) is the ability of theclassifier to correctly identify a consumer as normal alsoreferred to as sensitivity and the false positive rate (FPR) isthe risk of positively identifying an abnormal consumerPrecision is the ratio of consumers detected correctly to thetotal number of detected normal consumers Recall is theratio of the number of consumers correctly detected to theactual number of consumers (Figure 7)

In addition the ROC curve is introduced by plotting allTPR values on the y-axis against their equivalent FPR valuesfor all available thresholds on the x-axis -e ROC curve atthe upper left has better detection effects where lower FPRvalues are caused by the same TPR -e AUC indicates the

Table 1 Parameters of the proposed CNN-RF model

Layer Size of the filter Number of filters StrideConvolution1 3lowast 3 32 1MaxPooling1 2lowast 2 mdash 2Convolution2 3lowast 3 64 1MaxPooling2 2lowast 2 mdash 2

Journal of Electrical and Computer Engineering 7

quality of the classifier and the larger AUC represents betterperformance

62 Experiment Based onCNN-RFModel -e experimentis conducted based on normalized consumption dataWith the batch size of 100 the training procedure isstopped after 300 epochs As shown in Figure 8 it could beseen that training and testing losses settled downsmoothly -is means that the proposed model learnedtrain dataset and there was small bias in the test datadataset as well -en the RF classifier is the last layer of theCNN to predict labels of input consumers Forty valuesfrom the fully connected layer of the trained CNN are usedas a new feature vector to detect abnormal users and arefed to the RF for learning and testing To build the RF inthe hybrid model the optimal parameters are determinedby applying the grid search algorithm based on the

training dataset -e grid searching range of each pa-rameter is given as follows numTrees [50 60 100]

and max depth [10 20 40] 6 times 4 24 are tried with

Require(1) Original training datasetsS (xi yi) i 1 2 3 m1113864 1113865 (X Y) isin Re times R

(2) Test datasets xj isin Rm

for d 1 to Ntree do(1) Draw a bootstrap Sd from the original training data(2) Grow an unpruned tree hd using data Sd

(a)Randomly select new feature set Mtry from original feature set e

(b)Select the best features from feature set Mtry based on Gini indicator on each node(c)Split until each tree grows to its maximum

end forEnsure(1) -e collection of trees hd d 1 2 Ntree1113864 1113865

(2) For xj the output of a decision tree is hd(xj)

f(xj) majority vote hd(xj)1113966 1113967d

Ntreereturn f(xj)

ALGORITHM 1 Random forest algorithm

x Wconv Wpool J (W b)

bx

bconv

bpool

(a)

x Wconv Wpool J (W b)

bx

bconv

bpool

(b)

Figure 7 Process of forward propagation (a) and backpropagation (b)

0 50 100 150 200 250 300Epoch

03

035

04

045

05

055

06

Usa

ge (k

W)

TrainingTesting

Figure 8 Loss curves of training and testing

Table 2 Confusion matrix applied in electricity theft detection

Actualdetected Normal AbnormalNormal TP (true positive) FN (false negative)Abnormal FP (false positive) TN (true negative)

8 Journal of Electrical and Computer Engineering

different combinations -e best result is achieved whennumTrees 100 and maxDepth 30 -ese parametersare then used to train the hybrid model

-e classification results of the detection model pro-posed in this paper are as follows the number of TPs FNsFPs and TNs is 1041 28 23 and 577 respectively-erefore the precision recall and F1 score can be calcu-lated as shown in Table 3 where Class 0 is an abnormalpattern and Class 1 is a normal pattern Also the AUC valueof the CNN-RF algorithm is calculated which is 098 and farbetter than the value of the baseline model (AUC 05) -ismeans the proposed algorithm is able to classify both classesaccurately as shown in Figure 9

63 Comparative Experiments and the Analysis of ResultsTo evaluate the accuracy of the CNN-RF model comparativeexperiments are conducted based on handcrafted featureswith nondeep learning methods including SVM RF gradientboosting decision tree (GBDT) and logistic regression (LR)Moreover we compared the obtained classification results byvarious supervised classifiers CNN features with the SVMclassifier (CNN-SVM) and CNN features with the GBDTclassifier (CNN-GBDT) to those obtained by the previousclassification work In the following five methods are in-troduced respectively and then the results are analysed

(1) Logistic regression it is a basic model in binaryclassification that is equivalent to one layer of neuralnetwork with sigmoid activation function A sigmoidfunction obtains a value ranging from 0 to 1 based onlinear regression -en any value larger than 05 willbe classified as a normal pattern and less than 05 willbe classified as an abnormal pattern

(2) Support vector machine this classifier with kernelfunctions can transform a nonlinear separableproblem into a linear separable problem by pro-jecting data into the feature space and then findingthe optimal separate hyperplane It is applied topredict the patterns of consumers based on theaforementioned handcrafted features

(3) Random forest it is essentially an integration ofmultiple decision trees that can achieve better per-formance when maintaining the effective control ofoverfitting in comparison with a single decision tree-e RF classifier also can handle high-dimensionaldata while maintaining high computationalefficiency

(4) Gradient boosting decision tree it is an iterativedecision tree algorithm which consists of multipledecision trees For the final output all results orweights are accumulated in the GBDT while randomforests use majority voting

(5) Deep learning methods softmax is used in the lastlayer of the CNN CNN-GBDT and CNN-SVM incomparison with the proposed method

Table 4 summarizes the parameters of comparativemethods All experiments have been done Figure 10 shows

the results of different methods and proposed hybrid CNN-RF method and it can be seen that the AUC value of CNN-RF CNN-GBDT CNN-SVM CNN SVM RF LR andGBDT is 099 097 098 093 077 091 063 and 077 AndFigure 11 presents the results of all comparative experimentsin terms of precision recall and F1 score Among theperformances of eight different electricity theft detectionalgorithms the deep learning methods (such as CNN CNN-RF CNN-SVM and CNN-GBDT) show better perfor-mances than machine learning methods (eg LR SVMGBDT and RF)-e reason of this result is that the CNN canlearn features through a large number of electricity con-sumption data -us the accuracy of electricity theft de-tection is greatly improved In addition it is indicated thatthe proposed CNN-RF model shows the best performancecompared with other methods

Table 3 Classification score summary for CNN-RF

Precision Recall F1 scoreClass 0 097 096 096Class 1 098 098 098Averagetotal 097 097 097

0 01 02 03 04 05 06 07 08 09 1FPR

0

01

02

03

04

05

06

07

08

09

1

TPR

AUC (CNN-RF) = 099

Figure 9 ROC curve for CNN-RF

Table 4 Experimental parameters for comparative methods

Methods Input data Parameters

LR Features(1D)

Penalty L2Inverse of regulation strength 10

SVM Features(1D)

Penalty parameter of the error term 105Parameter of kernel function (RBF)

00005

GBDT Features(1D) -e number of estimators 200

RF Features(1D) -e number of trees in the forest 100

CNN Raw data(2D)

-e max depth of each tree 30-e same as the CNN component of the

proposed approach

Journal of Electrical and Computer Engineering 9

As detecting electricity theft consumers is achieved bydiscovering the anomalous consumption behaviours ofsuspicious customers we can also regard it as an anomalydetection problem essentially Since CNN-RF has shownsuperior performance on the SEAI dataset it would beinteresting to test its generalization ability on other datasetsIn particular we have further tested the performance ofCNN-RF together with the above-mentioned models onanother dataset Low Carbon London (LCL) dataset [35]-e LCL dataset is composed of over 5500 honest electricityconsumers in total each of which consists of 525 days (the

sampling rate is hour) Among the 5500 customers werandomly chose 1200 as malicious customers and modifiedtheir intraday load profiles according to the fraudulentsample generation methods similar to the SEAI dataset

After the preprocess of the LCL dataset (similar to theSEAI dataset) the train dataset contains 7584 customers inwhich the number of normal and abnormal samples is equalAnd the test dataset consists of 1700 customers together withcorresponding labels indicating anomaly or not -en thetraining of the proposed CNN-RF model and comparativemodels is all done following the aforementioned procedure

CNN

CNN

-RF

CNN

-GBD

T

CNN

-SV

M

GBD

T LR RF

SVM

Different algorithms

0

01

02

03

04

05

06

07

08

09

1

Perfo

rman

ce m

etric

PrecisionRecallF1 score

Figure 11 Comparison with other algorithms based on the SEAIdataset

0 01 02 03 04 05 06 07 08 09 1FPR

0

01

02

03

04

05

06

07

08

09

1

TPR

AUC (CNN) = 093AUC (CNN-RF) = 099AUC (CNN-GBDT) = 097AUC (CNN-SVM) = 098

AUC (GBDT) = 077AUC (LR) = 063AUC (RF) = 091AUC (SVM) = 077

Figure 10 ROC curves for eight different algorithms based on theSEAI dataset

TPR

FPR

AUC (CNN) = 095AUC (CNN-RF) = 097AUC (CNN-GBDT) = 094AUC (CNN-SVM) = 096

AUC (GBDT) = 074AUC (LR) = 068AUC (RF) = 077AUC (SVM) = 071

0

01

02

03

04

05

06

07

08

09

1

0 01 02 03 04 05 06 07 08 09 1

Figure 12 ROC curves for eight different algorithms based on theLCL dataset

0

01

02

03

04

05

06

07

08

09

1

Perfo

rman

ce m

etric

CNN

CNN

-RF

CNN

-GBD

T

CNN

-SV

M

GBD

T

SVM

Different algorithms

LR RF

PrecisionRecallF1 score

Figure 13 Comparison with other algorithms based on the LCLdataset

10 Journal of Electrical and Computer Engineering

based on the train dataset -e performances of all themodels on the test dataset are shown in Figures 12 and 13

Figures 12 and 13 show that the performances of CNN-SVM CNN-GBDT and CNN-RF are better than those ofother models on the whole Since they are deep learningmodels the distinguishable features can be learned directlyfrom raw smart meter data by the CNN which has equippedthem with better generalization ability In addition theCNN-RF model has achieved the best performance since theRF can handle high-dimensional data while maintaininghigh computational efficiency

7 Conclusions

In this paper a novel CNN-RF model is presented to detectelectricity theft In this model the CNN is similar to anautomatic feature extractor in investigating smart meter dataand the RF is the output classifier Because a large number ofparameters must be optimized that increase the risk ofoverfitting a fully connected layer with a dropout rate of 04is designed during the training phase In addition the SMOTalgorithm is adopted to overcome the problem of dataimbalance Some machine learning and deep learningmethods such as SVM RF GBDT and LR are applied to thesame problem as a benchmark and all those methods havebeen conducted on SEAI and LCL datasets -e resultsindicate that the proposed CNN-RF model is quite apromising classification method in the electricity theft de-tection field because of two properties -e first is thatfeatures can be automatically extracted by the hybrid modelwhile the success of most other traditional classifiers relieslargely on the retrieval of good hand-designed featureswhich is a laborious and time-consuming task -e secondlies in that the hybrid model combines the advantages of theRF and CNN as both are the most popular and successfulclassifiers in the electricity theft detection field

Since the detection of electricity theft affects the privacyof consumers the future work will focus on investigatinghow the granularity and duration of smart meter data mightaffect this privacy Extending the proposed hybrid CNN-RFmodel to other applications (eg load forecasting) is a taskworth investigating

Data Availability

Previously reported [SEAI][LCL] datasets were used tosupport this study and are available at [httpwwwucdieissdadata][httpsbetaukdataserviceacukdatacataloguestudiesstudyid=7857] -ese prior studies (and datasets)are cited at relevant places within the text as references[19 26]

Conflicts of Interest

-e authors declare that there are no conflicts of interest

Acknowledgments

-is work was supported by the National Key Research andDevelopment Program of China (2016YFB0901900) the

Natural Science Foundation of Hebei Province of China(F2017501107) Open Research Fund from the State KeyLaboratory of Rolling and Automation Northeastern Uni-versity (2017RALKFKT003) the Foundation of Northeast-ern University at Qinhuangdao (XNB201803) and theFundamental Research Funds for the Central Universities(N182303037)

References

[1] S S S R Depuru L Wang and V Devabhaktuni ldquoElectricitytheft overview issues prevention and a smart meter basedapproach to control theftrdquo Energy Policy vol 39 no 2pp 1007ndash1015 2011

[2] J P Navani N K Sharma and S Sapra ldquoTechnical and non-technical losses in power system and its economic conse-quence in Indian economyrdquo International Journal of Elec-tronics and Computer Science Engineering vol 1 no 2pp 757ndash761 2012

[3] S McLaughlin B Holbert A Fawaz R Berthier andS Zonouz ldquoA multi-sensor energy theft detection frameworkfor advanced metering infrastructuresrdquo IEEE Journal on Se-lected Areas in Communications vol 31 no 7 pp 1319ndash13302013

[4] P McDaniel and S McLaughlin ldquoSecurity and privacychallenges in the smart gridrdquo IEEE Security amp PrivacyMagazine vol 7 no 3 pp 75ndash77 2009

[5] T B Smith ldquoElectricity theft a comparative analysisrdquo EnergyPolicy vol 32 no 1 pp 2067ndash2076 2004

[6] J I Guerrero C Leon I Monedero F Biscarri andJ Biscarri ldquoImproving knowledge-based systems with sta-tistical techniques text mining and neural networks for non-technical loss detectionrdquo Knowledge-Based Systems vol 71no 4 pp 376ndash388 2014

[7] C C O Ramos A N Souza G Chiachia A X Falcatildeo andJ P Papa ldquoA novel algorithm for feature selection usingharmony search and its application for non-technical lossesdetectionrdquo Computers amp Electrical Engineering vol 37 no 6pp 886ndash894 2011

[8] P Glauner J A Meira P Valtchev R State and F Bettingerldquo-e challenge of non-technical loss detection using artificialintelligence a surveyficial intelligence a surveyrdquo In-ternational Journal of Computational Intelligence Systemsvol 10 no 1 pp 760ndash775 2017

[9] S-C Huang Y-L Lo and C-N Lu ldquoNon-technical lossdetection using state estimation and analysis of variancerdquoIEEE Transactions on Power Systems vol 28 no 3pp 2959ndash2966 2013

[10] O Rahmati H R Pourghasemi and A M Melesse ldquoAp-plication of GIS-based data driven random forest and max-imum entropy models for groundwater potential mapping acase study at Mehran region Iranrdquo CATENA vol 137pp 360ndash372 2016

[11] N Edison A C Aranha and J Coelho ldquoProbabilisticmethodology for technical and non-technical losses estima-tion in distribution systemrdquo Electric Power Systems Researchvol 97 no 11 pp 93ndash99 2013

[12] J B Leite and J R S Mantovani ldquoDetecting and locating non-technical losses in modern distribution networksrdquo IEEETransactions on Smart Grid vol 9 no 2 pp 1023ndash1032 2018

[13] S Amin G A Schwartz A A Cardenas and S S SastryldquoGame-theoretic models of electricity theft detection in smartutility networks providing new capabilities with advanced

Journal of Electrical and Computer Engineering 11

metering infrastructurerdquo IEEE Control Systems Magazinevol 35 no 1 pp 66ndash81 2015

[14] A H Nizar Z Y Dong Y Wang and A N Souza ldquoPowerutility nontechnical loss analysis with extreme learning ma-chine methodrdquo IEEE Transactions on Power Systems vol 23no 3 pp 946ndash955 2008

[15] J Nagi K S Yap S K Tiong S K Ahmed andMMohamadldquoNontechnical loss detection for metered customers in powerutility using support vector machinesrdquo IEEE Transactions onPower Delivery vol 25 no 2 pp 1162ndash1171 2010

[16] E W S Angelos O R Saavedra O A C Cortes andA N de Souza ldquoDetection and identification of abnormalitiesin customer consumptions in power distribution systemsrdquoIEEE Transactions on Power Delivery vol 26 no 4pp 2436ndash2442 2011

[17] K A P Costa L A M Pereira R Y M NakamuraC R Pereira J P Papa and A Xavier Falcatildeo ldquoA nature-inspired approach to speed up optimum-path forest clusteringand its application to intrusion detection in computer net-worksrdquo Information Sciences vol 294 pp 95ndash108 2015

[18] R R Bhat R D Trevizan X Li and A Bretas ldquoIdentifyingnontechnical power loss via spatial and temporal deeplearningrdquo in Proceedings of the IEEE International Conferenceon Machine Learning and Applications Anaheim CA USADecember 2016

[19] M Ismail M Shahin M F Shaaban E Serpedin andK Qaraqe ldquoEfficient detection of electricity theft cyber attacksin ami networksrdquo in Proceedings of the IEEE Wireless Com-munications and Networking Conference Barcelona SpainApril 2018

[20] RMehrizi X Peng X Xu S Zhang DMetaxas and K Li ldquoAcomputer vision based method for 3D posture estimation ofsymmetrical liftingrdquo Journal of Biomechanics vol 69 no 1pp 40ndash46 2018

[21] K He X Zhang S Ren and J Sun ldquoDelving deep intorectifiers surpassing human-level performance on imagenetclassificationrdquo in Proceedings of the IEEE Conference onComputer for Sustainable Global Development pp 1026ndash1034New Delhi India March 2015

[22] A Sharif Razavian H Azizpour J Sullivan and S CarlssonldquoCNN features off-the-shelf an astounding baseline for rec-ognitionrdquo Astrophysical Journal Supplement Series vol 221no 1 pp 806ndash813 2014

[23] Z Zheng Y Yang X Niu H-N Dai and Y Zhou ldquoWide anddeep convolutional neural networks for electricity-theft de-tection to secure smart gridsrdquo IEEE Transactions on IndustrialInformatics vol 14 no 4 pp 1606ndash1615 2018

[24] D Svozil V Kvasnicka and J Pospichal ldquoIntroduction tomulti-layer feed-forward neural networksrdquo Chemometricsand Intelligent Laboratory Systems vol 39 no 1 pp 43ndash621997

[25] Irish social science data archive httpwwwucdieissdadatacommissionforenegryregulationcer

[26] P Jokar N Arianpoo and V C M Leung ldquoElectricity theftdetection in AMI using customersrsquo consumption patternsrdquoIEEE Transactions on Smart Grid vol 7 no 1 pp 216ndash2262016

[27] Y Wang Q Chen D Gan J Yang D S Kirschen andC Kang ldquoDeep learning-based socio-demographic in-formation identification from smart meter datafication fromsmart meter datardquo IEEE Transactions on Smart Grid vol 10no 3 pp 2593ndash2602 2019

[28] V Chandola A Banerjee and V Kumar ldquoAnomaly de-tection a surveyrdquo ACM Computing Surveys vol 41 no 32009

[29] H-T Cheng L Koc J Harmsen et al ldquoWide amp deep learningfor recommender systemsrdquo in Proceedings of the 1stWorkshopon Deep Learning for Recommender SystemsmdashDLRS 2016pp 7ndash10 Boston MA USA September 2016

[30] E Feczko N M Balba O Miranda-Dominguez et alldquoSubtyping cognitive profiles in autism spectrum disorderusing a functional random forest algorithmfiles in autismspectrum disorder using a functional random forest algo-rithmrdquo NeuroImage vol 172 pp 674ndash688 2018

[31] Y-L Boureau J Ponce and Y LeCun ldquoA theoretical analysisof feature pooling in visual recognitionrdquo in Proceedings of the27th International Conference on Machine Learningpp 111ndash118 Haifa Israel June 2010

[32] N Srivastava G Hinton A Krizhevsky I Sutskever andR Salakhutdinov ldquoDropout a simply way to prevent neuralnetworks from overfittingrdquo Ce Journal of Machine LearningResearch vol 15 no 1 pp 1929ndash1958 2014

[33] M Abadi A Agarwal P Barham et al ldquoTensorflow Large-scale machine learning on heterogeneous distributed sys-temsrdquo 2016 httpsarxivorgabs16030446

[34] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 no 4 pp 2825ndash2830 2011

[35] J R Schofield ldquoLow carbon london project data from thedynamic time-of-use electricity pricing trialrdquo 2017 httpsdiscoverukdataserviceacukcataloguesn7857

12 Journal of Electrical and Computer Engineering

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 7: ElectricityTheftDetectioninPowerGridswithDeep ...downloads.hindawi.com/journals/jece/2019/4136874.pdf(1)Datacleaning:thereareerroneousvalues(e.g.,outliers) in raw data, which correspond

is that dropping units randomly from the neural networkduring training can prevent units from coadapting too much[32] and make a neuron not rely on the presence of otherspecific neurons Applying an appropriate training methodis also useful for reducing overfitting It is essentially aregularization that adds a penalty for weight update at eachiteration And we also use a binary cross entropy as the lossfunction Finally the grid search algorithm is used to op-timize RF classifier parameters such as the maximumnumber of decision trees and features

53 Training Process of CNN-RF -ere is no doubt that theCNN-RF structure needs to train the CNN at first before theRF classifier is invoked After that the RF classifier is trainedbased on features learned by the CNN

531 Training Process of CNN With a batch size (thenumber of training samples in an iteration) of 100 CNNs aretrained using the forward propagation algorithm andbackpropagation algorithm As shown in Figure 6 firstly thedata in the input layer are transferred to convolutionallayers pooling layers and a fully connected layer and thepredicted value is obtained As shown in Figure 7 thebackpropagation phase begins to update parameters if thedifference of the output value and target value is too largeand exceeds a certain threshold -e difference between theactual output of the neural network and the expected outputdetermines the adjustment direction and strip size of weightsamong layers in the network

Given a sample set D (x1 y1) (xm ym)1113864 1113865 with msamples in total 1113957y is obtained at first by using the feed-forward process As for all samples the mean of differencebetween the actual output yi and the expected output1113957ywb(xi) can be expressed by

J(W b) 1m

1113944

m

i1

12

1113957ywb xi( 1113857 minus yi

2

1113874 1113875 (8)

where w represents the connection weight between layers ofthe network and b is the corresponding bias

-en each parameter is initialized with a random valuegenerated by a normal distribution when the mean is 0 andvariance is ϑ which are updated with the gradient descentmethod -e corresponding expressions are as follows

W(l)ij W

(l)ij minus α

z

zW(l)ij

J(W b) (9)

b(l)i b

(l)i minus α

z

zb(l)i

J(W b) (10)

where α represents a learning rate W(l)ij represents the

connection weight between the i-th neuron in the l-th layerand the j-th neuron in the (l + 1)-th layer and b

(l)i represents

the bias of the i-th neuron in the l-th layerFinally all parameters W and b in the network are

updated according to formulas 9 and 10 And all these stepsare repeated to reduce the objective function J(W b)

532 Training Process of RF Classifier -e process of the RFclassifier takes advantage of two powerful machine learningtechniques bagging and random feature selection Baggingis a method to generate a particular bootstrap samplerandomly from the learnt features for growing each treeInstead of using all features the RF randomly selects a subsetof features to determine the best splitting points whichensures complete splitting to one classification of leaf nodesof decision trees Specifically the whole process of RFclassification can be written as Algorithm 1

6 Experiments and Result Analysis

All the experiments are implemented using Python 36 on astandard PC with an Intel Core i5-7500MQ CPU running at340GHz and with 80 GB of RAM-e CNN architecture isconstructed based on TensorFlow [33] and the interfacebetween the CNN and the RF is programmed using scikit-learn [34]

61 Performance Metrics In this paper the problem ofelectricity theft is considered a discrete two-class classifi-cation task which should assign each consumer into one ofthe predefined classes (abnormal or normal)

In particular the result of classifier validation is oftenpresented as confusion matrices Table 2 shows a confusionmatrix applied in electricity theft detection where TP FNFP and TN respectively represent the number of con-sumers that are classified correctly as normal classifiedfalsely as abnormal classified falsely as normal andclassified correctly as abnormal Although these four indicesshow all of the information about the performance of theclassifier more meaningful performance measurescan be extracted from them to illustrate certain performancecriteria such as TPR TP(TP + FN) FPR FP(FP +TN) precisionTP(TP+ FP) recallTP(TP+ FN)and F1 score 2(precision times recall)(precision + recall)where the true positive rate (TPR) is the ability of theclassifier to correctly identify a consumer as normal alsoreferred to as sensitivity and the false positive rate (FPR) isthe risk of positively identifying an abnormal consumerPrecision is the ratio of consumers detected correctly to thetotal number of detected normal consumers Recall is theratio of the number of consumers correctly detected to theactual number of consumers (Figure 7)

In addition the ROC curve is introduced by plotting allTPR values on the y-axis against their equivalent FPR valuesfor all available thresholds on the x-axis -e ROC curve atthe upper left has better detection effects where lower FPRvalues are caused by the same TPR -e AUC indicates the

Table 1 Parameters of the proposed CNN-RF model

Layer Size of the filter Number of filters StrideConvolution1 3lowast 3 32 1MaxPooling1 2lowast 2 mdash 2Convolution2 3lowast 3 64 1MaxPooling2 2lowast 2 mdash 2

Journal of Electrical and Computer Engineering 7

quality of the classifier and the larger AUC represents betterperformance

62 Experiment Based onCNN-RFModel -e experimentis conducted based on normalized consumption dataWith the batch size of 100 the training procedure isstopped after 300 epochs As shown in Figure 8 it could beseen that training and testing losses settled downsmoothly -is means that the proposed model learnedtrain dataset and there was small bias in the test datadataset as well -en the RF classifier is the last layer of theCNN to predict labels of input consumers Forty valuesfrom the fully connected layer of the trained CNN are usedas a new feature vector to detect abnormal users and arefed to the RF for learning and testing To build the RF inthe hybrid model the optimal parameters are determinedby applying the grid search algorithm based on the

training dataset -e grid searching range of each pa-rameter is given as follows numTrees [50 60 100]

and max depth [10 20 40] 6 times 4 24 are tried with

Require(1) Original training datasetsS (xi yi) i 1 2 3 m1113864 1113865 (X Y) isin Re times R

(2) Test datasets xj isin Rm

for d 1 to Ntree do(1) Draw a bootstrap Sd from the original training data(2) Grow an unpruned tree hd using data Sd

(a)Randomly select new feature set Mtry from original feature set e

(b)Select the best features from feature set Mtry based on Gini indicator on each node(c)Split until each tree grows to its maximum

end forEnsure(1) -e collection of trees hd d 1 2 Ntree1113864 1113865

(2) For xj the output of a decision tree is hd(xj)

f(xj) majority vote hd(xj)1113966 1113967d

Ntreereturn f(xj)

ALGORITHM 1 Random forest algorithm

x Wconv Wpool J (W b)

bx

bconv

bpool

(a)

x Wconv Wpool J (W b)

bx

bconv

bpool

(b)

Figure 7 Process of forward propagation (a) and backpropagation (b)

0 50 100 150 200 250 300Epoch

03

035

04

045

05

055

06

Usa

ge (k

W)

TrainingTesting

Figure 8 Loss curves of training and testing

Table 2 Confusion matrix applied in electricity theft detection

Actualdetected Normal AbnormalNormal TP (true positive) FN (false negative)Abnormal FP (false positive) TN (true negative)

8 Journal of Electrical and Computer Engineering

different combinations -e best result is achieved whennumTrees 100 and maxDepth 30 -ese parametersare then used to train the hybrid model

-e classification results of the detection model pro-posed in this paper are as follows the number of TPs FNsFPs and TNs is 1041 28 23 and 577 respectively-erefore the precision recall and F1 score can be calcu-lated as shown in Table 3 where Class 0 is an abnormalpattern and Class 1 is a normal pattern Also the AUC valueof the CNN-RF algorithm is calculated which is 098 and farbetter than the value of the baseline model (AUC 05) -ismeans the proposed algorithm is able to classify both classesaccurately as shown in Figure 9

63 Comparative Experiments and the Analysis of ResultsTo evaluate the accuracy of the CNN-RF model comparativeexperiments are conducted based on handcrafted featureswith nondeep learning methods including SVM RF gradientboosting decision tree (GBDT) and logistic regression (LR)Moreover we compared the obtained classification results byvarious supervised classifiers CNN features with the SVMclassifier (CNN-SVM) and CNN features with the GBDTclassifier (CNN-GBDT) to those obtained by the previousclassification work In the following five methods are in-troduced respectively and then the results are analysed

(1) Logistic regression it is a basic model in binaryclassification that is equivalent to one layer of neuralnetwork with sigmoid activation function A sigmoidfunction obtains a value ranging from 0 to 1 based onlinear regression -en any value larger than 05 willbe classified as a normal pattern and less than 05 willbe classified as an abnormal pattern

(2) Support vector machine this classifier with kernelfunctions can transform a nonlinear separableproblem into a linear separable problem by pro-jecting data into the feature space and then findingthe optimal separate hyperplane It is applied topredict the patterns of consumers based on theaforementioned handcrafted features

(3) Random forest it is essentially an integration ofmultiple decision trees that can achieve better per-formance when maintaining the effective control ofoverfitting in comparison with a single decision tree-e RF classifier also can handle high-dimensionaldata while maintaining high computationalefficiency

(4) Gradient boosting decision tree it is an iterativedecision tree algorithm which consists of multipledecision trees For the final output all results orweights are accumulated in the GBDT while randomforests use majority voting

(5) Deep learning methods softmax is used in the lastlayer of the CNN CNN-GBDT and CNN-SVM incomparison with the proposed method

Table 4 summarizes the parameters of comparativemethods All experiments have been done Figure 10 shows

the results of different methods and proposed hybrid CNN-RF method and it can be seen that the AUC value of CNN-RF CNN-GBDT CNN-SVM CNN SVM RF LR andGBDT is 099 097 098 093 077 091 063 and 077 AndFigure 11 presents the results of all comparative experimentsin terms of precision recall and F1 score Among theperformances of eight different electricity theft detectionalgorithms the deep learning methods (such as CNN CNN-RF CNN-SVM and CNN-GBDT) show better perfor-mances than machine learning methods (eg LR SVMGBDT and RF)-e reason of this result is that the CNN canlearn features through a large number of electricity con-sumption data -us the accuracy of electricity theft de-tection is greatly improved In addition it is indicated thatthe proposed CNN-RF model shows the best performancecompared with other methods

Table 3 Classification score summary for CNN-RF

Precision Recall F1 scoreClass 0 097 096 096Class 1 098 098 098Averagetotal 097 097 097

0 01 02 03 04 05 06 07 08 09 1FPR

0

01

02

03

04

05

06

07

08

09

1

TPR

AUC (CNN-RF) = 099

Figure 9 ROC curve for CNN-RF

Table 4 Experimental parameters for comparative methods

Methods Input data Parameters

LR Features(1D)

Penalty L2Inverse of regulation strength 10

SVM Features(1D)

Penalty parameter of the error term 105Parameter of kernel function (RBF)

00005

GBDT Features(1D) -e number of estimators 200

RF Features(1D) -e number of trees in the forest 100

CNN Raw data(2D)

-e max depth of each tree 30-e same as the CNN component of the

proposed approach

Journal of Electrical and Computer Engineering 9

As detecting electricity theft consumers is achieved bydiscovering the anomalous consumption behaviours ofsuspicious customers we can also regard it as an anomalydetection problem essentially Since CNN-RF has shownsuperior performance on the SEAI dataset it would beinteresting to test its generalization ability on other datasetsIn particular we have further tested the performance ofCNN-RF together with the above-mentioned models onanother dataset Low Carbon London (LCL) dataset [35]-e LCL dataset is composed of over 5500 honest electricityconsumers in total each of which consists of 525 days (the

sampling rate is hour) Among the 5500 customers werandomly chose 1200 as malicious customers and modifiedtheir intraday load profiles according to the fraudulentsample generation methods similar to the SEAI dataset

After the preprocess of the LCL dataset (similar to theSEAI dataset) the train dataset contains 7584 customers inwhich the number of normal and abnormal samples is equalAnd the test dataset consists of 1700 customers together withcorresponding labels indicating anomaly or not -en thetraining of the proposed CNN-RF model and comparativemodels is all done following the aforementioned procedure

CNN

CNN

-RF

CNN

-GBD

T

CNN

-SV

M

GBD

T LR RF

SVM

Different algorithms

0

01

02

03

04

05

06

07

08

09

1

Perfo

rman

ce m

etric

PrecisionRecallF1 score

Figure 11 Comparison with other algorithms based on the SEAIdataset

0 01 02 03 04 05 06 07 08 09 1FPR

0

01

02

03

04

05

06

07

08

09

1

TPR

AUC (CNN) = 093AUC (CNN-RF) = 099AUC (CNN-GBDT) = 097AUC (CNN-SVM) = 098

AUC (GBDT) = 077AUC (LR) = 063AUC (RF) = 091AUC (SVM) = 077

Figure 10 ROC curves for eight different algorithms based on theSEAI dataset

TPR

FPR

AUC (CNN) = 095AUC (CNN-RF) = 097AUC (CNN-GBDT) = 094AUC (CNN-SVM) = 096

AUC (GBDT) = 074AUC (LR) = 068AUC (RF) = 077AUC (SVM) = 071

0

01

02

03

04

05

06

07

08

09

1

0 01 02 03 04 05 06 07 08 09 1

Figure 12 ROC curves for eight different algorithms based on theLCL dataset

0

01

02

03

04

05

06

07

08

09

1

Perfo

rman

ce m

etric

CNN

CNN

-RF

CNN

-GBD

T

CNN

-SV

M

GBD

T

SVM

Different algorithms

LR RF

PrecisionRecallF1 score

Figure 13 Comparison with other algorithms based on the LCLdataset

10 Journal of Electrical and Computer Engineering

based on the train dataset -e performances of all themodels on the test dataset are shown in Figures 12 and 13

Figures 12 and 13 show that the performances of CNN-SVM CNN-GBDT and CNN-RF are better than those ofother models on the whole Since they are deep learningmodels the distinguishable features can be learned directlyfrom raw smart meter data by the CNN which has equippedthem with better generalization ability In addition theCNN-RF model has achieved the best performance since theRF can handle high-dimensional data while maintaininghigh computational efficiency

7 Conclusions

In this paper a novel CNN-RF model is presented to detectelectricity theft In this model the CNN is similar to anautomatic feature extractor in investigating smart meter dataand the RF is the output classifier Because a large number ofparameters must be optimized that increase the risk ofoverfitting a fully connected layer with a dropout rate of 04is designed during the training phase In addition the SMOTalgorithm is adopted to overcome the problem of dataimbalance Some machine learning and deep learningmethods such as SVM RF GBDT and LR are applied to thesame problem as a benchmark and all those methods havebeen conducted on SEAI and LCL datasets -e resultsindicate that the proposed CNN-RF model is quite apromising classification method in the electricity theft de-tection field because of two properties -e first is thatfeatures can be automatically extracted by the hybrid modelwhile the success of most other traditional classifiers relieslargely on the retrieval of good hand-designed featureswhich is a laborious and time-consuming task -e secondlies in that the hybrid model combines the advantages of theRF and CNN as both are the most popular and successfulclassifiers in the electricity theft detection field

Since the detection of electricity theft affects the privacyof consumers the future work will focus on investigatinghow the granularity and duration of smart meter data mightaffect this privacy Extending the proposed hybrid CNN-RFmodel to other applications (eg load forecasting) is a taskworth investigating

Data Availability

Previously reported [SEAI][LCL] datasets were used tosupport this study and are available at [httpwwwucdieissdadata][httpsbetaukdataserviceacukdatacataloguestudiesstudyid=7857] -ese prior studies (and datasets)are cited at relevant places within the text as references[19 26]

Conflicts of Interest

-e authors declare that there are no conflicts of interest

Acknowledgments

-is work was supported by the National Key Research andDevelopment Program of China (2016YFB0901900) the

Natural Science Foundation of Hebei Province of China(F2017501107) Open Research Fund from the State KeyLaboratory of Rolling and Automation Northeastern Uni-versity (2017RALKFKT003) the Foundation of Northeast-ern University at Qinhuangdao (XNB201803) and theFundamental Research Funds for the Central Universities(N182303037)

References

[1] S S S R Depuru L Wang and V Devabhaktuni ldquoElectricitytheft overview issues prevention and a smart meter basedapproach to control theftrdquo Energy Policy vol 39 no 2pp 1007ndash1015 2011

[2] J P Navani N K Sharma and S Sapra ldquoTechnical and non-technical losses in power system and its economic conse-quence in Indian economyrdquo International Journal of Elec-tronics and Computer Science Engineering vol 1 no 2pp 757ndash761 2012

[3] S McLaughlin B Holbert A Fawaz R Berthier andS Zonouz ldquoA multi-sensor energy theft detection frameworkfor advanced metering infrastructuresrdquo IEEE Journal on Se-lected Areas in Communications vol 31 no 7 pp 1319ndash13302013

[4] P McDaniel and S McLaughlin ldquoSecurity and privacychallenges in the smart gridrdquo IEEE Security amp PrivacyMagazine vol 7 no 3 pp 75ndash77 2009

[5] T B Smith ldquoElectricity theft a comparative analysisrdquo EnergyPolicy vol 32 no 1 pp 2067ndash2076 2004

[6] J I Guerrero C Leon I Monedero F Biscarri andJ Biscarri ldquoImproving knowledge-based systems with sta-tistical techniques text mining and neural networks for non-technical loss detectionrdquo Knowledge-Based Systems vol 71no 4 pp 376ndash388 2014

[7] C C O Ramos A N Souza G Chiachia A X Falcatildeo andJ P Papa ldquoA novel algorithm for feature selection usingharmony search and its application for non-technical lossesdetectionrdquo Computers amp Electrical Engineering vol 37 no 6pp 886ndash894 2011

[8] P Glauner J A Meira P Valtchev R State and F Bettingerldquo-e challenge of non-technical loss detection using artificialintelligence a surveyficial intelligence a surveyrdquo In-ternational Journal of Computational Intelligence Systemsvol 10 no 1 pp 760ndash775 2017

[9] S-C Huang Y-L Lo and C-N Lu ldquoNon-technical lossdetection using state estimation and analysis of variancerdquoIEEE Transactions on Power Systems vol 28 no 3pp 2959ndash2966 2013

[10] O Rahmati H R Pourghasemi and A M Melesse ldquoAp-plication of GIS-based data driven random forest and max-imum entropy models for groundwater potential mapping acase study at Mehran region Iranrdquo CATENA vol 137pp 360ndash372 2016

[11] N Edison A C Aranha and J Coelho ldquoProbabilisticmethodology for technical and non-technical losses estima-tion in distribution systemrdquo Electric Power Systems Researchvol 97 no 11 pp 93ndash99 2013

[12] J B Leite and J R S Mantovani ldquoDetecting and locating non-technical losses in modern distribution networksrdquo IEEETransactions on Smart Grid vol 9 no 2 pp 1023ndash1032 2018

[13] S Amin G A Schwartz A A Cardenas and S S SastryldquoGame-theoretic models of electricity theft detection in smartutility networks providing new capabilities with advanced

Journal of Electrical and Computer Engineering 11

metering infrastructurerdquo IEEE Control Systems Magazinevol 35 no 1 pp 66ndash81 2015

[14] A H Nizar Z Y Dong Y Wang and A N Souza ldquoPowerutility nontechnical loss analysis with extreme learning ma-chine methodrdquo IEEE Transactions on Power Systems vol 23no 3 pp 946ndash955 2008

[15] J Nagi K S Yap S K Tiong S K Ahmed andMMohamadldquoNontechnical loss detection for metered customers in powerutility using support vector machinesrdquo IEEE Transactions onPower Delivery vol 25 no 2 pp 1162ndash1171 2010

[16] E W S Angelos O R Saavedra O A C Cortes andA N de Souza ldquoDetection and identification of abnormalitiesin customer consumptions in power distribution systemsrdquoIEEE Transactions on Power Delivery vol 26 no 4pp 2436ndash2442 2011

[17] K A P Costa L A M Pereira R Y M NakamuraC R Pereira J P Papa and A Xavier Falcatildeo ldquoA nature-inspired approach to speed up optimum-path forest clusteringand its application to intrusion detection in computer net-worksrdquo Information Sciences vol 294 pp 95ndash108 2015

[18] R R Bhat R D Trevizan X Li and A Bretas ldquoIdentifyingnontechnical power loss via spatial and temporal deeplearningrdquo in Proceedings of the IEEE International Conferenceon Machine Learning and Applications Anaheim CA USADecember 2016

[19] M Ismail M Shahin M F Shaaban E Serpedin andK Qaraqe ldquoEfficient detection of electricity theft cyber attacksin ami networksrdquo in Proceedings of the IEEE Wireless Com-munications and Networking Conference Barcelona SpainApril 2018

[20] RMehrizi X Peng X Xu S Zhang DMetaxas and K Li ldquoAcomputer vision based method for 3D posture estimation ofsymmetrical liftingrdquo Journal of Biomechanics vol 69 no 1pp 40ndash46 2018

[21] K He X Zhang S Ren and J Sun ldquoDelving deep intorectifiers surpassing human-level performance on imagenetclassificationrdquo in Proceedings of the IEEE Conference onComputer for Sustainable Global Development pp 1026ndash1034New Delhi India March 2015

[22] A Sharif Razavian H Azizpour J Sullivan and S CarlssonldquoCNN features off-the-shelf an astounding baseline for rec-ognitionrdquo Astrophysical Journal Supplement Series vol 221no 1 pp 806ndash813 2014

[23] Z Zheng Y Yang X Niu H-N Dai and Y Zhou ldquoWide anddeep convolutional neural networks for electricity-theft de-tection to secure smart gridsrdquo IEEE Transactions on IndustrialInformatics vol 14 no 4 pp 1606ndash1615 2018

[24] D Svozil V Kvasnicka and J Pospichal ldquoIntroduction tomulti-layer feed-forward neural networksrdquo Chemometricsand Intelligent Laboratory Systems vol 39 no 1 pp 43ndash621997

[25] Irish social science data archive httpwwwucdieissdadatacommissionforenegryregulationcer

[26] P Jokar N Arianpoo and V C M Leung ldquoElectricity theftdetection in AMI using customersrsquo consumption patternsrdquoIEEE Transactions on Smart Grid vol 7 no 1 pp 216ndash2262016

[27] Y Wang Q Chen D Gan J Yang D S Kirschen andC Kang ldquoDeep learning-based socio-demographic in-formation identification from smart meter datafication fromsmart meter datardquo IEEE Transactions on Smart Grid vol 10no 3 pp 2593ndash2602 2019

[28] V Chandola A Banerjee and V Kumar ldquoAnomaly de-tection a surveyrdquo ACM Computing Surveys vol 41 no 32009

[29] H-T Cheng L Koc J Harmsen et al ldquoWide amp deep learningfor recommender systemsrdquo in Proceedings of the 1stWorkshopon Deep Learning for Recommender SystemsmdashDLRS 2016pp 7ndash10 Boston MA USA September 2016

[30] E Feczko N M Balba O Miranda-Dominguez et alldquoSubtyping cognitive profiles in autism spectrum disorderusing a functional random forest algorithmfiles in autismspectrum disorder using a functional random forest algo-rithmrdquo NeuroImage vol 172 pp 674ndash688 2018

[31] Y-L Boureau J Ponce and Y LeCun ldquoA theoretical analysisof feature pooling in visual recognitionrdquo in Proceedings of the27th International Conference on Machine Learningpp 111ndash118 Haifa Israel June 2010

[32] N Srivastava G Hinton A Krizhevsky I Sutskever andR Salakhutdinov ldquoDropout a simply way to prevent neuralnetworks from overfittingrdquo Ce Journal of Machine LearningResearch vol 15 no 1 pp 1929ndash1958 2014

[33] M Abadi A Agarwal P Barham et al ldquoTensorflow Large-scale machine learning on heterogeneous distributed sys-temsrdquo 2016 httpsarxivorgabs16030446

[34] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 no 4 pp 2825ndash2830 2011

[35] J R Schofield ldquoLow carbon london project data from thedynamic time-of-use electricity pricing trialrdquo 2017 httpsdiscoverukdataserviceacukcataloguesn7857

12 Journal of Electrical and Computer Engineering

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 8: ElectricityTheftDetectioninPowerGridswithDeep ...downloads.hindawi.com/journals/jece/2019/4136874.pdf(1)Datacleaning:thereareerroneousvalues(e.g.,outliers) in raw data, which correspond

quality of the classifier and the larger AUC represents betterperformance

62 Experiment Based onCNN-RFModel -e experimentis conducted based on normalized consumption dataWith the batch size of 100 the training procedure isstopped after 300 epochs As shown in Figure 8 it could beseen that training and testing losses settled downsmoothly -is means that the proposed model learnedtrain dataset and there was small bias in the test datadataset as well -en the RF classifier is the last layer of theCNN to predict labels of input consumers Forty valuesfrom the fully connected layer of the trained CNN are usedas a new feature vector to detect abnormal users and arefed to the RF for learning and testing To build the RF inthe hybrid model the optimal parameters are determinedby applying the grid search algorithm based on the

training dataset -e grid searching range of each pa-rameter is given as follows numTrees [50 60 100]

and max depth [10 20 40] 6 times 4 24 are tried with

Require(1) Original training datasetsS (xi yi) i 1 2 3 m1113864 1113865 (X Y) isin Re times R

(2) Test datasets xj isin Rm

for d 1 to Ntree do(1) Draw a bootstrap Sd from the original training data(2) Grow an unpruned tree hd using data Sd

(a)Randomly select new feature set Mtry from original feature set e

(b)Select the best features from feature set Mtry based on Gini indicator on each node(c)Split until each tree grows to its maximum

end forEnsure(1) -e collection of trees hd d 1 2 Ntree1113864 1113865

(2) For xj the output of a decision tree is hd(xj)

f(xj) majority vote hd(xj)1113966 1113967d

Ntreereturn f(xj)

ALGORITHM 1 Random forest algorithm

x Wconv Wpool J (W b)

bx

bconv

bpool

(a)

x Wconv Wpool J (W b)

bx

bconv

bpool

(b)

Figure 7 Process of forward propagation (a) and backpropagation (b)

0 50 100 150 200 250 300Epoch

03

035

04

045

05

055

06

Usa

ge (k

W)

TrainingTesting

Figure 8 Loss curves of training and testing

Table 2 Confusion matrix applied in electricity theft detection

Actualdetected Normal AbnormalNormal TP (true positive) FN (false negative)Abnormal FP (false positive) TN (true negative)

8 Journal of Electrical and Computer Engineering

different combinations -e best result is achieved whennumTrees 100 and maxDepth 30 -ese parametersare then used to train the hybrid model

-e classification results of the detection model pro-posed in this paper are as follows the number of TPs FNsFPs and TNs is 1041 28 23 and 577 respectively-erefore the precision recall and F1 score can be calcu-lated as shown in Table 3 where Class 0 is an abnormalpattern and Class 1 is a normal pattern Also the AUC valueof the CNN-RF algorithm is calculated which is 098 and farbetter than the value of the baseline model (AUC 05) -ismeans the proposed algorithm is able to classify both classesaccurately as shown in Figure 9

63 Comparative Experiments and the Analysis of ResultsTo evaluate the accuracy of the CNN-RF model comparativeexperiments are conducted based on handcrafted featureswith nondeep learning methods including SVM RF gradientboosting decision tree (GBDT) and logistic regression (LR)Moreover we compared the obtained classification results byvarious supervised classifiers CNN features with the SVMclassifier (CNN-SVM) and CNN features with the GBDTclassifier (CNN-GBDT) to those obtained by the previousclassification work In the following five methods are in-troduced respectively and then the results are analysed

(1) Logistic regression it is a basic model in binaryclassification that is equivalent to one layer of neuralnetwork with sigmoid activation function A sigmoidfunction obtains a value ranging from 0 to 1 based onlinear regression -en any value larger than 05 willbe classified as a normal pattern and less than 05 willbe classified as an abnormal pattern

(2) Support vector machine this classifier with kernelfunctions can transform a nonlinear separableproblem into a linear separable problem by pro-jecting data into the feature space and then findingthe optimal separate hyperplane It is applied topredict the patterns of consumers based on theaforementioned handcrafted features

(3) Random forest it is essentially an integration ofmultiple decision trees that can achieve better per-formance when maintaining the effective control ofoverfitting in comparison with a single decision tree-e RF classifier also can handle high-dimensionaldata while maintaining high computationalefficiency

(4) Gradient boosting decision tree it is an iterativedecision tree algorithm which consists of multipledecision trees For the final output all results orweights are accumulated in the GBDT while randomforests use majority voting

(5) Deep learning methods softmax is used in the lastlayer of the CNN CNN-GBDT and CNN-SVM incomparison with the proposed method

Table 4 summarizes the parameters of comparativemethods All experiments have been done Figure 10 shows

the results of different methods and proposed hybrid CNN-RF method and it can be seen that the AUC value of CNN-RF CNN-GBDT CNN-SVM CNN SVM RF LR andGBDT is 099 097 098 093 077 091 063 and 077 AndFigure 11 presents the results of all comparative experimentsin terms of precision recall and F1 score Among theperformances of eight different electricity theft detectionalgorithms the deep learning methods (such as CNN CNN-RF CNN-SVM and CNN-GBDT) show better perfor-mances than machine learning methods (eg LR SVMGBDT and RF)-e reason of this result is that the CNN canlearn features through a large number of electricity con-sumption data -us the accuracy of electricity theft de-tection is greatly improved In addition it is indicated thatthe proposed CNN-RF model shows the best performancecompared with other methods

Table 3 Classification score summary for CNN-RF

Precision Recall F1 scoreClass 0 097 096 096Class 1 098 098 098Averagetotal 097 097 097

0 01 02 03 04 05 06 07 08 09 1FPR

0

01

02

03

04

05

06

07

08

09

1

TPR

AUC (CNN-RF) = 099

Figure 9 ROC curve for CNN-RF

Table 4 Experimental parameters for comparative methods

Methods Input data Parameters

LR Features(1D)

Penalty L2Inverse of regulation strength 10

SVM Features(1D)

Penalty parameter of the error term 105Parameter of kernel function (RBF)

00005

GBDT Features(1D) -e number of estimators 200

RF Features(1D) -e number of trees in the forest 100

CNN Raw data(2D)

-e max depth of each tree 30-e same as the CNN component of the

proposed approach

Journal of Electrical and Computer Engineering 9

As detecting electricity theft consumers is achieved bydiscovering the anomalous consumption behaviours ofsuspicious customers we can also regard it as an anomalydetection problem essentially Since CNN-RF has shownsuperior performance on the SEAI dataset it would beinteresting to test its generalization ability on other datasetsIn particular we have further tested the performance ofCNN-RF together with the above-mentioned models onanother dataset Low Carbon London (LCL) dataset [35]-e LCL dataset is composed of over 5500 honest electricityconsumers in total each of which consists of 525 days (the

sampling rate is hour) Among the 5500 customers werandomly chose 1200 as malicious customers and modifiedtheir intraday load profiles according to the fraudulentsample generation methods similar to the SEAI dataset

After the preprocess of the LCL dataset (similar to theSEAI dataset) the train dataset contains 7584 customers inwhich the number of normal and abnormal samples is equalAnd the test dataset consists of 1700 customers together withcorresponding labels indicating anomaly or not -en thetraining of the proposed CNN-RF model and comparativemodels is all done following the aforementioned procedure

CNN

CNN

-RF

CNN

-GBD

T

CNN

-SV

M

GBD

T LR RF

SVM

Different algorithms

0

01

02

03

04

05

06

07

08

09

1

Perfo

rman

ce m

etric

PrecisionRecallF1 score

Figure 11 Comparison with other algorithms based on the SEAIdataset

0 01 02 03 04 05 06 07 08 09 1FPR

0

01

02

03

04

05

06

07

08

09

1

TPR

AUC (CNN) = 093AUC (CNN-RF) = 099AUC (CNN-GBDT) = 097AUC (CNN-SVM) = 098

AUC (GBDT) = 077AUC (LR) = 063AUC (RF) = 091AUC (SVM) = 077

Figure 10 ROC curves for eight different algorithms based on theSEAI dataset

TPR

FPR

AUC (CNN) = 095AUC (CNN-RF) = 097AUC (CNN-GBDT) = 094AUC (CNN-SVM) = 096

AUC (GBDT) = 074AUC (LR) = 068AUC (RF) = 077AUC (SVM) = 071

0

01

02

03

04

05

06

07

08

09

1

0 01 02 03 04 05 06 07 08 09 1

Figure 12 ROC curves for eight different algorithms based on theLCL dataset

0

01

02

03

04

05

06

07

08

09

1

Perfo

rman

ce m

etric

CNN

CNN

-RF

CNN

-GBD

T

CNN

-SV

M

GBD

T

SVM

Different algorithms

LR RF

PrecisionRecallF1 score

Figure 13 Comparison with other algorithms based on the LCLdataset

10 Journal of Electrical and Computer Engineering

based on the train dataset -e performances of all themodels on the test dataset are shown in Figures 12 and 13

Figures 12 and 13 show that the performances of CNN-SVM CNN-GBDT and CNN-RF are better than those ofother models on the whole Since they are deep learningmodels the distinguishable features can be learned directlyfrom raw smart meter data by the CNN which has equippedthem with better generalization ability In addition theCNN-RF model has achieved the best performance since theRF can handle high-dimensional data while maintaininghigh computational efficiency

7 Conclusions

In this paper a novel CNN-RF model is presented to detectelectricity theft In this model the CNN is similar to anautomatic feature extractor in investigating smart meter dataand the RF is the output classifier Because a large number ofparameters must be optimized that increase the risk ofoverfitting a fully connected layer with a dropout rate of 04is designed during the training phase In addition the SMOTalgorithm is adopted to overcome the problem of dataimbalance Some machine learning and deep learningmethods such as SVM RF GBDT and LR are applied to thesame problem as a benchmark and all those methods havebeen conducted on SEAI and LCL datasets -e resultsindicate that the proposed CNN-RF model is quite apromising classification method in the electricity theft de-tection field because of two properties -e first is thatfeatures can be automatically extracted by the hybrid modelwhile the success of most other traditional classifiers relieslargely on the retrieval of good hand-designed featureswhich is a laborious and time-consuming task -e secondlies in that the hybrid model combines the advantages of theRF and CNN as both are the most popular and successfulclassifiers in the electricity theft detection field

Since the detection of electricity theft affects the privacyof consumers the future work will focus on investigatinghow the granularity and duration of smart meter data mightaffect this privacy Extending the proposed hybrid CNN-RFmodel to other applications (eg load forecasting) is a taskworth investigating

Data Availability

Previously reported [SEAI][LCL] datasets were used tosupport this study and are available at [httpwwwucdieissdadata][httpsbetaukdataserviceacukdatacataloguestudiesstudyid=7857] -ese prior studies (and datasets)are cited at relevant places within the text as references[19 26]

Conflicts of Interest

-e authors declare that there are no conflicts of interest

Acknowledgments

-is work was supported by the National Key Research andDevelopment Program of China (2016YFB0901900) the

Natural Science Foundation of Hebei Province of China(F2017501107) Open Research Fund from the State KeyLaboratory of Rolling and Automation Northeastern Uni-versity (2017RALKFKT003) the Foundation of Northeast-ern University at Qinhuangdao (XNB201803) and theFundamental Research Funds for the Central Universities(N182303037)

References

[1] S S S R Depuru L Wang and V Devabhaktuni ldquoElectricitytheft overview issues prevention and a smart meter basedapproach to control theftrdquo Energy Policy vol 39 no 2pp 1007ndash1015 2011

[2] J P Navani N K Sharma and S Sapra ldquoTechnical and non-technical losses in power system and its economic conse-quence in Indian economyrdquo International Journal of Elec-tronics and Computer Science Engineering vol 1 no 2pp 757ndash761 2012

[3] S McLaughlin B Holbert A Fawaz R Berthier andS Zonouz ldquoA multi-sensor energy theft detection frameworkfor advanced metering infrastructuresrdquo IEEE Journal on Se-lected Areas in Communications vol 31 no 7 pp 1319ndash13302013

[4] P McDaniel and S McLaughlin ldquoSecurity and privacychallenges in the smart gridrdquo IEEE Security amp PrivacyMagazine vol 7 no 3 pp 75ndash77 2009

[5] T B Smith ldquoElectricity theft a comparative analysisrdquo EnergyPolicy vol 32 no 1 pp 2067ndash2076 2004

[6] J I Guerrero C Leon I Monedero F Biscarri andJ Biscarri ldquoImproving knowledge-based systems with sta-tistical techniques text mining and neural networks for non-technical loss detectionrdquo Knowledge-Based Systems vol 71no 4 pp 376ndash388 2014

[7] C C O Ramos A N Souza G Chiachia A X Falcatildeo andJ P Papa ldquoA novel algorithm for feature selection usingharmony search and its application for non-technical lossesdetectionrdquo Computers amp Electrical Engineering vol 37 no 6pp 886ndash894 2011

[8] P Glauner J A Meira P Valtchev R State and F Bettingerldquo-e challenge of non-technical loss detection using artificialintelligence a surveyficial intelligence a surveyrdquo In-ternational Journal of Computational Intelligence Systemsvol 10 no 1 pp 760ndash775 2017

[9] S-C Huang Y-L Lo and C-N Lu ldquoNon-technical lossdetection using state estimation and analysis of variancerdquoIEEE Transactions on Power Systems vol 28 no 3pp 2959ndash2966 2013

[10] O Rahmati H R Pourghasemi and A M Melesse ldquoAp-plication of GIS-based data driven random forest and max-imum entropy models for groundwater potential mapping acase study at Mehran region Iranrdquo CATENA vol 137pp 360ndash372 2016

[11] N Edison A C Aranha and J Coelho ldquoProbabilisticmethodology for technical and non-technical losses estima-tion in distribution systemrdquo Electric Power Systems Researchvol 97 no 11 pp 93ndash99 2013

[12] J B Leite and J R S Mantovani ldquoDetecting and locating non-technical losses in modern distribution networksrdquo IEEETransactions on Smart Grid vol 9 no 2 pp 1023ndash1032 2018

[13] S Amin G A Schwartz A A Cardenas and S S SastryldquoGame-theoretic models of electricity theft detection in smartutility networks providing new capabilities with advanced

Journal of Electrical and Computer Engineering 11

metering infrastructurerdquo IEEE Control Systems Magazinevol 35 no 1 pp 66ndash81 2015

[14] A H Nizar Z Y Dong Y Wang and A N Souza ldquoPowerutility nontechnical loss analysis with extreme learning ma-chine methodrdquo IEEE Transactions on Power Systems vol 23no 3 pp 946ndash955 2008

[15] J Nagi K S Yap S K Tiong S K Ahmed andMMohamadldquoNontechnical loss detection for metered customers in powerutility using support vector machinesrdquo IEEE Transactions onPower Delivery vol 25 no 2 pp 1162ndash1171 2010

[16] E W S Angelos O R Saavedra O A C Cortes andA N de Souza ldquoDetection and identification of abnormalitiesin customer consumptions in power distribution systemsrdquoIEEE Transactions on Power Delivery vol 26 no 4pp 2436ndash2442 2011

[17] K A P Costa L A M Pereira R Y M NakamuraC R Pereira J P Papa and A Xavier Falcatildeo ldquoA nature-inspired approach to speed up optimum-path forest clusteringand its application to intrusion detection in computer net-worksrdquo Information Sciences vol 294 pp 95ndash108 2015

[18] R R Bhat R D Trevizan X Li and A Bretas ldquoIdentifyingnontechnical power loss via spatial and temporal deeplearningrdquo in Proceedings of the IEEE International Conferenceon Machine Learning and Applications Anaheim CA USADecember 2016

[19] M Ismail M Shahin M F Shaaban E Serpedin andK Qaraqe ldquoEfficient detection of electricity theft cyber attacksin ami networksrdquo in Proceedings of the IEEE Wireless Com-munications and Networking Conference Barcelona SpainApril 2018

[20] RMehrizi X Peng X Xu S Zhang DMetaxas and K Li ldquoAcomputer vision based method for 3D posture estimation ofsymmetrical liftingrdquo Journal of Biomechanics vol 69 no 1pp 40ndash46 2018

[21] K He X Zhang S Ren and J Sun ldquoDelving deep intorectifiers surpassing human-level performance on imagenetclassificationrdquo in Proceedings of the IEEE Conference onComputer for Sustainable Global Development pp 1026ndash1034New Delhi India March 2015

[22] A Sharif Razavian H Azizpour J Sullivan and S CarlssonldquoCNN features off-the-shelf an astounding baseline for rec-ognitionrdquo Astrophysical Journal Supplement Series vol 221no 1 pp 806ndash813 2014

[23] Z Zheng Y Yang X Niu H-N Dai and Y Zhou ldquoWide anddeep convolutional neural networks for electricity-theft de-tection to secure smart gridsrdquo IEEE Transactions on IndustrialInformatics vol 14 no 4 pp 1606ndash1615 2018

[24] D Svozil V Kvasnicka and J Pospichal ldquoIntroduction tomulti-layer feed-forward neural networksrdquo Chemometricsand Intelligent Laboratory Systems vol 39 no 1 pp 43ndash621997

[25] Irish social science data archive httpwwwucdieissdadatacommissionforenegryregulationcer

[26] P Jokar N Arianpoo and V C M Leung ldquoElectricity theftdetection in AMI using customersrsquo consumption patternsrdquoIEEE Transactions on Smart Grid vol 7 no 1 pp 216ndash2262016

[27] Y Wang Q Chen D Gan J Yang D S Kirschen andC Kang ldquoDeep learning-based socio-demographic in-formation identification from smart meter datafication fromsmart meter datardquo IEEE Transactions on Smart Grid vol 10no 3 pp 2593ndash2602 2019

[28] V Chandola A Banerjee and V Kumar ldquoAnomaly de-tection a surveyrdquo ACM Computing Surveys vol 41 no 32009

[29] H-T Cheng L Koc J Harmsen et al ldquoWide amp deep learningfor recommender systemsrdquo in Proceedings of the 1stWorkshopon Deep Learning for Recommender SystemsmdashDLRS 2016pp 7ndash10 Boston MA USA September 2016

[30] E Feczko N M Balba O Miranda-Dominguez et alldquoSubtyping cognitive profiles in autism spectrum disorderusing a functional random forest algorithmfiles in autismspectrum disorder using a functional random forest algo-rithmrdquo NeuroImage vol 172 pp 674ndash688 2018

[31] Y-L Boureau J Ponce and Y LeCun ldquoA theoretical analysisof feature pooling in visual recognitionrdquo in Proceedings of the27th International Conference on Machine Learningpp 111ndash118 Haifa Israel June 2010

[32] N Srivastava G Hinton A Krizhevsky I Sutskever andR Salakhutdinov ldquoDropout a simply way to prevent neuralnetworks from overfittingrdquo Ce Journal of Machine LearningResearch vol 15 no 1 pp 1929ndash1958 2014

[33] M Abadi A Agarwal P Barham et al ldquoTensorflow Large-scale machine learning on heterogeneous distributed sys-temsrdquo 2016 httpsarxivorgabs16030446

[34] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 no 4 pp 2825ndash2830 2011

[35] J R Schofield ldquoLow carbon london project data from thedynamic time-of-use electricity pricing trialrdquo 2017 httpsdiscoverukdataserviceacukcataloguesn7857

12 Journal of Electrical and Computer Engineering

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 9: ElectricityTheftDetectioninPowerGridswithDeep ...downloads.hindawi.com/journals/jece/2019/4136874.pdf(1)Datacleaning:thereareerroneousvalues(e.g.,outliers) in raw data, which correspond

different combinations -e best result is achieved whennumTrees 100 and maxDepth 30 -ese parametersare then used to train the hybrid model

-e classification results of the detection model pro-posed in this paper are as follows the number of TPs FNsFPs and TNs is 1041 28 23 and 577 respectively-erefore the precision recall and F1 score can be calcu-lated as shown in Table 3 where Class 0 is an abnormalpattern and Class 1 is a normal pattern Also the AUC valueof the CNN-RF algorithm is calculated which is 098 and farbetter than the value of the baseline model (AUC 05) -ismeans the proposed algorithm is able to classify both classesaccurately as shown in Figure 9

63 Comparative Experiments and the Analysis of ResultsTo evaluate the accuracy of the CNN-RF model comparativeexperiments are conducted based on handcrafted featureswith nondeep learning methods including SVM RF gradientboosting decision tree (GBDT) and logistic regression (LR)Moreover we compared the obtained classification results byvarious supervised classifiers CNN features with the SVMclassifier (CNN-SVM) and CNN features with the GBDTclassifier (CNN-GBDT) to those obtained by the previousclassification work In the following five methods are in-troduced respectively and then the results are analysed

(1) Logistic regression it is a basic model in binaryclassification that is equivalent to one layer of neuralnetwork with sigmoid activation function A sigmoidfunction obtains a value ranging from 0 to 1 based onlinear regression -en any value larger than 05 willbe classified as a normal pattern and less than 05 willbe classified as an abnormal pattern

(2) Support vector machine this classifier with kernelfunctions can transform a nonlinear separableproblem into a linear separable problem by pro-jecting data into the feature space and then findingthe optimal separate hyperplane It is applied topredict the patterns of consumers based on theaforementioned handcrafted features

(3) Random forest it is essentially an integration ofmultiple decision trees that can achieve better per-formance when maintaining the effective control ofoverfitting in comparison with a single decision tree-e RF classifier also can handle high-dimensionaldata while maintaining high computationalefficiency

(4) Gradient boosting decision tree it is an iterativedecision tree algorithm which consists of multipledecision trees For the final output all results orweights are accumulated in the GBDT while randomforests use majority voting

(5) Deep learning methods softmax is used in the lastlayer of the CNN CNN-GBDT and CNN-SVM incomparison with the proposed method

Table 4 summarizes the parameters of comparativemethods All experiments have been done Figure 10 shows

the results of different methods and proposed hybrid CNN-RF method and it can be seen that the AUC value of CNN-RF CNN-GBDT CNN-SVM CNN SVM RF LR andGBDT is 099 097 098 093 077 091 063 and 077 AndFigure 11 presents the results of all comparative experimentsin terms of precision recall and F1 score Among theperformances of eight different electricity theft detectionalgorithms the deep learning methods (such as CNN CNN-RF CNN-SVM and CNN-GBDT) show better perfor-mances than machine learning methods (eg LR SVMGBDT and RF)-e reason of this result is that the CNN canlearn features through a large number of electricity con-sumption data -us the accuracy of electricity theft de-tection is greatly improved In addition it is indicated thatthe proposed CNN-RF model shows the best performancecompared with other methods

Table 3 Classification score summary for CNN-RF

Precision Recall F1 scoreClass 0 097 096 096Class 1 098 098 098Averagetotal 097 097 097

0 01 02 03 04 05 06 07 08 09 1FPR

0

01

02

03

04

05

06

07

08

09

1

TPR

AUC (CNN-RF) = 099

Figure 9 ROC curve for CNN-RF

Table 4 Experimental parameters for comparative methods

Methods Input data Parameters

LR Features(1D)

Penalty L2Inverse of regulation strength 10

SVM Features(1D)

Penalty parameter of the error term 105Parameter of kernel function (RBF)

00005

GBDT Features(1D) -e number of estimators 200

RF Features(1D) -e number of trees in the forest 100

CNN Raw data(2D)

-e max depth of each tree 30-e same as the CNN component of the

proposed approach

Journal of Electrical and Computer Engineering 9

As detecting electricity theft consumers is achieved bydiscovering the anomalous consumption behaviours ofsuspicious customers we can also regard it as an anomalydetection problem essentially Since CNN-RF has shownsuperior performance on the SEAI dataset it would beinteresting to test its generalization ability on other datasetsIn particular we have further tested the performance ofCNN-RF together with the above-mentioned models onanother dataset Low Carbon London (LCL) dataset [35]-e LCL dataset is composed of over 5500 honest electricityconsumers in total each of which consists of 525 days (the

sampling rate is hour) Among the 5500 customers werandomly chose 1200 as malicious customers and modifiedtheir intraday load profiles according to the fraudulentsample generation methods similar to the SEAI dataset

After the preprocess of the LCL dataset (similar to theSEAI dataset) the train dataset contains 7584 customers inwhich the number of normal and abnormal samples is equalAnd the test dataset consists of 1700 customers together withcorresponding labels indicating anomaly or not -en thetraining of the proposed CNN-RF model and comparativemodels is all done following the aforementioned procedure

CNN

CNN

-RF

CNN

-GBD

T

CNN

-SV

M

GBD

T LR RF

SVM

Different algorithms

0

01

02

03

04

05

06

07

08

09

1

Perfo

rman

ce m

etric

PrecisionRecallF1 score

Figure 11 Comparison with other algorithms based on the SEAIdataset

0 01 02 03 04 05 06 07 08 09 1FPR

0

01

02

03

04

05

06

07

08

09

1

TPR

AUC (CNN) = 093AUC (CNN-RF) = 099AUC (CNN-GBDT) = 097AUC (CNN-SVM) = 098

AUC (GBDT) = 077AUC (LR) = 063AUC (RF) = 091AUC (SVM) = 077

Figure 10 ROC curves for eight different algorithms based on theSEAI dataset

TPR

FPR

AUC (CNN) = 095AUC (CNN-RF) = 097AUC (CNN-GBDT) = 094AUC (CNN-SVM) = 096

AUC (GBDT) = 074AUC (LR) = 068AUC (RF) = 077AUC (SVM) = 071

0

01

02

03

04

05

06

07

08

09

1

0 01 02 03 04 05 06 07 08 09 1

Figure 12 ROC curves for eight different algorithms based on theLCL dataset

0

01

02

03

04

05

06

07

08

09

1

Perfo

rman

ce m

etric

CNN

CNN

-RF

CNN

-GBD

T

CNN

-SV

M

GBD

T

SVM

Different algorithms

LR RF

PrecisionRecallF1 score

Figure 13 Comparison with other algorithms based on the LCLdataset

10 Journal of Electrical and Computer Engineering

based on the train dataset -e performances of all themodels on the test dataset are shown in Figures 12 and 13

Figures 12 and 13 show that the performances of CNN-SVM CNN-GBDT and CNN-RF are better than those ofother models on the whole Since they are deep learningmodels the distinguishable features can be learned directlyfrom raw smart meter data by the CNN which has equippedthem with better generalization ability In addition theCNN-RF model has achieved the best performance since theRF can handle high-dimensional data while maintaininghigh computational efficiency

7 Conclusions

In this paper a novel CNN-RF model is presented to detectelectricity theft In this model the CNN is similar to anautomatic feature extractor in investigating smart meter dataand the RF is the output classifier Because a large number ofparameters must be optimized that increase the risk ofoverfitting a fully connected layer with a dropout rate of 04is designed during the training phase In addition the SMOTalgorithm is adopted to overcome the problem of dataimbalance Some machine learning and deep learningmethods such as SVM RF GBDT and LR are applied to thesame problem as a benchmark and all those methods havebeen conducted on SEAI and LCL datasets -e resultsindicate that the proposed CNN-RF model is quite apromising classification method in the electricity theft de-tection field because of two properties -e first is thatfeatures can be automatically extracted by the hybrid modelwhile the success of most other traditional classifiers relieslargely on the retrieval of good hand-designed featureswhich is a laborious and time-consuming task -e secondlies in that the hybrid model combines the advantages of theRF and CNN as both are the most popular and successfulclassifiers in the electricity theft detection field

Since the detection of electricity theft affects the privacyof consumers the future work will focus on investigatinghow the granularity and duration of smart meter data mightaffect this privacy Extending the proposed hybrid CNN-RFmodel to other applications (eg load forecasting) is a taskworth investigating

Data Availability

Previously reported [SEAI][LCL] datasets were used tosupport this study and are available at [httpwwwucdieissdadata][httpsbetaukdataserviceacukdatacataloguestudiesstudyid=7857] -ese prior studies (and datasets)are cited at relevant places within the text as references[19 26]

Conflicts of Interest

-e authors declare that there are no conflicts of interest

Acknowledgments

-is work was supported by the National Key Research andDevelopment Program of China (2016YFB0901900) the

Natural Science Foundation of Hebei Province of China(F2017501107) Open Research Fund from the State KeyLaboratory of Rolling and Automation Northeastern Uni-versity (2017RALKFKT003) the Foundation of Northeast-ern University at Qinhuangdao (XNB201803) and theFundamental Research Funds for the Central Universities(N182303037)

References

[1] S S S R Depuru L Wang and V Devabhaktuni ldquoElectricitytheft overview issues prevention and a smart meter basedapproach to control theftrdquo Energy Policy vol 39 no 2pp 1007ndash1015 2011

[2] J P Navani N K Sharma and S Sapra ldquoTechnical and non-technical losses in power system and its economic conse-quence in Indian economyrdquo International Journal of Elec-tronics and Computer Science Engineering vol 1 no 2pp 757ndash761 2012

[3] S McLaughlin B Holbert A Fawaz R Berthier andS Zonouz ldquoA multi-sensor energy theft detection frameworkfor advanced metering infrastructuresrdquo IEEE Journal on Se-lected Areas in Communications vol 31 no 7 pp 1319ndash13302013

[4] P McDaniel and S McLaughlin ldquoSecurity and privacychallenges in the smart gridrdquo IEEE Security amp PrivacyMagazine vol 7 no 3 pp 75ndash77 2009

[5] T B Smith ldquoElectricity theft a comparative analysisrdquo EnergyPolicy vol 32 no 1 pp 2067ndash2076 2004

[6] J I Guerrero C Leon I Monedero F Biscarri andJ Biscarri ldquoImproving knowledge-based systems with sta-tistical techniques text mining and neural networks for non-technical loss detectionrdquo Knowledge-Based Systems vol 71no 4 pp 376ndash388 2014

[7] C C O Ramos A N Souza G Chiachia A X Falcatildeo andJ P Papa ldquoA novel algorithm for feature selection usingharmony search and its application for non-technical lossesdetectionrdquo Computers amp Electrical Engineering vol 37 no 6pp 886ndash894 2011

[8] P Glauner J A Meira P Valtchev R State and F Bettingerldquo-e challenge of non-technical loss detection using artificialintelligence a surveyficial intelligence a surveyrdquo In-ternational Journal of Computational Intelligence Systemsvol 10 no 1 pp 760ndash775 2017

[9] S-C Huang Y-L Lo and C-N Lu ldquoNon-technical lossdetection using state estimation and analysis of variancerdquoIEEE Transactions on Power Systems vol 28 no 3pp 2959ndash2966 2013

[10] O Rahmati H R Pourghasemi and A M Melesse ldquoAp-plication of GIS-based data driven random forest and max-imum entropy models for groundwater potential mapping acase study at Mehran region Iranrdquo CATENA vol 137pp 360ndash372 2016

[11] N Edison A C Aranha and J Coelho ldquoProbabilisticmethodology for technical and non-technical losses estima-tion in distribution systemrdquo Electric Power Systems Researchvol 97 no 11 pp 93ndash99 2013

[12] J B Leite and J R S Mantovani ldquoDetecting and locating non-technical losses in modern distribution networksrdquo IEEETransactions on Smart Grid vol 9 no 2 pp 1023ndash1032 2018

[13] S Amin G A Schwartz A A Cardenas and S S SastryldquoGame-theoretic models of electricity theft detection in smartutility networks providing new capabilities with advanced

Journal of Electrical and Computer Engineering 11

metering infrastructurerdquo IEEE Control Systems Magazinevol 35 no 1 pp 66ndash81 2015

[14] A H Nizar Z Y Dong Y Wang and A N Souza ldquoPowerutility nontechnical loss analysis with extreme learning ma-chine methodrdquo IEEE Transactions on Power Systems vol 23no 3 pp 946ndash955 2008

[15] J Nagi K S Yap S K Tiong S K Ahmed andMMohamadldquoNontechnical loss detection for metered customers in powerutility using support vector machinesrdquo IEEE Transactions onPower Delivery vol 25 no 2 pp 1162ndash1171 2010

[16] E W S Angelos O R Saavedra O A C Cortes andA N de Souza ldquoDetection and identification of abnormalitiesin customer consumptions in power distribution systemsrdquoIEEE Transactions on Power Delivery vol 26 no 4pp 2436ndash2442 2011

[17] K A P Costa L A M Pereira R Y M NakamuraC R Pereira J P Papa and A Xavier Falcatildeo ldquoA nature-inspired approach to speed up optimum-path forest clusteringand its application to intrusion detection in computer net-worksrdquo Information Sciences vol 294 pp 95ndash108 2015

[18] R R Bhat R D Trevizan X Li and A Bretas ldquoIdentifyingnontechnical power loss via spatial and temporal deeplearningrdquo in Proceedings of the IEEE International Conferenceon Machine Learning and Applications Anaheim CA USADecember 2016

[19] M Ismail M Shahin M F Shaaban E Serpedin andK Qaraqe ldquoEfficient detection of electricity theft cyber attacksin ami networksrdquo in Proceedings of the IEEE Wireless Com-munications and Networking Conference Barcelona SpainApril 2018

[20] RMehrizi X Peng X Xu S Zhang DMetaxas and K Li ldquoAcomputer vision based method for 3D posture estimation ofsymmetrical liftingrdquo Journal of Biomechanics vol 69 no 1pp 40ndash46 2018

[21] K He X Zhang S Ren and J Sun ldquoDelving deep intorectifiers surpassing human-level performance on imagenetclassificationrdquo in Proceedings of the IEEE Conference onComputer for Sustainable Global Development pp 1026ndash1034New Delhi India March 2015

[22] A Sharif Razavian H Azizpour J Sullivan and S CarlssonldquoCNN features off-the-shelf an astounding baseline for rec-ognitionrdquo Astrophysical Journal Supplement Series vol 221no 1 pp 806ndash813 2014

[23] Z Zheng Y Yang X Niu H-N Dai and Y Zhou ldquoWide anddeep convolutional neural networks for electricity-theft de-tection to secure smart gridsrdquo IEEE Transactions on IndustrialInformatics vol 14 no 4 pp 1606ndash1615 2018

[24] D Svozil V Kvasnicka and J Pospichal ldquoIntroduction tomulti-layer feed-forward neural networksrdquo Chemometricsand Intelligent Laboratory Systems vol 39 no 1 pp 43ndash621997

[25] Irish social science data archive httpwwwucdieissdadatacommissionforenegryregulationcer

[26] P Jokar N Arianpoo and V C M Leung ldquoElectricity theftdetection in AMI using customersrsquo consumption patternsrdquoIEEE Transactions on Smart Grid vol 7 no 1 pp 216ndash2262016

[27] Y Wang Q Chen D Gan J Yang D S Kirschen andC Kang ldquoDeep learning-based socio-demographic in-formation identification from smart meter datafication fromsmart meter datardquo IEEE Transactions on Smart Grid vol 10no 3 pp 2593ndash2602 2019

[28] V Chandola A Banerjee and V Kumar ldquoAnomaly de-tection a surveyrdquo ACM Computing Surveys vol 41 no 32009

[29] H-T Cheng L Koc J Harmsen et al ldquoWide amp deep learningfor recommender systemsrdquo in Proceedings of the 1stWorkshopon Deep Learning for Recommender SystemsmdashDLRS 2016pp 7ndash10 Boston MA USA September 2016

[30] E Feczko N M Balba O Miranda-Dominguez et alldquoSubtyping cognitive profiles in autism spectrum disorderusing a functional random forest algorithmfiles in autismspectrum disorder using a functional random forest algo-rithmrdquo NeuroImage vol 172 pp 674ndash688 2018

[31] Y-L Boureau J Ponce and Y LeCun ldquoA theoretical analysisof feature pooling in visual recognitionrdquo in Proceedings of the27th International Conference on Machine Learningpp 111ndash118 Haifa Israel June 2010

[32] N Srivastava G Hinton A Krizhevsky I Sutskever andR Salakhutdinov ldquoDropout a simply way to prevent neuralnetworks from overfittingrdquo Ce Journal of Machine LearningResearch vol 15 no 1 pp 1929ndash1958 2014

[33] M Abadi A Agarwal P Barham et al ldquoTensorflow Large-scale machine learning on heterogeneous distributed sys-temsrdquo 2016 httpsarxivorgabs16030446

[34] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 no 4 pp 2825ndash2830 2011

[35] J R Schofield ldquoLow carbon london project data from thedynamic time-of-use electricity pricing trialrdquo 2017 httpsdiscoverukdataserviceacukcataloguesn7857

12 Journal of Electrical and Computer Engineering

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 10: ElectricityTheftDetectioninPowerGridswithDeep ...downloads.hindawi.com/journals/jece/2019/4136874.pdf(1)Datacleaning:thereareerroneousvalues(e.g.,outliers) in raw data, which correspond

As detecting electricity theft consumers is achieved bydiscovering the anomalous consumption behaviours ofsuspicious customers we can also regard it as an anomalydetection problem essentially Since CNN-RF has shownsuperior performance on the SEAI dataset it would beinteresting to test its generalization ability on other datasetsIn particular we have further tested the performance ofCNN-RF together with the above-mentioned models onanother dataset Low Carbon London (LCL) dataset [35]-e LCL dataset is composed of over 5500 honest electricityconsumers in total each of which consists of 525 days (the

sampling rate is hour) Among the 5500 customers werandomly chose 1200 as malicious customers and modifiedtheir intraday load profiles according to the fraudulentsample generation methods similar to the SEAI dataset

After the preprocess of the LCL dataset (similar to theSEAI dataset) the train dataset contains 7584 customers inwhich the number of normal and abnormal samples is equalAnd the test dataset consists of 1700 customers together withcorresponding labels indicating anomaly or not -en thetraining of the proposed CNN-RF model and comparativemodels is all done following the aforementioned procedure

CNN

CNN

-RF

CNN

-GBD

T

CNN

-SV

M

GBD

T LR RF

SVM

Different algorithms

0

01

02

03

04

05

06

07

08

09

1

Perfo

rman

ce m

etric

PrecisionRecallF1 score

Figure 11 Comparison with other algorithms based on the SEAIdataset

0 01 02 03 04 05 06 07 08 09 1FPR

0

01

02

03

04

05

06

07

08

09

1

TPR

AUC (CNN) = 093AUC (CNN-RF) = 099AUC (CNN-GBDT) = 097AUC (CNN-SVM) = 098

AUC (GBDT) = 077AUC (LR) = 063AUC (RF) = 091AUC (SVM) = 077

Figure 10 ROC curves for eight different algorithms based on theSEAI dataset

TPR

FPR

AUC (CNN) = 095AUC (CNN-RF) = 097AUC (CNN-GBDT) = 094AUC (CNN-SVM) = 096

AUC (GBDT) = 074AUC (LR) = 068AUC (RF) = 077AUC (SVM) = 071

0

01

02

03

04

05

06

07

08

09

1

0 01 02 03 04 05 06 07 08 09 1

Figure 12 ROC curves for eight different algorithms based on theLCL dataset

0

01

02

03

04

05

06

07

08

09

1

Perfo

rman

ce m

etric

CNN

CNN

-RF

CNN

-GBD

T

CNN

-SV

M

GBD

T

SVM

Different algorithms

LR RF

PrecisionRecallF1 score

Figure 13 Comparison with other algorithms based on the LCLdataset

10 Journal of Electrical and Computer Engineering

based on the train dataset -e performances of all themodels on the test dataset are shown in Figures 12 and 13

Figures 12 and 13 show that the performances of CNN-SVM CNN-GBDT and CNN-RF are better than those ofother models on the whole Since they are deep learningmodels the distinguishable features can be learned directlyfrom raw smart meter data by the CNN which has equippedthem with better generalization ability In addition theCNN-RF model has achieved the best performance since theRF can handle high-dimensional data while maintaininghigh computational efficiency

7 Conclusions

In this paper a novel CNN-RF model is presented to detectelectricity theft In this model the CNN is similar to anautomatic feature extractor in investigating smart meter dataand the RF is the output classifier Because a large number ofparameters must be optimized that increase the risk ofoverfitting a fully connected layer with a dropout rate of 04is designed during the training phase In addition the SMOTalgorithm is adopted to overcome the problem of dataimbalance Some machine learning and deep learningmethods such as SVM RF GBDT and LR are applied to thesame problem as a benchmark and all those methods havebeen conducted on SEAI and LCL datasets -e resultsindicate that the proposed CNN-RF model is quite apromising classification method in the electricity theft de-tection field because of two properties -e first is thatfeatures can be automatically extracted by the hybrid modelwhile the success of most other traditional classifiers relieslargely on the retrieval of good hand-designed featureswhich is a laborious and time-consuming task -e secondlies in that the hybrid model combines the advantages of theRF and CNN as both are the most popular and successfulclassifiers in the electricity theft detection field

Since the detection of electricity theft affects the privacyof consumers the future work will focus on investigatinghow the granularity and duration of smart meter data mightaffect this privacy Extending the proposed hybrid CNN-RFmodel to other applications (eg load forecasting) is a taskworth investigating

Data Availability

Previously reported [SEAI][LCL] datasets were used tosupport this study and are available at [httpwwwucdieissdadata][httpsbetaukdataserviceacukdatacataloguestudiesstudyid=7857] -ese prior studies (and datasets)are cited at relevant places within the text as references[19 26]

Conflicts of Interest

-e authors declare that there are no conflicts of interest

Acknowledgments

-is work was supported by the National Key Research andDevelopment Program of China (2016YFB0901900) the

Natural Science Foundation of Hebei Province of China(F2017501107) Open Research Fund from the State KeyLaboratory of Rolling and Automation Northeastern Uni-versity (2017RALKFKT003) the Foundation of Northeast-ern University at Qinhuangdao (XNB201803) and theFundamental Research Funds for the Central Universities(N182303037)

References

[1] S S S R Depuru L Wang and V Devabhaktuni ldquoElectricitytheft overview issues prevention and a smart meter basedapproach to control theftrdquo Energy Policy vol 39 no 2pp 1007ndash1015 2011

[2] J P Navani N K Sharma and S Sapra ldquoTechnical and non-technical losses in power system and its economic conse-quence in Indian economyrdquo International Journal of Elec-tronics and Computer Science Engineering vol 1 no 2pp 757ndash761 2012

[3] S McLaughlin B Holbert A Fawaz R Berthier andS Zonouz ldquoA multi-sensor energy theft detection frameworkfor advanced metering infrastructuresrdquo IEEE Journal on Se-lected Areas in Communications vol 31 no 7 pp 1319ndash13302013

[4] P McDaniel and S McLaughlin ldquoSecurity and privacychallenges in the smart gridrdquo IEEE Security amp PrivacyMagazine vol 7 no 3 pp 75ndash77 2009

[5] T B Smith ldquoElectricity theft a comparative analysisrdquo EnergyPolicy vol 32 no 1 pp 2067ndash2076 2004

[6] J I Guerrero C Leon I Monedero F Biscarri andJ Biscarri ldquoImproving knowledge-based systems with sta-tistical techniques text mining and neural networks for non-technical loss detectionrdquo Knowledge-Based Systems vol 71no 4 pp 376ndash388 2014

[7] C C O Ramos A N Souza G Chiachia A X Falcatildeo andJ P Papa ldquoA novel algorithm for feature selection usingharmony search and its application for non-technical lossesdetectionrdquo Computers amp Electrical Engineering vol 37 no 6pp 886ndash894 2011

[8] P Glauner J A Meira P Valtchev R State and F Bettingerldquo-e challenge of non-technical loss detection using artificialintelligence a surveyficial intelligence a surveyrdquo In-ternational Journal of Computational Intelligence Systemsvol 10 no 1 pp 760ndash775 2017

[9] S-C Huang Y-L Lo and C-N Lu ldquoNon-technical lossdetection using state estimation and analysis of variancerdquoIEEE Transactions on Power Systems vol 28 no 3pp 2959ndash2966 2013

[10] O Rahmati H R Pourghasemi and A M Melesse ldquoAp-plication of GIS-based data driven random forest and max-imum entropy models for groundwater potential mapping acase study at Mehran region Iranrdquo CATENA vol 137pp 360ndash372 2016

[11] N Edison A C Aranha and J Coelho ldquoProbabilisticmethodology for technical and non-technical losses estima-tion in distribution systemrdquo Electric Power Systems Researchvol 97 no 11 pp 93ndash99 2013

[12] J B Leite and J R S Mantovani ldquoDetecting and locating non-technical losses in modern distribution networksrdquo IEEETransactions on Smart Grid vol 9 no 2 pp 1023ndash1032 2018

[13] S Amin G A Schwartz A A Cardenas and S S SastryldquoGame-theoretic models of electricity theft detection in smartutility networks providing new capabilities with advanced

Journal of Electrical and Computer Engineering 11

metering infrastructurerdquo IEEE Control Systems Magazinevol 35 no 1 pp 66ndash81 2015

[14] A H Nizar Z Y Dong Y Wang and A N Souza ldquoPowerutility nontechnical loss analysis with extreme learning ma-chine methodrdquo IEEE Transactions on Power Systems vol 23no 3 pp 946ndash955 2008

[15] J Nagi K S Yap S K Tiong S K Ahmed andMMohamadldquoNontechnical loss detection for metered customers in powerutility using support vector machinesrdquo IEEE Transactions onPower Delivery vol 25 no 2 pp 1162ndash1171 2010

[16] E W S Angelos O R Saavedra O A C Cortes andA N de Souza ldquoDetection and identification of abnormalitiesin customer consumptions in power distribution systemsrdquoIEEE Transactions on Power Delivery vol 26 no 4pp 2436ndash2442 2011

[17] K A P Costa L A M Pereira R Y M NakamuraC R Pereira J P Papa and A Xavier Falcatildeo ldquoA nature-inspired approach to speed up optimum-path forest clusteringand its application to intrusion detection in computer net-worksrdquo Information Sciences vol 294 pp 95ndash108 2015

[18] R R Bhat R D Trevizan X Li and A Bretas ldquoIdentifyingnontechnical power loss via spatial and temporal deeplearningrdquo in Proceedings of the IEEE International Conferenceon Machine Learning and Applications Anaheim CA USADecember 2016

[19] M Ismail M Shahin M F Shaaban E Serpedin andK Qaraqe ldquoEfficient detection of electricity theft cyber attacksin ami networksrdquo in Proceedings of the IEEE Wireless Com-munications and Networking Conference Barcelona SpainApril 2018

[20] RMehrizi X Peng X Xu S Zhang DMetaxas and K Li ldquoAcomputer vision based method for 3D posture estimation ofsymmetrical liftingrdquo Journal of Biomechanics vol 69 no 1pp 40ndash46 2018

[21] K He X Zhang S Ren and J Sun ldquoDelving deep intorectifiers surpassing human-level performance on imagenetclassificationrdquo in Proceedings of the IEEE Conference onComputer for Sustainable Global Development pp 1026ndash1034New Delhi India March 2015

[22] A Sharif Razavian H Azizpour J Sullivan and S CarlssonldquoCNN features off-the-shelf an astounding baseline for rec-ognitionrdquo Astrophysical Journal Supplement Series vol 221no 1 pp 806ndash813 2014

[23] Z Zheng Y Yang X Niu H-N Dai and Y Zhou ldquoWide anddeep convolutional neural networks for electricity-theft de-tection to secure smart gridsrdquo IEEE Transactions on IndustrialInformatics vol 14 no 4 pp 1606ndash1615 2018

[24] D Svozil V Kvasnicka and J Pospichal ldquoIntroduction tomulti-layer feed-forward neural networksrdquo Chemometricsand Intelligent Laboratory Systems vol 39 no 1 pp 43ndash621997

[25] Irish social science data archive httpwwwucdieissdadatacommissionforenegryregulationcer

[26] P Jokar N Arianpoo and V C M Leung ldquoElectricity theftdetection in AMI using customersrsquo consumption patternsrdquoIEEE Transactions on Smart Grid vol 7 no 1 pp 216ndash2262016

[27] Y Wang Q Chen D Gan J Yang D S Kirschen andC Kang ldquoDeep learning-based socio-demographic in-formation identification from smart meter datafication fromsmart meter datardquo IEEE Transactions on Smart Grid vol 10no 3 pp 2593ndash2602 2019

[28] V Chandola A Banerjee and V Kumar ldquoAnomaly de-tection a surveyrdquo ACM Computing Surveys vol 41 no 32009

[29] H-T Cheng L Koc J Harmsen et al ldquoWide amp deep learningfor recommender systemsrdquo in Proceedings of the 1stWorkshopon Deep Learning for Recommender SystemsmdashDLRS 2016pp 7ndash10 Boston MA USA September 2016

[30] E Feczko N M Balba O Miranda-Dominguez et alldquoSubtyping cognitive profiles in autism spectrum disorderusing a functional random forest algorithmfiles in autismspectrum disorder using a functional random forest algo-rithmrdquo NeuroImage vol 172 pp 674ndash688 2018

[31] Y-L Boureau J Ponce and Y LeCun ldquoA theoretical analysisof feature pooling in visual recognitionrdquo in Proceedings of the27th International Conference on Machine Learningpp 111ndash118 Haifa Israel June 2010

[32] N Srivastava G Hinton A Krizhevsky I Sutskever andR Salakhutdinov ldquoDropout a simply way to prevent neuralnetworks from overfittingrdquo Ce Journal of Machine LearningResearch vol 15 no 1 pp 1929ndash1958 2014

[33] M Abadi A Agarwal P Barham et al ldquoTensorflow Large-scale machine learning on heterogeneous distributed sys-temsrdquo 2016 httpsarxivorgabs16030446

[34] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 no 4 pp 2825ndash2830 2011

[35] J R Schofield ldquoLow carbon london project data from thedynamic time-of-use electricity pricing trialrdquo 2017 httpsdiscoverukdataserviceacukcataloguesn7857

12 Journal of Electrical and Computer Engineering

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 11: ElectricityTheftDetectioninPowerGridswithDeep ...downloads.hindawi.com/journals/jece/2019/4136874.pdf(1)Datacleaning:thereareerroneousvalues(e.g.,outliers) in raw data, which correspond

based on the train dataset -e performances of all themodels on the test dataset are shown in Figures 12 and 13

Figures 12 and 13 show that the performances of CNN-SVM CNN-GBDT and CNN-RF are better than those ofother models on the whole Since they are deep learningmodels the distinguishable features can be learned directlyfrom raw smart meter data by the CNN which has equippedthem with better generalization ability In addition theCNN-RF model has achieved the best performance since theRF can handle high-dimensional data while maintaininghigh computational efficiency

7 Conclusions

In this paper a novel CNN-RF model is presented to detectelectricity theft In this model the CNN is similar to anautomatic feature extractor in investigating smart meter dataand the RF is the output classifier Because a large number ofparameters must be optimized that increase the risk ofoverfitting a fully connected layer with a dropout rate of 04is designed during the training phase In addition the SMOTalgorithm is adopted to overcome the problem of dataimbalance Some machine learning and deep learningmethods such as SVM RF GBDT and LR are applied to thesame problem as a benchmark and all those methods havebeen conducted on SEAI and LCL datasets -e resultsindicate that the proposed CNN-RF model is quite apromising classification method in the electricity theft de-tection field because of two properties -e first is thatfeatures can be automatically extracted by the hybrid modelwhile the success of most other traditional classifiers relieslargely on the retrieval of good hand-designed featureswhich is a laborious and time-consuming task -e secondlies in that the hybrid model combines the advantages of theRF and CNN as both are the most popular and successfulclassifiers in the electricity theft detection field

Since the detection of electricity theft affects the privacyof consumers the future work will focus on investigatinghow the granularity and duration of smart meter data mightaffect this privacy Extending the proposed hybrid CNN-RFmodel to other applications (eg load forecasting) is a taskworth investigating

Data Availability

Previously reported [SEAI][LCL] datasets were used tosupport this study and are available at [httpwwwucdieissdadata][httpsbetaukdataserviceacukdatacataloguestudiesstudyid=7857] -ese prior studies (and datasets)are cited at relevant places within the text as references[19 26]

Conflicts of Interest

-e authors declare that there are no conflicts of interest

Acknowledgments

-is work was supported by the National Key Research andDevelopment Program of China (2016YFB0901900) the

Natural Science Foundation of Hebei Province of China(F2017501107) Open Research Fund from the State KeyLaboratory of Rolling and Automation Northeastern Uni-versity (2017RALKFKT003) the Foundation of Northeast-ern University at Qinhuangdao (XNB201803) and theFundamental Research Funds for the Central Universities(N182303037)

References

[1] S S S R Depuru L Wang and V Devabhaktuni ldquoElectricitytheft overview issues prevention and a smart meter basedapproach to control theftrdquo Energy Policy vol 39 no 2pp 1007ndash1015 2011

[2] J P Navani N K Sharma and S Sapra ldquoTechnical and non-technical losses in power system and its economic conse-quence in Indian economyrdquo International Journal of Elec-tronics and Computer Science Engineering vol 1 no 2pp 757ndash761 2012

[3] S McLaughlin B Holbert A Fawaz R Berthier andS Zonouz ldquoA multi-sensor energy theft detection frameworkfor advanced metering infrastructuresrdquo IEEE Journal on Se-lected Areas in Communications vol 31 no 7 pp 1319ndash13302013

[4] P McDaniel and S McLaughlin ldquoSecurity and privacychallenges in the smart gridrdquo IEEE Security amp PrivacyMagazine vol 7 no 3 pp 75ndash77 2009

[5] T B Smith ldquoElectricity theft a comparative analysisrdquo EnergyPolicy vol 32 no 1 pp 2067ndash2076 2004

[6] J I Guerrero C Leon I Monedero F Biscarri andJ Biscarri ldquoImproving knowledge-based systems with sta-tistical techniques text mining and neural networks for non-technical loss detectionrdquo Knowledge-Based Systems vol 71no 4 pp 376ndash388 2014

[7] C C O Ramos A N Souza G Chiachia A X Falcatildeo andJ P Papa ldquoA novel algorithm for feature selection usingharmony search and its application for non-technical lossesdetectionrdquo Computers amp Electrical Engineering vol 37 no 6pp 886ndash894 2011

[8] P Glauner J A Meira P Valtchev R State and F Bettingerldquo-e challenge of non-technical loss detection using artificialintelligence a surveyficial intelligence a surveyrdquo In-ternational Journal of Computational Intelligence Systemsvol 10 no 1 pp 760ndash775 2017

[9] S-C Huang Y-L Lo and C-N Lu ldquoNon-technical lossdetection using state estimation and analysis of variancerdquoIEEE Transactions on Power Systems vol 28 no 3pp 2959ndash2966 2013

[10] O Rahmati H R Pourghasemi and A M Melesse ldquoAp-plication of GIS-based data driven random forest and max-imum entropy models for groundwater potential mapping acase study at Mehran region Iranrdquo CATENA vol 137pp 360ndash372 2016

[11] N Edison A C Aranha and J Coelho ldquoProbabilisticmethodology for technical and non-technical losses estima-tion in distribution systemrdquo Electric Power Systems Researchvol 97 no 11 pp 93ndash99 2013

[12] J B Leite and J R S Mantovani ldquoDetecting and locating non-technical losses in modern distribution networksrdquo IEEETransactions on Smart Grid vol 9 no 2 pp 1023ndash1032 2018

[13] S Amin G A Schwartz A A Cardenas and S S SastryldquoGame-theoretic models of electricity theft detection in smartutility networks providing new capabilities with advanced

Journal of Electrical and Computer Engineering 11

metering infrastructurerdquo IEEE Control Systems Magazinevol 35 no 1 pp 66ndash81 2015

[14] A H Nizar Z Y Dong Y Wang and A N Souza ldquoPowerutility nontechnical loss analysis with extreme learning ma-chine methodrdquo IEEE Transactions on Power Systems vol 23no 3 pp 946ndash955 2008

[15] J Nagi K S Yap S K Tiong S K Ahmed andMMohamadldquoNontechnical loss detection for metered customers in powerutility using support vector machinesrdquo IEEE Transactions onPower Delivery vol 25 no 2 pp 1162ndash1171 2010

[16] E W S Angelos O R Saavedra O A C Cortes andA N de Souza ldquoDetection and identification of abnormalitiesin customer consumptions in power distribution systemsrdquoIEEE Transactions on Power Delivery vol 26 no 4pp 2436ndash2442 2011

[17] K A P Costa L A M Pereira R Y M NakamuraC R Pereira J P Papa and A Xavier Falcatildeo ldquoA nature-inspired approach to speed up optimum-path forest clusteringand its application to intrusion detection in computer net-worksrdquo Information Sciences vol 294 pp 95ndash108 2015

[18] R R Bhat R D Trevizan X Li and A Bretas ldquoIdentifyingnontechnical power loss via spatial and temporal deeplearningrdquo in Proceedings of the IEEE International Conferenceon Machine Learning and Applications Anaheim CA USADecember 2016

[19] M Ismail M Shahin M F Shaaban E Serpedin andK Qaraqe ldquoEfficient detection of electricity theft cyber attacksin ami networksrdquo in Proceedings of the IEEE Wireless Com-munications and Networking Conference Barcelona SpainApril 2018

[20] RMehrizi X Peng X Xu S Zhang DMetaxas and K Li ldquoAcomputer vision based method for 3D posture estimation ofsymmetrical liftingrdquo Journal of Biomechanics vol 69 no 1pp 40ndash46 2018

[21] K He X Zhang S Ren and J Sun ldquoDelving deep intorectifiers surpassing human-level performance on imagenetclassificationrdquo in Proceedings of the IEEE Conference onComputer for Sustainable Global Development pp 1026ndash1034New Delhi India March 2015

[22] A Sharif Razavian H Azizpour J Sullivan and S CarlssonldquoCNN features off-the-shelf an astounding baseline for rec-ognitionrdquo Astrophysical Journal Supplement Series vol 221no 1 pp 806ndash813 2014

[23] Z Zheng Y Yang X Niu H-N Dai and Y Zhou ldquoWide anddeep convolutional neural networks for electricity-theft de-tection to secure smart gridsrdquo IEEE Transactions on IndustrialInformatics vol 14 no 4 pp 1606ndash1615 2018

[24] D Svozil V Kvasnicka and J Pospichal ldquoIntroduction tomulti-layer feed-forward neural networksrdquo Chemometricsand Intelligent Laboratory Systems vol 39 no 1 pp 43ndash621997

[25] Irish social science data archive httpwwwucdieissdadatacommissionforenegryregulationcer

[26] P Jokar N Arianpoo and V C M Leung ldquoElectricity theftdetection in AMI using customersrsquo consumption patternsrdquoIEEE Transactions on Smart Grid vol 7 no 1 pp 216ndash2262016

[27] Y Wang Q Chen D Gan J Yang D S Kirschen andC Kang ldquoDeep learning-based socio-demographic in-formation identification from smart meter datafication fromsmart meter datardquo IEEE Transactions on Smart Grid vol 10no 3 pp 2593ndash2602 2019

[28] V Chandola A Banerjee and V Kumar ldquoAnomaly de-tection a surveyrdquo ACM Computing Surveys vol 41 no 32009

[29] H-T Cheng L Koc J Harmsen et al ldquoWide amp deep learningfor recommender systemsrdquo in Proceedings of the 1stWorkshopon Deep Learning for Recommender SystemsmdashDLRS 2016pp 7ndash10 Boston MA USA September 2016

[30] E Feczko N M Balba O Miranda-Dominguez et alldquoSubtyping cognitive profiles in autism spectrum disorderusing a functional random forest algorithmfiles in autismspectrum disorder using a functional random forest algo-rithmrdquo NeuroImage vol 172 pp 674ndash688 2018

[31] Y-L Boureau J Ponce and Y LeCun ldquoA theoretical analysisof feature pooling in visual recognitionrdquo in Proceedings of the27th International Conference on Machine Learningpp 111ndash118 Haifa Israel June 2010

[32] N Srivastava G Hinton A Krizhevsky I Sutskever andR Salakhutdinov ldquoDropout a simply way to prevent neuralnetworks from overfittingrdquo Ce Journal of Machine LearningResearch vol 15 no 1 pp 1929ndash1958 2014

[33] M Abadi A Agarwal P Barham et al ldquoTensorflow Large-scale machine learning on heterogeneous distributed sys-temsrdquo 2016 httpsarxivorgabs16030446

[34] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 no 4 pp 2825ndash2830 2011

[35] J R Schofield ldquoLow carbon london project data from thedynamic time-of-use electricity pricing trialrdquo 2017 httpsdiscoverukdataserviceacukcataloguesn7857

12 Journal of Electrical and Computer Engineering

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 12: ElectricityTheftDetectioninPowerGridswithDeep ...downloads.hindawi.com/journals/jece/2019/4136874.pdf(1)Datacleaning:thereareerroneousvalues(e.g.,outliers) in raw data, which correspond

metering infrastructurerdquo IEEE Control Systems Magazinevol 35 no 1 pp 66ndash81 2015

[14] A H Nizar Z Y Dong Y Wang and A N Souza ldquoPowerutility nontechnical loss analysis with extreme learning ma-chine methodrdquo IEEE Transactions on Power Systems vol 23no 3 pp 946ndash955 2008

[15] J Nagi K S Yap S K Tiong S K Ahmed andMMohamadldquoNontechnical loss detection for metered customers in powerutility using support vector machinesrdquo IEEE Transactions onPower Delivery vol 25 no 2 pp 1162ndash1171 2010

[16] E W S Angelos O R Saavedra O A C Cortes andA N de Souza ldquoDetection and identification of abnormalitiesin customer consumptions in power distribution systemsrdquoIEEE Transactions on Power Delivery vol 26 no 4pp 2436ndash2442 2011

[17] K A P Costa L A M Pereira R Y M NakamuraC R Pereira J P Papa and A Xavier Falcatildeo ldquoA nature-inspired approach to speed up optimum-path forest clusteringand its application to intrusion detection in computer net-worksrdquo Information Sciences vol 294 pp 95ndash108 2015

[18] R R Bhat R D Trevizan X Li and A Bretas ldquoIdentifyingnontechnical power loss via spatial and temporal deeplearningrdquo in Proceedings of the IEEE International Conferenceon Machine Learning and Applications Anaheim CA USADecember 2016

[19] M Ismail M Shahin M F Shaaban E Serpedin andK Qaraqe ldquoEfficient detection of electricity theft cyber attacksin ami networksrdquo in Proceedings of the IEEE Wireless Com-munications and Networking Conference Barcelona SpainApril 2018

[20] RMehrizi X Peng X Xu S Zhang DMetaxas and K Li ldquoAcomputer vision based method for 3D posture estimation ofsymmetrical liftingrdquo Journal of Biomechanics vol 69 no 1pp 40ndash46 2018

[21] K He X Zhang S Ren and J Sun ldquoDelving deep intorectifiers surpassing human-level performance on imagenetclassificationrdquo in Proceedings of the IEEE Conference onComputer for Sustainable Global Development pp 1026ndash1034New Delhi India March 2015

[22] A Sharif Razavian H Azizpour J Sullivan and S CarlssonldquoCNN features off-the-shelf an astounding baseline for rec-ognitionrdquo Astrophysical Journal Supplement Series vol 221no 1 pp 806ndash813 2014

[23] Z Zheng Y Yang X Niu H-N Dai and Y Zhou ldquoWide anddeep convolutional neural networks for electricity-theft de-tection to secure smart gridsrdquo IEEE Transactions on IndustrialInformatics vol 14 no 4 pp 1606ndash1615 2018

[24] D Svozil V Kvasnicka and J Pospichal ldquoIntroduction tomulti-layer feed-forward neural networksrdquo Chemometricsand Intelligent Laboratory Systems vol 39 no 1 pp 43ndash621997

[25] Irish social science data archive httpwwwucdieissdadatacommissionforenegryregulationcer

[26] P Jokar N Arianpoo and V C M Leung ldquoElectricity theftdetection in AMI using customersrsquo consumption patternsrdquoIEEE Transactions on Smart Grid vol 7 no 1 pp 216ndash2262016

[27] Y Wang Q Chen D Gan J Yang D S Kirschen andC Kang ldquoDeep learning-based socio-demographic in-formation identification from smart meter datafication fromsmart meter datardquo IEEE Transactions on Smart Grid vol 10no 3 pp 2593ndash2602 2019

[28] V Chandola A Banerjee and V Kumar ldquoAnomaly de-tection a surveyrdquo ACM Computing Surveys vol 41 no 32009

[29] H-T Cheng L Koc J Harmsen et al ldquoWide amp deep learningfor recommender systemsrdquo in Proceedings of the 1stWorkshopon Deep Learning for Recommender SystemsmdashDLRS 2016pp 7ndash10 Boston MA USA September 2016

[30] E Feczko N M Balba O Miranda-Dominguez et alldquoSubtyping cognitive profiles in autism spectrum disorderusing a functional random forest algorithmfiles in autismspectrum disorder using a functional random forest algo-rithmrdquo NeuroImage vol 172 pp 674ndash688 2018

[31] Y-L Boureau J Ponce and Y LeCun ldquoA theoretical analysisof feature pooling in visual recognitionrdquo in Proceedings of the27th International Conference on Machine Learningpp 111ndash118 Haifa Israel June 2010

[32] N Srivastava G Hinton A Krizhevsky I Sutskever andR Salakhutdinov ldquoDropout a simply way to prevent neuralnetworks from overfittingrdquo Ce Journal of Machine LearningResearch vol 15 no 1 pp 1929ndash1958 2014

[33] M Abadi A Agarwal P Barham et al ldquoTensorflow Large-scale machine learning on heterogeneous distributed sys-temsrdquo 2016 httpsarxivorgabs16030446

[34] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 no 4 pp 2825ndash2830 2011

[35] J R Schofield ldquoLow carbon london project data from thedynamic time-of-use electricity pricing trialrdquo 2017 httpsdiscoverukdataserviceacukcataloguesn7857

12 Journal of Electrical and Computer Engineering

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 13: ElectricityTheftDetectioninPowerGridswithDeep ...downloads.hindawi.com/journals/jece/2019/4136874.pdf(1)Datacleaning:thereareerroneousvalues(e.g.,outliers) in raw data, which correspond

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom