AUTOMATIC CLASSIFICATION OF DEFECTIVE PHOTOVOLTAIC MODULE CELLS …noiselab.ucsd.edu/ECE228/projects/Report/41Report.pdf · 2020-06-14 · Photovoltaic (PV) power is generated when

AUTOMATIC CLASSIFICATION OF DEFECTIVE PHOTOVOLTAIC MODULE CELLS INELECTROLUMINESCENCE IMAGES

Haenara Shin, Jaeyoung Kang, Onat Gungor

University of California San Diego, La Jolla, CA 92093-0238

Index Terms— convolutional neural network, multi-classclassification, transfer learning, photovoltaics, electrolumi-nescence imaging

1. INTRODUCTION

Photovoltaic (PV) power is generated when PV cell (i.e. solarcell) converts sunlight into electricity. As the industrial-levelof PV cell, mono- and multi-crystalline silicon solar cells aretaking the highest market share (over 97%) [1]. In producingsolar cells, invisible microcracks or defects in the Si wafer arecommon during process steps. Since PV modules are madeby series connections of PV cells, defects in cells are pointedout as a major cause of module output degradation. The mainproblem in PV module inspection is that it is a challengingtask and requires trained experts [2]. Even experts may fail todetect defects since some of these defects are not visible. Es-pecially, these defects diminish the efficiency of a PV module,which causes the high production cost for PV companies. Al-though there are a variety of imaging methods developed forthe inspection of PV modules (e.g. infrared), electrolumines-cence (EL) imaging seems the most reasonable option due toits high resolution compared to other imaging methods. Nev-ertheless, it is burdensome to analyze these images visually,and an automated classification method is necessary. In thisproject, we propose an automated classification strategy us-ing mainstream multi-class classification methods (e.g. Sup-port Vector Machines (SVM) and Random Forest (RF)) andconvolutional neural network (CNN) based methods. GivenEL images, our model can predict defection probability classwhich can take 4 different values:{0, 0.33, 0.67, 1}.

2. RELATED WORK

Deitsch et al. [2] solve this problem by proposing an auto-mated method for classifying defects in EL images. Specifi-cally, they use SVMs and CNN to perform the classification.For SVM, they obtained the best results using KAZE/VGGfeatures; and for CNN, they got the best results using trans-fer learning. They achieved an average accuracy of 82.44%and 88.42%. Based on this work, Akram et al. [3] enhancedthe accuracy up to 93.02% with less computational power andtime by using Light CNN. In addition, Tang et al. [4] devel-ops and trains their CNN model, and compared a deep learn-ing model, VGG16, which is commonly applied for image

Fig. 1: Example EL Image

classification. The accuracy of the proposed CNN method ineach defect type is slightly higher than VGG16, and the meanprocess time is much diminished. Chen et al. [5] propose avisual defect detection method based on multi-spectral CNN.They fundamentally adjust the depth and width of the selectedinitial CNN model and obtain the optimal CNN. They reach94.30% accuracy level for defect detection in solar cells. Ku-rukuru et al. [6] work on fault detection in PV modules. Theypropose thermography and machine learning-based PV mod-ule fault classification. Particularly, they implement an arti-ficial neural network (NN) classifier where they reach at thetest accuracy of 91.7%.

3. DATASET AND FEATURES

We use the public dataset1 of solar cells extracted fromhigh resolution EL images of mono-crystalline and multi-crystalline PV modules [7, 8]. The dataset consists of 2,624solar cell images at a resolution of 300×300 pixels originallyextracted from 44 different PV modules, where 18 modulesare of mono-crystalline type, and 26 are of a multi-crystallinetype. We share an example EL image in Figure 1. The datasetis split into training set (75%) and test set (25%).

For feature extraction, we simplify the process mentionedin [2]. Our adapted classification framework can be seen inFigure 2. According to this framework, the feature extrac-tion phase is composed of two consecutive steps, namely key-point detection and feature description. Some of the proposedmethods for performing these operations are summarized inTable 1. We select KAZE [9] since this method is most accu-rate out of 4 models and allows for both key-point detection

1https://github.com/zae-bayern/elpv-dataset

Fig. 2: Classification Framework

Table 1: Used Methods for Keypoint Detectors and FeatureDesciptors

Method Keypoint Detector Feature DescriptorKAZE[9] + +

AGAST[12] + -HOG[13] - +

PHOW[14] - +

and feature description at the same time2. Its key-point de-tection algorithm is very similar to SIFT [10] and as featuredescriptor, it uses SURF [11]. We run KAZE for all EL im-ages and have written extracted features to a CSV file. Wealso share an example of KAZE output in Figure 3. After weobtain features coming from KAZE, we combine these fea-tures with PV cell type (mono or multi). As our label, weselect the defect probability class. Table 2 demonstrates thefeatures and labels for our classification algorithms. We ob-tain the first four columns (Time Scale Space, Time Detector,Time Descriptor, Number of Keypoints) using KAZE.

Fig. 3: Kaze Output Example

4. METHODS

4.1. Multi-class ClassificationWe use multi-class classification algorithms to determine thedefect probability class of a solar cell using Python’s scikit-learn library [15]. For all classifiers, our labels are defectprobability class of solar cells (0, 0.33, 0.66, 1) and featuresare obtained from KAZE (see Table 2). We run a grid searchfor all algorithms to find their optimal hyperparameters. Ac-cordingly, we implement the following classifiers:Support Vector Machine (SVM)[16]: SVM aims to find thedecision boundary to separate different classes. While find-ing this decision boundary (i.e. hyperplane), it maximizes

2https://github.com/pablofdezalc/kaze

the margin which is defined as the perpendicular distance be-tween the hyperplane and the closest elements from classes.Random Forest (RF)[17]: This model generates multiple de-cision trees by using different parts of the training dataset andaggregates their predictions in order to obtain a more accurateand stable prediction.Logistic Regression (LR)[18]: This discriminative model di-rectly calculates posterior probability. The classifier’s formu-lation is quite similar to linear regression. By using the sig-moid (logistic) function, it converts the output of the linearequation of the model into a class variable.Stochastic Gradient Descent (SGD) [19]: SGD is a dis-criminative model which optimizes a differentiable objective(loss) function. As opposed to gradient descent, it consid-ers only one random point while changing weights instead ofwhole training data.K-Nearest Neighbors (kNN) [20]: This classifier considersk most similar training instances to the given instance anddetermines the class of that instance by employing a majorityvoting on the class of its k-nearest neighbors.

4.2. CNN-based methodsVanilla CNN: CNN is a powerful method to extract featuresas automatically and multi-class classification. Before jump-ing into the state-of-the-art models, we evaluate the feasibilityof the Vanilla CNN model and the the dataset quality, to setthe starting point and unveil the latent issues.

Simple Vanilla CNN model and Light CNN [4] are testedwith the data augmentation of ImageDataGenerator importedfrom Tensorflow 2 framework [21] and the data generatedthrough looping the data augmentation. Vanilla CNN modelhas 2 convolutional layers with ReLU activation, dropout,and max-pooling layers. Two dense layers having ReLU ac-tivation and dropout layers are used as the fully-connectedlayers. Light CNN architecture has more filter sizes, 2 moreconvolutional layers, and each convolutional layer havingReLU activation, L2-regularizer, and batch normalizationlayers to alleviate the overfitting.

Transfer Learning: Using the aforementioned CNN-basedarchitecture, a large number of epochs is required to get rea-sonable accuracy. (see Section 5.2) Also, data augmenta-tion might not be sufficient to overcome the limited datasetproblem. Therefore, to accelerate the training process, weapply transfer learning from a pre-trained model based onImageNet[22]. Note that we use the same data augmentationmethod as the Vanilla CNN model.

Following the architecture of [2], we design the classi-fier to the single Global Average Pooling (GAP) layer, twoDense Layer (4096 and 2048 neurons, respectively) and oneDense Layer, as shown in Figure 4. Here, the GAP layeris applied since we are to make the network that uses pre-trained weights from ImageNet (224× 224) compatible withour dataset’s image size (300× 300). Using this classifier, we

Table 2: Features Used for Classification

Time Scale Space Time Detector Time Descriptor Number of Keypoints Type Defect probability32.58 25.74 0.71 16 mono 144.47 31.79 0.36 7 mono 043.44 29.28 1.21 27 mono 0.3336.35 31.77 10.95 250 poly 0.67

GAP

1x1x4096

1x1x2048

1x1x4

Fig. 4: The Architecture of Proposed Classifier (Red: GAPlayer, Blue and Green: Dense Layer)

replace the bottleneck layer of NN models. Also, since Ima-geNet is an RGB image and most of the existing NN modeldesigned considering RGB image input, we replicate our grayscale image on each RGB channel.

Once we replace the bottleneck layers of NN models,we perform two-stage fine-tuning. In the first stage, we useAdam optimizer with the learning rate (η) of 10−3, and in thesecond stage, we use SGD with η = 5 × 10−4. Note that weuse early stopping and trained with a low learning rate to pre-vent overfitting problem in the transfer learning method. [23]

Depending on the last dense layer, we can get two typesof result: (1) the dense layer with linear activation can predictdefect probability of the image mean squared error as a lossfunction required), (2) the dense layer with softmax activa-tion can classify the image directly into the probability classes(categorical cross-entropy as a loss function required). Notethat with the linear activation, we can obtain the raw numberof probability and map it to the nearest probability class.

5. EXPERIMENT RESULTS AND DISCUSSION

5.1. Multi-class ClassificationAfter grid based hyperparameter search, we select the follow-ing parameters for our classifiers:SVM: C (regularization parameter): 10, gamma (kernel coef-ficient): 1, kernel: rbfRF: max depth: 110, max features:5, min samples leaf: 6.min samples split: 14, number of estimators: 100LR: penalty (regularization): l2, dual: FalseSGD: loss function: hinge, penalty (regularization): l2kNN: number of neighbors: 5, leaf size: 30.

Table 3 demonstrates the classifier accuracy for the defectprobability class based on the selected hyperparameters. Al-though values are close to the baseline paper [2], results showthat features retrieved from KAZE are not highly discrimi-native. Hence, there is a room for further improvement for

Table 3: Classifier Accuracy Values

Classifier Accuracy (%)SVM 64RF 64LR 63

SGD 63kNN 62

Fig. 5: Data augmentation. Left: Original image; Right: Aug-mented image.

feature extraction which may bring higher classifier accuracy.

5.2. CNN-based methodsAs a proof-of-concept, we implemented our NN model usingTensorflow 2 framework [21].

Vanilla CNN: The multi-class classification accuracy ofVanilla CNN was only obtained at 73% with overfitting al-though each layer has dropout. This was worsened at theaccuracy of 60% in Light CNN that is with L2-regularizerand batch normalization. Therefore, we moved our focus onthe dataset quality that the training set is processed throughthe data augmentation of ImageDataGenerator. In detail, werotated the training images as 3 degrees, width, and heightshift as 2% of the original (Figure 5). As similar with otherresearch group saying that data augmentation can give moreinformation to the training model that leads to obtaining thebetter performance in NN models, the result is impressivelyenhanced, as shown in Figure 6, even including the simpleVanilla CNN model at the accuracy of 82% in 100 epochs.

Preliminary test results implies that the data augmentationgives much more information on the NN model to have betteraccuracy without increasing total size of dataset. Therefore,we should move the state-of-the-art models with augmentedtraining images and transfer learning method.

Transfer Learning: As mentioned in Section 4.2, we re-placed bottleneck layers of VGG19 [24], MobileNet V2 [25],Inception V3 [26] and ResNet50 [27] to our classifier. Note

Fig. 6: Impact of Data Augmentation on Accuracy in VanillaCNN and Light CNN models.

(a) (b)

Fig. 7: Confusion Matrix. (a) 4-Class (b) Binary

that we used the exactly same configuration. (i.e., the numberof feature maps, strides)

The model that predicts the defect probability of theimage (regression) showed 56% of accuracy at best (usingVGG19), which is the poor result compared to the previousvanilla CNN model.

Also, we implemented the model that classifies the im-age directly into four probability classes. Compared to thevanilla CNN models, transfer learning-based methods signif-icantly lowered the required epochs to converge the accuracy,by three orders of magnitude. This implies that this can cur-tail the training speed of defect detection and make this modelfeasible for the real PV cell manufacturing process.

Table 4 shows classification accuracy results on variousNN models. When we simply replaced the bottleneck layersof each model, the performance was approximately 70%.This is because the characteristics of our dataset, especiallythe mean and the variance is not similar to the ImageNetdataset. Therefore, we made the model adjust the meanand the variance on each layer, by unfreezing the batch nor-malization layer of the original NN models and forced tolearn the statistics of our dataset. Among the various mod-els, ResNet50 based transfer learning with unfrozen batchnormalization shows the best accuracy, which is 92%.

From the successful result in ResNet50 with modification,

Table 4: CNN-based Classification Results

Models Accuracy (%) Required EpochsFirstTune

SecondTune

FirstTune

SecondTune

Vanilla CNN 86 - 40000 -VGG19 85 85 18 11

MobileNet V2 72 72 23 6Inception V3 73 74 22 11

ResNet50 72 72 12 10ResNet50 withmodification3 89 92 47 13

Table 5: Number of images of each class

Class (Defect probability) 0 0.33 0.67 1Size (Image numbers) 1508 295 106 715

the satisfactory confusion matrix was also expected, but Fig-ure 7(a) shows that it shows poor performance as the 4-classclassifier. Here, precision, recall and F1 score equivalently re-sulted 0.80. While reviewing the project from the scratch, werecently found a problem at the data that is highly imbalancedas shown in Table (5). Conversely, because the class size ofdefect probability 0 and 1 are much dominant over 0.33 and0.67 class, the model was trained like as a binary-classifierhaving the confusion matrix of Figure 7(b). Note that whenwe treat this as binary classifier, precision, recall and F1 scoreresulted 0.93, 0.96, and 0.94, respectively.

6. CONCLUSION AND FUTURE WORK

In this work, we propose machine learning-based automatedclassification strategies for defect detection of the PV cell us-ing EL images. First, we use mainstream classification meth-ods, which shows up to 64% accuracy with SVM and RF. Inaddition, we implemented CNN-based NN models. As com-pared to the vanilla CNN models, the transfer learning-basedNN models highly decreased the required epochs to accuracyto converge. Our experimental result shows that ResNet50with unfrozen batch normalization performed the best with92% accuracy. However, the result in confusion matrix showsthat models does not function as a 4-class classifier, but workas a binary-class classifier. The main reason is that the datasetis highly imbalanced to two classes, 0 and 1. Because we fo-cused only on the data augmentation, the number of imagesin each class was not properly considered.

As a future work, we should focus on the data generationto both 0.33 and 0.67 classes at least 7 times. Also, the bottle-neck layers will be tunned with more powerful methods toprevent from overfitting. Those work will be give the betterresult to us with enriching the feature extraction process aswell as adapt more sophisticated CNN models.

3Unfrozen batch normalization layer

7. REFERENCES

[1] S. Mohagheghi M. Choobineh. A multi-objective opti-mization framework for energy and asset managementin an industrial microgrid. Journal of Cleaner Produc-tion, 139:1326–1338, 2016.

[2] Sergiu Deitsch, Vincent Christlein, Stephan Berger,Claudia Buerhop-Lutz, Andreas Maier, Florian Gall-witz, and Christian Riess. Automatic classification ofdefective photovoltaic module cells in electrolumines-cence images. Solar Energy, 185:455–468, 2019.

[3] Jin Y. Chen X. Zhu C. Zhao X. Khaliq A. Faheem M.Ahmad A. Akram M.W., Li G. Cnn based automaticdetection of photovoltaic cell defects in electrolumines-cence images. Energy, 2019.

[4] W. Yan W. Tang, Q. Yang. Deep learning based modelfor defect detection of mono-crystalline-si solar pv mod-ule cells in electroluminescence images using data aug-mentation. 2019 IEEE PES Asia-Pacific Power and En-ergy Engineering Conference, 2019.

[5] Haiyong Chen, Yue Pang, Qidi Hu, and Kun Liu. So-lar cell surface defect inspection based on multispec-tral convolutional neural network. Journal of IntelligentManufacturing, pages 1–16, 2018.

[6] VS Bharath Kurukuru, Ahteshamul Haque, Mo-hammed Ali Khan, and Arun Kumar Tripathy. Faultclassification for photovoltaic modules using thermog-raphy and machine learning techniques. In 2019 Inter-national Conference on Computer and Information Sci-ences (ICCIS), pages 1–6. IEEE, 2019.

[7] A. Maier F. Gallwitz S. Berger B. Doll J. Hauch C.Camus C. J. Brabec C. Buerhop-Lutz, S. Deitsch. Abenchmark for visual identification of defective solarcells in electroluminescence imagery. 35th EuropeanPV Solar Energy Conference and Exhibition 2018, page1287–1289, 2018.

[8] Maier A. Gallwitz F. Riess C. Deitsch S., Buerhop-lutz C. Segmentation of photovoltaic module cells inelectroluminescence images. arXiv preprint, 2018.

[9] Pablo Fernandez Alcantarilla, Adrien Bartoli, and An-drew J Davison. Kaze features. In European Conferenceon Computer Vision, pages 214–227. Springer, 2012.

[10] David G Lowe. Object recognition from local scale-invariant features. In Proceedings of the seventh IEEEinternational conference on computer vision, volume 2,pages 1150–1157. Ieee, 1999.

[11] Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. Surf:Speeded up robust features. In European conference oncomputer vision, pages 404–417. Springer, 2006.

[12] Elmar Mair, Gregory D Hager, Darius Burschka,Michael Suppa, and Gerhard Hirzinger. Adaptive andgeneric corner detection based on the accelerated seg-ment test. In European conference on Computer vision,pages 183–196. Springer, 2010.

[13] Navneet Dalal and Bill Triggs. Histograms of orientedgradients for human detection. In 2005 IEEE com-puter society conference on computer vision and pat-tern recognition (CVPR’05), volume 1, pages 886–893.IEEE, 2005.

[14] Anna Bosch, Andrew Zisserman, and Xavier Munoz.Image classification using random forests and ferns. In2007 IEEE 11th international conference on computervision, pages 1–8. Ieee, 2007.

[15] Fabian Pedregosa, Gael Varoquaux, Alexandre Gram-fort, Vincent Michel, Bertrand Thirion, Olivier Grisel,Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vin-cent Dubourg, et al. Scikit-learn: Machine learningin python. the Journal of machine Learning research,12:2825–2830, 2011.

[16] Johan AK Suykens and Joos Vandewalle. Least squaressupport vector machine classifiers. Neural processingletters, 9(3):293–300, 1999.

[17] Andy Liaw, Matthew Wiener, et al. Classification andregression by randomforest. R news, 2(3):18–22, 2002.

[18] David G Kleinbaum, K Dietz, M Gail, Mitchel Klein,and Mitchell Klein. Logistic regression. Springer, 2002.

[19] Shun-ichi Amari. Backpropagation and stochastic gradi-ent descent method. Neurocomputing, 5(4-5):185–196,1993.

[20] Leif E Peterson. K-nearest neighbor. Scholarpedia,4(2):1883, 2009.

[21] Martın Abadi, Paul Barham, Jianmin Chen, ZhifengChen, Andy Davis, Jeffrey Dean, Matthieu Devin, San-jay Ghemawat, Geoffrey Irving, Michael Isard, et al.Tensorflow: A system for large-scale machine learning.In 12th USENIX Symposium on Operating Systems De-sign and Implementation (OSDI ’16), pages 265–283,2016.

[22] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, andL. Fei-Fei. ImageNet: A Large-Scale Hierarchical Im-age Database. In CVPR09, 2009.

[23] Simon Kornblith, Jonathon Shlens, and Quoc V. Le.Do better ImageNet models transfer better? In 2019IEEE/CVF Conference on Computer Vision and PatternRecognition (CVPR). IEEE, June 2019.

[24] Karen Simonyan and Andrew Zisserman. Very deepconvolutional networks for large-scale image recogni-tion, 2014.

[25] Mark Sandler, Andrew Howard, Menglong Zhu, An-drey Zhmoginov, and Liang-Chieh Chen. MobileNetV2:Inverted residuals and linear bottlenecks. In 2018IEEE/CVF Conference on Computer Vision and PatternRecognition. IEEE, June 2018.

[26] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe,Jonathon Shlens, and Zbigniew Wojna. Rethinking theinception architecture for computer vision, 2015.

[27] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and JianSun. Deep residual learning for image recognition. In2016 IEEE Conference on Computer Vision and PatternRecognition (CVPR). IEEE, June 2016.

[28] Nitish Shirish Keskar and Richard Socher. Improvinggeneralization performance by switching from adam tosgd, 2017.

8. INDIVIDUAL CONTRIBUTIONS

Each group member completed the following tasks:

1. Haenara: Implemented preliminary tests using VanillaCNN. Analyzed result and suggest possible improve-ments regarding data preprocessing. Edited presenta-tion and video files.

2. Jaeyoung: Implemented the data augmentation stage,NN models (regression model) in the baseline paperand popular CNN-based models with transfer learning.

3. Onat: Feature extraction and implementation of the se-lected classification algorithms. Developed NN modelsand implement proposed data preprocessing strategies

9. REPLY TO REVIEWS

Critical review from group 25:Question: General questions related to specific details

about classification (e.g. what are labels?)Our response: About the classification, we provide de-

tailed explanation at Introduction. Specifically, in our classi-fication task, we have 4 different labels to be predicted. Theselabels represent the probability of defection.

Question: SGD is not a classifier, do you mean anothermodel that uses SGD to optimize?

Our response: SGD can be used as either optimizer andclassifier. In fact, SGD classifier is a linear classifier whichimplements regularized linear models with stochastic gradientdescent (SGD) learning.

Question: It seems you changed both optimizer and learn-ing at the same time. Did you try various learning rates withan unchanged optimizer and the other way?

Our response: Yes, but hyperparameter and optimizerconfiguration stated in the report showed the best perfor-mance when we performed hyperparameter optimization.Also, according to Keskar et al.[28], switching Adam to SGDoptimizer showed better generalization performance.

Question: In the literature review, there is a baseline work.What is the baseline performance?

Our response: The CNN classifier of our baseline workshows the average accuracy of 88.42%, and 82.44% for SVM,for 100 epochs with data augmentation and hyperparametertuning.

Critical review from group 87:Question: It is obvious that CNN outperforms multi-class

classification. If the accuracy of multi-class classification ishigh, do you think this may work in some cases? If you im-plement other feather extraction method, will the accuracy ofmulti-class classification increase?

Our response: According to selected baseline paper, theauthors combine variety of feature extraction methods. Theysolely utilize SVM after encoding these features into a globalfeature descriptor. We simplify the process, yet still obtainclose accuracy values to the baseline paper.

Question: Why did you choose to replace bottleneck layerinstead of other layers in transfer learning?

Our response: Features in the bottleneck layer containmore generality compared to top layers. Since the objectiveis to make the model fit our dataset, we replaced the bottle-neck layer. Not only the bottleneck layer, but we also unfrozebatch normalization layers of NN models.

Critical review from group 90:Question: For the models for multi-class classification,

you mentioned five different classifiers and later a table ofthe accuracy was presented. Apart from the feature extractionchoice, what would be the possible reasons for the unsatisfy-ing results? And since the accuracy of these models are very

close, how can we tell that SVM and RF are better than therest of multi-class classification classifiers?

Our response: The classifier accuracy values are close toour baseline paper although we simplify the feature extractionprocess. According to the classifier accuracy values, we canconclude that SVM and RF are better than other algorithms.

Question: It wasn’t exactly clear to me what the defectionprobability class is and how the four classes are determined.The introduction part said ‘1’ means defective for sure, and‘0’ is not. Are the other two values picked by you or justa convention thing? More information about the defectionprobability class would be appreciated.

Our response: About classification, we provide detailedexplanation at Introduction. Specifically, in our classificationtask, we have 4 different labels to be predicted. These labelsrepresent the probability of defection.

Documents

AUTOMATIC CLASSIFICATION OF DEFECTIVE PHOTOVOLTAIC MODULE CELLS …noiselab.ucsd.edu/ECE228/projects/Report/41Report.pdf · 2020-06-14 · Photovoltaic (PV) power is generated when