18
Brain Informatics Using Deep Learning Final Research Report Cognitive Science and Deep Learning Research Intern Institute of Nuclear Medicine and Allied Sciences (INMAS) Defence Research and Development Organisation (DRDO) Ministry of Defence, Govt. of India Student: Guide: Vikramank Singh Sushil Chandra Computer Engineering, Scientist ‘F’ VES Institute of Technology, Head, Bio-Medical Engineering Department University of Mumbai INMAS, DRDO

INMAS Final Report

Embed Size (px)

Citation preview

Page 1: INMAS Final Report

  

Brain Informatics Using Deep Learning   

  

Final Research Report   

Cognitive Science and Deep Learning Research Intern 

 Institute of Nuclear Medicine and Allied Sciences (INMAS)  Defence Research and Development Organisation (DRDO) 

Ministry of Defence, Govt. of India    

    

Student:                                                                                                                          Guide: Vikramank Singh                                                                                      Sushil Chandra Computer Engineering,                                                                                        Scientist ‘F’ VES Institute of Technology,                      Head, Bio-Medical Engineering Department   University of Mumbai                                                                                     INMAS, DRDO 

    

   

   

Page 2: INMAS Final Report

CERTIFICATE   

This is to certify that the project entitled ‘Brain Informatics Using Deep                       Learning’ is the bonafide work of Vikramank Singh conducted in the Biomedical                       Engineering Department of the Institute of Nuclear Medicine and Allied Sciences,                     DRDO, Delhi under the supervision and guidance of Mr. Sushil Chandra, Scientist                       ‘F’.         

                   

Sh. Sushil Chandra Scientist ‘F’ Biomedical Engg. Department INMAS (DRDO) 

     

  

 1   

   

Page 3: INMAS Final Report

ACKNOWLEDGEMENT    I hereby take this opportunity to express my sincere gratitude to all the people                           who have contributed with their knowledge and experience in aiding me with my                         project. It would have been quite a difficult task for me to complete this work.  I am thankful to Mr. Sushil Chandra, Scientist ‘F’ & Head B.M.E. Deptt. , INMAS                             (DRDO) for coordinating this training and giving me an invaluable opportunity to                       work in a competitive yet amicable atmosphere and providing me with all the                         facilities and paraphernalia required to carry out this project. His profound                     knowledge and understanding provided me with an entirely new perspective on                     my project. It was always a new and unique experience working with him.  I would like to express my gratitude towards Mrs. Greeshma Sharma my project                         monitor for her worthwhile suggestions and fruitful help and also for all the                         knowledge she imparted to me during the course of time.  Finally I would like to express my deep appreciation to my family and friends                           who have been a constant source of inspiration. I am eternally grateful to them                           for always encouraging and being with me whereever and whenever I needed                       them.                  

2  

 

   

Page 4: INMAS Final Report

  About the Organization 

Defense Research and Development Organization (DRDO) DRDO was formed in 1958 from the amalgamation of the then already                       functioning Technical Development Establishment (TDEs) of the Indian Army and                   the Directorate of Technical Development & Production (DTDP) with the Defense                     Science Organization (DSO). DRDO was then a small organization with 10                     establishments or laboratories. Over the years, it has grown multi-directionally in                     terms of the variety of subject disciplines, number of laboratories, achievements.                     Today, DRDO is a network of more than 50 laboratories which are deeply                         engaged in developing defense technologies covering various disciplines, like                 aeronautics, armaments, electronics, combat vehicles, engineering systems,             instrumentation, missiles, advanced computing and simulation, special materials,               naval systems, life sciences, training, information systems and agriculture.                 Presently, the Organization is backed by over 5000 scientists and about 25,000                       other scientific, technical and supporting personnel. Several major projects for                   the development of missiles, armaments, light combat aircrafts, radars, electronic                   warfare systems etc. are on hand and significant achievements have already                     been made in several such technologies.  Institute of Nuclear Medicine and Allied Sciences (INMAS) At the instance of Pandit Jawaharlal Nehru, the first Prime Minister of India, a                           Radiation Cell was established in 1956 at Defence Science Laboratory, Delhi.The                     initial assignment was to undertake a study on the consequences of the use of                           nuclear and other weapons of mass destruction. But it was soon realized that                         nuclear energy can also be harnessed for the good of the mankind. Radioisotopes could find peaceful medical applications. The scope of work was,                     therefore, enlarged and the cell upgraded to Radiation Medicine Division in 1959.                       As awareness increased, so did the work and a full-fledged establishment was                       created in 1961 and named Institute of Nuclear Medicine and Allied Sciences.                       Since then it has traversed a long way, carrying out R&D and providing service as                             a model of excellence in various aspects of Nuclear Medicine and Allied Sciences.                         The activities of the Institute have proliferated enormously over the years. Its areas of activity have been diversified to cover many fields of radiation and                         bio-medical sciences.  

3  

   

Page 5: INMAS Final Report

 Vision The Vision of INMAS has been identified as to be a centre of excellence in                             biomedical and clinical research with special reference to ionizing radiation.     Mission The Mission of INMAS is clinical research in nuclear medicine and non-invasive                       imaging methods with a focus on biological radio-protectors and thyroid                   disorders.  Basic Background and Theory  Project Background Institute of Nuclear Medicine and Allied Sciences (INMAS), a wing of Defence                       Research and Development Organization (DRDO) is currently in the third year of                       it four-year project “Cognition Enhancement using Non-Invasive Interventions”. This project would not only benefit the training regimen for defence personnel as                         it would enhance their reasoning, attention, planning, decision making, memory                   and sensory input processing abilities, but would also contribute to the treatment                       of cognitive disorders like ADD and ADHD, executive disfunctioning in stroke                     patients, autism and cognitive skill degradation due to natural ageing. 

     Fig 1: Research at BME, INMAS 

   

Page 6: INMAS Final Report

 

Brain Informatics Using Deep Learning  

Final Research Report 4th February 2016 

  

 

1. Abstract    Electroencephalography (EEG) technology has gained growing popularity in                 

various applications. In this report we propose a deep learning based automated                       system which can classify the workload into 3 categories - High, Medium and Low                           using the Electroencephalographic signals (EEG) acquired by an inexpensive EEG                   device (Emotiv EEG). Workload is a critical factor influencing the performance of                       an individual in any field ranging from Research, corporate job to Army personels.                         In this study, a 14 channel EEG was used to acquire the brain signals while the                               subjects were given some tasks to perform which were divided based on the                         workload they can cause on an individual. The then acquired signals were passed                         through various deep learning algorithms as training sets. The trained deep                     learning models were then used for classification of workload on an individual by                         just acquiring the EEG signals of that individual and pass them through those                         models.   Keywords:    Deep Learning, Artificial Neural Networks, Radial Basis Function,                          Support Vector Machines (SVM), Stacked Autoencoders, Linear                         Discriminant Analysis (LDA), EEG, EEG Feature Extraction    

    2. Introduction  In this research work we made use of five deep learning algorithms to train and                             then compare the results of each of the algorithms to figure out which algorithm                           best suited our results. The Emotiv EEG machine was used to gather the 14                           channel data. Since, the Electroencephalographic data is found to contain a lot of                         noise and other disturbing elements which if directly fed into the algorithms as                         the training data can bring out aberrant results. Hence, the acquired EEG data                         was then treated with various digital signal processing techniques to filter out the                         noise and other elements and try to make the signal as pure as possible.   Various noise reduction filter were applied to eliminate the noise from the data                         as far as possible. The filtered data was then passed through butterworth filter in                           order to perform feature extraction of EEG signals. The Alpha, Beta, Gamma,                       

   

Page 7: INMAS Final Report

Delta and Theta Features were extracted from the EEG signals based on their                         frequencies. These features of EEG were then used as the input training sets to                           train the various deep learning algorithm. The five deep learning algorithms                     used were - Artificial Neural Networks (ANNs), Support Vector Machines (SVM),                     Radial Basis Function (RBF), Linear Discriminant Analysis (LDAs) and Stacked                   Autoencoders. We will go through each and every algorithm below in detail.   Once the models were trained and the classification was performed, the next step                         in the study was to discern any correlation between various features of the EEG                           signals in case of all the three load cases. We also calculated the significant                           difference between various features in case of each workload condition using                     various statistical methods.      2.1 Artificial Neural Networks  The first deep learning model that we made use of was the Artificial Neural                           Network. We developed a deep neural network consisting of 1 hidden layer with                         8 hidden neurons. The input to the network were the 14 channel EEG signals and                             thus the input layer consisted of 14 neurons. The output that we wanted was a                             classifier which could classify, on the basis of EEG signals, the workload in 3                           categories and hence the output layer consisted of 3 neurons.   The below figure shows how the artificial neural network appeared visually.   

   

Fig 1 - Deep Neural Network (14, 8, 3)   

   

Page 8: INMAS Final Report

The 14 input neurons represent the 14 EEG channels - AF3, F7, F3, FC5, T7, P7, O1,                                 O2, P8, T8, FC6, F4, F8 and AF4. The 3 output neurons represent the BL (Base                               Line) i.e no workload, LWL (Low Workload) and HWL (High Workload). We                       made use of R programming to perform the entire research work and the above                           shown neural network was also coded in R. We made use of Resilient                         Backpropagation technique (+Rprop) to train the deep neural net.   In order to train the deep neural network, we first needed to normalize the entire                             input data set. We made use of normalize function available in the RSNNS                         package on the CRAN server for R programming. The testing data was also                         normalized before being fed into the network for testing. The obtained                     classification output thus was in a normalized form and we had to denormalize                         the output using the denormalization function available in the same package                     mentioned above. The denormalized values thus obtained were the actual values                     which represented whether the workload is Base, Low or High.   The data that we had was of 10 students which we further divided in a ratio of                                 8:2 which would then be used for training : testing. We trained the neural net                             with the EEG data of 8 students and then tested the deep net with the data of 2                                   students.   The input / training data which we fed into the neural net was as shown below.   

  

   

Page 9: INMAS Final Report

 The output set of the 14 channel EEG signals was transformed into a binary                           matrix format where the 3 columns are in a format (1,0,0) which signify that for                             each pair of signal it can only be any one of the 3 cases. Hence, when the output                                   of the neural net was denormalized using the denormalization function, the                     output of 3 neurons where in the same format (0,1,0) which was satisfied by the                             input data set.      2.2 Support Vector Machines  Support Vector Machines are based on the concept of decision planes that define                         decision boundaries. A decision plane is one that separates between a set of                         objects having different class memberships. Classification tasks based on                 drawing separating lines to distinguish between objects of different class                   memberships are known as hyperplane classifiers. Support Vector Machines are                   particularly suited to handle such tasks.   The illustration below shows the basic idea behind Support Vector Machines.                     Here we see the original objects (left side of the schematic) mapped, i.e.,                         rearranged, using a set of mathematical functions, known as kernels. The process                       of rearranging the objects is known as mapping (transformation). Note that in                       this new setting, the mapped objects (right side of the schematic) is linearly                         separable and, thus, instead of constructing the complex curve (left schematic),                     all we have to do is to find an optimal line that can separate the GREEN and the                                   RED objects. 

 

In our case also, we made use of SVM as one of the classification models to                               classify the workloads. We made use of the Kernel function in the SVM for the                             classification. In R programming, the SVM was used where the kernel type was                         “Radial”. The output of the SVM was pretty much accurate like that of the ANN.                             

   

Page 10: INMAS Final Report

The same dataset was used to train the SVM which was used to train the Artificial                               Neural Network.   2.3 Stacked Autoencoders (SDAs)   

A stacked autoencoder is a neural network consisting of multiple layers of sparse                         autoencoders in which the outputs of each layer is wired to the inputs of the                             successive layer. Formally, consider a stacked autoencoder with n layers. Using                     notation from the autoencoder section, let W(k,1),W(k,2),b(k,1),b(k,2)denote the               parameters W(1),W(2),b(1),b(2) for kth autoencoder. Then the encoding step for the                     stacked autoencoder is given by running the encoding step of each layer in                         forward order: 

 

The decoding step is given by running the decoding stack of each autoencoder in                           reverse order: 

 

The information of interest is contained within a(n), which is the activation of the                           deepest layer of hidden units. This vector gives us a representation of the input in                             terms of higher-order features. 

A good way to obtain good parameters for a stacked autoencoder is to use greedy                             layer-wise training. To do this, first train the first layer on raw input to obtain                             parametersW(1,1),W(1,2),b(1,1),b(1,2). Use the first layer to transform the raw input into                       a vector consisting of activation of the hidden units, A. Train the second layer on                             this vector to obtain parameters W(2,1),W(2,2),b(2,1),b(2,2). Repeat for subsequent                 layers, using the output of each layer as input for the subsequent layer. 

This method trains the parameters of each layer individually while freezing                     parameters for the remainder of the model. To produce better results, after this                         phase of training is complete, fine-tuning using backpropagation can be used to                       improve the results by tuning the parameters of all layers are changed at the                           same time. 

A stacked autoencoder enjoys all the benefits of any deep network of greater                         expressive power. 

Further, it often captures a useful "hierarchical grouping" or "part-whole                   decomposition" of the input. To see this, recall that an autoencoder tends to learn                           features that form a good representation of its input. The first layer of a stacked                             autoencoder tends to learn first-order features in the raw input (such as edges in                           

   

Page 11: INMAS Final Report

an image). The second layer of a stacked autoencoder tends to learn second-order                         features corresponding to patterns in the appearance of first-order features (e.g.,                     in terms of what edges tend to occur together--for example, to form contour or                           corner detectors). Higher layers of the stacked autoencoder tend to learn even                       higher-order features. 

The training and testing process of a Stacked Autoencoder was pretty much the                         same as that of the ANN. Initially, the training dataset was normalized and then                           fed into the neural net. The output thus obtained was in a normalized form and                             was necessary to de normalize the output to get it into a conducive form. The                             output however of a SDA was not that accurate when compared to that of ANN                             and SVM.    2.4  Radial Basis Function (RBF)   In the field of mathematical modeling, a radial basis function network is an                         artificial neural network that uses radial basis functions as activation functions.                     The output of the network is a linear combination of radial basis functions of the                             inputs and neuron parameters.   

Radial basis function (RBF) networks typically have three layers: an input layer, a                         hidden layer with a non-linear RBF activation function and a linear output layer.                         

The input can be modeled as a vector of real numbers . The output of the                               

network is then a scalar function of the input vector, , and is given                           by 

 

RBF networks are typically trained by a two-step algorithm. In the first step, the                           center vectors of the RBF functions in the hidden layer are chosen. This step                             can be performed in several ways; centers can be randomly sampled from some                         set of examples, or they can be determined using k-means clustering. Note that                         this step is unsupervised. A third backpropagation step can be performed to                       fine-tune all of the RBF net's parameters.[3] 

The second step simply fits a linear model with coefficients to the hidden                           layer's outputs with respect to some objective function. A common objective                     function, at least for regression/function estimation, is the least squares function: 

   

Page 12: INMAS Final Report

 

where 

We have explicitly included the dependence on the weights. Minimization of the                       least squares objective function by optimal choice of weights optimizes accuracy                     of fit. 

There are occasions in which multiple objectives, such as smoothness as well as                         accuracy, must be optimized. In that case it is useful to optimize a regularized                           objective function such as 

 

where 

 

and 

 

where optimization of S maximizes smoothness and is known as a                       regularization parameter. 

 In our case, the weighted- SSE plot v/s Iterations shows a gradual reduction thus                           indicating a positive sign, however some disturbances in between shows that the                       model is still not an ideal one.   

 

   

Page 13: INMAS Final Report

The above diagram shows the image of the SSE v/s iteration plot along with the result being shown at the top.      2.5 Linear Discriminant Analysis (LDA)   

Linear discriminant analysis (LDA) is a generalization of Fisher's linear                   discriminant, a method used in statistics, pattern recognition and machine                   learning to find a linear combination of features that characterizes or separates                       two or more classes of objects or events. The resulting combination may be used                           as a linear classifier, or, more commonly, for dimensionality reduction before                     later classification. 

LDA is closely related to analysis of variance (ANOVA) and regression analysis,                       which also attempt to express one dependent variable as a linear combination of                         other features or measurements. However, ANOVA uses categorical independent                 variables and a continuous dependent variable, whereas discriminant analysis                 has continuous independent variables and a categorical dependent variable (i.e.                   the class label).[3] Logistic regression and probit regression are more similar to                       LDA than ANOVA is, as they also explain a categorical variable by the values of                             continuous independent variables. These other methods are preferable in                 applications where it is not reasonable to assume that the independent variables                       are normally distributed, which is a fundamental assumption of the LDA method. 

LDA is also closely related to principal component analysis (PCA) and factor                       analysis in that they both look for linear combinations of variables which best                         explain the data. LDA explicitly attempts to model the difference between the                       classes of data. PCA on the other hand does not take into account any difference in                               class, and factor analysis builds the feature combinations based on differences                     rather than similarities. Discriminant analysis is also different from factor                   analysis in that it is not an interdependence technique: a distinction between                       independent variables and dependent variables (also called criterion variables)                 must be made. 

 

In the case where there are more than two classes, the analysis used in the                             derivation of the Fisher discriminant can be extended to find a subspace which                         appears to contain all of the class variability. This generalization is due to C. R.                             

Rao. Suppose that each of C classes has a mean and the same covariance .                               Then the scatter between class variability may be defined by the sample                       covariance of the class means 

   

Page 14: INMAS Final Report

 

where is the mean of the class means. The class separation in a direction in                                 this case will be given by 

 

This means that when is an eigenvector of the separation will be equal                             to the corresponding eigenvalue. 

If is diagonalizable, the variability between features will be contained in                       the subspace spanned by the eigenvectors corresponding to the C − 1 largest                         

eigenvalues (since is of rank C − 1 at most). These eigenvectors are primarily                             used in feature reduction, as in PCA. The eigenvectors corresponding to the                       smaller eigenvalues will tend to be very sensitive to the exact choice of training                           data, and it is often necessary to use regularisation as described in the next                           section. 

If classification is required, instead of dimension reduction, there are a number of                         alternative techniques available. For instance, the classes may be partitioned, and                     a standard Fisher discriminant or LDA used to classify each partition. A common                         example of this is "one against the rest" where the points from one class are put in                                 one group, and everything else in the other, and then LDA applied. This will result                             in C classifiers, whose results are combined. Another common method is pairwise                       classification, where a new classifier is created for each pair of classes (givingC(C                           − 1)/2 classifiers in total), with the individual classifiers combined to produce a                         final classification. 

 

The LDA plot for the given training dataset came out to be as below.  

 

   

Page 15: INMAS Final Report

  

3. Correlation and Significance Analysis  In this section we perform a statistical analysis over the features of the EEG                           signals to check whether there exists any significant relationship or correlation                     between the these components of alpha, beta, gamma and delta of EEG signals.                         We performed this analysis for each of the workload (Base Line, Low and High)                           and with a pair of each possible combination to check the relativity.   Firstly, we performed the one-way ANOVA (Analysis of Variance) test to calculate                       any significance difference between the values for each class.   

  The first table shows the significant difference between the alpha values and the                         beta values for the Base Line class. The P-value for this is greater than 0.05 and                               hence we can say that there is a significant difference between the values of                           

   

Page 16: INMAS Final Report

alpha and beta for the Base Line. Similarly, we can calculate the same for each                             and every class as done above.   The next step is performing the correlation analysis. We made use of the                         Pearson’s Correlation technique and compared the Pearson’s co-efficient to check                   the positive or negative correlation between these components for all the 3                       classes.   

  The first table shows the correlation between the all the possible combinations                       of components of EEG for the Base Line class. Thus, we can make significant                           conclusions from the above tables.    4. Conclusion  Thus, we made use of 14 channels of EEG to calculate the workload on any                             individual using 5 deep learning techniques and at the end made use of various                           statistical methods to draw inferences from the obtained results. Below is shown                       a visualization of the channels location on the head surface where we can locate                           the 14 channels that we had used to train our models.   The models that we made use of showed some variances in their results and thus                             all of them cannot be termed as the best models for the workload classification.                           The Artificial Neural Networks and the Support Vector Machines were among the                       best working algorithms for the classification and can be more trusted over the                         others.  

   

Page 17: INMAS Final Report

  

Fig: EEG channel visualization   

In our case we had made use of 14 channel EEG device named Emotiv. The                             machine developed however can be used for all the pair of channels - 14, 128,                             256. The UI developed in R shiny is so designed that a drop-down menu can be                               used to select which kind of data the user is trying to train the machine with.   Following is a  screenshot of the complete application developed in R shiny -    

   

   

Page 18: INMAS Final Report

5. References  

1. NEURAL NETWORK CLASSIFICATION OF EEG SIGNALS BY USING AR                 WITH MLE PREPROCESSING FOR EPILEPTIC SEIZURE DETECTION             Abdulhamit Subasia , M. Kemal Kiymika*, Ahmet Alkana , Etem                   Koklukayab a Department of Electrical and Electronics Engineering,               Kahramanmaraş Sütçü İmam University, 46100 Kahramanmaraş, Turkey.             b Department of Electrical and Electronics Engineering, Sakarya               University 54187 Sakarya, Turkey. 

 2. CLASSIFYING MENTAL ACTIVITIES FROM EEG-P300 SIGNALS USING             

ADAPTIVE NEURAL NETWORKS, Arjon Turnip and Keum-Shik Hong.  

3. Epileptic EEG detection using neural networks and post-classification L.M.                 Patnaik a,∗, Ohil K. Manyam 

 4. Multi-class SVM for EEG Signal Classification Using Wavelet Based                 

Approximate Entropy, A. S. Muthanantha Murugavel, S. Ramakrishnan  

5. Support Vector Machine Technique for EEG Signals P Bhuvaneswari                 Research Scholar Bharathiar University Coimbatore, J Satheesh Kumar               Assistant Professor, Bharathiar University Coimbatore.