Upload
vikramank-singh
View
245
Download
0
Embed Size (px)
Citation preview
Brain Informatics Using Deep Learning
Final Research Report
Cognitive Science and Deep Learning Research Intern
Institute of Nuclear Medicine and Allied Sciences (INMAS) Defence Research and Development Organisation (DRDO)
Ministry of Defence, Govt. of India
Student: Guide: Vikramank Singh Sushil Chandra Computer Engineering, Scientist ‘F’ VES Institute of Technology, Head, Bio-Medical Engineering Department University of Mumbai INMAS, DRDO
CERTIFICATE
This is to certify that the project entitled ‘Brain Informatics Using Deep Learning’ is the bonafide work of Vikramank Singh conducted in the Biomedical Engineering Department of the Institute of Nuclear Medicine and Allied Sciences, DRDO, Delhi under the supervision and guidance of Mr. Sushil Chandra, Scientist ‘F’.
Sh. Sushil Chandra Scientist ‘F’ Biomedical Engg. Department INMAS (DRDO)
1
ACKNOWLEDGEMENT I hereby take this opportunity to express my sincere gratitude to all the people who have contributed with their knowledge and experience in aiding me with my project. It would have been quite a difficult task for me to complete this work. I am thankful to Mr. Sushil Chandra, Scientist ‘F’ & Head B.M.E. Deptt. , INMAS (DRDO) for coordinating this training and giving me an invaluable opportunity to work in a competitive yet amicable atmosphere and providing me with all the facilities and paraphernalia required to carry out this project. His profound knowledge and understanding provided me with an entirely new perspective on my project. It was always a new and unique experience working with him. I would like to express my gratitude towards Mrs. Greeshma Sharma my project monitor for her worthwhile suggestions and fruitful help and also for all the knowledge she imparted to me during the course of time. Finally I would like to express my deep appreciation to my family and friends who have been a constant source of inspiration. I am eternally grateful to them for always encouraging and being with me whereever and whenever I needed them.
2
About the Organization
Defense Research and Development Organization (DRDO) DRDO was formed in 1958 from the amalgamation of the then already functioning Technical Development Establishment (TDEs) of the Indian Army and the Directorate of Technical Development & Production (DTDP) with the Defense Science Organization (DSO). DRDO was then a small organization with 10 establishments or laboratories. Over the years, it has grown multi-directionally in terms of the variety of subject disciplines, number of laboratories, achievements. Today, DRDO is a network of more than 50 laboratories which are deeply engaged in developing defense technologies covering various disciplines, like aeronautics, armaments, electronics, combat vehicles, engineering systems, instrumentation, missiles, advanced computing and simulation, special materials, naval systems, life sciences, training, information systems and agriculture. Presently, the Organization is backed by over 5000 scientists and about 25,000 other scientific, technical and supporting personnel. Several major projects for the development of missiles, armaments, light combat aircrafts, radars, electronic warfare systems etc. are on hand and significant achievements have already been made in several such technologies. Institute of Nuclear Medicine and Allied Sciences (INMAS) At the instance of Pandit Jawaharlal Nehru, the first Prime Minister of India, a Radiation Cell was established in 1956 at Defence Science Laboratory, Delhi.The initial assignment was to undertake a study on the consequences of the use of nuclear and other weapons of mass destruction. But it was soon realized that nuclear energy can also be harnessed for the good of the mankind. Radioisotopes could find peaceful medical applications. The scope of work was, therefore, enlarged and the cell upgraded to Radiation Medicine Division in 1959. As awareness increased, so did the work and a full-fledged establishment was created in 1961 and named Institute of Nuclear Medicine and Allied Sciences. Since then it has traversed a long way, carrying out R&D and providing service as a model of excellence in various aspects of Nuclear Medicine and Allied Sciences. The activities of the Institute have proliferated enormously over the years. Its areas of activity have been diversified to cover many fields of radiation and bio-medical sciences.
3
Vision The Vision of INMAS has been identified as to be a centre of excellence in biomedical and clinical research with special reference to ionizing radiation. Mission The Mission of INMAS is clinical research in nuclear medicine and non-invasive imaging methods with a focus on biological radio-protectors and thyroid disorders. Basic Background and Theory Project Background Institute of Nuclear Medicine and Allied Sciences (INMAS), a wing of Defence Research and Development Organization (DRDO) is currently in the third year of it four-year project “Cognition Enhancement using Non-Invasive Interventions”. This project would not only benefit the training regimen for defence personnel as it would enhance their reasoning, attention, planning, decision making, memory and sensory input processing abilities, but would also contribute to the treatment of cognitive disorders like ADD and ADHD, executive disfunctioning in stroke patients, autism and cognitive skill degradation due to natural ageing.
Fig 1: Research at BME, INMAS
Brain Informatics Using Deep Learning
Final Research Report 4th February 2016
1. Abstract Electroencephalography (EEG) technology has gained growing popularity in
various applications. In this report we propose a deep learning based automated system which can classify the workload into 3 categories - High, Medium and Low using the Electroencephalographic signals (EEG) acquired by an inexpensive EEG device (Emotiv EEG). Workload is a critical factor influencing the performance of an individual in any field ranging from Research, corporate job to Army personels. In this study, a 14 channel EEG was used to acquire the brain signals while the subjects were given some tasks to perform which were divided based on the workload they can cause on an individual. The then acquired signals were passed through various deep learning algorithms as training sets. The trained deep learning models were then used for classification of workload on an individual by just acquiring the EEG signals of that individual and pass them through those models. Keywords: Deep Learning, Artificial Neural Networks, Radial Basis Function, Support Vector Machines (SVM), Stacked Autoencoders, Linear Discriminant Analysis (LDA), EEG, EEG Feature Extraction
2. Introduction In this research work we made use of five deep learning algorithms to train and then compare the results of each of the algorithms to figure out which algorithm best suited our results. The Emotiv EEG machine was used to gather the 14 channel data. Since, the Electroencephalographic data is found to contain a lot of noise and other disturbing elements which if directly fed into the algorithms as the training data can bring out aberrant results. Hence, the acquired EEG data was then treated with various digital signal processing techniques to filter out the noise and other elements and try to make the signal as pure as possible. Various noise reduction filter were applied to eliminate the noise from the data as far as possible. The filtered data was then passed through butterworth filter in order to perform feature extraction of EEG signals. The Alpha, Beta, Gamma,
Delta and Theta Features were extracted from the EEG signals based on their frequencies. These features of EEG were then used as the input training sets to train the various deep learning algorithm. The five deep learning algorithms used were - Artificial Neural Networks (ANNs), Support Vector Machines (SVM), Radial Basis Function (RBF), Linear Discriminant Analysis (LDAs) and Stacked Autoencoders. We will go through each and every algorithm below in detail. Once the models were trained and the classification was performed, the next step in the study was to discern any correlation between various features of the EEG signals in case of all the three load cases. We also calculated the significant difference between various features in case of each workload condition using various statistical methods. 2.1 Artificial Neural Networks The first deep learning model that we made use of was the Artificial Neural Network. We developed a deep neural network consisting of 1 hidden layer with 8 hidden neurons. The input to the network were the 14 channel EEG signals and thus the input layer consisted of 14 neurons. The output that we wanted was a classifier which could classify, on the basis of EEG signals, the workload in 3 categories and hence the output layer consisted of 3 neurons. The below figure shows how the artificial neural network appeared visually.
Fig 1 - Deep Neural Network (14, 8, 3)
The 14 input neurons represent the 14 EEG channels - AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8 and AF4. The 3 output neurons represent the BL (Base Line) i.e no workload, LWL (Low Workload) and HWL (High Workload). We made use of R programming to perform the entire research work and the above shown neural network was also coded in R. We made use of Resilient Backpropagation technique (+Rprop) to train the deep neural net. In order to train the deep neural network, we first needed to normalize the entire input data set. We made use of normalize function available in the RSNNS package on the CRAN server for R programming. The testing data was also normalized before being fed into the network for testing. The obtained classification output thus was in a normalized form and we had to denormalize the output using the denormalization function available in the same package mentioned above. The denormalized values thus obtained were the actual values which represented whether the workload is Base, Low or High. The data that we had was of 10 students which we further divided in a ratio of 8:2 which would then be used for training : testing. We trained the neural net with the EEG data of 8 students and then tested the deep net with the data of 2 students. The input / training data which we fed into the neural net was as shown below.
The output set of the 14 channel EEG signals was transformed into a binary matrix format where the 3 columns are in a format (1,0,0) which signify that for each pair of signal it can only be any one of the 3 cases. Hence, when the output of the neural net was denormalized using the denormalization function, the output of 3 neurons where in the same format (0,1,0) which was satisfied by the input data set. 2.2 Support Vector Machines Support Vector Machines are based on the concept of decision planes that define decision boundaries. A decision plane is one that separates between a set of objects having different class memberships. Classification tasks based on drawing separating lines to distinguish between objects of different class memberships are known as hyperplane classifiers. Support Vector Machines are particularly suited to handle such tasks. The illustration below shows the basic idea behind Support Vector Machines. Here we see the original objects (left side of the schematic) mapped, i.e., rearranged, using a set of mathematical functions, known as kernels. The process of rearranging the objects is known as mapping (transformation). Note that in this new setting, the mapped objects (right side of the schematic) is linearly separable and, thus, instead of constructing the complex curve (left schematic), all we have to do is to find an optimal line that can separate the GREEN and the RED objects.
In our case also, we made use of SVM as one of the classification models to classify the workloads. We made use of the Kernel function in the SVM for the classification. In R programming, the SVM was used where the kernel type was “Radial”. The output of the SVM was pretty much accurate like that of the ANN.
The same dataset was used to train the SVM which was used to train the Artificial Neural Network. 2.3 Stacked Autoencoders (SDAs)
A stacked autoencoder is a neural network consisting of multiple layers of sparse autoencoders in which the outputs of each layer is wired to the inputs of the successive layer. Formally, consider a stacked autoencoder with n layers. Using notation from the autoencoder section, let W(k,1),W(k,2),b(k,1),b(k,2)denote the parameters W(1),W(2),b(1),b(2) for kth autoencoder. Then the encoding step for the stacked autoencoder is given by running the encoding step of each layer in forward order:
The decoding step is given by running the decoding stack of each autoencoder in reverse order:
The information of interest is contained within a(n), which is the activation of the deepest layer of hidden units. This vector gives us a representation of the input in terms of higher-order features.
A good way to obtain good parameters for a stacked autoencoder is to use greedy layer-wise training. To do this, first train the first layer on raw input to obtain parametersW(1,1),W(1,2),b(1,1),b(1,2). Use the first layer to transform the raw input into a vector consisting of activation of the hidden units, A. Train the second layer on this vector to obtain parameters W(2,1),W(2,2),b(2,1),b(2,2). Repeat for subsequent layers, using the output of each layer as input for the subsequent layer.
This method trains the parameters of each layer individually while freezing parameters for the remainder of the model. To produce better results, after this phase of training is complete, fine-tuning using backpropagation can be used to improve the results by tuning the parameters of all layers are changed at the same time.
A stacked autoencoder enjoys all the benefits of any deep network of greater expressive power.
Further, it often captures a useful "hierarchical grouping" or "part-whole decomposition" of the input. To see this, recall that an autoencoder tends to learn features that form a good representation of its input. The first layer of a stacked autoencoder tends to learn first-order features in the raw input (such as edges in
an image). The second layer of a stacked autoencoder tends to learn second-order features corresponding to patterns in the appearance of first-order features (e.g., in terms of what edges tend to occur together--for example, to form contour or corner detectors). Higher layers of the stacked autoencoder tend to learn even higher-order features.
The training and testing process of a Stacked Autoencoder was pretty much the same as that of the ANN. Initially, the training dataset was normalized and then fed into the neural net. The output thus obtained was in a normalized form and was necessary to de normalize the output to get it into a conducive form. The output however of a SDA was not that accurate when compared to that of ANN and SVM. 2.4 Radial Basis Function (RBF) In the field of mathematical modeling, a radial basis function network is an artificial neural network that uses radial basis functions as activation functions. The output of the network is a linear combination of radial basis functions of the inputs and neuron parameters.
Radial basis function (RBF) networks typically have three layers: an input layer, a hidden layer with a non-linear RBF activation function and a linear output layer.
The input can be modeled as a vector of real numbers . The output of the
network is then a scalar function of the input vector, , and is given by
RBF networks are typically trained by a two-step algorithm. In the first step, the center vectors of the RBF functions in the hidden layer are chosen. This step can be performed in several ways; centers can be randomly sampled from some set of examples, or they can be determined using k-means clustering. Note that this step is unsupervised. A third backpropagation step can be performed to fine-tune all of the RBF net's parameters.[3]
The second step simply fits a linear model with coefficients to the hidden layer's outputs with respect to some objective function. A common objective function, at least for regression/function estimation, is the least squares function:
where
.
We have explicitly included the dependence on the weights. Minimization of the least squares objective function by optimal choice of weights optimizes accuracy of fit.
There are occasions in which multiple objectives, such as smoothness as well as accuracy, must be optimized. In that case it is useful to optimize a regularized objective function such as
where
and
where optimization of S maximizes smoothness and is known as a regularization parameter.
In our case, the weighted- SSE plot v/s Iterations shows a gradual reduction thus indicating a positive sign, however some disturbances in between shows that the model is still not an ideal one.
The above diagram shows the image of the SSE v/s iteration plot along with the result being shown at the top. 2.5 Linear Discriminant Analysis (LDA)
Linear discriminant analysis (LDA) is a generalization of Fisher's linear discriminant, a method used in statistics, pattern recognition and machine learning to find a linear combination of features that characterizes or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier, or, more commonly, for dimensionality reduction before later classification.
LDA is closely related to analysis of variance (ANOVA) and regression analysis, which also attempt to express one dependent variable as a linear combination of other features or measurements. However, ANOVA uses categorical independent variables and a continuous dependent variable, whereas discriminant analysis has continuous independent variables and a categorical dependent variable (i.e. the class label).[3] Logistic regression and probit regression are more similar to LDA than ANOVA is, as they also explain a categorical variable by the values of continuous independent variables. These other methods are preferable in applications where it is not reasonable to assume that the independent variables are normally distributed, which is a fundamental assumption of the LDA method.
LDA is also closely related to principal component analysis (PCA) and factor analysis in that they both look for linear combinations of variables which best explain the data. LDA explicitly attempts to model the difference between the classes of data. PCA on the other hand does not take into account any difference in class, and factor analysis builds the feature combinations based on differences rather than similarities. Discriminant analysis is also different from factor analysis in that it is not an interdependence technique: a distinction between independent variables and dependent variables (also called criterion variables) must be made.
In the case where there are more than two classes, the analysis used in the derivation of the Fisher discriminant can be extended to find a subspace which appears to contain all of the class variability. This generalization is due to C. R.
Rao. Suppose that each of C classes has a mean and the same covariance . Then the scatter between class variability may be defined by the sample covariance of the class means
where is the mean of the class means. The class separation in a direction in this case will be given by
This means that when is an eigenvector of the separation will be equal to the corresponding eigenvalue.
If is diagonalizable, the variability between features will be contained in the subspace spanned by the eigenvectors corresponding to the C − 1 largest
eigenvalues (since is of rank C − 1 at most). These eigenvectors are primarily used in feature reduction, as in PCA. The eigenvectors corresponding to the smaller eigenvalues will tend to be very sensitive to the exact choice of training data, and it is often necessary to use regularisation as described in the next section.
If classification is required, instead of dimension reduction, there are a number of alternative techniques available. For instance, the classes may be partitioned, and a standard Fisher discriminant or LDA used to classify each partition. A common example of this is "one against the rest" where the points from one class are put in one group, and everything else in the other, and then LDA applied. This will result in C classifiers, whose results are combined. Another common method is pairwise classification, where a new classifier is created for each pair of classes (givingC(C − 1)/2 classifiers in total), with the individual classifiers combined to produce a final classification.
The LDA plot for the given training dataset came out to be as below.
3. Correlation and Significance Analysis In this section we perform a statistical analysis over the features of the EEG signals to check whether there exists any significant relationship or correlation between the these components of alpha, beta, gamma and delta of EEG signals. We performed this analysis for each of the workload (Base Line, Low and High) and with a pair of each possible combination to check the relativity. Firstly, we performed the one-way ANOVA (Analysis of Variance) test to calculate any significance difference between the values for each class.
The first table shows the significant difference between the alpha values and the beta values for the Base Line class. The P-value for this is greater than 0.05 and hence we can say that there is a significant difference between the values of
alpha and beta for the Base Line. Similarly, we can calculate the same for each and every class as done above. The next step is performing the correlation analysis. We made use of the Pearson’s Correlation technique and compared the Pearson’s co-efficient to check the positive or negative correlation between these components for all the 3 classes.
The first table shows the correlation between the all the possible combinations of components of EEG for the Base Line class. Thus, we can make significant conclusions from the above tables. 4. Conclusion Thus, we made use of 14 channels of EEG to calculate the workload on any individual using 5 deep learning techniques and at the end made use of various statistical methods to draw inferences from the obtained results. Below is shown a visualization of the channels location on the head surface where we can locate the 14 channels that we had used to train our models. The models that we made use of showed some variances in their results and thus all of them cannot be termed as the best models for the workload classification. The Artificial Neural Networks and the Support Vector Machines were among the best working algorithms for the classification and can be more trusted over the others.
Fig: EEG channel visualization
In our case we had made use of 14 channel EEG device named Emotiv. The machine developed however can be used for all the pair of channels - 14, 128, 256. The UI developed in R shiny is so designed that a drop-down menu can be used to select which kind of data the user is trying to train the machine with. Following is a screenshot of the complete application developed in R shiny -
5. References
1. NEURAL NETWORK CLASSIFICATION OF EEG SIGNALS BY USING AR WITH MLE PREPROCESSING FOR EPILEPTIC SEIZURE DETECTION Abdulhamit Subasia , M. Kemal Kiymika*, Ahmet Alkana , Etem Koklukayab a Department of Electrical and Electronics Engineering, Kahramanmaraş Sütçü İmam University, 46100 Kahramanmaraş, Turkey. b Department of Electrical and Electronics Engineering, Sakarya University 54187 Sakarya, Turkey.
2. CLASSIFYING MENTAL ACTIVITIES FROM EEG-P300 SIGNALS USING
ADAPTIVE NEURAL NETWORKS, Arjon Turnip and Keum-Shik Hong.
3. Epileptic EEG detection using neural networks and post-classification L.M. Patnaik a,∗, Ohil K. Manyam
4. Multi-class SVM for EEG Signal Classification Using Wavelet Based
Approximate Entropy, A. S. Muthanantha Murugavel, S. Ramakrishnan
5. Support Vector Machine Technique for EEG Signals P Bhuvaneswari Research Scholar Bharathiar University Coimbatore, J Satheesh Kumar Assistant Professor, Bharathiar University Coimbatore.