4
STACKED SPARSE AUTOENCODER (SSAE) BASED FRAMEWORK FOR NUCLEI PATCH CLASSIFICATION ON BREAST CANCER HISTOPATHOLOGY Jun Xu 1 , Lei Xiang 1 , Renlong Hang 1 , Jianzhong Wu 2 1 Nanjing University of Information Science and Technology, Nanjing 210044, China. 2 Jiangsu Cancer Hospital, Nanjing 210000, China. ABSTRACT In this paper, a Stacked Sparse Autoencoder (SSAE) based framework is presented for nuclei classification on breast cancer histopathology. SSAE works very well in learning useful high-level feature for better representation of input raw data. To show the effectiveness of proposed framework, SSAE+Softmax is compared with conventional Softmax clas- sifier, PCA+Softmax, and single layer Sparse Autoencoder (SAE)+Softmax in classifying the nuclei and non-nuclei patches extracted from breast cancer histopathology. The SSAE+Softmax for nuclei patch classification yields an ac- curacy of 83.7%, F1 score of 82%, and AUC of 0.8992, which outperform Softmax classifier, PCA+Softmax, and SAE+Softmax. Index TermsDeep learning, Sparse Autoencoder, Breast Cancer Histopathology 1. INTRODUCTION With the recent advent and cost-effectiveness of whole-slide digital scanners, tissue histopathology slides can now be dig- itized and stored in digital image form. Digital pathology makes computerized quantitative analysis of histopathology imagery possible. In the context of breast cancer, the size, arrangement, and morphology of nuclei in breast histopathol- ogy are important biomarkers for predicting patient out- come [1]. However, the manual detection of BC nuclei in histopathology is a tedious and time-consuming process that is unfeasible in the clinical setting. Therefore, it is impor- tant to develop efficient method for automatically detecting BC nuclei. Previous approaches to nuclei or cell segmen- tation include region growth, threshold, clustering, level set [2], supervised color-texture based method, watershed based method are not very robust to the highly variable shapes and sizes of BC nuclei, as well as artifacts in the histological fixing, staining, and digitization processes. In [1], we present an semi-automated nuclear detection scheme based on the Expectation Maximization (EM) algorithm. These previous This work is supported by National Science Foundation of China (No. 61273259) and Six Major Talents Summit of Jiangsu Province (No. 2013- XXRJ-019). Email: [email protected]. works in segmentation or classification of nuclei are mostly based on supervised learning. For histopathological images, it is usually expensive or cost to get enough labeled data for training or learning. On the other hand, as the rapid develop- ment of digitalized pathological technology, it is easy to get a large amount of unlabeled data. Moreover, histopathological is generally high-resolution data. The performance of current supervised-based discriminative model would be greatly im- proved if we can develop a very efficient way to make use of such large unlabeled and highly-structured data. One solution to this problem is to learn a good feature representation to capture a lot of structure from those input unlabeled data. Then the discriminative model works on such new feature space for subsequent classification of desired objects. Recently, significant progress has been made on learning representation of images from the pixel (or low) level feature in order to identify high-level feature in images. These high- level feature are often learned in hierarchical representation using large amounts of unlabeled data. Deep learning is such an hierarchical learning approach to learn high-level feature from raw pixel level intensity which is sufficiently useful for differentiating different objects by a classifier. Deep learning has been shown great accomplishments in vision and learn- ing since the first deep autoencoder network was proposed by Hinton et al. in [3]. It has been caused great attention by researchers from both industry and academia. Recently, a deep max-pooling convolutional neural networks is presented for detecting mitosis in breast histological images [4]. Sim- ilar work from this team won the ICPR 2012 mitosis detec- tion competition. Inspired by these works, in this paper, we present a Stacked Sparse Autoencoder (SSAE) framework for nuclei classification on breast histopathology. 2. METHOD Autoencoder is an unsupervised feature learning algorithm which aims to develop better feature representation of input high-dimensional data by finding the correlation among the data. Basically, an auto-encoder is simply a multi-layer feed- forward neural network trained to represent the input with back-propagation. By applying back-propagation, the autoen- coder tries to decrease the discrepancy as much as possible 978-1-4673-1961-4/14/$31.00 ©2014 IEEE 999

Stacked Sparse Autoencoder (SSAE) based Framework for ...or.nsfc.gov.cn/bitstream/00001903-5/150390/1/1000006955751.pdf · CLASSIFICATION ON BREAST CANCER HISTOPATHOLOGY Jun Xu 1,

Embed Size (px)

Citation preview

STACKED SPARSE AUTOENCODER (SSAE) BASED FRAMEWORK FOR NUCLEI PATCHCLASSIFICATION ON BREAST CANCER HISTOPATHOLOGY

Jun Xu1, Lei Xiang1, Renlong Hang1, Jianzhong Wu2

1 Nanjing University of Information Science and Technology, Nanjing 210044, China.2 Jiangsu Cancer Hospital, Nanjing 210000, China.

ABSTRACT

In this paper, a Stacked Sparse Autoencoder (SSAE) basedframework is presented for nuclei classification on breastcancer histopathology. SSAE works very well in learninguseful high-level feature for better representation of inputraw data. To show the effectiveness of proposed framework,SSAE+Softmax is compared with conventional Softmax clas-sifier, PCA+Softmax, and single layer Sparse Autoencoder(SAE)+Softmax in classifying the nuclei and non-nucleipatches extracted from breast cancer histopathology. TheSSAE+Softmax for nuclei patch classification yields an ac-curacy of 83.7%, F1 score of 82%, and AUC of 0.8992,which outperform Softmax classifier, PCA+Softmax, andSAE+Softmax.

Index Terms— Deep learning, Sparse Autoencoder,Breast Cancer Histopathology

1. INTRODUCTION

With the recent advent and cost-effectiveness of whole-slidedigital scanners, tissue histopathology slides can now be dig-itized and stored in digital image form. Digital pathologymakes computerized quantitative analysis of histopathologyimagery possible. In the context of breast cancer, the size,arrangement, and morphology of nuclei in breast histopathol-ogy are important biomarkers for predicting patient out-come [1]. However, the manual detection of BC nuclei inhistopathology is a tedious and time-consuming process thatis unfeasible in the clinical setting. Therefore, it is impor-tant to develop efficient method for automatically detectingBC nuclei. Previous approaches to nuclei or cell segmen-tation include region growth, threshold, clustering, level set[2], supervised color-texture based method, watershed basedmethod are not very robust to the highly variable shapesand sizes of BC nuclei, as well as artifacts in the histologicalfixing, staining, and digitization processes. In [1], we presentan semi-automated nuclear detection scheme based on theExpectation Maximization (EM) algorithm. These previous

This work is supported by National Science Foundation of China (No.61273259) and Six Major Talents Summit of Jiangsu Province (No. 2013-XXRJ-019). Email: [email protected].

works in segmentation or classification of nuclei are mostlybased on supervised learning. For histopathological images,it is usually expensive or cost to get enough labeled data fortraining or learning. On the other hand, as the rapid develop-ment of digitalized pathological technology, it is easy to get alarge amount of unlabeled data. Moreover, histopathologicalis generally high-resolution data. The performance of currentsupervised-based discriminative model would be greatly im-proved if we can develop a very efficient way to make use ofsuch large unlabeled and highly-structured data. One solutionto this problem is to learn a good feature representation tocapture a lot of structure from those input unlabeled data.Then the discriminative model works on such new featurespace for subsequent classification of desired objects.

Recently, significant progress has been made on learningrepresentation of images from the pixel (or low) level featurein order to identify high-level feature in images. These high-level feature are often learned in hierarchical representationusing large amounts of unlabeled data. Deep learning is suchan hierarchical learning approach to learn high-level featurefrom raw pixel level intensity which is sufficiently useful fordifferentiating different objects by a classifier. Deep learninghas been shown great accomplishments in vision and learn-ing since the first deep autoencoder network was proposedby Hinton et al. in [3]. It has been caused great attentionby researchers from both industry and academia. Recently, adeep max-pooling convolutional neural networks is presentedfor detecting mitosis in breast histological images [4]. Sim-ilar work from this team won the ICPR 2012 mitosis detec-tion competition. Inspired by these works, in this paper, wepresent a Stacked Sparse Autoencoder (SSAE) framework fornuclei classification on breast histopathology.

2. METHOD

Autoencoder is an unsupervised feature learning algorithmwhich aims to develop better feature representation of inputhigh-dimensional data by finding the correlation among thedata. Basically, an auto-encoder is simply a multi-layer feed-forward neural network trained to represent the input withback-propagation. By applying back-propagation, the autoen-coder tries to decrease the discrepancy as much as possible

978-1-4673-1961-4/14/$31.00 ©2014 IEEE 999

Fig. 1: The architecture of basic Sparse Autoencoder (SAE)for nuclei classification.

between input and reconstruction by learning a encoder and adecoder (See Figure 1), which yields a set of weights W andbiases b.

2.1. The Basic Sparse Autoencoder[5]

Let X = (x(1), x(2), . . . , x(N))T be the entire training (un-labeled) patches, where x(k) ∈ Rdx , N and dx are the num-ber of training patches and the number of pixels in each patch,respectively. h(l)(k) = (h

(l)1 (k), h

(l)2 (k), . . . , h

(l)dh(k))T de-

notes the learned high-level feature at layer l for the k−thpatch, where dh are the number of hidden units in current lay-er l. Throughout this paper, we use superscript and subscripton a notation to define the hidden layer and unit in this lay-er, respectively. For instance, in Figure 1, h(1)

i represents thei−th unit in the 1st hidden layer. For simplicity, we denote xand h(l) as a input patch and its representation at hidden layerl, respectively.

The architecture of basic Sparse Autoencoder (SAE) isshown in Figure 1. In general, the input layer of the auto-encoder consists in an encoder which transforms input x inthe corresponding representation h, and the hidden layer hcan be seen as a new feature representation of input data. Theoutput layer is effectively a decoder which is trained to re-construct an approximation x̂ of the input from the hiddenrepresentation h. Basically, training an autoencoder is to findoptimal parameters by minimizing the discrepancy betweeninput x and its reconstruction x̂.

This discrepancy is described with a cost function. Thecost function of a Sparse Autoencoder (SAE) comprises threeterms as follows[6]:

LSAE(θ) =1

N

N∑k=1

(L(x(k), dθ̂(eθ̌(x(k))))) + αn∑

j=1

KL(ρ||ρ̂j)

(1)

+ β∥W∥22

The first term is an average sum-of-squares error term whichdescribes the discrepancy between input x(k) and reconstruc-tion x̂(k) over the entire data. Encoder eθ̌(·) maps inputx ∈ Rdx to the hidden representation h ∈ Rdh , which isdefined by h = eθ̌(x) = s(Wx+ bh), where W is a dh × dxweight matrix and bh ∈ Rdh is a bias vector. The encoder isparameterized by θ̌ = (W, bh). Decoder dθ̂(·) maps resultinghidden representation h back into input space x̂. x̂ = dθ̂(h) =s(WTh+bx), where WT is a dx×dh weight matrix and bx ∈Rdx is a bias vector. Here s(·) is activation function which aredefined by Sigmoid logistic function as s(z) = 1

1+exp(−z) ,where z is the pre-activation of a neuron. Hence, the de-coder is parameterized by θ = (WT , bx). The weight matrixWT of the reverse mapping is the transpose of weight matrixW . Therefore, the autoencoder is said to have tie weights,which effectively reduce the number of the parameters of theweight matrix into half. Therefore, the pre-activation of theoutput layers of autoencoder can be written by three parame-ters θ = (W, bh, bx) as y = WT s(Wx + bh) + bx. So, thereconstruction x̂ by decoder can be computed as x̂ = s(y).Basically, training an autoencoder is to find optimal parame-ters θ = (W, bh, bx) simultaneously by minimizing the recon-struction error described in the first term. The cost functionL(·, ·) measures the discrepancy between input x and the re-construction x̂ by decoder dθ̂(·).

In the second term, n is the number of units in hidden lay-er, and the index j is summing over the hidden units in thenetwork. KL(ρ||ρ̂j) is the Kullback-Leibler (KL) divergencebetween ρ̂j , the average activation (averaged over the train-ing set) of hidden unit j, and desired activations ρj which isdefined as ρ log ρ

ρ̂j+ (1− ρ) log 1−ρ

1−ρ̂j.

The third term is a weight decay term which tends to de-crease the magnitude of the weight, and helps prevent overfit-ting. Here

∥W∥22 = tr(WTW ) =

nl∑l=1

sl−1∑i

sl∑j

(w(l)i,j)

2 (2)

where nl is the number of layers and sl is the number of neu-rons in layer l. w

(l)i,j represents the connection between i−th

neuron in layer l− 1 and j−th neuron in layer l. For the SAEstudied in this paper, as Figure 1 shows, the parameters ofSAE are nl = 2, s1 = 1156, s2 = 500.

2.2. Stacked Sparse Autoencoder (SSAE)[5]

The stacked autoencoder is a neural network consisting ofmultiple layers of basic SAE in which the outputs of eachlayer is wired to the inputs of the successive layer. In thispaper, we construct two layers SSAE which consists of twobasic SAE. The architecture of SSAE is shown in Figure 2.For simplicity, we didn’t show the decoder parts of each basicSAE in the figure.

1000

Fig. 2: The architecture of Stacked Sparse Autoencoder (S-SAE) and Softmax Classifier for nuclei classification.

The SSAE yields a function f : Rdx → Rdh(2) that trans-

forms an input raw pixels of a patch to a new feature represen-tation h(2) = f(x) ∈ Rd

h(2) . In the first layer or input layer,the input is the raw pixel intensity of a square patch, which isrepresented as a column vector of pixel intensity whose sizeis 342 × 1. There are dx = 34 × 34 = 1156 input unitsin the input layer. The first and second hidden layers havedh(1) = 500 and dh(2) = 100 hidden units, respectively.

3. DATA SET AND EXPERIMENTAL DESIGN

A total of 37 H & E stained breast histopathology imageswere collected from a cohort of 17 patients and scanned in-to a computer using a high resolution whole slide scanner at20x optical magnification.

3.1. The Generation of Training and Testing Sets

We extract two classes of square patches from the histopathol-ogy images: nuclei and non-nuclei patches. The size of eachpatches is 34 × 34. Each nuclei patch contains one nucleiin its centroid. The non-nuclei patches don’t contain any nu-clei or contain parts of nuclei. In training set, 14421 nucleiand 28032 non-nuclei patches are extracted from 37 H & Estained breast histopathology images. To show the effective-ness of our method, we choose some challenge patches fortesting. There are 2000 nuclei and 2000 non-nuclei patchesin the testing set.

3.2. Training the SSAE

The architecture of SSAE is shown in Figure 2. We employthe greedy layerwise approach for pretraining SSAE by train-ing each layer in turn. After the pre-training, the trained S-SAE will be employed to nuclei and non-nuclei patches clas-sification in testing set.

First, a SAE on the raw inputs x to learn primary featuresh(1)(x) on the raw input by adjusting the weight W(1).

Next, the raw input is fed into this trained sparse autoen-coder, obtaining the primary feature activations h(1)(x) foreach of the input patches x. These primary features are usedas the “raw input” to another sparse autoencoder to learn sec-ondary features h(2)(x) on these primary features.

Following this, the primary features are fed into the sec-ond SAE to obtain the secondary feature activations h(2)(x)for each of the primary features h(1)(x) (which correspondto the primary features of the corresponding input patches x).These secondary features are treated as “raw input” to a soft-max or SVM classifier, training it to map secondary featuresto digit labels.

Finally, all three layers are combined together to form aSSAE with 2 hidden layers and a final softmax or SVM clas-sifier layer capable of classifying the nuclei and non-nucleipatches as desired.

3.3. Experiments and Performance Evaluation

There are two aims in designing the experiments. One is toshow that high-level feature representation with SSAE wouldbe much more useful to a classifier than the intensity of rawpixels. We compare SSAE with raw pixel-based classifi-cation which employ softmax classifier directly to intensityof raw input pixels, where a softmax classifier is learnedfrom training set to classify candidate patches into nuclei andnon-nuclei patches using intensity of raw pixels. In SSAEbased framework, the learned features by SSAE are treatedas “raw input” to Softmax classifier for classification. We useSSAE+Softmax to represent this framework.

The other aim is to show the efficiency of SSAE in rep-resenting high-level features from the raw pixels. PrincipalComponent Analysis (PCA) is also one of the most com-monly used unsupervised learning algorithms for identifyinglow-dimensional subspace of maximal variation within un-labeled data. We compare SSAE with PCA in representinghigh-level features from the raw pixels. To attain these aim-s, the SSAE+Softmax is compared against three methods onclassification of nuclei from non-nuclei patches as follows:(1)Softmax(SM): Softmax classifier on raw pixel classification;(2)PCA+Softmax: high-level features learned by PCA fromraw data are treated as “raw input” to Softmax classifier forclassification; (3)SAE+Softmax: high-level features learnedby SAE from raw data are treated as “raw input” to Softmaxclassifier for classification.

The classification results of different methods on clas-sification of nuclei and non-nuclei patches were evaluat-ed in terms of Precision, Recall, F1 Score, and Accuracywhich are defined as Precision = TP

TP+FP , Recall =TP

TP+FN , F1 Score = 2 × Precision×RecallPrecision+Recall ; Accuracy =

TP+TNTP+TN+FP+FN , respectively. Here TP is the number ofnuclei patches that had been correctly classified based on the

1001

ground truth. FP, TN, and FN are defined in similar ways.

4. RESULTS AND DISCUSSION

Qualitative results of learned high-level features from train-ing patches with SSAE are shown in Figures 3(a) and 3(b).The SSAE model discovers very interesting structure betweendata. The quantitative evaluation on SSAE and comparedmethods are shown in Table 1. The Precision-Recall planeof SSAE and other methods are shown in Figure 4(a). TheReceive Operating Characteristic Curve (ROC) of SSAE andother methods are shown in Figure 4(b). As expected, SSAEyields highest F1 score of 82.0%, AUC of 0.8992, and ac-curacy of 83.7%. These results justify our assumption thatSSAE works well in learning useful high-level feature for bet-ter representation the data. All the experiments were carriedout on a PC (Intel Core(TM) 3.4 GHz processor with 16 GBof RAM). The software implementation was performed usingMATLAB.

Table 1: Precision (Prec), Recall (RC), F1 Score (F1), AreaUnder the Curve (AUC), Accuracy(Acc), and Execution Time(ET) (in minutes (m) or hours (h)) for four different methods.

Method Prec RC F1 AUC Acc ET(%) (%) (%) (%)

SM 71 79 74.5 0.8501 75.3 1 mPCA+SM 89 72 79.6 0.8616 77.5 0.5 mSAE +SM 74 84 78.7 0.8942 80.1 1.1 h

SSAE +SM 76 89 82 0.8992 83.7 2.8 h

(a) (b)

Fig. 3: The learned feature representation of raw pixels withSSAE. Fig 3(a) show the learned feature representation in thefirst hidden layer (with 500 units). The learned high-levelfeature representation in the second hidden layer (with 100units) is shown in Fig 3(b). As expected, Fig 3(a) shows de-tailed boundary features of nuclei and other tissue while Fig3(b) show high-level feature of nuclei.

5. CONCLUDING REMARKS

In this paper, a Stacked Sparse Autoencoder (SSAE) frame-work is proposed for nuclei patches classification on breast

(a) (b)

Fig. 4: The performance of our approach compared to othersin Precision-Recall plane show in Fig 4(a). The ROC curveof our approach and others are shown in Fig 4(b).

cancer histopathology. Most of current classification method-s to this problem are based on supervised learning and pixel-wise classification. The SSAE framework learns high-levelfeature representation of raw data in unsupervised way. Thesehigh-level features enable the classifier to work very efficient-ly on classifying nuclei from non-nuclei patches. The evalu-ation results show SSAE+Softmax outperforms conventionalSoftmax classifier, PCA+Softmax, and SAE+Softmax.

6. REFERENCES

[1] Ajay Basavanhally, Jun Xu, and et al., “Computer-aidedprognosis of er+ breast cancer histopathology and corre-lating survival outcome with oncotype dx assay,” in IS-BI’09. IEEE, 2009, pp. 851–854.

[2] Hussain Fatakdawala, Jun Xu, and et al., “Expectation–maximization-driven geodesic active contour with over-lap resolution (emagacor): Application to lymphocytesegmentation on breast cancer histopathology,” TBME,vol. 57, no. 7, pp. 1676–1689, 2010.

[3] Geoffrey E Hinton and Ruslan R Salakhutdinov, “Re-ducing the dimensionality of data with neural networks,”Science, vol. 313, no. 5786, pp. 504–507, 2006.

[4] Dan C. Ciresan and et al., “Mitosis detection in breastcancer histology images with deep neural networks,”in MICCAI 2013, vol. 8150 of LNCS, pp. 411–418.Springer, 2013.

[5] Andrew Ng, “Sparse autoencoder,” CS294A Lecturenotes, p. 72, 2011.

[6] Y. Bengio and et al., “Representation learning: A reviewand new perspectives,” TPAMI, vol. 35, no. 8, pp. 1798–1828, 2013.

1002