[IEEE 2014 IEEE Region 10 Symposium - Kuala Lumpur, Malaysia (2014.4.14-2014.4.16)] 2014 IEEE REGION 10 SYMPOSIUM - A method to reduce curvelet coefficients for mammogram classification

A method to reduce curvelet coefficients for mammogram classification

Mohamed Meselhy Eltoukhy1,3 3Faculty of Computers and Informatics,

Computer Science Department, Suez Canal University Ismailia 41522, Egypt.

Syed Jamal Safdar Gardezi1,2, Ibrahima Faye1, 2

1Centre for Intelligent Signals and Imaging Research, 2Department of Fundamental and Applied Sciences,

Universiti Teknologi PETRONAS, Malaysia.

Abstract— This paper presents a method for classification of normal and abnormal tissues in mammograms using curvelet transform. The curvelet coefficients are represented into certain groups of coefficients, independently. Some statistical features are calculated for each group of coefficients. These statistical features are combined with features extracted from the mammogram image itself. To improve the classification rate, feature ranking method is applied to select the most significant features. The classification results of support vector machine (SVM) using 10-fold cross validation are presented. The classification results show that the ranked features improved the classification rate up to 85.48% with group of 200 coefficients.

Keywords— Curvelet transform, Feature Selection, Mammogram Classification, Statistical Features.

I. INTRODUCTION Breast cancer is one of the most common types of cancer

diagnosed in women around the world. More than 11% of women are diagnosed for breast cancer annually. Malaysian estimates report that one out of nineteen women suffer from breast cancer during their life time. 64% of the cancer cases were reported among the age group of 40-60 years old. Amongst these women unfortunately nearly 40% of the identified breast cancer cases were already in very advanced stages of the disease [1]. Scientific studies prove that early detection of cancer can reduce the mortality rates considerably. Cancer begins in the cells that make up the tissues which are building blocks. Tissues combine to form the breast and other parts of the body. Normal cells grow and divide to form new cells as required by the body. When the cell grow old, they are replaced by new ones, sometimes the process goes wrong ,cell are formed when not required by the body or they don’t die as a result tumor is formed in the breast[2].

Digital mammography is used to visualize the internal structure of the breast and is one of the most effective tools for early detection of the breast cancer. Cancer cells are characterized as benign and malignant on the basis of their developmental stages. In general breast cancer cells are divided based on geometric shape as microcalcification, masses and architectural distortions. In majority of cases the cancer cell are either microcalcification or masses [3]. Early detection and treatment can reduce the breast cancer mortality rate quiet considerably. Unfortunately, in breast cancer the signs are very subtle and vary in appearance at early stages [3]. It is

difficult for radiologists to give an accurate and uniform evaluation for the large number of mammograms generated in widespread screening, so a miss diagnose could occur. This might be due to human error such as subjective or varying the decision criteria, that might not be conclusive enough to verify the presence of cancer cells. It may also be due to distraction by other image features, or simple oversight [4]. This leads to a false negative detection. Due to false negative detection, 10-30% of actual tumor remains undetected [5]. Whereas, Statistics show that, among all the breast biopsy cases 20-30% only proved to be cancerous [4, 6].

In literature, multiresolution techniques are used to detect the cancer regions. Multiresolution representations i.e. wavelet and curvelet, provide a very sparse and efficient representation for images [7, 8]. The curvelet transform has advantage of representing the edge discontinuities and curves in images and object very efficiently. Eltoukhy et al. [5] achieved better classification accuracy using curvelet coefficients to classify between normal and abnormal tissues and determined whether the tumor is benign or malignant. Moyadi et al. [3] used the coefficients obtained by the 4 level decompositions of contourlet, on mammographic images to classify between the normal and abnormal tissues. Their method also showed the improvements in the classification rates. Ferreira and Borges [8] derived feature vectors from set of coefficients obtained from level one decomposition of wavelet bases for classification of normal tissues and masses and achieved good classification results. Eltoukhy et al. [9] presented a method for classification by selecting the percentage of biggest coefficients at each level of decomposition of the curvelet sub-bands.

Sayed et al. [10] used Fast Discrete Curvelet Transform (FDCT) via wrapping for object extraction from the grey level images. The experimental results proved that the curvelet are superior to the wavelet on the pixel domain. Faye et al. [11] developed a feature extraction method using multiresolution representation. They used standard deviation of means to maximize the differences between the different classes and achieved good classification results. Eltoukhy et al. [12] presented an optimized feature selection method for cancer detection using the wavelet and curvelet transforms. The method uses the differences of threshold values between different classes, for classification of normal and abnormal

2014 IEEE Region 10 Symposium

978-1-4799-2027-3/14/$31.00 ©2014 IEEE 663

tissues. Classification results show that curvelet has advantage over the wavelets. This study aims to find a method to reduce the number of coefficients that have capability to differentiate between two different classes of tissues.

II. CURVELET TRANSFORM Candes and Donoho [13] proposed the discrete curvelet

transform. Curvelet are superior over the wavelet as they provide optimal sparse of edges in objects and are good in image reconstruction [14]. The curvelet transform in two dimensions has spatial and frequency parameters and respectively, represented in polar coordinates as r and . We define a pair of windows and as radial window and angular window respectively which obey the admissibility conditions:

(1)

(2)

For each , a frequency window is defined in the Fourier domain by

(3)

where is the integer part of j/2.Thus the support of is a polar wedge defined by support of and and is applied with scale dependant windows, with widths in radial and angular direction.

The curvelets are defined as function of defined at scale , orientation angle and position

by

(4)

where , . A curvelet coefficient in the inner product between an element

and a curvelet

where R denotes real line. There are two ways of Curvelet transform implementation, namely: the unequispaced Fast Fourier Transform (FFT) and the wrapping technique [14].

III. METHODOLOGY

In this study, curvelet transform coefficients are used to classify between normal and abnormal mammogram images. In this section we will discuss about the Region of interest (ROI), feature extraction and the classification. Figure1 illustrates the steps of the work.

A. Region of interest (ROI) cropping The study is performed on the MIAS dataset. The data set

contains 322 images. In this study, 309 images are selected with division into subcategories normal and abnormal images. The dataset contains 102 abnormal images and 207 normal images. The original image size was 1024x1024 pixels in the dataset. The image labels, pectoral muscles and background all considered unwanted regions. Thus, were removed by cropping the image in order to get the Region of interest. Image cropping was performed manually based on the information provided in the data set about region of interest and center of abnormalities. This information has been provided by expert radiologists. After the manual cropping, we obtained images of 128x128 pixels.

Fig.1 The proposed method for mammogram classification


978-1-4799-2027-3/14/$31.00 ©2014 IEEE 664

B. Feature Extraction The curvelet transform is applied on the ROI’s using four

decomposition levels. The obtained coefficients are arranged to produce high dimensional feature vector. Now, to reduce the dimensionality of feature vector, a set of feature can be used to represent a certain number of coefficients. Therefore, the curvelet coefficients for the ROI are arranged into groups of 100, 200, 300 and 400 coefficients. We extracted six statistical features (mean, energy, standard deviation, entropy and maximum value in the coefficient matrix) for each group of coefficients. These features are also calculated directly from each ROI, then they are combined together to produce the feature vector of the ROI’s. In order to reduce the number of feature, a feature ranking method is applied to select the most significant features that are able to discriminate between different classes. The obtained features are presented to SVM classifier.

C. Classification The SVM classifier is used for classification task. 10 fold

cross validation on dataset is applied. The classification is performed, firstly on the group feature vector, then on the reduced ranked feature matrices obtained after the feature ranking process for each group. The classification results for each case are noted for comparison.

IV. RESULTS AND DISCUSSION Image features were extracted using the curvelet

transform. Initially, from four curvelet decomposition levels, 46080 coefficients were obtained. Each group of (100, 200, 300, 400) coefficients is presented using six statistical features. Thus the size of the coefficient vector reduced to 2766, 1380, 924 and 466 corresponding to group 100, 200, 300 and 400 respectively. For feature vector of each group (100, 200, 300 and 400) the classification were obtained using SVM classifier in two steps. Firstly, on the whole feature vector, secondly on ranked feature vector. The

obtained results are illustrated in Figure 2 .These ranked features are obtained by arranging the feature vector in descending order according to its capability to differentiate between different classes. The capability scores of these features were calculated using mean and standard deviation. Thresholding is applied over the scores, only the most significant features are kept [5].These ranked features were then passed to SVM and results were presented.

The overall feature vector of 2766 was obtained for group of 100 coefficients. The classification is done using the support vector machine for 10 fold cross validation. The overall classification rate achieved was 72.4%. The feature vector size reduced to top 57 features after feature ranking. The classification results improved up to 79.19%. The same experiment was repeated on groups of 200, 300 and 400 coefficients all six statistical features were calculated for each group respectively.

For group of 200 coefficients, having feature vector size 1380, the 2x5 folds cross validation, using SVM yielded classification accuracy of 72.5%. 107 features were obtained after feature ranking. The classification rate improved by 13%, with an overall accuracy of 85.48%. For group of 300 coefficients a feature vector having 924 elements was passed to SVM, produced classification accuracy of 73% on 10 fold cross validation. The feature ranking procedure obtained 89 features that yielded classification accuracy of 85.08 with 12% improvement in performance from the actual feature vector. Same experiment was also performed on group of 400 coefficients for each ROI, each having six statistical features mentioned above. The classification process yielded an accuracy of 63.36% on 10 fold cross validation of SVM classifier on feature vector having 466 features. The feature vector reduced to 235 selected features after feature ranking and the classification rate improved up to 68.3%. However, the classification accuracy was least as compared to other group of coefficients.

Fig.2 Classification results obtained using SVM

72.4179.19 72.51

85.4872.95

85.09

63.45 68.36

0

10

20

30

40

50

60

70

80

90

100

Statistics For100

Coefficients

Statistics for Ranked

Features of 100Coeff

Statistics For200

Coefficients



Statistics For300

Coefficients



Statistics For400

Coefficients



Cla

ssifi

catio

nac

cura

cyra

te


978-1-4799-2027-3/14/$31.00 ©2014 IEEE 665

V. CONCLUSION In this paper feature extraction from the curvelet

coefficients is achieved by dividing the coefficients into certain groups of coefficients having six statistical features. The classification is achieved using the SVM classifier for each group. Feature ranking method was applied on each case and classification process is repeated. The classification results show that for each case the classification rates improved significantly after feature ranking. However, it was also noted that for group of 200 and 300 coefficients the classification performance approximately remained the same with the slightest decrease. The maximum accuracy for normal and abnormal classification is 85.48% obtained from 107 ranked features from group of 200 coefficients .For future work , addition of more statistical features from the curvelet coefficients could be used to improve the classification results. Also, using different feature ranking method for selecting top ranked features can be explored. Introducing different classifier might help in improvement of classification rates.

VI. ACKNOWLEDGMENT The Research is supported by Ministry of Science,

Technology and Innovation (MOSTI) Malaysia under Science Fund grant No. 06-02-02-SF0142.

REFERENCES [1] Foundation, P. [Breast Cancer] 2012; Available from:

http://pride.org.my/breast-cancer/. [2] D.N. Ponraj, M.E. Jenifer, P. Poongodi, J.S.

Manoharan, “A Survey on the Preprocessing Techniques of Mammogram for the Detection of Breast Cancer ”. Journal of Emerging Trends in Computing and Information Sciences, 2011. 2(12) pp.656-664.

[3] F. Moayedi, Z. Azimifar, R. Boostani, and S. Katebi, “Contourlet-based mammography mass classification using the SVM family”. Computers in Biology and Medicine, 2010. 40(4): pp. 373-383

[4] I.Christoyianni, A.Kuutras ,E. Darmatas, G.Kokkinakis, “Computer aided diagnosis of breast cancer in digitized mammograms”. Computerized Medical Imaging and Graphics, 2002. 26(5): p. 309-319.

[5] M.M.Eltoukhy, I. Faye, and B.B. Samir ,“A statistical based feature extraction method for breast cancer diagnosis in digital mammogram using multiresolution representation.”Computers in Biology and Medicine, 2012. 42(1): pp. 123-128.

[6] R. Mousa, Q. Munib, and A. Moussa, Breast cancer diagnosis system based on wavelet analysis and fuzzy-neural. Expert Systems with Applications, 2005. 28(4): pp. 713-723.

[7] S.G. Mallat, “A theory for multiresolution signal decomposition: the wavelet representation.” Pattern

Analysis and Machine Intelligence, IEEE Transactions on, 1989. 11(7): pp. 674-693.

[8] C.B.R. Ferreira, and D.B.L. Borges, “Analysis of mammogram classification using a wavelet transform decomposition.” Pattern Recognition Letters, 2003. 24(7): p. 973-982.

[9] M.M.Eltoukhy, I. Faye, B.B. Samir, “Breast cancer diagnosis in digital mammogram using the multisclae curvelet transform”,computerizedMed.ImagingGraphics34(4)(2010):pp269-276.

[10] U. Sayed, M.A. Mofaddel, W.M. Abd Elhafiez, M.M. Abdel-Gawad, “Image Object Extraction Based on Curvelet Transform”, Applied Mathematics & Information Sciences, 2013 7 (1), pp.133-138.

[11] I. Faye, B.B. Samir, M.M. Eltoukhy. Digital mammograms classification using a wavelet based feature extraction method. Proceeding of 2nd international conference on computer and electrical engineering (ICCEE) 2009.

[12] M. M. Eltoukhy, I. Faye, “An Optimized Feature Selection Method for Breast Cancer Diagnosis in Digital Mammogram Using Multiresolution Representation” Applied Mathematics & Information Sciences, 2014.

[13] E.J.Candes and D.L. Donoho , Curvelets, multiresolution representation, and scaling laws.2000.

[14] E.J.Candes, L. Demanet, D.L. Donoho, “Fast discrete curvelet transforms”. Multiscale Modeling and Simulation, 2006. 5(3): pp 861-899.


978-1-4799-2027-3/14/$31.00 ©2014 IEEE 666

Documents

[IEEE 2014 IEEE Region 10 Symposium - Kuala Lumpur, Malaysia (2014.4.14-2014.4.16)] 2014 IEEE REGION 10 SYMPOSIUM - A method to reduce curvelet coefficients for mammogram classification