6

Click here to load reader

[IEEE 2011 IEEE International Conference on Image Information Processing (ICIIP) - Shimla, Himachal Pradesh, India (2011.11.3-2011.11.5)] 2011 International Conference on Image Information

  • Upload
    mallika

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: [IEEE 2011 IEEE International Conference on Image Information Processing (ICIIP) - Shimla, Himachal Pradesh, India (2011.11.3-2011.11.5)] 2011 International Conference on Image Information

2011 International Conference on Image Information Processing (ICIIP 2011)

Proceedings of the 2011 International Conference on Image Information Processing (ICIIP 2011) 978-1-61284-861-7/11/$26.00 ©2011 IEEE

Probabilistic Prediction of Malaria using Morphological and Textural Information

Devkumar Das*, Madhumala Ghosh, Chandan Chakraborty

School of Medical Science and Technology Indian Institute of Technology Kharagpur

Kharagpur, West Bengal, India *[email protected]

Ashok K Maiti#, Mallika Pal Department of Pathology

Midnapur Medical College and Hospital Midnapur, West Bengal, India

#[email protected]

Abstract— The objective of this paper is to introduce a computer assisted prediction of malaria infection particularly Plasmodium vivax based on the morphological and textural information. Here erythrocytes have been segmented from light microscopic images of peripheral blood smear using marker controlled watershed followed by pre-processing. Thereafter texture and morphology of erythrocytes are extracted using geometrical and Haralick texture measure. Finally statistically significant features are fitted with multivariate regression model. This predictive model is good fitted (-2 Log Likelihood = 24.636 and Chi-Square = 820.949) and it gives 88.77 % prediction accuracy.

Keywords- malaria, P. vivax, watershed, texture, multivariate logistic regression.

I. INTRODUCTION In the last few decades it has been observed that malaria

very often attacks the population of sub African and Asian countries [1]. And hence these diseases have become leading cause of death worldwide. It causes 1.5-2.7 million of deaths per year [2]. In India, malaria incident rate is also higher. In contrast of other malarial infection, surveillance rate of Plasmodium vivax (P. vivax) is higher in Indian population. In India, approximately 50-60% of people are affected by P. vivax and 40-50% of people are affected by Plasmodium falciparum (P. falciparum) as reported in National Vector Borne Disease Control Program (NVBDCP) data of the year 2010. In today’s diagnostic circumstances, microscopic imaging has massive contribution in order to diagnose malarial infection from blood smear images. In practice expert detect plasmodium infection under microscope best on their clinico-pathological knowledge. This subjective evolution procedure is time consuming and tedious. In order to reduce the subjective evolution error and time complexity there is need to develop automatic image analysis approach that can detect automatically plasmodium infection from the peripheral blood smear images.

Towards this orientation some of the literature has been proposed. In malaria detection, colour histogram based malaria parasite detection has been carried out [3]. Further, Diaz et al., showed quantification and classification of P.

falciparum infected erythrocytes [4]. Morphological and novel thresholding selection technique for identification of erythrocytes was used by Ross et al., [5] Malaria parasite in HSV (Hue, Saturation, and Value) color space was segmented by Makkapati et al., [6]. Erythrocytes infected by malaria parasites were detected by using statistical based approach [2]. Mathematical morphology and granulometry approach for estimation of parasitemia was applied by Dempster et al. [7]. The malaria parasite was investigated using gray level thresholding [8].

Most of the literature worked on P. falciparum detection. Although P. falciparum infected malaria is much severe than the P. vivax infected malaria, but in Indian population incident rate of P. vivax malaria is higher than that of P. falciparum malaria.

In order to identify P. vivax infected malaria here we introduced shape viz., area, perimeter, compactness, etc. and GLCM based textural features to recognize malaria. Prior to those feature extraction, all infected and non-infected erythrocytes are segmented using gray level thresholding followed by morphological operator and are partitioned using marker controlled watershed algorithm. The extracted features are statistically analysed using box plot, kernel density and t-test for finding significant features those having discriminating efficiency between non-infected and P. vivax infected erythrocytes. For getting disease probabilistic prediction here multivariate regression model is fitted with the formulation. This predictive model is good fitted (-2 Log Likelihood = 24.636 and Chi-Square = 820.949) and it gives 88.77 % prediction accuracy. Fig.1 shows the flow diagram of my work.

II. MATERIALS AND METHODOLOGY

A. Image Acqusition We have collected 250 Plasmodium vivax infected patients thin blood smear from Midnapur Medical College & Hospital and Medipath Laboratory, Midnapur, West Bengal, which is labeled /diagnosed as Plasmodium vivax by the expert. Instead of 250, we have considered 50 patents peripheral blood smear

Page 2: [IEEE 2011 IEEE International Conference on Image Information Processing (ICIIP) - Shimla, Himachal Pradesh, India (2011.11.3-2011.11.5)] 2011 International Conference on Image Information

2011 International Conference on Image Information Processing (ICIIP 2011)

Proceedings of the 2011 International Conference on Image Information Processing (ICIIP 2011)

slide based on doctor suggestion; because most of slides are not well prepared and noisy.

Fig. 1 Flow diagram of the work All slides are stained by using Leishman stain. Blood image are optically grabbed by using Leica Observer (Leica DM750, Leica Microsystems, Switzerland) under 100X oil objectives (NA 1.5150). The effective magnification is being 1000X and pixel size being 0.064µ. All images are grabbed in the JPEG format (Fig. 2) and grabbed images are digitized at 2048*1536 pixel.

(a) (b)

Fig. 2 Light microscopic images of blood smear (a) Normal, (b) Malaria

B. Pre-processing Pre-processing is basic requirement for every image

analysis scheme. In case of microscopic image analysis, pre-processing is required to eliminate artefact that incorporated during slide preparation and time of imaging. Here we consider G channel of the test images because it gives better information than R and B channel. Henceforth, median filter is used for reducing the impulse noise (Fig. 3), as we know

median filter is non linear in nature which is used to reduce impulse noise [9].

(a) (b)

Fig. 3 Median filtering Image (a) G component of the original image, (b) Filtered image

C. Erythrocyte segmentation Blood smear image consists of foreground and background

region. Erythrocyte, Leukocyte and Platelet are the three blood parameters present in foreground region of the blood smear images and background contains plasma. In this study, erythrocyte is the main region of interest. These erythrocytes are segmented using gray level thresholding followed by morphological operator and marker controlled watershed algorithm. Gray level thresholding commonly known as Otsu thresholding [10] is a popular technique to identify normalized gray value where intra class variance is minimal. Based on this threshold value we extracted foreground region of the test images. Since our main region of interest is erythrocyte, we use morphological operator to eliminate unwanted cell like leukocyte and platelet because area of those two cells are different than erythrocyte (Fig. 4). Marker controlled watershed algorithm is a powerful segmentation tool which enhance the segmentation of object having close contour [9]. Overlapping erythrocytes are segmented by using marker controlled watershed segmentation technique.

(a) (b)

Fig. 4 Segmentation of P. vivax infected blood smear image (a) Input image (b) Segmented output

Algorithm-I: Erythrocyte segmentation

1. Read an input image 2. Pre-processing:

a) Extract R,G,B component from input image

b) use median filter on G channel image to remove noise

3. Compute threshold value based on Otsu method 4. Binarize the image 5. Remove unwanted cells using morphological

operator from binary image

Image Acquisition

Pre-processing

Erythrocytes Segmentation (Gray level thresholding, Morphology and

Marker controlled watershed)

Textural (Haralick textural features) and Morphological feature extraction

P. vivax infection prediction (Multivariate logistic regression model)

Page 3: [IEEE 2011 IEEE International Conference on Image Information Processing (ICIIP) - Shimla, Himachal Pradesh, India (2011.11.3-2011.11.5)] 2011 International Conference on Image Information

2011 International Conference on Image Information Processing (ICIIP 2011)

Proceedings of the 2011 International Conference on Image Information Processing (ICIIP 2011)

6. Perform marker controlled watershed transform to separate erythrocyte form background

7. Reconstruct the RGB image

D. Feature Extraction To identify infected and non infected erythrocyte there is a

requirement to extract some good feature that can help to discriminate both types of erythrocyte. In case of malarial infection, textural as well as shape features are important to distinguish which types of Plasmodium species are present in a blood smear. We compute Haralick [11] textural features (Energy, entropy, correlation, etc.) and geometrical features (area, perimeter, compactness, etc.).

1). Haralick Texture Feature

Consider ( , )f x y is the segmented erythrocyte (infected and non-infected) image having (0,1,2,3,.........., 1)N N − distinct gray level intensities. Firstly we calculate Gray level Co-occurrence matrix (GLCM) which is a square matrix of N×N, where N is the number of gray levels. Based on this GLCM, Haralick described statistical features that can describe the texture of the image. Some of them are calculated as following

If ( , )P i j = Normalized dependence matrix and N = no of gray levels present in the erythrocyte

(a) Entropy:

1 1

( , ) log( ( , ))1 0 0

N Nf P i j P i j

i j

− −= − ∑ ∑

= = (1)

(a) Energy:

1 1 2( , )2 0 0

N Nf P i j

i j

− −= ∑ ∑

= = (2) (b) Correlation:

1 1

( , ) ( , )0 0

3

N Ni j P i j x yi jf

x y

μ μ

σ σ

− −−∑ ∑

= == (3)

Where , , ,x y x yσ σ μ μ are the standard deviations and

means of are ,x yP P correspond

where ,x yP P are the partial probability density functions

( ) thxp i i= entry in the marginal–probability matrix

obtained by summing the rows of ( , )P i j (c) Variance

1 1 2( ) log( ( , )) 4 0 0where mean of ( , )

N N

f i P i ji j

P i j

μ

μ

− −= −∑ ∑

= == (4)

(d) Information correlation measure1

1 15 max( )

f HXYf

HX HY

−=

− (5)

(e) Information correlation 2 12(1 exp[ 2( 2 )])6 1f HXY f= − − −

(6)

Where HX and HY are the entropies for xP and yP 1

( , )(log( ( , )), 0

NHXY P i j P i j

i j

−= − ∑

=

11 ( , ) log( ( ) ( ))

, 0

NHXY P i j P i P jx yi j

−= − ∑

=

1 02 ( ) ( ) log( ( ) ( ))

, 0

NHXY P i P j P i P jx y x yi j

− == − ∑

=

(f) Sum Entropy

2( 1)( ) log( ( ))7 2

Nf P i P ix y x yi

−= − ∑ + +=

(7)

2). Geometric feature In this study we calculate five types of geometrical feature

(Area, Perimeter, Circularity, Compactness, and Orientation). These geometrical features mathematically defined as follows [9]. Area is the total no of pixel present in a particular erythrocyte.

( , )Area f x yx y

= ∑∑ (8)

Where f(x, y) is the binary image of erythrocyte ( , ) 1;f x y = if ( , ) f x y object∈

( , ) 0;f x y = Otherwise Perimeter defined as no of boundary pixel present in a particular erythrocyte.

( , ), , Perimeter f x y x y Boundary regionx y

= ∈∑∑ (9)

Compactness is ratio between area and square of perimeter of each erythrocyte.

4 .2

AreaCompactness

Perimeter

π=

(10)

Circularity is the ratio between square of perimeter and area of erythrocyte

2PerimeterCircularity

Area= (11)

A. Statistical Analysis Statistical analysis has important role in any image analysis

algorithm. Based on statistical analysis (t-test, probability density plot and Box plot), we identified significant features which having higher discriminating capability. We have considered 146 plasmodium vivax infected erythrocyte and 650 non infected erythrocyte for textural and geometrical feature extraction purpose.

Page 4: [IEEE 2011 IEEE International Conference on Image Information Processing (ICIIP) - Shimla, Himachal Pradesh, India (2011.11.3-2011.11.5)] 2011 International Conference on Image Information

2011 International Conference on Image Information Processing (ICIIP 2011)

Proceedings of the 2011 International Conference on Image Information Processing (ICIIP 2011)

1). t-test: It’s a one type of hypothesis testing which

hypotheses the mean of independent variable across different groups [12]. Using t-test, nineteen features out of twenty four, are statistically significant in discriminating two classes at 1% level of significant shown in TABLE1.

TABLE1

SUMMERY STATISTICS AND STATISTICAL SIGNIFICANCE TEST ERYTHROCYTE’S FEATURES FOR MALARIA

Textural Feature

Non-infected erythrocyte (µ±σ)

Parasite infected Erythrocyte (µ±σ)

t-test (*Sig.)

Area 10907.26±1727.29 16529.97± 804.37

37.00*

Perimeter 402.47± 42.34 517.87 ± 47.28

30.253*

Circularity 14.95 ± 1.56 16.35 ± 2.58

8.961*

Eccentricity 0.45 ± 0.16 0.54 ± .16 7.215* Orientation 6.37 ± 50.64 -3.77 ±

47.36 -2.251

Compactness .98 ± .02 .97 ± .02 -8.210* Autocorrelation 21.66 ± 2.94 20.64 ±

2.74 1.291

Contrast 0.43 ± 0.06 0.41 ± 0.04 -3.248 Correlation 0.9364 ± 0.01 0.94823 ±

0.01 8.993*

Cluster prominence

468.80±54.70 439.00± 115.89

5.443*

Cluster shade -46.87± 6.73 -45.41 ± 17.39

-5.333*

Dissimilarity 0.13 ± 0.03 0.14 ± 0.02 7.155* Energy 0.43 ± 0.10 0.29 ± 0.06 -17.092* Entropy 1.14 ± 0.23 1.57 ± 0.18 21.200* Homogeneity 0.96 ± 0.01 0.94 ± 0.01 -9.860* Maximum probability

0.59 ± 0.12 0.41 ± 0.08 -17.154*

Variance 21.74 ± 2.93 20.72 ± 2.74

1.262

Sum average 8.55 ± 0.60 8.27 ± 0.62 -1.421 Sum variance 58.19 ± 28.34 58.90 ±

13.87 -5.604*

Sum entropy 1.09 ± 0.22 1.50 ± 0.16 21.522*

Difference variance

0.43 ± 0.06 0.41 ± 0.04 -3.428*

Difference entropy

0.27 ± 0.08 0.34 ± 0.05 10.619*

Information measure of correlation1

-0.75 ± 0.06 -0.74 ± 0.03

4.296*

Information measures of correlation2

0.86 ± 0.03 0.92 ± 0.02 21.839*

* p< 0.001 indicates statistical significance

2). Box-Whisker Plot: A Box -Whisker plot shows the distribution of a set of data along the medians between non-

infected and infected erythrocytes group. Fig. 4 shows the box plot of the erythrocyte (infected and non-infected). The spacing between the different parts of the box shown the degree of dispersion (spread) and skewness in the data, and identify outliers.

Fig. 4 Box-Whisker’s plot for (a) Area, (b) Perimeter,

(c) Entropy, (d) Energy 3). Class-conditional density plot: It identified the

likelihood area of non-infected and infected erythrocyte. In class conditional density plot, the variation in distribution in the shape and texture based features for non-infected and P. vivax infected erythrocyte are tested to visualize whether there is any overlapping in data between any pair of erythrocyte types. In continuous scale, the overlapping area of each of the features for normal and parasite infected region groups is shown in Fig. 5 using estimated density functions. From class conditional density plot [12] we can find shape feature (Area) and textural features (Energy, Entropy and Information measure of correlation2) having less overlapping region between two classes.

Fig. 5 Class conditional plot for (a) Area, (b) Entropy,

(c) Energy, (d) Information measure of correlation2

Page 5: [IEEE 2011 IEEE International Conference on Image Information Processing (ICIIP) - Shimla, Himachal Pradesh, India (2011.11.3-2011.11.5)] 2011 International Conference on Image Information

2011 International Conference on Image Information Processing (ICIIP 2011)

Proceedings of the 2011 International Conference on Image Information Processing (ICIIP 2011)

B. Multivariate Logistic regression

Multivariate logistic regression model is a good predictive model for predicting whether erythrocytes are plasmodium vivax infected or not [13]. It can be defined as

0Y Xi iα α= + (12) Where

1 if the patient has Plasmodium vivax infection |0 if the patient is non-infectedY =

0 constantα = 1 2 3( , , .............., )nx x x x x= denotes textural and shape

pattern of erythrocytes. In case of linear regression, least-square estimation does not provide robustness. In order to accomplish robustness, multivariate logistic regression fitted to calculate disease probability of a patient based on textural and geometrical features of erythrocyte. Probability of malaria infection (P. vivax) given an erythrocyte pattern by

( 1 | , ,........., ) ( )1 1 2 2P Y X x X x X x xn n π= = = = = , say Hence, the odds of malaria infection is computed as

( 1| , ,........., )( ) 1 1 2 21 ( ) ( 0 | , ,........., )1 1 2 2

P Y X x X x X xx n nx P Y X x X x X xn n

ππ

= = = ==

− = = = =

(13) Thus, the probabilistic model for predicting malaria

(Plasmodium vivax) infection is as follows 1

( )( )01

x T xe

πα α

=− +

+

(14)

And likelihood leads to 1

( | ) [ ( )] [1 ( )] ; 0,1yy iiL x x x yi i iα π π

−= − =∏ (15)

The regression parameters 1 2 3, , ..............α α α are calculated using maximum likelihood (ML) estimation.

Hence the decision rule is defined as

If *( 1 | )P y x δ= ≥ Present malaria (P. vivax) infection

*( 0 | )P y x δ= ≤ Normal Where " "δ is the threshold of the decisions marker.

III. RESULT AND DISCUSSION Total nineteen statistically significant shape and textural

features are considered here for estimate regression coefficients of erythrocytes features. TABLE II shows the regression coefficients of ML estimation. The probabilistic model has been define as

* 1( 1| )1 zP y x

e−= =+

,

(16)

Where 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7

8 8 9 9 10 10 11 11 12 12 13 13

14 14 15 15 16 16 17 17 18 18 19 19

z x x x x x x xx x x x x x

x x x x x x

α α α α α α α αα α α α α α

α α α α α α

= + + + + + + ++ + + + + + +

+ + + + + For an unlabelled erythrocyte of a patient’s peripheral blood smear images characterized by the feature vector

* (16047,495.9727,15.32927,0.2318015,0.972251,

0.939519,467.477,-49.4502,0.131568,0.326801,

1.408455,0.962582,0.474082,62.82728,1.36445,

0.471963,0.273369,-0.78258,0.9144596)

x =

The estimated P vivax infection probability is

1*( 1 | ) 0.8453531

P y x ze= = =−+

TABLE II

ML ESTIMATE OF MULTIVARIATE LOGISTIC REGRESSION MODEL

Variable Coefficient(α ) Area 0.0040 Perimeter -0.0877 Circularity 1.0161 Eccentricity 2.6162 Compactness 48.1113 Correlation 180.2734 Cluster prominence -0.0054 Cluster shade -0.0469 Dissimilarity -7057.02 Energy 197.1168 Entropy 535.7820 Homogeneity -12071.9 Maximum probability -47.2946 Sum variance -0.0239 Sum entropy -517.023 Difference variance 1023.778 Difference entropy -329.970 Information measure of correlation1

398.2371

Information measures of correlation2

457.2683

Constant 11708.33 Here 0.5 considered as a threshold value, below the

threshold value considered as normal erythrocyte and above the threshold value considered as P vivax infected erythrocyte. Based on formulated decision rule, the patient has 84.53 % risk of P vivax infection. This predictive model is good fitted (-2 Log Likelihood = 24.636 and Chi-Square = 820.949) and it gives 88.77 % prediction accuracy (TABLE III). Fig. 6 shows the fitted multivariate logistic regression curve.

Page 6: [IEEE 2011 IEEE International Conference on Image Information Processing (ICIIP) - Shimla, Himachal Pradesh, India (2011.11.3-2011.11.5)] 2011 International Conference on Image Information

2011 International Conference on Image Information Processing (ICIIP 2011)

Proceedings of the 2011 International Conference on Image Information Processing (ICIIP 2011)

TABLE III PREDICTION STATISTICS OF NORMAL AND P. vivax INFECTION

Observed % Prediction Normal P vivax

Predicted Normal 596 54 90.93%

P vivax 19 127 86.62% Accuracy 88.77%

Fig. 6 Fitted multivariate logistic regression curve for

prediction of Plasmodium vivax

IV. CONCLUSSION Malaria infection is very common in our population. As

because of highly demand population especially in India, it is now becoming a burning issue to develop computer assisted screening, in order to prevent the disease even at early stage. In view of this, such attempt highlights a probabilistic approach to classifying P. vivax infected patient from healthy one rapidly as well as efficiently. Here, total nineteen features are found to be statistically significant in morphological and texture scale based on the 650 normal and 146 P. Vivax erythrocytes. Though attempts has been made for supervised and unsupervised standard classifier, but here we applied multivariate logistic regression as probabilistic model and showed a good fit. Here, this methodology can be recommended for classifying the infected erythrocytes probabilistically.

ACKNOWLEDGEMENT We are strongly thankful to Dept. of Information

Technology, Govt. of India for financial support in carrying out this project.

REFERENCES

1. J. Frean, “Microscopic determination of malaria parasite load: role of image analysis”, Microscopy: Science, Technology, Application and Education, FORMATEX 2010, vol. 3, pp. 862-866, 2010.

2. S. Raviraja, S.S. Osman, Kardman, A novel technique for malaria diagnosis using invariant moments and by image compression, IFMBE Proceedings, Volume 21, 2008.

3. F.B. Tek, A.G. Dempster, I. Kale, “Malaria parasite detection in peripheral blood images”, In: Proc. Med. Imaging Understand. Anal. Conf., Manchester, UK, 2006.

4. Diaz G., Gonzalez F. A, Romero E., (2009), “A semi automatic method for quantification and classification of erythrocytes infected with malaria parasites in microscopic image”, Journal of Biomedical Informatics, Elsevier, vol-42, pp. 296-307.

5. N.E. Ross, C.J. Pritchard, D.M. Rubin, A. G. Duse, Automatic image processing method for the diagnosis and classification of malaria on thin blood smears, Med Biol Eng Comput, Vol-44, 427-436, 2006.

6. V.V. Makkapati, R.M. Rao, “Segmentation on malaria parasites in peripheral blood smear images”, IEEE International conference on acoustics, speech and signal processing, pp. 1361-1364, 2009.

7. Dempster, C.D. Ruberto, “Morphological processing of malarial slide images”, Matlab DSP Conference, Nov, 16–17, Espoo, Finland, 1999.

8. S.F. Toha, U.K. Ngah, Computer Aided Medical Diagnosis for the Identification of Malaria Parasites, IEEE - ICSCN 2007, MIT Campus, Anna University, Chennai, India. Feb. 22-24, pp.521-522, 2007.

9. R.C.Gonzalez, R.E. Woods, Digital image processing, 2nd Edn. Prentice Hall, New York, 2002.

10. N. Otsu "A threshold selection method from gray-level histograms". IEEE Trans. Sys., Man., Cyber. 9 (1): pp. 62–66, 1979.

11. R.M., Harlik, S.R. Stemberg, “Image analysis and using mathematical Morphology”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.9, no.4, pp.532-550, 1983.

12. Rastogi, Fundamentals of Biostatistics, Ane Books, India, 2008.

13. M.Muthu Rama Krishnan, V. Dutta, C. Chakraborty, A.K. Ray, ” Probabilistic prediction of cancer using nuclei morphometry” In: Proc. India Conference (INDICON), 2009 Annual IEEE, pp.1-4, 2009.