4
AbstractA novel approach is proposed for generating data driven models of the lung nodules appearing in low dose CT (LDCT) scans of the human chest. Four types of common lung nodules are analyzed using Active Appearance Model methods to create descriptive lung nodule models. The proposed approach is also applicable for automatic classification of nodules into pathologies given a descriptive database. This approach is a major step forward for early diagnosis of lung cancer. We show the performance of the new nodule models on clinical datasets which illustrates significant improvements in both sensitivity and specificity. Keywords: Nodule modeling, Data-driven nodule models, Active Appearance, Sensitivity and Specificity of CAD systems I. INTRODUCTION HIS paper deals with modeling the lung nodules that are visible in low dose computed tomography (LDCT) of the human chest. The new models are major enhancements to ongoing efforts for early detection and classification of nodules in lung cancer screening studies that are based on LDCT. Globally lung cancer remains the most common malignancy with an estimated 1.5 million newly diagnosed cases in 2007 and 1.35 million deaths occurring that same year. Of the 1.35 million deaths 975,000 cases were men and 376,000 cases were female. The highest recorded 5-year patient survival rates of 14% are observed in the United States while the 5 year survival rate is 8% in Europe. The survival of lung cancer is strongly dependent on early diagnosis [1][2]. Research studies to reach an optimal detection rate for early detection of lung cancer are the hope for improved survival rate [3]-[5]. An image analysis approach for automatic detection and classification of lung nodules involves image acquisition and filtering of scan noise; segmentation of the lung tissue from the rest of the chest cavity; detection of nodules and reduction of false positives; and classification into pathologies. The literature is rich in approaches to segment the lung from the rest of the chest tissues; but the majority of the nodule modeling methods are based on parametric descriptions of the nodules (e.g., in 2D circular or semicircular models are used, while in 3D volumes spherical Manuscript received April 23, 2010. This work was supported in part by the National Science Foundation and the Kentucky Lung Cancer Program. Contact Author: Amal A. Farag, [email protected] Amal Farag, S. Elhabian, J. Graham, A. Farag, S. Elshazly are with the Department of Electrical & Computer Eng., University of Louisville, USA. R.Falk, MD, is with the Medical Imaging Division, Jewish Hospital, Louisville, KY, USA H. Mahdi, H. Abdelmunim and S. Al-Gaafary, MD are with Ain Shams University, Ain Shams, Egypt. or hemispherical models are used). Nodule detection is performed using various machine learning methods which execute template matching by one approach or another (e.g., [5][7][8]). Extensive surveys on automatic lung nodule detection may be found in [9][10]. Yet, the sensitivity and specificity of the automatic detection methods is very hard to quantify. Part of the problem is that, unlike common problems in computer vision (e.g., face recognition or stereo-based reconstruction); no “standard” databases are available for lung nodules. Nodule centroid region locations are identified in the ELCAP study in which the nodules are identified by a pixel inside the nodule (i.e., the spatial support is not identified by the human experts) [11]. The National Lung Screening Trial (NLST) in the US is to report LDCT scans of over 30,000 subjects; to date this data is not available [1]. To the best of our knowledge none of the studies worldwide have resulted in identifiable databases of nodules listing their types and pathologies. Therefore, the need is persistent for reliable nodule models based on the actual scans. In fact, part of our ongoing research is to construct such databases which will be made available to the research community. Our efforts seek to identify 10,000 nodules by end of 2010. Kostis et al. [6] provides a description of four major types of lung nodules based on identifiable features (size, shape and location in the lung tissue). These “anatomical” descriptions of nodules are used in our work. Lee et al. [5] noted that the intensity (or Hounsfield Units) of the nodules decays with radial distance off the centroid; this is beneficial for texture/intensity estimation of the inside of a nodule shape. Farag et al. [7][8] established a parametric form for the relationship between the radial distance and the Hounsfield units in Lee’s work; this is very useful for estimating the intensity of a nodule model given an ensemble. Yet, the nodule models used in Lee and Farag as well as all similar works are based on parametric nodule models. In this paper we will use the data to determine the size, shape and intensity distribution of the nodule models (templates). Our approach is non-parametric and is defined by the actual nodules in the lung CT slices, and thus we call it “data-driven”. The contribution of the paper is confined to nodule modeling based on the known information in LDCT scans, as described in an ensemble of nodules identified by radiologists. The developed approach is systematic and applicable to any LDCT imaging protocol. Shape and texture models have shown great promise in a number of computer vision and biomedical imaging analysis (e.g., [12][13]); to the best of our knowledge, this is the first attempt to use the methodologies for shape and texture Modeling of the Lung Nodules for Detection in LDCT Scans Amal Farag, Student Member, IEEE, Shireen Elhabian, Student Member, IEEE, James Graham, Senior Member, IEEE, Aly Farag, Senior Member, IEEE, Salwa Elshazly, Robert Falk, Hani Mahdi, Member, IEEE, Hossam Abdelmunim, Member, IEEE, Sahar Al-Ghaafary T 32nd Annual International Conference of the IEEE EMBS Buenos Aires, Argentina, August 31 - September 4, 2010 978-1-4244-4124-2/10/$25.00 ©2010 IEEE 3618

Modeling of the Lung Nodules for Detection in LDCT … models of the lung nodules appearing in low dose CT (LDCT) ... . Nodule detection is performed using various machine learning

Embed Size (px)

Citation preview

Abstract— A novel approach is proposed for generating data

driven models of the lung nodules appearing in low dose CT

(LDCT) scans of the human chest. Four types of common lung

nodules are analyzed using Active Appearance Model methods

to create descriptive lung nodule models. The proposed

approach is also applicable for automatic classification of

nodules into pathologies given a descriptive database. This

approach is a major step forward for early diagnosis of lung

cancer. We show the performance of the new nodule models on

clinical datasets which illustrates significant improvements in

both sensitivity and specificity.

Keywords: Nodule modeling, Data-driven nodule models, Active

Appearance, Sensitivity and Specificity of CAD systems

I. INTRODUCTION

HIS paper deals with modeling the lung nodules that are

visible in low dose computed tomography (LDCT) of

the human chest. The new models are major enhancements

to ongoing efforts for early detection and classification of

nodules in lung cancer screening studies that are based on

LDCT. Globally lung cancer remains the most common

malignancy with an estimated 1.5 million newly diagnosed

cases in 2007 and 1.35 million deaths occurring that same

year. Of the 1.35 million deaths 975,000 cases were men and

376,000 cases were female. The highest recorded 5-year

patient survival rates of 14% are observed in the United

States while the 5 year survival rate is 8% in Europe. The

survival of lung cancer is strongly dependent on early

diagnosis [1][2]. Research studies to reach an optimal

detection rate for early detection of lung cancer are the hope

for improved survival rate [3]-[5].

An image analysis approach for automatic detection and

classification of lung nodules involves image acquisition and

filtering of scan noise; segmentation of the lung tissue from

the rest of the chest cavity; detection of nodules and

reduction of false positives; and classification into

pathologies. The literature is rich in approaches to segment

the lung from the rest of the chest tissues; but the majority of

the nodule modeling methods are based on parametric

descriptions of the nodules (e.g., in 2D circular or

semicircular models are used, while in 3D volumes spherical

Manuscript received April 23, 2010. This work was supported in part by

the National Science Foundation and the Kentucky Lung Cancer Program.

Contact Author: Amal A. Farag, [email protected]

Amal Farag, S. Elhabian, J. Graham, A. Farag, S. Elshazly are with the

Department of Electrical & Computer Eng., University of Louisville, USA.

R.Falk, MD, is with the Medical Imaging Division, Jewish Hospital,

Louisville, KY, USA

H. Mahdi, H. Abdelmunim and S. Al-Gaafary, MD are with Ain Shams

University, Ain Shams, Egypt.

or hemispherical models are used). Nodule detection is

performed using various machine learning methods which

execute template matching by one approach or another (e.g.,

[5][7][8]). Extensive surveys on automatic lung nodule

detection may be found in [9][10].

Yet, the sensitivity and specificity of the automatic

detection methods is very hard to quantify. Part of the

problem is that, unlike common problems in computer vision

(e.g., face recognition or stereo-based reconstruction); no

“standard” databases are available for lung nodules. Nodule

centroid region locations are identified in the ELCAP study

in which the nodules are identified by a pixel inside the

nodule (i.e., the spatial support is not identified by the

human experts) [11]. The National Lung Screening Trial

(NLST) in the US is to report LDCT scans of over 30,000

subjects; to date this data is not available [1]. To the best of

our knowledge none of the studies worldwide have resulted

in identifiable databases of nodules listing their types and

pathologies. Therefore, the need is persistent for reliable

nodule models based on the actual scans. In fact, part of our

ongoing research is to construct such databases which will

be made available to the research community. Our efforts

seek to identify 10,000 nodules by end of 2010.

Kostis et al. [6] provides a description of four major types

of lung nodules based on identifiable features (size, shape

and location in the lung tissue). These “anatomical”

descriptions of nodules are used in our work. Lee et al. [5]

noted that the intensity (or Hounsfield Units) of the nodules

decays with radial distance off the centroid; this is beneficial

for texture/intensity estimation of the inside of a nodule

shape. Farag et al. [7][8] established a parametric form for

the relationship between the radial distance and the

Hounsfield units in Lee’s work; this is very useful for

estimating the intensity of a nodule model given an

ensemble. Yet, the nodule models used in Lee and Farag as

well as all similar works are based on parametric nodule

models. In this paper we will use the data to determine the

size, shape and intensity distribution of the nodule models

(templates). Our approach is non-parametric and is defined

by the actual nodules in the lung CT slices, and thus we call

it “data-driven”.

The contribution of the paper is confined to nodule

modeling based on the known information in LDCT scans,

as described in an ensemble of nodules identified by

radiologists. The developed approach is systematic and

applicable to any LDCT imaging protocol. Shape and

texture models have shown great promise in a number of

computer vision and biomedical imaging analysis (e.g.,

[12][13]); to the best of our knowledge, this is the first

attempt to use the methodologies for shape and texture

Modeling of the Lung Nodules for Detection in LDCT Scans

Amal Farag, Student Member, IEEE, Shireen Elhabian, Student Member, IEEE, James Graham, Senior

Member, IEEE, Aly Farag, Senior Member, IEEE, Salwa Elshazly, Robert Falk, Hani Mahdi, Member,

IEEE, Hossam Abdelmunim, Member, IEEE, Sahar Al-Ghaafary

T

32nd Annual International Conference of the IEEE EMBSBuenos Aires, Argentina, August 31 - September 4, 2010

978-1-4244-4124-2/10/$25.00 ©2010 IEEE 3618

modeling in the study of lung nodules. The paper is

organized as follows: section 2 discusses the new nodule

modeling approach; section 3 evaluates the improvements in

sensitivity and specificity that result from the new nodule

models algorithmic; and section 4 has conclusions and

planned extensions.

II. NODULE MODELING

This section will examine the process of nodule modeling

using an ensemble of nodules identified by radiologists.

A. Pulmonary Nodule definitions

In radiology, a pulmonary nodule is a mass in the lung

usually spherical in shape; however it can be distorted by

surrounding anatomical structures such as the pleural surface

and anatomical structures. The nodules may be located in

any part of the lung tissue and may be camouflaged or

occluded by the anatomical structure of the lung tissue. This

paper uses the classification of Kostis et al. [6] which

focuses on grouping nodules into four categories: 1) well-

circumscribed where the nodule is located centrally in the

lung without being connected to vasculature; 2) vascularized

where the nodule has significant connection(s) to the

neighboring vessels while located centrally in the lung; 3)

juxta-pleural where a significant portion of the nodule is

connected to the pleural surface; and 4) pleural tail where

the nodule is near the pleural surface, connected by a thin

structure; In all of these types there is no limitations on size

or distribution in the lung tissue. These definitions will be

adopted in this paper and the image analysis methods are

developed and tested based on these nodule types. Of course,

we can add a fifth “none of the above” class that describes

nodules of uncommon shapes or locations; we chose to limit

ourselves to the four classes of Kostis. The goal of the

modeling process is to generate a model or “template” for

each nodule type that possesses its main features.

B. Statistical Nodule Modeling

The main reason for the limited performance of

parametric nodules is the fact that real world nodules do not

have uniform shape or fixed size, and are not isotropic. The

active appearance modeling approach (AAM) which is used

synonymously as active appearance, active shapes,

morphable models, etc. in the computer vision literature may

hold real promise in the analysis (and even synthesis) of

lung nodules. The literature in AAM (and its variants) is rich

(see [12][13] for examples). We shall follow the notations

and developments of Mathews and Baker (e.g., [13]). There

are two forms of AAM, one independent where the shape

and appearance are separate parameters thus the shape and

appearance are modeled separately, and the other form

which is a combined AAM model consists of a single set of

parameters. In the case of independent AAM: Given the

parameters , , we

can use the following equations to generate the shape, ,

which can be defined as the coordinates of the vertices

model ,

(1)

Thus the mean shape can be expressed as:

(2)

where is a base shape plus a linear combination of n shape

vectors . The apperance of the shape is defined within .

Allowing to denote also as a set of pixels

that lies inside the base shape allowing the expression of

appearance as a base appearance plus a linear

combination of images .

(3)

In the combined AAM case, a single set of parameters

parameterize the shape and appearance:

(4)

(5)

For either form of the AAM, the quantities in Eq. 3 or 4

and 5 needs to be estimated from an ensemble of pre-labeled

entities (e.g., nodules). Various methods can be used to

perform this task. Commonly, manually annotated entities

are manipulated to extract the most discriminatory features

(entities or symbols) for the shape and appearance A

very efficient approach to carry out this step is Principle

Component Analysis (PCA). Indeed, carrying out this

process will result in a systematic approach for nodule

synthesis; the coefficients in Eq. (3) or (4) and (5) will be the

basis of a discrimination (or recognition) step. This approach

will lend great benefit for “identification of nodule”; i.e.,

from a collection of nodules in a database, we will be able to

classify a given nodule to the group closest to its features in

that database. For this to be carried out with confidence, a

huge database need to exist which carries out the statistical

variations in the nodules appearing in LDCT scans. Such a

database does not exist at the moment, and our group intends

to create 10,000 labeled nodules by 3 radiologists (crossly

validated) by end of 2010 which will be made public.

The empirical evaluation in this paper is entirely based on

the ELCAP public database [11], which contains 50 sets of

low-dose CT lung scans taken at a single breath-hold with

slice thickness 1.25 mm. The locations (radiologists mark

point inside the nodule) of the 397 nodules are provided by

the radiologists, where 39.12% are juxta-pleural nodules,

13.95% are vascularized nodules, 31.29% are well-

circumscribed nodules and 15.65% are pleural-tail nodules.

The official reports indicate the mean nodule diameter to be

8.5 mm with standard deviation 3.6. The ELCAP database is

of resolution 0.5x0.5mm. The methodologies developed here

will be applicable for any standard LDCT chest scanning

protocol, and the nodule models can be performed in 2D or

3D.

Specifically, developments in this paper use pre-identified

nodules by experts to automatically crop and manually

annotate the nodules centered in a bounding box of size

21x21 pixels (this region was selected based on the radial

distance distribution of the ensemble reported in [8]). The

ensemble of nodules contains variations in intensity

distribution, shape/structural information and directional

3619

variability which the cropped regions within the determined

bounding-box maintain. For testing the algorithms,

ensembles of 24 nodules per type, i.e. 96 total nodules,

where used. Manual annotation of the 96 nodules was

conducted using 10 landmark points (radiologists defined the

positioning and suitable number of landmarks). The cropped

nodules were annotated to highlight the basic geometric and

structural features of the nodules. The nodules were co-

registered with known classification from the ELCAP

screening study.

Fig. 1 shows a sample of co-registered nodules. We

implemented two approaches for registration: the Procrsutes

based AAM method (e.g., [14]) and a variational shape

registration method using vector level sets (e.g., [15]). The

mean nodule with shape and texture/appearance is

generated per nodule type from the co-registered nodules.

Fig. 1. Six cropped nodules from the four nodule types. Well-

circumscribed (1st raw) vascular (2nd raw), juxta-pleural (3rd raw) and

pleural-tail (4th raw) nodule types.

Fig. 2 shows the resultant nodule models based on the

average of shape and texture from the Procrustes approach

and average shape from the vector level sets approach. Note

that these templates possess the major shape and intensity

characteristics of the nodules of each category. Indeed,

these characteristics are behind the significant improvement

in the sensitivity and specificity when these templates were

used for nodule detection. The shapes (Fig. 2 lower part)

may be filled using the equations in [7][8] that relates the

intensity as function of the radial density. In this paper, we

show only results based on the combined shape and texture

models.

Fig. 2. Nodule models using the mean shape and texture of co-registered

nodules in an ensembles of size 24 per nodule type. First row is resultant

nodule models based on the average of shape and texture from the

Procrustes approach, and second row is the average shape from the vector

level sets approach.

III. NODULE DETECTION AND PERFORMANCE EVALUATION

The nodule detection step may be carried out by a variety

of methods including matched filtering, correlation filtering

and template matching. The focus of this paper is not on the

detection mechanism per se; the focus is on creating the

nodule templates that may be applicable to any detection

approach. The results in this paper are for the template

matching using the normalized cross correlation (NCC) as

the similarity measure. Template matching is performed

using the four mean shapes developed in the modeling stage

as the templates for detection. The behavior of the

Normalized Cross-Correlation (NCC) for the new templates

was studied by obtaining the NCC over all slices in the

ELCAP study with known ground truth for each nodule.

The sensitivity is measured in terms of detection rate, and

the specificity is measured in terms of correct classification

of detected nodules. Fig. 3 shows the performance of nodule

models when applied to nodule detection using the NCC as

similarity criterion. Of particular focus in this paper is the

specificity: is the detected nodule the right nodule type? The

new models in Fig. 2 outperform any parametric models in

terms of sensitivity (detection rate) and most importantly in

specificity. Figure 4 shows histogram of the NCC for the

new models. Various methods can be used for an optimal

thresholding. A threshold of 0.5 was selected. Table 1 shows

the results with templates centered with respect to the x-axis

(i.e., zero orientation). The performance of the model-based

approach is much more robust than the parametric templates.

Most important is the specificity, especially with the well-

circumscribed nodule types which are nearly isotropic,

hence would favor the circular (spherical in 3D) nodule in

terms of detection. The improvement in the template

matching using the new templates is very significant. It takes

into account the specifics of the data in terms of nodule

shape and intensity distribution. The results are expected to

be further enhanced using larger ensemble sizes. Likewise,

involvement of several radiologists to create the ensemble

may also lead to further improvements.

Fig. 3. Performance of nodule models when applied to detection using the

NCC as similarity criterion. The results reflect the specificity gains using

the new models (dot is centroid).

3620

Fig. 4. Histogram of the NCC values as the templates are swept across

the image in a raster fashion). The NCC decays much faster with the

templates of the new models vs. parametric templates.

Table 1. Sensitivity and Specificity of new the nodule models vs.

parametric templates. Detection was performed as average of simultaneous

application of four templates (Fig. 2 upper row). Parametric templates used

were circular and semi-circular in the four quadrants.

Nodule Type

Results of the data-driven

nodules of size 21x21 pixels

with main medial axis of

templates parallel to the x-axis

(orientation = 0 o)

Parametric Templates

with Radius = 10 and

single orientation (0 o) for

semi-circular models.

Sensitivity Specificity Sensitivity Specificity

Average of all nodule

types

85.22% 86.28% 72.16% 80.95%

Well-Circumscribed 69.66 % 87.10 % 49.44% 81.72%

Vascularized 80.4 % 87.0 % 70.73% 84.17%

Juxta-Pleural 94.78 % 86.54 % 83.48% 79.59%

Pleural-Tail 95.65 % 83.33 % 89.13% 79.33%

ASM can be used for segmentation of the detected

nodules; i.e., outlining the spatial support of the detected

nodules. We would like also to point out that a fifth category

of nodules “none of the above” may be added to the four we

studied in this paper. The database that our group is

currently generating allows for additional nodule types,

which will further improve the specificity of the nodule

detection process. We also have both 2D (for small size

nodules) as well as 3D for larger size and advanced stage

tumors.

IV. CONCLUSION

In this paper, a data-driven approach was devised to

model and simulate typical lung nodules. Based on extensive

experimentation we found that the new data-driven models

yield an overall higher sensitivity and specificity rate than

parametric templates. The well-circumscribed nodule was

the least sensitive nodule, yet it yielded the greatest

improvement using the new nodule models. The pleural tail

in both the parametric and data-driven templates yielded the

greatest sensitivity. Current efforts are directed towards

constructing and testing the new data-driven modeling

approach on a large clinical data base and extend this work

into the 3D space. Extensions include using the new models

for recognition using shape context matching (e.g., [16]),

and study of the shape and texture for simultaneous

detection and segmentation of cancerous nodules.

V. REFERENCES

1. United States National Institute of Health www.nih.gov

2. Zaho, B, Gamsu, G., Ginsberg, M., Jiang, L and. Schwartz, L

―Automatic Detection of small lung nodules on CT utilizing a local

density maximum algorithm,‖ J. of Applied Clinical Medical Physics 4

(2003).

3. Armato, S. G. 3rd, Giger, M. L., Moran C. J., Blackburn, J. T., Doi, K.,

MacMahon H.: Computerized detection of pulmonary nodules on CT

scans. Radio Graphics 19 pp.1303--1311 (1999).

4. Hu, S., Hoffman, E.A. and Reinhardt, J.M., ―Automatic lu1ng

segmentation for accurate quantitation of volumetric X-ray CT images,‖

IEEE Transactions on Medical Imaging, Vol. 20, pp. 490–498, 2001.

5. Lee, Y., Hara, T., Fujita, H., Itoh, S., and Ishigaki, T. ―Automated

Detection of Pulmonary Nodules in Helical CT Images Based on an

Improved Template-Matching Technique,‖ IEEE Transactions on

Medical Imaging, Vol. 20, 2001.

6. Kostis, W.J., Reeves, A.P., Yankelevitz, D.F., and Henschke, C.I., ―Three

dimensional segmentation and growth-rate estimation of small

pulmonary nodules in helical CT images,‖ Medical Imaging IEEE

Transactions Vol. 22, pp. 1259—1274, 2003.

7. Farag, A.A., El-Baz, A., Gimel'farb, G.L., Falk, R., Abou El-Ghar, M,

Eldiasty, T. and Elshazly, S., ―Appearance Models for Robust

Segmentation of Pulmonary Nodules in 3D LDCT Chest Images,‖ Proc.

of Int. Conf. on Medical Image Comp. and Computer-Assisted

Intervention (MICCAI'06), Copenhagen, Denmark, October 1-6, 2006,

pp. 662-670.

8. Farag, A.A., Elhabian, S.Y., Elshazly, S.A., and. Farag, A.A.

―Quantification of Nodule Detection in Chest CT: A Clinical

Investigation Based on the ELCAP Study,‖ Proc. of Second

International Workshop on Pulmonary Image Processing in conjunction

with MICCAI-09, September 2009, pp. 149-160.

9. van Ginneken, B., Romeny, B., and Viergever, M. ―Computer-Aided

Diagnosis in Chest Radiography: A Survey,‖ IEEE Transactions on

Medical Imaging, Vol. 20, 2001.

10. Sluimer, I., Schilham, A., Prokop, M., and van Ginneken, B. ―Computer

Analysis of Computed Tomography Scans of the Lung: A Survey,‖

IEEE Transactions on Medical Imaging, vol. 25, No. 4, pp. 385–405,

April, 2006.

11. ELCAP public lung image databse

12. Cootes, T.F. and Taylor, C.J.,. ―Active Shape Models and the Shape

Approximation Problem‖ Computer Vision, 2000.

13. Matthews, I. and Baker, S. ―Active Appearance Models Revisited‖.

International Journal of Computer Vision, pp. 135-164, 2004

14. Stegmann, M.B. and Gomez.,D.D. A Brief Introduction to Statistical

Shape Analysis, Technical University of Denmark, Lyngby, 2002.

15. Huang, X., Paragios, N. and Metaxas, D.N. ―Shape registration in

implicit spaces using information theory and free form deformations,‖

IEEE Transactions on Pattern Analysis and Machine Intelligence,

28(8):1303–1318, 2006.

16. S. Belongie, S, Malik, J., and Puzicha, J., ― Shape matching and object

recognition using Shape contexts,‖ IEEE Transactions on Pattern

Analysis and Machine Intelligence, 24(24):509–522, 2002.

3621