4
Pattern Recognition 36 (2003) 1251 – 1254 www.elsevier.com/locate/patcog Rapid and Brief Communication Robust adaptive spot segmentation of DNA microarray images Alan Wee-Chung Liew a ; , Hong Yan a , Mengsu Yang b a Department of Computer Engineering and Information Technology, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon Tong, Hong Kong b Department of Biology and Chemistry, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon Tong, Hong Kong Abstract The rapid advancement of DNA chip (microarray) technology has revolutionalized genetic research in bioscience. However, the enormous amount of data produced from a microarray image makes automatic computer analysis indispensable. An important rst step in analyzing microarray image is the accurate determination of the DNA spots in the image. We report here a novel spot segmentation method for DNA microarray images. The algorithm makes use of adaptive thresholding and statistical intensity modeling to: (i) generate the grid structure automatically, where each subregion in the grid contains only one spot, and (ii) to segment the spot, if any, within each subregion. The algorithm is fully automatic, robust, and can aid in the high throughput computer analysis of microarray data. ? 2002 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: DNA microarray; DNA chip; Adaptive segmentation 1. Introduction Microarray technology, or DNA chip [1], which allows massively parallel, high throughput proling of gene ex- pression in a single hybridization experiment, has recently emerged as a powerful tool for genetic research. It allows the simultaneous study of tens of thousands of dierent DNA nucleotide sequences on a single microscopic glass slide. In addition to the enormous scientic potential of DNA mi- croarrays in the study of gene expression, gene regulation and interactions, it also has very important applications in pharmaceutical and clinical research. By comparing gene expression in normal and disease cells, microarrays can be used to identify disease genes for therapeutic drugs. Due to the huge amount of data, computer analysis of microarray images has become indispensable [2]. The Corresponding author. Tel.: +852-2788-7522; fax: +852- 2788-8292. E-mail addresses: [email protected] (A. Wee-Chung Liew), [email protected] (H. Yan), [email protected] (M. Yang). ultimate goal in image analysis is to automatically quantify each spots giving information about the relative extent of hy- bridization of the two cDNA samples. However, automatic analysis of microarray images has proved to be dicult due to the poor contrast, and the many artifacts arising from the hybridization procedures such as irregular spot shape and size, dust on the slide, large intensity variation within spots and background, and nonspecic hybridization. In order to extract data from microarray images, it is necessary to cor- rectly identify and segment out each spot. In this brief communication, we describe a novel approach for the automatic segmentation of spots on a microarray im- age. The spot segmentation task is performed in two stages: (1) automatic gridding, where each subregion in the gener- ated grid contains at most one spot, and (2) segmentation of spot, if present, within each subregion. 2. Automatic gridding The input microarray images consist of a pair of 16-bit images in TIFF format, Cy3 and Cy5. For image analysis purpose, a composite 8-bit gray-scale image is computed. 0031-3203/03/$30.00 ? 2002 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII:S0031-3203(02)00170-X

Robust adaptive spot segmentation of DNA microarray images

Embed Size (px)

Citation preview

Page 1: Robust adaptive spot segmentation of DNA microarray images

Pattern Recognition 36 (2003) 1251–1254www.elsevier.com/locate/patcog

Rapid and Brief Communication

Robust adaptive spot segmentation of DNA microarray images

Alan Wee-Chung Liewa ;∗, Hong Yana, Mengsu Yangb

aDepartment of Computer Engineering and Information Technology, City University of Hong Kong, 83 Tat Chee Avenue,Kowloon Tong, Hong Kong

bDepartment of Biology and Chemistry, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon Tong, Hong Kong

Abstract

The rapid advancement of DNA chip (microarray) technology has revolutionalized genetic research in bioscience. However,the enormous amount of data produced from a microarray image makes automatic computer analysis indispensable. Animportant 3rst step in analyzing microarray image is the accurate determination of the DNA spots in the image. We reporthere a novel spot segmentation method for DNA microarray images. The algorithm makes use of adaptive thresholding andstatistical intensity modeling to: (i) generate the grid structure automatically, where each subregion in the grid contains onlyone spot, and (ii) to segment the spot, if any, within each subregion. The algorithm is fully automatic, robust, and can aid inthe high throughput computer analysis of microarray data.? 2002 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.

Keywords: DNA microarray; DNA chip; Adaptive segmentation

1. Introduction

Microarray technology, or DNA chip [1], which allowsmassively parallel, high throughput pro3ling of gene ex-pression in a single hybridization experiment, has recentlyemerged as a powerful tool for genetic research. It allows thesimultaneous study of tens of thousands of di<erent DNAnucleotide sequences on a single microscopic glass slide. Inaddition to the enormous scienti3c potential of DNA mi-croarrays in the study of gene expression, gene regulationand interactions, it also has very important applications inpharmaceutical and clinical research. By comparing geneexpression in normal and disease cells, microarrays can beused to identify disease genes for therapeutic drugs.

Due to the huge amount of data, computer analysisof microarray images has become indispensable [2]. The

∗ Corresponding author. Tel.: +852-2788-7522; fax: +852-2788-8292.

E-mail addresses: [email protected] (A. Wee-ChungLiew), [email protected] (H. Yan), [email protected](M. Yang).

ultimate goal in image analysis is to automatically quantifyeach spots giving information about the relative extent of hy-bridization of the two cDNA samples. However, automaticanalysis of microarray images has proved to be diCcult dueto the poor contrast, and the many artifacts arising from thehybridization procedures such as irregular spot shape andsize, dust on the slide, large intensity variation within spotsand background, and nonspeci3c hybridization. In order toextract data from microarray images, it is necessary to cor-rectly identify and segment out each spot.

In this brief communication, we describe a novel approachfor the automatic segmentation of spots on a microarray im-age. The spot segmentation task is performed in two stages:(1) automatic gridding, where each subregion in the gener-ated grid contains at most one spot, and (2) segmentationof spot, if present, within each subregion.

2. Automatic gridding

The input microarray images consist of a pair of 16-bitimages in TIFF format, Cy3 and Cy5. For image analysispurpose, a composite 8-bit gray-scale image is computed.

0031-3203/03/$30.00 ? 2002 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.PII: S0031 -3203(02)00170 -X

Page 2: Robust adaptive spot segmentation of DNA microarray images

1252 A. Wee-Chung Liew et al. / Pattern Recognition 36 (2003) 1251–1254

Fig. 1. RGB color microarray image.

Fig. 2. The composite image.

The radius of a typical spot in the image (which is known apriori) is entered. A region of interest enclosing a block ofspots is then selected for automatic gridding.

In order to generate the correct grid for the microarray im-age, the location of good-quality spots (which act as guidespots) must be known. A good-quality spot should be cir-cular in shape, of appropriate size, and with intensity con-sistently higher than the background. Moreover, its positionshould agree with the overall spot geometry as dictated bythe printing process. After the guide spots are found, thecorrect grid can be generated based on their geometry.

Many microarray image processing problems are simi-lar to those in document image processing. To account forthe variable background and spot intensity, adaptive im-age thresholding developed for character extraction fromlow-quality document images [3] and morphological pro-

Fig. 3. Guide spots detected.

Fig. 4. Final grid generated automatically.

cessing [4] are used to detect the guide spots. Then, a veri3-cation stage is performed to remove any erroneous spots. Forillustration, Fig. 1 shows an RGB microarray image, wherethe green component is given by Cy3, the red componentgiven by Cy5 and the blue component set to zero. Fig. 2shows the corresponding composite image. The guide spotsdetected are shown in Fig. 3, and Fig. 4 shows the 3nal gridgenerated automatically. Note that many of the faint spotsare correctly found in this example.

3. Spot segmentation

Spot segmentation is performed in each of the subregionde3ned by the grid. The segmentation involves 3nding a cir-cle that separates out the spot, if any, from the background.

Page 3: Robust adaptive spot segmentation of DNA microarray images

A. Wee-Chung Liew et al. / Pattern Recognition 36 (2003) 1251–1254 1253

Fig. 5. DNA Spot segmentation results. Left column: Our method. Right column: GenePix, where circle with a vertical bar indicates “spotnot found”.

The spot segmentation task consists of three steps: (1) back-ground equalization for intensity variation in the subregion,(2) statistical intensity modeling and optimum thresholdingof the subregion, and 3nally, (3) 3nding the best-3t circlethat segments out the spot.

The purpose of background equalization is to remove anysmooth background intensity variation within the subregion.The average intensity over a small region at the four cor-ners of the subregion is obtained. Then, a bilinear surface isconstructed and subtracted from the original subregion.

Given a subregion, we need to assess whether a spot ispresent before performing spot segmentation. If a guide spotis present, a spot is present. Otherwise, a spot is assumedpresent if the ratio of the median intensity between the ten-

tative spot pixels and the background pixels is larger than apreset value. When a spot is present, the intensity distribu-tion of the pixels within the subregion is modeled using aGaussian-mixture model. The optimum threshold can thenbe computed. Once the subregion is thresholded and seg-mented, a best-3t circle is computed for the 3nal spot seg-mentation. Although the actual spot shape usually deviatesfrom being strictly circular, we constrained the spot shapeto be circular to ensure that the spot extraction procedure isrobust to poor-quality segmentation, while providing a rea-sonable 3t to good-quality spots.

Fig. 5 presents some spot segmentation results using ourapproach (left column), and the results from a well-knowncommercial system called GenePix [5] (right column). In

Page 4: Robust adaptive spot segmentation of DNA microarray images

1254 A. Wee-Chung Liew et al. / Pattern Recognition 36 (2003) 1251–1254

GenePix, the number of spots in the row and column direc-tions, the spacing between the spots in the row and columndirections, the spot radius, and the approximate location ofthe spot array in the image need to be given a priori beforeinvoking the automatic spot detection process. In contrast,our approach needs only be supplied the approximate radiusof a typical spot, which is known a priori. Visual inspectionclearly reveals that more spots are found and the spot sizeis more accurately determined using our approach.

4. Conclusions

A novel DNA microarray spot segmentation algorithmis outlined. The algorithm makes use of novel adaptivethresholding, morphological processing, as well as statis-tical intensity modeling to accurately locate and segmentout the DNA spots. Experiments have indicated that itis robust and accurate for both medium and high-densityDNA microarray spot segmentation and should prove to bea valuable computational tool in high throughput genomicanalysis.

Acknowledgements

This work was supported by a CityU SRG Grant(7001183) and an interdisciplinary research Grant(9010003).

References

[1] S.K. Moore, Making Chips, IEEE Spectrum (2001) 54–60.[2] Y.H. Yang, M.J. Buckley, S. Dudoit, T.P. Speed, Comparison

of methods for image analysis on cDNA microarray data,Technical Report #584, Department of Statistics, University ofCalifornia at Berkeley, November 2002.

[3] Y. Yang, H. Yan, An adaptive logical method for binarization ofdegraded document images, Pattern Recognition 33 (5) (2000)787–807.

[4] E.R. Dougherty, An Introduction to Morphological ImageProcessing, SPIE—The International Society for OpticalEngineering, SPIE Optical Engineering Press, Bellingham, WA,1992.

[5] Axon Instruments Inc., GenePix Pro 3.0, Technical manual(2001) Axon Instruments, Union City, CA, 2001.

About the Author—ALAN WEE-CHUNG LIEW received his B.E. with 3rst class honors in Electrical and Electronic Engineering fromthe University of Auckland, New Zealand, in 1993 and Ph.D. in Electronic Engineering from the University of Tasmania, Australia, in1997. He is currently a Senior Research Fellow in the Department of Computer Engineering and Information Technology, City Universityof Hong Kong. His current research interests include image processing, pattern recognition and bioinformatics.

About the Author—HONG YAN received a B.E. degree from Nanking Institute of Posts and Telecommunications in 1982, an M.S.E.degree from the University of Michigan in 1984, and a Ph.D. degree from Yale University in 1989, all in electrical engineering. In 1982and 1983, he worked on signal detection and estimation as a graduate student and research assistant at Tsinghua University. From 1986 to1989 he was a research scientist at General Network Corporation, New Haven, CT, USA, where he worked on design and optimization ofcomputer and telecommunications networks. He joined the University of Sydney in 1989 and became Professor of Imaging Science in 1997.He is currently Professor of Computer Engineering at City University of Hong Kong. His research interests include image processing, patternrecognition and bioinformatics. He is author or co-author of one book and over 200 refereed technical papers in these areas. Professor Yanis a fellow of the International Association for Pattern Recognition (IAPR), a fellow of the Institution of Engineers, Australia (IEAust) anda senior member of the Institute of Electrical and Electronic Engineers (IEEE).

About the Author—MENGSU YANG received his Ph.D. from the University of Toronto in 1992 and his Postdoctoral training in theScripps Research Institute. Since 1994, he has been working in the Department of Biology and Chemistry, City University of Hong Kong.His research interest is in the area of bioanalytical chemistry and biophysical chemistry.