Upload
ambrose-evanthe
View
43
Download
2
Embed Size (px)
DESCRIPTION
Gene Discovery from Microarray Images. 陳朝欽、 高成炎、張春梵 ARCNTU, NTU-Hospital [email protected] [email protected] Project#: 93-EC-17-A-19-S1-0016. Motivation and Data Acquisition. - PowerPoint PPT Presentation
Citation preview
Gene Discovery from Microarray Images
陳朝欽、 高成炎、張春梵ARCNTU, [email protected]@csie.ntu.edu.tw
Project#: 93-EC-17-A-19-S1-0016
Motivation and Data Acquisition
• Parts of our current works attempt to investigate and discover “a subset of genes” related to some specific diseases such as Hepatoma and Gastric Cancers by microarray experiments. Hence, we collect data from cDNA microarray images which are “spot signal intensities” via a sequence of biological experiments
A Paradigm for Microarray Image Data Analysis
Outline
• Microarray Image Data Acquisition
• Gridding for Image Segmentation
• Normalization from MA-Plot
• Finding Differentially Expressed Genes
• Finding Discriminative Genes
• Performance Evaluation by Dendrogram and K-means Algorithms
A Look at a Microarray Slide
Examples of Microarray Images
Gridding for Spot Segmentation
Gridding for a Block of 30*9 Spots
Spot Feature Computation
• Cy3 (for Column 1) 639 54879 5980 1984 324 910 2153 236
• Cy5 (for Column 6) 104 52858 567 189 36 1489 5083 407
M-A plot and Piecewise Normalization
Normalized Ratio from MA-Plot
Pre-Processing / Normalization
• Due to the process of measurements or some unavoidable factors, “Raw Data” directly collected from experiments may contain noise and may have different scales, or have missing items. Thus, a pre-processing step for filtering out some inappropriate data, or normalization may be done.
Spot Features for Gene Discovery
Cy3 Cy5
201 67
520 153
28276 21747
4072 6324
14807 690
1058 1451
572 524
M=(log2Cy3 − log2Cy5)
A= (log2Cy3+log2Cy5)/2
Program compustt.c
computes spot features
and pieceline.c does
normalization and
maplot.c does M-A plot
Microarray Pattern Analysis
• Microarrays consisting of 13574 effected genes from 18564 in a chip with tumor dyed in Cy3 and normal dyed in Cy5
• 12 HCV, 27 HBV, 1 HCV+HBV, 4 neither HCV nor HBV patients
• Criterion for Differentially Expressed is defined as log2(Lowess normalized ratio of Cy3/Cy5) is greater than T (↑) or less than -T (↓)
Feature Selection/Extraction (1)
• Given a set of N patterns from K categories (K=2, a problem of dichotomy) with Ni , 1≤ i ≤ K, patterns belonging to category i, each pattern consists of M redundant features, e.g., a microarray can be represented as a pattern consisting of 13574 features corresponding to 13574 effected genes. The goal is to select a small subset of features for “Recognition”
Feature Selection/Extraction (2)
• Given a set of N patterns from K categories (K=2, a problem of dichotomy) with Ni , 1≤ i ≤ K, patterns belonging to category i. The goal of extraction is to transform an M-dimensional pattern into an m-dimensional pattern with m<<M for classification. A selected feature preserves the original meaning but an extraction usually does not preserve the original one.
16 Most Discriminative Genes to distinguish HCV from HBV [YCT39] Index Accession# 13796 U35376 7197 BG259957 2918 BI520001 8495 AJ012159 11189 AB008549 11087 BC006496 9443 CAC51145 9546 X52125
Index Accession# 16144 AK024601 16496 Y00083 17213 BC007437 14579 BC011568 587 AF386492 113 Y1696117215 AF19576616760 AI022747
Next 16 Most Discriminative Genes to distinguish HCV from HBV
Index Accession# 5947 BG207354 4885 AK021818 11291 AF155110 1262 BI861005 8055 AJ224741 10965 AAF36120 4164 NM_000423 8088 BC000187
Index Accession# 7353 AF070641 5434 AB05078512727 AB06298714993 AA974308 4182 AI970531 5341 X65882 10052 AB011542 8140 AK026068
32 Discriminative Genes by Fisher’s Ratios for a Dendrogram
32 Discriminative Genes by Chuang+Kao’s for a Dendrogram
Dendrogram from Chen’s 32 Most Discriminative Genes [CC39]
Dendrogram from Genasia’s 32 Most Discriminative Genes
K-means Clustering Results by using 32 Best Discriminative Genes• G45 from Genasia: distortion 341.261222221222 2211111111 111111111111111111• X47 from C. Chen: distortion 302.331222221222 2211111111 112111111111111111• Y48 by Fisher’s Ratio on YCT39: distortion 307.491222221222 2211111111 112111111111111111• PY50 by Chuang+Kao’s on YCT39: distortion 290.062222222222 2211211111 112111111111111111
Leave-one-out errors by 1-nn : 4, 3, 2, 1 (/39)Leave-one-out errors by Fisher : 15, 7, 8, 9 (/39)
Up (Down) Regulated Genes for Gastric Cancers
• 5 Advanced and 5 Early Stage of Patients with Gastric Cancer
• We find the following genes which can completely discriminate Patients of “Advanced Stage” from “Early Stage” under clinical diagnosis
Dengrogram for Gastric Patients
Top 16 Discriminative Genes for Advanced and Early Stages
Index Accession# 15843 AF316855
12994 BF868865 18370 BC002996 2070 AK021788 1118 BC000249 9661 AP000350 2017 U53530 1128 AF035281
Index Accession# 8728 AL591713 494 AB01452610990 L77570 342 BC00784810425 BG745129 6052 AF073362 170 AK000278 1016 BF526386
Thank You
• http://www.bioinfo.ntu.edu.tw
• http://www.cs.nthu.edu.tw/~cchen
• Tel: (02) 2312 3456 ~ 5917
• Tel: (02) 2362 5336 ~ 418
• Tel: (03) 573 1078