33
Gene expression correlates of clinical prostate cancer behavior Singh D, Febbo P, Ross K, Jackson D, Manola J, Ladd C, Tamayo P, Renshaw A, D’Amico A, Richie J, Lander E, Loda M, Kantoff P, Golub T, Sellers W. Cancer Cell 2002 1: 203-209. Topics in Bioinformatics Robert Kazmierski Oct 12, 2004

Gene expression correlates of clinical prostate …zhong/635/rkazmier.pdf · Gene expression correlates of clinical prostate cancer behavior Singh D, Febbo P, Ross K, Jackson D, Manola

Embed Size (px)

Citation preview

Page 1: Gene expression correlates of clinical prostate …zhong/635/rkazmier.pdf · Gene expression correlates of clinical prostate cancer behavior Singh D, Febbo P, Ross K, Jackson D, Manola

Gene expression correlates of clinical prostate cancer behavior

Singh D, Febbo P, Ross K, Jackson D, Manola J, Ladd C, Tamayo P, Renshaw A,

D’Amico A, Richie J, Lander E, Loda M, Kantoff P, Golub T, Sellers W.

Cancer Cell 2002 1: 203-209.

Topics in BioinformaticsRobert Kazmierski

Oct 12, 2004

Page 2: Gene expression correlates of clinical prostate …zhong/635/rkazmier.pdf · Gene expression correlates of clinical prostate cancer behavior Singh D, Febbo P, Ross K, Jackson D, Manola

Primers

• Gleason score• Signal to noise metric• k-NN • Leave one out CV (continued)• Permutation Testing (Part 1)

Page 3: Gene expression correlates of clinical prostate …zhong/635/rkazmier.pdf · Gene expression correlates of clinical prostate cancer behavior Singh D, Febbo P, Ross K, Jackson D, Manola

Gleason Score• 1 Simple round glands, closely packed in

rounded masses with well-defined edges. • 2 Simple round glands, loosely packed in

vague, rounded masses with loosely packed edges.

• 3A Medium-sized single glands of irregular shape and irregular spacing with ill-defined infiltrating edges.

• 3B Very similar to 3A, but small to very small glands which must not form significant chains or cords.

• 3C Papillary and cribriform epithelium in smooth, rounded cylinders and masses; no necrosis.

• 4A Small, medium, or large glands fused into cords, chains or ragged, infiltrating masses.

• 4B Very similar to 4A, but with many large clear cells, sometimes resembling "hypernephroma."

• 5A No glandular differentiation, solid sheets, cords, single cells, or solid nests of tumor with central necrosis.

• 5B Anaplastic adenocarcinoma in ragged sheets. From http://www.prostateinfo.com

Page 4: Gene expression correlates of clinical prostate …zhong/635/rkazmier.pdf · Gene expression correlates of clinical prostate cancer behavior Singh D, Febbo P, Ross K, Jackson D, Manola

Gleason Score

• Pathologist examines two areas that make up the largest portion of the tissue

• Each area scored and resulting sum is Gleason score

• Answers how cancerous is the prostate tissue

• High = very cancerous• Low = less cancerous

Page 5: Gene expression correlates of clinical prostate …zhong/635/rkazmier.pdf · Gene expression correlates of clinical prostate cancer behavior Singh D, Febbo P, Ross K, Jackson D, Manola

Signal to Noise Metric

• From Golub et al. 1999

– Where µ is the mean and σ is the standard deviation

• In our case, Class 0 would be Cancer and Class 1 would be Normal

• Method for determining statistical significant difference

NSClassClass

ClassClass 210

10 =+−

σσµµ

Page 6: Gene expression correlates of clinical prostate …zhong/635/rkazmier.pdf · Gene expression correlates of clinical prostate cancer behavior Singh D, Febbo P, Ross K, Jackson D, Manola

Signal to Noise Metric vs. T-Test

• S2N different from T-test

– Here n is number of samples

• S2N does not take n into account

−+

−+−

+

−=

2)1()1(

||

10

211

200

10

10

10

ClassClass

ClassClassClassClass

ClassClass

ClassClass

ClassClass

nnnn

nnnn

tσσ

µµ

Page 7: Gene expression correlates of clinical prostate …zhong/635/rkazmier.pdf · Gene expression correlates of clinical prostate cancer behavior Singh D, Febbo P, Ross K, Jackson D, Manola

k-Nearest Neighbor

• Given: – d(x,x’) : metric to measure relative distances

• Euclidean distance used in this case– L : training set

• fkNN(x) = majority class among the k NN’sof x in L – Where x is a sample to be classified

• (cancer vs. normal)– K specifies how many neighbors to consider

• Usually odd number

Page 8: Gene expression correlates of clinical prostate …zhong/635/rkazmier.pdf · Gene expression correlates of clinical prostate cancer behavior Singh D, Febbo P, Ross K, Jackson D, Manola

3-Nearest Neighbor

3 NN

•2 Blue, 1 Red

•Green grouped as Blue

Page 9: Gene expression correlates of clinical prostate …zhong/635/rkazmier.pdf · Gene expression correlates of clinical prostate cancer behavior Singh D, Febbo P, Ross K, Jackson D, Manola

5-Nearest Neighbor

5 NN

•2 Blue, 3 Red

•Green grouped as Red

Page 10: Gene expression correlates of clinical prostate …zhong/635/rkazmier.pdf · Gene expression correlates of clinical prostate cancer behavior Singh D, Febbo P, Ross K, Jackson D, Manola

k-Nearest Neighbor

• Result influenced by – Choice of k– Training set– Measure of distance

• Most basic classification tool

Page 11: Gene expression correlates of clinical prostate …zhong/635/rkazmier.pdf · Gene expression correlates of clinical prostate cancer behavior Singh D, Febbo P, Ross K, Jackson D, Manola

Leave one out CV (continued)

• Method for determining error in classifier• Everything must be done inside of CV

loops – Outside adjustments lead to incorrect error

estimates (usually too low)

Page 12: Gene expression correlates of clinical prostate …zhong/635/rkazmier.pdf · Gene expression correlates of clinical prostate cancer behavior Singh D, Febbo P, Ross K, Jackson D, Manola

Leave one out CV• 102 Samples testing effectiveness of k-NN

classifier (k is set to 3)(10 most significant genes used in k-NN)– 1 sample set aside – S2N of 101 samples determines 10 most significantly

different genes in two classes– 3-NN by Euclidean distance performed on single

sample – Sample classified as Class0 or Class1– Results recorded as correct classification or not– Process repeated 101 more times (each sample set

aside once)• Results give accuracy of using 10 most

significant genes with S2N metric and 3-NN classification method

Page 13: Gene expression correlates of clinical prostate …zhong/635/rkazmier.pdf · Gene expression correlates of clinical prostate cancer behavior Singh D, Febbo P, Ross K, Jackson D, Manola

Leave one out CV• 102 Samples testing effectiveness of k-NN classifier, now allowing k to be 3, or 9 (10 most

significant genes used in k-NN)– 1 sample set aside

• 1 sample set aside, k set to 3• S2N of 100 samples determines 10 most significantly different genes in two classes• 3-NN by Euclidean distance performed on single sample • Sample classified as Class0 or Class1• Results recorded as correct classification or not• Process repeated 100 more times (each sample set aside once)

• 1 sample set aside, k set to 9• S2N of 100 samples determines 10 most significantly different genes in two classes• 9-NN by Euclidean distance performed on single sample • Sample classified as Class0 or Class1• Results recorded as correct classification or not• Process repeated 100 more times (each sample set aside once)

– “Best”-NN determined by least error in “inner loop” of CV– S2N of 101 samples determines 10 most significantly different genes in two classes– “Best”-NN by Euclidean distance performed on single sample – Sample classified as Class0 or Class1– Results recorded as correct classification or not– Process repeated 101 more times (each sample set aside once)

• Results give accuracy of using 10 most significant genes with S2N metric and 3,9-NN classification method

Page 14: Gene expression correlates of clinical prostate …zhong/635/rkazmier.pdf · Gene expression correlates of clinical prostate cancer behavior Singh D, Febbo P, Ross K, Jackson D, Manola

Leave one out CV

• 102 Samples testing effectiveness of k-NN classifier, allowing k to be 3, or 9, and allowing 10, 20, 50 most significant genes to be used in k-NN

• “Inner loop” for selection of k as seen before• “Inner loop” for number of genes to be used in k-

NN classification• Results give accuracy of using most accurate

amount of genes and most accurate k in classification

Page 15: Gene expression correlates of clinical prostate …zhong/635/rkazmier.pdf · Gene expression correlates of clinical prostate cancer behavior Singh D, Febbo P, Ross K, Jackson D, Manola

Permutation Testing (part 1)

• Method for determining significance of correlations to classes– i.e. how likely the gene is expressed more

highly in cancer by chance

Page 16: Gene expression correlates of clinical prostate …zhong/635/rkazmier.pdf · Gene expression correlates of clinical prostate cancer behavior Singh D, Febbo P, Ross K, Jackson D, Manola

Permutation Testing

• Take data and randomly reassign class labels (cancer, normal) to gene expression levels– Removes correlation between gene expression levels and class

labels

• Repeat correlation method (S2N metric in this case) to see if correlation still exists between gene data and class labels

• Repeat reassignment of class labels (permutation) and test again for correlation– If correlations persist through permutations, they are not

significant – If correlations are lost through permutations to a significant level,

they can be considered significant

Page 17: Gene expression correlates of clinical prostate …zhong/635/rkazmier.pdf · Gene expression correlates of clinical prostate cancer behavior Singh D, Febbo P, Ross K, Jackson D, Manola

Singh et al. Objectives

• Identify genes in microarray expression analysis that might anticipate clinical behavior of prostate cancer– Cancer vs. Normal– Recurrent vs. Nonrecurrent

Page 18: Gene expression correlates of clinical prostate …zhong/635/rkazmier.pdf · Gene expression correlates of clinical prostate cancer behavior Singh D, Febbo P, Ross K, Jackson D, Manola

Motivation

• Prostate cancer very common among cancers• Early diagnosis increases chances of survival• Clinical tests (Gleason Score, serum PSA) not

completely reliable• Progress has already been made in linking

differential gene expression to cancer (p53, myc, p27, PTEN)– No gene found yet to have sufficient prognostic utility

to warrant clinical implementation

Page 19: Gene expression correlates of clinical prostate …zhong/635/rkazmier.pdf · Gene expression correlates of clinical prostate cancer behavior Singh D, Febbo P, Ross K, Jackson D, Manola

Methods

• 12,600 genes in microarray analysis • 52 tumor and 50 normal prostate

specimens

Page 20: Gene expression correlates of clinical prostate …zhong/635/rkazmier.pdf · Gene expression correlates of clinical prostate cancer behavior Singh D, Febbo P, Ross K, Jackson D, Manola

Tumor vs normal classification

• Genes ranked according to their differential expression across the 2 classes using S2N metric– Statistical significance of these rankings determined

using a permutation test– 1000 permutation of class labels determined

• 317 genes had significantly higher expression in tumor samples

• 139 genes had significantly higher expression in normal samples

• p = 0.001 in this case means that the correlation existed at most once in the 1000 permutations for each of the significantly different genes

Page 21: Gene expression correlates of clinical prostate …zhong/635/rkazmier.pdf · Gene expression correlates of clinical prostate cancer behavior Singh D, Febbo P, Ross K, Jackson D, Manola

Top 50 genes with high expression in tumor along with top 50 genes with high expression in normal. (red = above mean of all samples, blue = below)

Page 22: Gene expression correlates of clinical prostate …zhong/635/rkazmier.pdf · Gene expression correlates of clinical prostate cancer behavior Singh D, Febbo P, Ross K, Jackson D, Manola

Tumor vs normal classification

• Built k-NN algorithm using significantly different genes

• Models using 4 or more genes classified samples with greater than 90% accuracy in leave one out CV (p < 0.001 as measured by permutation testing)– Methods of leave one out CV not included as

well as value of k

Page 23: Gene expression correlates of clinical prostate …zhong/635/rkazmier.pdf · Gene expression correlates of clinical prostate cancer behavior Singh D, Febbo P, Ross K, Jackson D, Manola
Page 24: Gene expression correlates of clinical prostate …zhong/635/rkazmier.pdf · Gene expression correlates of clinical prostate cancer behavior Singh D, Febbo P, Ross K, Jackson D, Manola

Tumor vs normal classification

• 4 gene and 16 gene models were tested on independent data set – 8 normal, 27 tumor– 4 gene model

accuracy 77%– 16 gene model

accuracy 86%

Page 25: Gene expression correlates of clinical prostate …zhong/635/rkazmier.pdf · Gene expression correlates of clinical prostate cancer behavior Singh D, Febbo P, Ross K, Jackson D, Manola

Prediction of pathological features

• Gene expression data within 52 tumor samples analyzed for correlations with clinical behavior– Determined by comparing observed

correlations with those in a randomly permutated dataset

– Correlation found only with Gleason score (GS)

Page 26: Gene expression correlates of clinical prostate …zhong/635/rkazmier.pdf · Gene expression correlates of clinical prostate cancer behavior Singh D, Febbo P, Ross K, Jackson D, Manola

Prediction of pathological features

• 15 genes had expression positively correlated with GS (Type I)

• 14 genes had expression negatively correlated with GS (Type II)

Red = above mean

Blue = below mean

Page 27: Gene expression correlates of clinical prostate …zhong/635/rkazmier.pdf · Gene expression correlates of clinical prostate cancer behavior Singh D, Febbo P, Ross K, Jackson D, Manola

Prediction of pathological features

• Same 29 genes were used to drive hierarchical clustering of independent data set– Type I and Type II

genes remained highly cosegregatedsuggesting this coexpression is reproducible

Page 28: Gene expression correlates of clinical prostate …zhong/635/rkazmier.pdf · Gene expression correlates of clinical prostate cancer behavior Singh D, Febbo P, Ross K, Jackson D, Manola

Prediction of clinical outcome

• 21 patients evaluated with respect to recurrence following surgery – 8 relapsed– 13 have remained cancer free for 4 years

• No single gene associated with recurrence• k-NN classification with k=2 on a 5 gene model

results in 90% classification accuracy during leave one out CV– Again no information provided for CV methods– No independent data available for testing

Page 29: Gene expression correlates of clinical prostate …zhong/635/rkazmier.pdf · Gene expression correlates of clinical prostate cancer behavior Singh D, Febbo P, Ross K, Jackson D, Manola

Prediction of clinical outcome

• As another test of significance, tested 1000 permutations of class labels and attempted to find multigene expression classifiers using same range of gene numbers

• 37 of 1000 permutations yielded accuracy of 90% or more p = 0.037

Page 30: Gene expression correlates of clinical prostate …zhong/635/rkazmier.pdf · Gene expression correlates of clinical prostate cancer behavior Singh D, Febbo P, Ross K, Jackson D, Manola

Discussion – Clinical Use?

• Level of accuracy in classifying tumor vs. normal is 86-92%– While high, still not sufficient to replace

histological examination

• No association between serum PSA and gene expression– Possible that more genes need to be included

and/or more samples need to be evaluated

Page 31: Gene expression correlates of clinical prostate …zhong/635/rkazmier.pdf · Gene expression correlates of clinical prostate cancer behavior Singh D, Febbo P, Ross K, Jackson D, Manola

Discussion – GS correlation

• GS was associated with patient outcome– However, only 2 genes correlated with GS

were used in outcome prediction model– Genes most frequently used in model were

not correlated with GS

• GS-independent markers and determinants of prostate cancer exist

Page 32: Gene expression correlates of clinical prostate …zhong/635/rkazmier.pdf · Gene expression correlates of clinical prostate cancer behavior Singh D, Febbo P, Ross K, Jackson D, Manola

Discussion – recurrence predictor model

• 5 gene model correctly predicted 19 of 21 evaluable patients – Authors concede that model may be result of

overoptimization– More datasets needed for model validation

• Some of genes used commonly in model are known to have correlation with prostate cancer

Page 33: Gene expression correlates of clinical prostate …zhong/635/rkazmier.pdf · Gene expression correlates of clinical prostate cancer behavior Singh D, Febbo P, Ross K, Jackson D, Manola

Conclusion

• This is a “proof of concept” paper used to suggest further research rather than suggest changes in clinical practice

• Authors’ use of leave one out CV not well explained and could be result of very good results from models– Independent testing helped validate models