17
Ulrich Mansmann IBE, Medical School, University of Munich

Ulrich Mansmann IBE, Medical School, University of Munichcompdiag.molgen.mpg.de/ngfn/docs/2005/mar/mansmann-grouptesting.pdfandrogen_receptor_signaling 118 1289 6983 0.0568 Apoptosis

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

  • ����������������������

    Ulrich MansmannIBE, Medical School, University of Munich

  • Group testing 2

    Content of the lecture

    • How to assess the relevance of groups of genes:- outstanding gene expression in a specific group compared

    to other genes;- differential gene expression not of single genes but over a

    specific group of genes.

    • How to define gene-groups:- exploratory research produces functional groups and genomic

    signatures: confirm the relevance of the specific group.- Bioinformatic algorithms can be used to define pathways and

    functional groups

  • Group testing 3

    Example I: Cyclin D1 Action

    • Lamb J et al. (2003) A mechanism of Cyclin D1 Action Encoded inthe Patterns of Gene Expression in Human Cancer, Cell, 114: 323-334

    • Cyclin D1 expression signature: cyclin D1 target gene set.

    • Cyclin D1 activity in Human Tumors: Does the cyclin D1 targetgene set play a prominent role in different tumor entities?Being present as highly expressed genes.

  • Group testing 4

    The ideas behind the analysis

    Problem:

    Two groups of genes have to be compared with respect to gene expression: Is the gene expression in gene group A different from the expression in gene group B. Important: Genes in both groups are different!

    Basic idea:

    nA genes in group A, nB genes in group B

    Order the genes with respect to the expression value. If there is a difference in expression level between both groups, the expression values will be separated. The position of a value in group A will have the tendency to be in general high or low. In case of no difference, the values will be nicely mixed.

    Group A

    Group BRank of expression value

  • Group testing 5

    The ideas behind the analysis

    Group A (nA=10)

    Group B (nB=5)

    -5 -5 -5 -5 -5 -5 -5 -5 -510

    Genes ordered by rank of expression

    10 10 10 10-5

    MinimumIs the minimum extreme with respect to random group mixing?

  • Group testing 6

    The algorithm fomalized

    Basic idea:

    • nA genes in group A, nB genes in group B. • Order the genes with respect to expression values.

    • Create a vector vv of (nA+nB) components with value – nB at each position where avalue from group A is sitting and with value nA at each position where a value fromgroup B is sitting.

    • Calculate yy = cumsum(vv).

    • Draw a line starting at (0,0) through points (i, yy[i]). The line will end in (nA+nB,0)because (-nB)⋅nA+ nA⋅nB =0.

    • Look at Mvv = max{|min(yy)|,max(yy)} which will be large in case of a good separationbetween both groups.

    • Permute the vector vv to get vv*, calculate yy* and Mvv*. Use permutation to calculatethe distribution of Mvv under the Null hypothesis, determine the permutation based p-value: pperm = #{Mvv* ≥ Mvv}/ # permutations.

  • Group testing 7

    Example II: Colon Cancer

    Study: 18 patients with UICC II colon cancer, 18 patients with UICC III colon cancer, HG-U133A, 22.283 probesets representing ~18.000 genes. Snap-frozen material, laser microdisection.

    Question 1: Are there specific cancer related pathways with a more distinct differential gene expression between UICC II/III?

  • Group testing 8

    Gene set enrichment – Colon cancer

    1407 probe sets are studied which belong to 9 cancer specificpathways.

    androgen_receptor_signalling 122 apoptosis 245cell_cycle_control 51 notch_delta_signalling 50 p53_signalling 45ras_signalling 316tgf_beta_signalling 100tight_junction_signalling 425wnt_signalling 214

  • Group testing 9

    Gene set enrichment – Colon cancer

    group.A group.B Myy p.value

    androgen_receptor_signaling 118 1289 6983 0.0568

    Apoptosis 238 1169 17801 0.7438

    cell_cycle_control 51 1356 10413 0.3616

    notch_delta_signalling 50 1357 9010 0.6492

    p53_signalling 45 1362 12390 0.0924

    ras_signalling 311 1096 15486 0.6252

    tgf_beta_signaling 100 1307 22615 0.0128

    tight_junction_signaling 406 1001 15456 0.4414

    wnt_signaling 214 1193 16318 0.8432

    Restriction of the analysis to genes in cancer specific pathways

  • Group testing 10

    Example III: Lymph node metastases

    Bertucci F et al. (2004) Gene expression profiling of colon cancer by DNA microarrays and correlation withhistoclinical parameters, Oncogene 23, 1377–1391

    Bertucci at al present a gene signature consisting of 46 genes which is claimed to be able to discriminate between LN- and LN+ colorectal cancer.

    Is it possible to prove with new data that the signature has discriminative value. Can we reject the Nullhypothesis

    P[Y|X] = P[Y].

    Y : LN+/LN-X: expression pattern of 46 genes

  • Group testing 11

    Goeman’s Global Test

    • Test if global expression pattern of a group of genes is significantlyrelated to some outcome of interest (groups, continuous phenotype).

    • If this relationship exists, then the knowledge of gene expressionhelps to improve the prediction of the phenotype of interest. If theprediction can not improved by knowing the gene expression thenthere will not be differential gene expression.

    • Test statistic:Q ~ (Y-µ)’R (Y-µ)

    ~ Σ [Xi’(Y-µ)]² sum over genes of the pathway~ Σ Σ Rij(Yi-µ) (Yj-µ) sum over subjects

    µ: Mean of phenotype, Xmi Expression for gene m in subject iR : X’X matrix of correlations between gene expression of subjects

    Goeman JJ. Et al. (2003) A global test for groups of genes: Testing association with a clinical outcome, Bioinformatics, 20:93-99; Bioconductor package: globaltest

  • Group testing 12

    Example III: Lymph node metastases

    Test is not significant (p=0.43)

    No clear answer on the predictive power of the signature.

    No evidence for a difference is not evidence for no difference!

    Question of power

    Reasons for a non-significance: bad experiment or ...?

  • Group testing 13

    Example II: Colon Cancer

    Study: 18 patients with UICC II colon cancer, 18 patients with UICC III colon cancer, HG-U133A, 22.283 probesets representing ~18.000 genes. Snap-frozen material, laser microdisection.

    Question 2: Is there differential gene expression in the p53 signalling pathway between UICC II and UICC III colon cancer?

  • Group testing 14

    Goeman’s Global Test – Example II

    • Test for differential gene expression in p53 signalling pathway45 probesets

    • Global Test result:45 out of 45 genes used; 36 samples

    p value = 0.0114 based on 10000 permutations

    Test statistic Q = 11.78 with expectation EQ = 5.466and standard deviation sdQ = 2.152 under the null hypothesis

    • Informative plots:Sample plot: how good fits a sample to its phenotypeCheckerboard: Correlation between samplesGene plot: Influence of single genes to test statistics

  • Group testing 15

    Goeman’s Global Test – Example II

    5 10 15 20 25 30 35

    510

    1520

    2530

    35

    sorted samplenr

    sorte

    d sa

    mpl

    enr

    0 10 20 30 400

    5010

    015

    020

    025

    030

    0

    genenr

    influ

    ence

    pos. coregulated with Yneg. coregulated with Y

    Σ [Xi’(Y-µ)]²Σ Σ Rij(Yi-µ) (Yj-µ)

    Values dichotomized around median

  • Group testing

    16

    Goem

    an’sG

    lobal Test –E

    xample

    II

    -50 0 50 100

    influence

    Co1.T.IT.30.F.CEL

    Co14.T.IT.88.F.CEL

    Co15.T.IT.558.F.CEL

    Co20.T.IT.570.F.CEL

    Co37.T.IT.669.F.CEL

    Co39.T.IT.673.F.CEL

    Co40.T.IT.676.F.CEL

    Co41.T.IT.680.F.CEL

    Co45.T.IT.688.F.CEL

    Co581.T.IT.1085.F.CEL

    Co612.T.IT.1423.F.CEL

    Co618.T.IT.1449.F.CEL

    Co619.T.IT.1515.F.CEL

    Co659.T.IT.1742.F.CEL

    Co670.T.IT.1821.F.CEL

    Co672.T.IT.1841.F.CEL

    Co675.T.IT.1834.F.CEL

    Co9.T.IT.1728.F.CEL

    Co10.T.IT.83.F.CEL

    Co13.T.IT.86.F.CEL

    Co17.T.IT.563.F.CEL

    Co23.T.IT.1052.F.CEL

    Co44.T.IT.1162.F.CEL

    Co5.T.IT.62.F.CEL

    Co54.T.IT.1480.F.CEL

    Co610.T.IT.1206.F.CEL

    Co611.T.IT.1482.F.CEL

    Co620.T.IT.1570.F.CEL

    Co652.T.IT.1762.F.CEL

    Co653.T.IT.1731.F.CEL

    Co656.T.IT.1740.F.CEL

    Co657.T.IT.1744.F.CEL

    Co665.T.IT.1820.F.CEL

    Co666.T.IT.1777.F.CEL

    Co668.T.IT.1791.F.CEL

    Co8.T.IT.549.F.CEL

    1 samples

    0 samples

  • Group testing 17

    Two questions about groups of genes

    Question 1: Two groups of genes have to be compared with respectto gene expression: Is the gene expression in gene group A different from the expression in gene group B.

    Question 2: Is there differential gene expression between differentbiological entities not in terms of single genes but withrespect to a defined group of genes.

    Genes of group A Genes of group B

    Entity I Entity II

    Well definedgroup of genes