15
outline Pre-process of data Model Fitting for Identifying Differential Expression Network Inference Microarray data analysis using BioConductor Guiyuan Lei Centre for Integrated Systems Biology of Ageing and Nutrition (CISBAN) School of Mathematics & Statistics Newcastle University http://www.mas.ncl.ac.uk/ ngl9/ 6 June, 2008 Guiyuan Lei Microarray data analysis using BioConductor

Microarray data analysis using BioConductorngl9/topics/inotes/MicroarrayAnalysis4CRI.pdf · A microarray analysis for differential gene expression in the soybean genome using Bioconductor

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Microarray data analysis using BioConductorngl9/topics/inotes/MicroarrayAnalysis4CRI.pdf · A microarray analysis for differential gene expression in the soybean genome using Bioconductor

outlinePre-process of data

Model Fitting for Identifying Differential ExpressionNetwork Inference

Microarray data analysis using BioConductor

Guiyuan Lei

Centre for Integrated Systems Biology of Ageing and Nutrition (CISBAN)School of Mathematics & Statistics

Newcastle Universityhttp://www.mas.ncl.ac.uk/∼ngl9/

6 June, 2008

Guiyuan Lei Microarray data analysis using BioConductor

Page 2: Microarray data analysis using BioConductorngl9/topics/inotes/MicroarrayAnalysis4CRI.pdf · A microarray analysis for differential gene expression in the soybean genome using Bioconductor

outlinePre-process of data

Model Fitting for Identifying Differential ExpressionNetwork Inference

Outline

Outline

Bioconductor http://www.bioconductor.orgPre-process dataIdentify differential expressionNetwork inference

Guiyuan Lei Microarray data analysis using BioConductor

Page 3: Microarray data analysis using BioConductorngl9/topics/inotes/MicroarrayAnalysis4CRI.pdf · A microarray analysis for differential gene expression in the soybean genome using Bioconductor

outlinePre-process of data

Model Fitting for Identifying Differential ExpressionNetwork Inference

Entering data into BioconductorExtraction of Cerevisiae probesetsExploratory data analysisNormalisationPrincipal Component Analysis

Pre-process of data

Entering data into BioconductorExtraction of Cerevisiae probesetsExploratory data analysisNormalising Microarray dataProbeset level expression to gene level expressionPrincipal Component Analysis

Guiyuan Lei Microarray data analysis using BioConductor

Page 4: Microarray data analysis using BioConductorngl9/topics/inotes/MicroarrayAnalysis4CRI.pdf · A microarray analysis for differential gene expression in the soybean genome using Bioconductor

outlinePre-process of data

Model Fitting for Identifying Differential ExpressionNetwork Inference

Entering data into BioconductorExtraction of Cerevisiae probesetsExploratory data analysisNormalisationPrincipal Component Analysis

Entering data into Bioconductor

library(affy)fns2 = list.celfiles(path="data2", full.names=TRUE)rawdata = ReadAffy(filenames=fns2)print(rawdata)

Strain 0 hours 1 hour 2 hours 3 hours 4 hoursMutant 1 yeast01.cel yeast02.cel yeast03.cel yeast04.cel yeast05.celWild type 1 yeast06.cel yeast07.cel yeast08.cel yeast09.cel yeast10.celMutant 2 yeast11.cel yeast12.cel yeast13.cel yeast14.cel yeast15.celWild type 2 yeast16.cel yeast17.cel yeast18.cel yeast19.cel yeast20.celMutant 3 yeast21.cel yeast22.cel yeast23.cel yeast24.cel yeast25.celWild type 3 yeast26.cel yeast27.cel yeast28.cel yeast29.cel yeast30.cel

Guiyuan Lei Microarray data analysis using BioConductor

Page 5: Microarray data analysis using BioConductorngl9/topics/inotes/MicroarrayAnalysis4CRI.pdf · A microarray analysis for differential gene expression in the soybean genome using Bioconductor

outlinePre-process of data

Model Fitting for Identifying Differential ExpressionNetwork Inference

Entering data into BioconductorExtraction of Cerevisiae probesetsExploratory data analysisNormalisationPrincipal Component Analysis

Mask file for Cerevisiae probesets

Mask file to filter out pombe probesetshttp://www.affymetrix.com/Auth/support/downloads/mask files/s cerevisiae.zips_cerevisiae<-scan("s_cerevisiae.msk", skip=2, list("", ""))pombe_filter_out<-s_cerevisiae[[1]]

RemoveProbe [1]source("RemoveProbes.r")library(yeast2probe)cleancdf = cleancdfname("yeast2")RemoveProbes(listOutProbes=NULL, pombe_filter_out,

"yeast2cdf","yeast2probe")

Guiyuan Lei Microarray data analysis using BioConductor

Page 6: Microarray data analysis using BioConductorngl9/topics/inotes/MicroarrayAnalysis4CRI.pdf · A microarray analysis for differential gene expression in the soybean genome using Bioconductor

outlinePre-process of data

Model Fitting for Identifying Differential ExpressionNetwork Inference

Entering data into BioconductorExtraction of Cerevisiae probesetsExploratory data analysisNormalisationPrincipal Component Analysis

Exploratory data analysis

Examining raw imagesProbe intensitiesMA plotsRNA degradation

Guiyuan Lei Microarray data analysis using BioConductor

Page 7: Microarray data analysis using BioConductorngl9/topics/inotes/MicroarrayAnalysis4CRI.pdf · A microarray analysis for differential gene expression in the soybean genome using Bioconductor

outlinePre-process of data

Model Fitting for Identifying Differential ExpressionNetwork Inference

Entering data into BioconductorExtraction of Cerevisiae probesetsExploratory data analysisNormalisationPrincipal Component Analysis

Pre-process of data: Normalisation

yeast01.cel yeast08.cel yeast15.cel yeast22.cel yeast29.cel

68

1012

1416

Cerevisiae Probe intensities

yeast01.cel yeast08.cel yeast15.cel yeast22.cel yeast29.cel

46

810

1214

Cerevisiae RMA expression values

Guiyuan Lei Microarray data analysis using BioConductor

Page 8: Microarray data analysis using BioConductorngl9/topics/inotes/MicroarrayAnalysis4CRI.pdf · A microarray analysis for differential gene expression in the soybean genome using Bioconductor

outlinePre-process of data

Model Fitting for Identifying Differential ExpressionNetwork Inference

Entering data into BioconductorExtraction of Cerevisiae probesetsExploratory data analysisNormalisationPrincipal Component Analysis

Principal Component Analysis [5]

−20 −10 0 10 20 30 40

−10

−5

05

10

1st PC

2nd

PC

Clustering Samples

Principal Component Projection

yeast01.cel

yeast02.cel

yeast03.cel

yeast04.cel

yeast05.cel

yeast06.cel

yeast07.cel

yeast08.cel

yeast09.celyeast10.cel

yeast11.cel

yeast12.celyeast13.cel

yeast14.cel

yeast15.cel

yeast16.cel

yeast17.celyeast18.cel

yeast19.celyeast20.cel

yeast21.cel

yeast22.cel

yeast23.cel

yeast24.cel

yeast25.cel

yeast26.celyeast27.cel

yeast28.cel

yeast29.cel

yeast30.cel

Guiyuan Lei Microarray data analysis using BioConductor

Page 9: Microarray data analysis using BioConductorngl9/topics/inotes/MicroarrayAnalysis4CRI.pdf · A microarray analysis for differential gene expression in the soybean genome using Bioconductor

outlinePre-process of data

Model Fitting for Identifying Differential ExpressionNetwork Inference

LimmaPlot time course for top differential expressionHeatmap

Model Fitting for Identifying Differential Expression

Limma model [4]Construct design matrixConstruct constrasts

Up-regulated and down-regulated listPlot time course for top differential expressionVolcano Plot for viewing p-value and fold-changeHeatmap

Guiyuan Lei Microarray data analysis using BioConductor

Page 10: Microarray data analysis using BioConductorngl9/topics/inotes/MicroarrayAnalysis4CRI.pdf · A microarray analysis for differential gene expression in the soybean genome using Bioconductor

outlinePre-process of data

Model Fitting for Identifying Differential ExpressionNetwork Inference

LimmaPlot time course for top differential expressionHeatmap

Up and down regulated listFor identifying differential expression, combine the contrasts bycomparing mutant type and wild type at time point 1,2,3 and 4.

#Model Fittingfit<-lmFit(CerevisiaeProbeData, design)mc<-makeContrasts(’m1-w1’,’m2-w2’,’m3-w3’,’m4-w4’,levels=design)fit2<-contrasts.fit(fit, mc)eb<-eBayes(fit2)

Probeset ID Gene Symbol T1 T2 T3 T4ProbesetID 1 Gene 1 -1 -1 -1 -1ProbesetID 2 Gene 2 1 1 1 1ProbesetID 3 Gene 3 -1 -1 -1 -1

Table: Up and down regulated list1

1Not use real gene names hereGuiyuan Lei Microarray data analysis using BioConductor

Page 11: Microarray data analysis using BioConductorngl9/topics/inotes/MicroarrayAnalysis4CRI.pdf · A microarray analysis for differential gene expression in the soybean genome using Bioconductor

outlinePre-process of data

Model Fitting for Identifying Differential ExpressionNetwork Inference

LimmaPlot time course for top differential expressionHeatmap

Plot time course for top differential expression

M

M

M

M

M

0 1 2 3 4

89

1011

1213

ProbesetID 1 Gene 1 Rank= 1

time

Exp

ress

ion

M

M

MM

M

M

M

M

M

M

W W W WW

W W W W WW

W WW

W

M

M

M

M M

0 1 2 3 4

89

1011

1213

14

ProbesetID 2 Gene 2 Rank= 2

time

Exp

ress

ion

M

M

M

M M

M

M

M

MM

W W

W W W

W

W

W W W

W W

W W W

M

M

M

M

M

0 1 2 3 4

78

910

11

ProbesetID 3 Gene 3 Rank= 3

time

Exp

ress

ion

M

M

M

M

M

M

M

M

M

M

W

WW W W

W

W W W

W

W W

W W

W

Figure: Time course expression for top 3 differentially expressedYeast genes2

2Not use real gene names hereGuiyuan Lei Microarray data analysis using BioConductor

Page 12: Microarray data analysis using BioConductorngl9/topics/inotes/MicroarrayAnalysis4CRI.pdf · A microarray analysis for differential gene expression in the soybean genome using Bioconductor

outlinePre-process of data

Model Fitting for Identifying Differential ExpressionNetwork Inference

LimmaPlot time course for top differential expressionHeatmap

Volcano Plot for viewing p-value and fold-change

Figure: Bonferroni adjusted p-value less than 0.01 and fold-changelarger than 3.5

Guiyuan Lei Microarray data analysis using BioConductor

Page 13: Microarray data analysis using BioConductorngl9/topics/inotes/MicroarrayAnalysis4CRI.pdf · A microarray analysis for differential gene expression in the soybean genome using Bioconductor

outlinePre-process of data

Model Fitting for Identifying Differential ExpressionNetwork Inference

LimmaPlot time course for top differential expressionHeatmap

Heatmap

r1t0

r2t0

r3t0

r1t1

r2t1

r3t1

r1t2

r2t2

r3t2

r1t3

r2t3

r3t3

r1t4

r2t4

r3t4

Gene 2Gene 6Gene 38Gene 93Gene 80Gene 94Gene 88Gene 63Gene 85Gene 12Gene 91Gene 60Gene 46Gene 50Gene 19Gene 74Gene 92Gene 73Gene 100Gene 71Gene 69Gene 9Gene 43Gene 72Gene 32Gene 55Gene 64Gene 61Gene 75Gene 59Gene 86Gene 89Gene 45Gene 44Gene 83Gene 67Gene 68Gene 24Gene 53Gene 84Gene 90Gene 37Gene 87Gene 96Gene 15Gene 13Gene 21Gene 8Gene 39Gene 7Gene 3Gene 4Gene 5Gene 22Gene 42Gene 16Gene 95Gene 48Gene 77Gene 65Gene 76Gene 36Gene 20Gene 29Gene 33Gene 54Gene 52Gene 79Gene 51Gene 70Gene 97Gene 58Gene 62Gene 66Gene 14Gene 47Gene 23Gene 41Gene 1Gene 35Gene 57Gene 10Gene 99Gene 81Gene 98Gene 27Gene 40Gene 34Gene 31Gene 56Gene 11Gene 49Gene 82Gene 78Gene 18Gene 26Gene 25Gene 17Gene 30Gene 28

Guiyuan Lei Microarray data analysis using BioConductor

Page 14: Microarray data analysis using BioConductorngl9/topics/inotes/MicroarrayAnalysis4CRI.pdf · A microarray analysis for differential gene expression in the soybean genome using Bioconductor

outlinePre-process of data

Model Fitting for Identifying Differential ExpressionNetwork Inference

Network Inference

GeneNet [2]: simple, quick, but not robustMetropolis Hastings for decomposable graphs (MH-d) [3].Computationaly intensive but more stableReversiable Jump MCMC for time course data

Yeast Network

Gene 41

Gene 87

Gene 64

Gene 99Gene 57

Gene 81

Gene 94 Gene 91

Gene 98

Gene 96

Gene 58

Gene 68

Gene 88

Gene 92

Gene 80

Gene 95

Gene 69

Gene 97

Gene 65

Gene 63

Gene 66Gene 74

Gene 49

Gene 67 Gene 89

Gene 70

Gene 82Gene 75

Gene 100

Gene 90

Gene 22

Gene 53

Gene 44

Gene 93

Gene 60

Gene 56

Gene 59Gene 55Gene 76

Gene 54

Gene 34

Gene 10

Gene 35

Gene 71

Gene 28

Gene 50

Gene 84

Gene 79

Gene 73

Gene 16

Gene 86

Gene 42

Gene 20

Gene 31

Gene 27

Gene 40Gene 39

Gene 77

Gene 18

Gene 38

Gene 24

Gene 48

Gene 33

Gene 15

Gene 32

Gene 14

Gene 83

Gene 11

Gene 23

Gene 25

Gene 36 Gene 4

Gene 52

Figure: Inferred network by GeneNet package for top 100differentially expressed Yeast genes

Guiyuan Lei Microarray data analysis using BioConductor

Page 15: Microarray data analysis using BioConductorngl9/topics/inotes/MicroarrayAnalysis4CRI.pdf · A microarray analysis for differential gene expression in the soybean genome using Bioconductor

outlinePre-process of data

Model Fitting for Identifying Differential ExpressionNetwork Inference

References

W.G. Alvord, J.A. Roayaei, O.A. Quinones, and K.T. Schneider.A microarray analysis for differential gene expression in the soybean genomeusing Bioconductor and R.Briefings in Bioinformatics, September 29, 2007.

Schafer J. and K. Strimmer.An empirical bayes approach to inferring large-scale gene association networks.Bioinformatics, 21:754–764, 2005.

Beatrix Jones, Carlos Carvalho, Adrian Dobra, Chris Hans, Chris Carter, andMike West.Experiments in stochastic computation for high-dimensional graphical models.Statistical Science, 20(4):388–400, 2005.

G.K. Smyth.Linear models and empirical Bayes methods for assessing differential expressionin microarray experiments.Statistical Applications in Genetics and Molecular Biology, 3, 2004.

E. Wit and J. Mcclure.Statistics for Microarrays: Design, Analysis and Inference.Wiley, 2004.

Guiyuan Lei Microarray data analysis using BioConductor