8/3/2019 Biostatistics Role in Microarray Analysis
1/44
8/3/2019 Biostatistics Role in Microarray Analysis
2/44
8/3/2019 Biostatistics Role in Microarray Analysis
3/44
Define the elemental concept of microarrays.
Describe the utility of the analysis of microarrays. Describe the different sources of variability among the
analysis of microarrays.
Describe the linear technique used to normalize
microarray data.
Describe the role of statistics in the normalization
techniques described today.
Describe the different transformation techniques
mentioned today.
Distinguish between the different transformationtechniques described today
Describe the different pairwise comparisons techniques
used to test for independence among genes.
8/3/2019 Biostatistics Role in Microarray Analysis
4/44
Layman's term:
A DNA microarray (also commonly known as genechip, DNA chip, or biochip) is a collection of
microscopic DNA spots attached to a solid surface.
Scientists use DNA microarrays to measure theexpression levels of large numbers of genessimultaneously or to genotype multiple regions of
a genome. http://www.sciencedaily.com/articles/d/dna_microarray.
htm
http://www.sciencedaily.com/articles/d/dna_microarray.htmhttp://www.sciencedaily.com/articles/d/dna_microarray.htmhttp://www.sciencedaily.com/articles/d/dna_microarray.htmhttp://www.sciencedaily.com/articles/d/dna_microarray.htmhttp://www.sciencedaily.com/articles/d/dna_microarray.htmhttp://www.sciencedaily.com/articles/d/dna_microarray.htm8/3/2019 Biostatistics Role in Microarray Analysis
5/44
8/3/2019 Biostatistics Role in Microarray Analysis
6/44
8/3/2019 Biostatistics Role in Microarray Analysis
7/44
Which genes are related.
Which genes causes a certain disease.
What subcategories of disease X are there.
How certain can we be about this.
Dont expect it to fix bad data!
8/3/2019 Biostatistics Role in Microarray Analysis
8/44
Microarray data are inherently highly variable. YOU are measuring mRNA levels
Some of this variability is relevant since itcorresponds to the differential expression ofgenes.
Unfortunately, a large portion of undesirablebiases are introduced during the many technicalsteps of the experimental procedure.
8/3/2019 Biostatistics Role in Microarray Analysis
9/44
Biological variability
RNA extraction
Probe labeling Ex: dye differences
Printing Ex: print-order, plate-order, clone variation
Hybridization Ex: temperature, time, mixing technique
Human
Ex: variation between lab researchers Scanning
Ex: laser & detector, chemistry of the fluorescent label
Image analysis Ex: identification, quantification, background methods
8/3/2019 Biostatistics Role in Microarray Analysis
10/44
Raw Exploration
Normalization
Logarithmic Transformation (adjustment of variances)
M vs. A plot (rotation of logarithmic transformation)
This method adjust the median of differences to 0.
Background Transformation (RMA background approach usedfor linear scenarios) (to minimize the noise in the observed
plot)
Averaging normalization techniques
After normalization of all of the spots in the microarraychip, we average them to obtain a more stable masterslide.
Establish the cutting points
Nave approach (Establish cut off points by logs ratios)
Justifiable approach (Establish cut off points by T-statistic)
8/3/2019 Biostatistics Role in Microarray Analysis
11/44
Statistical analysis For eachgene iwe have the hypothesis test:
Null (neutral) hypothesis H0,i: Mi = 0 Alternative hypothesis H1,i: Mi 0
Post-hoc pairwise comparisons
Minimize false positives
8/3/2019 Biostatistics Role in Microarray Analysis
12/44
8/3/2019 Biostatistics Role in Microarray Analysis
13/44
At first, your data would probably be like this:
Large numbers are very heavy to workwith, so we need a more suitable way to
play with them
Observed data (R,G):
R= signal in red channel
G= signal in greenchannel
8/3/2019 Biostatistics Role in Microarray Analysis
14/44
8/3/2019 Biostatistics Role in Microarray Analysis
15/44
Not be confused with the normalization in statistical procedures in
which the purpose is to make the data distribution to a normal orGaussian distribution.
Normalization of microarray data is aimed to correct for the systematicmeasurement errors and bias in the observed data.
The process of normalization can be classified into linear and non linearnormalization.
Linear= is applied to selected genes or global ones. The process is quite suitablefor consistent data.
Non-linear= is highly precise for data at extreme values, but requires a gene setfor reference.
The purpose of both methods is to bring each image in the microarraydata to same average brightness using statistical modeling.
8/3/2019 Biostatistics Role in Microarray Analysis
16/44
Expectation: Most genes are non-differentially expressed
i.e. most of the data points should be around M=0.
Idea: Do various exploratory plots to see if this assumption is met. For example, M vs A, spatial plots, density & boxplots plots, print-
order plots etc.
Result: We commonly observe something like this: Measured value= real value +systematic errors+noise
Correction: If so, normalizethe data to get rid of errors &noise: Corrected value= real value +systematic errors+noise
8/3/2019 Biostatistics Role in Microarray Analysis
17/44
8/3/2019 Biostatistics Role in Microarray Analysis
18/44
Logarithmic Transformation
Why Log2??...
log2R=log2G
8/3/2019 Biostatistics Role in Microarray Analysis
19/44
M vs. A is basically arotation of the log2R vs.log2G scatter plot.
Now the quantity of
interest, i.e. the foldchange, is contained inone variable, namely M!
Transformed data (M,A): M = log2(R) - log2(G) (log ratio)
A = [log2(R) + log2(G)] (logintensity)
8/3/2019 Biostatistics Role in Microarray Analysis
20/44
R vs. G log(R) vs. log(G) M vs AR=red channel signalG=green channel signal
M= log2(R/G)aka log-ratio
A = log2(RG)aka log-intensity
8/3/2019 Biostatistics Role in Microarray Analysis
21/44
8/3/2019 Biostatistics Role in Microarray Analysis
22/44
It stands for Robust Multichip Average (Irizarry, 2003)
More robust than the Lowess (aka Loess) technique.
Mostly used in Affymetrix microarray data.
It is biologically sound to assume that fluorescenceintensities from a microarray experiment are composed
of both signal and noise, and that the noise isomnipresent throughout the entire signal distribution.
A convolution model of a signal distribution and a noise
distribution is a good choice in such a situation.
8/3/2019 Biostatistics Role in Microarray Analysis
23/44
Convolution model is a mathematicaloperation on two functions fand g, producing athird function that is typically viewed as a modifiedversion of one of the original functions.
Fluorescent signal
Observed data
Background noise
http://en.wikipedia.org/wiki/Operation_(mathematics)http://en.wikipedia.org/wiki/Operation_(mathematics)8/3/2019 Biostatistics Role in Microarray Analysis
24/44
8/3/2019 Biostatistics Role in Microarray Analysis
25/44
The equation of the RMA method, E(Si|Xi=xi) willbe used as the background intensity correction forgene i(it is applied to all genes in the microarrayin order to minimize the noise from the observed
signal).
8/3/2019 Biostatistics Role in Microarray Analysis
26/44
Useful when having different segmentation of the samegene.
Combines all segmentation of the same gene into anaverage transformed single unit.
Can apply T test to work out if the mean of data is sameor different between two conditions.
Can apply ANOVA to work out if the mean of data issame or different across two or more conditions.
8/3/2019 Biostatistics Role in Microarray Analysis
27/44
normalization
Average slide
8/3/2019 Biostatistics Role in Microarray Analysis
28/44
8/3/2019 Biostatistics Role in Microarray Analysis
29/44
Nave approach
Establish cut off pointsby logs ratios.
This has to be done postM vs. A transformation &background correction
Top and bottom 0.5 of
the absolute Mvalueshave to be shaven off.
8/3/2019 Biostatistics Role in Microarray Analysis
30/44
Justifiable approach Establish cut off points using T-
statistic via Significance Analysisof Microarrays*
For replicated data, i.e.multiple measurements of the
same thing, we trust thisapproach more if the deviation(std.dev.) is small.
T = mean(x) / SE(x) Where
The M axis is the only one tobe transformed by T.
If the deviation is large, we donot trust it that much.(stickwith nave approach)
*R package / Excel Add-In
8/3/2019 Biostatistics Role in Microarray Analysis
31/44
8/3/2019 Biostatistics Role in Microarray Analysis
32/44
For eachgene iwe have the hypothesistest:
Which genes or groups are (most) differentially
expressed?H0,i: Mi= 0H1,i: Mi 0
=5%
CI= 95%
8/3/2019 Biostatistics Role in Microarray Analysis
33/44
Thousands of tests, i.e. each gene is tested
againstH0: T=0.
false positives problems are a serious threat.
need to adjust p-values.
Different adjustment procedures
Pairwise comparisons post-hoc test
Bonferroni (best in linear situations)
Tukey
Sidak
Duncan
Holm
8/3/2019 Biostatistics Role in Microarray Analysis
34/44
Multiple tests: a family of tests They compared a list of significant genes
Then family-wise error (FWE) = 0.05
Bonferroni correction: set k=p/m
Where: k= new p-value; p= original ; m= # of posthoc performed.
8/3/2019 Biostatistics Role in Microarray Analysis
35/44
To sort and rank data.
To reduce data set of 1000s genes to 10s or
100s (via Averaging NormalizationTechniques).
As a guide in selecting which genes tovalidate more precisely and which no to.
8/3/2019 Biostatistics Role in Microarray Analysis
36/44
Filter out bad spots.
Adjust low intensities.
Normalize background noise and raw data. Calculate average ratios and statistical
significance values per gene.
Perform pairwise post hoc comparisons tominimize false positives.
8/3/2019 Biostatistics Role in Microarray Analysis
37/44
There are many different statistical significancemetrics.
T-test (P values), SAM (T values), Wilcoxon RST,ANOVA (F-statistics), many more
Just many variations on a theme!
Choose one (or more!) wisely.
8/3/2019 Biostatistics Role in Microarray Analysis
38/44
BUT: dont let it make decisions for you!
There will always be false positives. (theres no
post hoc test that can eliminate all!!)
The most accurate tool in validating the results isthe researchers judgment, with the help of the
keen point of view of a biostatistician of course!...
8/3/2019 Biostatistics Role in Microarray Analysis
39/44
You need replication and statistics to find realdifferences between genes.
In most cases the nave approach (cutoff points bylog ratios) is notenough.
Cutoff points by t-statisticsis a much wiser decision.
Look out for false positives.
Multiple testing = must adjust the pvalues.
8/3/2019 Biostatistics Role in Microarray Analysis
40/44
Dchip
Affymetrix
R
Bioconductor
BRBArray tools (NCI biometric research branch)
Matlab Bioinformatics Toolbox
GeneSpring
Partek
8/3/2019 Biostatistics Role in Microarray Analysis
41/44
For further reading regarding the non-linear normalization of
microarrays please visit:
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdf
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdfhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdf8/3/2019 Biostatistics Role in Microarray Analysis
42/44
1. Good image analysis is essential. Some software are
obsolete and not that good.
2. Normalization is needed. We understand more now
than a few years ago.
3. Use at least the t-statistics to identify differentially
expressed genes. Do not rely exclusively on log-ratios.
4. Multiple testing must be considered for false positives;
adjust yourp-values.
5. Talk to a biostatistician before doing the experiments!They too have a family to feed thanks to your work!.
8/3/2019 Biostatistics Role in Microarray Analysis
43/44
Analysis of Microarray Data
Henrik Bengtsson [email protected]
Brown,S. (2009). Microarray Data Analysis. September 8, MMXI.
Retrieved from http://www.docstoc.com/docs/5822653/Microarray-Data-Analysis
Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP.(2003). Summaries of Affymetrix GeneChip probe level data.
Nucleic Acids Res. 31:e15.
The Use of Statistics in Microarray Studies (Dr. Ernst Wit)
http://www.stats.gla.ac.uk/~microarray
Wikipedia. MA plots. September 8, MMXI.
Retrieved from http://en.wikipedia.org/wiki/MA_plot
mailto:[email protected]://www.docstoc.com/docs/5822653/Microarray-Data-Analysishttp://www.docstoc.com/docs/5822653/Microarray-Data-Analysishttp://www.stats.gla.ac.uk/~microarrayhttp://en.wikipedia.org/wiki/MA_plothttp://en.wikipedia.org/wiki/MA_plothttp://www.stats.gla.ac.uk/~microarrayhttp://www.docstoc.com/docs/5822653/Microarray-Data-Analysishttp://www.docstoc.com/docs/5822653/Microarray-Data-Analysishttp://www.docstoc.com/docs/5822653/Microarray-Data-Analysishttp://www.docstoc.com/docs/5822653/Microarray-Data-Analysishttp://www.docstoc.com/docs/5822653/Microarray-Data-Analysismailto:[email protected]8/3/2019 Biostatistics Role in Microarray Analysis
44/44