Upload
jennessa-cherry
View
55
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Gene Expression Arrays. EPP 245 Statistical Analysis of Laboratory Data. Basic Design of Expression Arrays. For each gene that is a target for the array, we have a known DNA sequence. - PowerPoint PPT Presentation
Citation preview
1
Gene Expression Arrays
EPP 245
Statistical Analysis of
Laboratory Data
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
2
Basic Design of Expression Arrays
• For each gene that is a target for the array, we have a known DNA sequence.
• mRNA is reverse transcribed to DNA, and if a complementary sequence is on the on a chip, the DNA will be more likely to stick
• The DNA is labeled with a dye that will fluoresce and generate a signal that is monotonic in the amount in the sample
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
3
TAAATCGATACGCATTAGTTCGACCTATCGAAGACCCAACACGGATTCGATACGTTAATATGACTACCTGCGCAACCCTAACGTCCATGTATCTAATACGATTTAGCTATGCGTAATCAAGCTGGATAGCTTCTGGGTTGTGCCTAAGCTATGCAATTATACTGATGGACGCGTTGGGATTGCAGGTACATAGATTATGC
Exon Intron
Probe Sequence
• cDNA arrays use variable length probes derived from expressed sequence tags– Spotted and almost always used with two color methods– Can be used in species with an unsequenced genome
• Long oligoarrays use 60-70mers– Agilent two-color arrays– Spotted arrays from UC Davis or elsewhere– Usually use computationally derived probes but can use probes
from sequenced EST’s
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
4
• Affymetrix GeneChips use multiple 25-mers– For each gene, one or more sets of 8-20 distinct
probes – May overlap – May cover more than one exon
• Affymetrix chips also use mismatch (MM) probes that have the same sequence as perfect match probes except for the middle base which is changed to inhibitbinding.
• This is supposed to act as a control, but often instead binds to another mRNA species, so many analysts do not use them
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
5
Probe Design
• A good probe sequence should match the chosen gene or exon from a gene and should not match any other gene in the genome.
• Melting temperature depends on the GC content and should be similar on all probes on an array since the hybridization must be conducted at a single temperature.
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
6
• The affinity of a given piece of DNA for the probe sequence can depend on many things, including secondary and tertiary structure as well as GC content.
• This means that the relationship between the concentration of the RNA species in the original sample and the brightness of the spot on the array can be very different for different probes for the same gene.
• Thus only comparisons of intensity within the same probe across arrays makes sense.
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
7
Affymetrix GeneChips
• For each probe set, there are 8-20 perfect match (PM) probes which may overlap or not and which target the same gene
• There are also mismatch (MM) probes which are supposed to serve as a control, but do so rather badly
• Most of us ignore the MM probes
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
8
Expression Indices
• A key issue with Affymetrix chips is how to summarize the multiple data values on a chip for each probe set (aka gene).
• There have been a large number of suggested methods.
• Generally, the worst ones are those from Affy, by a long way; worse means less able to detect real differences
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
9
Usable Methods
• Li and Wong’s dCHIP and follow on work is demonstrably better than MAS 4.0 and MAS 5.0, but not as good as RMA and GLA
• ArrayAssist can use dCHIP, RMA, gcRMA, and others.
• The GLA method (Durbin, Rocke, Zhou) can be imported into ArrayAssist.
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
10
Steps in Expression Index Construction
• Background correction is the process of adjusting the signals so that the zero point is similar on all parts of all arrays.
• We like to manage this so that zero signal after background correction corresponds approximately to zero amount of the mRNA species that is the target of the probe set.
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
11
• Data transformation is the process of changing the scale of the data so that it is more comparable from high to low.
• Common transformations are the logarithm and generalized logarithm
• Normalization is the process of adjusting for systematic differences from one array to another.
• Normalization may be done before or after transformation, and before or after probe set summarization.
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
12
• One may use only the perfect match (PM) probes, or may subtract or otherwise use the mismatch (MM) probes
• There are many ways to summarize 20 PM probes and 20 MM probes on 10 arrays (total of 200 numbers) into 10 expression index numbers
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
13
The RMA Method
• Background correction that does not make 0 signal correspond to 0 amount
• Quantile normalization makes the overall distribution of intensity values across probes the same on each array
• Log2 transform
• Median polish summary of PM probes
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
14
4.00 6.00 5.00 5.00
8.00 9.00 7.00 8.00
12.00 24.00 12.00 16.00
8.00 13.00 8.00
-1.00 1.00 0.00 0.00
0.00 1.00 -1.00 0.00
-4.00 8.00 -4.00 0.00
-1.67 3.33 -1.67
0.67 -2.33 1.67 0.00
1.67 -2.33 0.67 0.00
-2.33 4.67 -2.33 0.00
0.00 0.00 0.00
Analysis by means
•Remove Row Means•Remove Column Means•Rows and Columns have
mean 0•Influence of an outlier spreads
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
15
4.00 6.00 5.00 5.00
8.00 9.00 7.00 8.00
12.00 24.00 12.00 12.00
8.00 9.00 7.00
-1.00 1.00 0.00 0.00
0.00 1.00 -1.00 0.00
0.00 12.00 0.00 0.00
0.00 1.00 0.00
-1.00 0.00 0.00 0.00
0.00 0.00 -1.00 0.00
0.00 11.00 0.00 0.00
0.00 0.00 0.00
Median Polish
•Remove Row Medians•Remove Column Medians•Rows and Columns may not
have median 0•Outliers contained•May have to be iterated
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
16
Example Probe Set• Using the Affy HG U133 Plus 2.0
GeneChip with 54675 probe sets, from 604258 PM probes.
• Four chips derived from human IR exposed skin at 0, 1, 10, and 100 cGy
• Probe set number 10067/54675 has Affy ID 200618_at
• Gene is LASP1, LIM and SH3 protein 1, LIM protein subfamily, Src homology, actin binding.
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
17
0 1 10 100
200618_at1 360 216 158 198 233.0
200618_at2 313 402 106 103 231.0
200618_at3 130 182 79 91 120.5
200618_at4 351 370 195 136 263.0
200618_at5 164 130 98 107 124.8
200618_at6 223 219 164 196 200.5
200618_at7 437 529 195 158 329.8
200618_at8 509 554 274 128 366.3
200618_at9 522 720 285 198 431.3
200618_at10 668 715 247 260 472.5
200618_at11 306 286 144 159 223.8
362.1 393.0 176.8 157.6
Mean Summarization
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
18
0 1 10 100
200618_at1 2.56 2.33 2.20 2.30 2.35
200618_at2 2.50 2.60 2.03 2.01 2.28
200618_at3 2.11 2.26 1.90 1.96 2.06
200618_at4 2.55 2.57 2.29 2.13 2.38
200618_at5 2.21 2.11 1.99 2.03 2.09
200618_at6 2.35 2.34 2.21 2.29 2.30
200618_at7 2.64 2.72 2.29 2.20 2.46
200618_at8 2.71 2.74 2.44 2.11 2.50
200618_at9 2.72 2.86 2.45 2.30 2.58
200618_at10 2.82 2.85 2.39 2.41 2.62
200618_at11 2.49 2.46 2.16 2.20 2.33
2.51 2.53 2.21 2.18
Mean Summarizationof the Logs
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
19
The GLA Method
• The Glog Average (GLA) method is simpler than the RMA method, though it can require estimation of a parameter
• Background correction is intended to make a measured value of zero correspond to a zero quantity in the sample
• Transformation uses the glog ~ ln for large values
• Normalization via lowess• Summary is a simple average of PM probes
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
20
Probe Sets not Genes
• It is unavoidable to refer to a probe set as measuring a “gene”, but nevertheless it can be deceptive
• The annotation of a probe set may be based on homology with a gene of possibly known function in a different organism
• Only a relatively few probe sets correspond to genes with known function and known structure in the organism being studied
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
21
Two-Color Arrays
• Two-color arrays are designed to account for variability in slides and spots by using two samples on each slide, each labeled with a different dye.
• If a spot is too large, for example, both signals will be too big, and the difference or ratio will eliminate that source of variability
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
22
Dyes
• The most common dye sets are Cy3 (green) and Cy5 (red), which fluoresce at approximately 550 nm and 649 nm respectively (red light ~ 700 nm, green light ~ 550 nm)
• The dyes are excited with lasers at 532 nm (Cy3 green) and 635 nm (Cy5 red)
• The emissions are read via filters using a CCD device
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
23
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
24
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
25
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
26
File Format
• A slide scanned with Axon GenePix produces a file with extension .gpr that contains the results:http://www.axon.com/gn_GenePix_File_Formats.html
• This contains 29 rows of headers followed by 43 columns of data (in our example files)
• For full analysis one may also need a .gal file that describes the layout of the arrays
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
27
"Block" "Column" "Row" "Name" "ID" "X" "Y" "Dia." "F635 Median" "F635 Mean" "F635 SD" "B635 Median" "B635 Mean" "B635 SD" "% > B635+1SD" "% > B635+2SD" "F635 % Sat."
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
28
"F532 Median" "F532 Mean" "F532 SD" "B532 Median" "B532 Mean" "B532 SD" "% > B532+1SD" "% > B532+2SD" "F532 % Sat."
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
29
"Ratio of Medians (635/532)" "Ratio of Means (635/532)" "Median of Ratios (635/532)" "Mean of Ratios (635/532)" "Ratios SD (635/532)""Rgn Ratio (635/532)" "Rgn R² (635/532)" "F Pixels" "B Pixels" "Sum of Medians" "Sum of Means" "Log Ratio (635/532)" "F635 Median - B635""F532 Median - B532" "F635 Mean - B635" "F532 Mean - B532" "Flags"
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
30
Analysis Choices
• Mean or median foreground intensity
• Background corrected or not
• Log transform (base 2, e, or 10) or glog transform
• Log is compatible only with no background correction
• Glog is best with background correction
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
31
Block 1
Column 1
Row 1
Name NM_006182
ID discoidin domain receptor family, member
X 2575
Y 2565
Dia. 85
DDR1
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
32
F635 Median 48
F635 Mean 54
F635 SD 23
B635 Median 34
B635 Mean 36
B635 SD 11
% > B635+1SD 52
% > B635+2SD 36
F635 % Sat. 0
F532 Median 109
F532 Mean 113
F532 SD 26
B532 Median 35
B532 Mean 36
B532 SD 7
% > B532+1SD 100
% > B532+2SD 100
F532 % Sat. 0
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
33
Issues with Two-Color Arrays
• Chips have different overall intensities, so normalization across chips is needed.
• The overall intensity on the red channel may be greater or less than on the green channel, so normalization across dyes is needed.
• The red/green difference is can be different at different intensity levels
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
34
Array normalization
• Array normalization is meant to increase the precision of comparisons by adjusting for variations that cover entire arrays
• Without normalization, the analysis would be valid, but possibly less sensitive
• However, a poor normalization method will be worse than none at all.
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
35
Possible normalization methods
• We can equalize the mean or median intensity by adding or multiplying a correction term
• We can use different normalizations at different intensity levels (intensity-based normalization) for example by lowess or quantiles
• We can normalize for other things such as print tips
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
36
Group 1 Group 2
Array 1 Array 2 Array 3 Array 4
Gene 1 1100 900 425 550
Gene 2 110 95 85 110
Gene 3 80 65 55 80
Example for Normalization
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
37
. list Array Group Gene Expression
+---------------------------------+ | Array Group Gene Expres~n | |---------------------------------| 1. | 1 1 1 1100 | 2. | 2 1 1 900 | 3. | 3 2 1 425 | 4. | 4 2 1 550 | 5. | 1 1 2 110 | |---------------------------------| 6. | 2 1 2 95 | 7. | 3 2 2 85 | 8. | 4 2 2 110 | 9. | 1 1 3 80 | 10. | 2 1 3 65 | |---------------------------------| 11. | 3 2 3 55 | 12. | 4 2 3 80 | +---------------------------------+
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
38
. sort Gene
. by Gene: anova Expression Group
---------------------------------------------------------------------------------> Gene = 1
Number of obs = 4 R-squared = 0.9042 Root MSE = 117.925 Adj R-squared = 0.8564
Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------- Model | 262656.25 1 262656.25 18.89 0.0491 | Group | 262656.25 1 262656.25 18.89 0.0491 | Residual | 27812.5 2 13906.25 -----------+---------------------------------------------------- Total | 290468.75 3 96822.9167
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
39
-> Gene = 2
Number of obs = 4 R-squared = 0.0556 Root MSE = 14.5774 Adj R-squared = -0.4167
Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------- Model | 25 1 25 0.12 0.7643 | Group | 25 1 25 0.12 0.7643 | Residual | 425 2 212.5 -----------+---------------------------------------------------- Total | 450 3 150 --------------------------------------------------------------------------------> Gene = 3 Number of obs = 4 R-squared = 0.0556 Root MSE = 14.5774 Adj R-squared = -0.4167
Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------- Model | 25 1 25 0.12 0.7643 | Group | 25 1 25 0.12 0.7643 | Residual | 425 2 212.5 -----------+---------------------------------------------------- Total | 450 3 150
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
40
Group 1 Group 2
Array 1 Array 2 Array 3 Array 4
Gene 1 975 851 541 608
Gene 2 -15 46 201 168
Gene 3 -45 16 171 138
Additive Normalization by Means
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
41
. mean Expression
. ereturn list
scalars: e(df_r) = 11 e(N_over) = 1 e(N) = 12 e(k_eq) = 1 e(k_eform) = 0
macros: e(cmd) : "mean" e(title) : "Mean estimation" e(estat_cmd) : "estat_vce_only" e(varlist) : "Expression" e(predict) : "_no_predict" e(properties) : "b V"
matrices: e(b) : 1 x 1 e(V) : 1 x 1 e(_N) : 1 x 1 e(error) : 1 x 1
functions: e(sample)
. matrix ExpMeanMat = e(b)
. matlist ExpMeanMat
| Express~n -------------+----------- y1 | 304.5833
. scalar ExpMean = ExpMeanMat[1,1]
. display ExpMean304.58333. anova Expression Array. predict ArrayMean. generate NormExp1=Expression-ArrayMean +ExpMean
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
42
. list Array Group Gene Expression ArrayMean NormExp1
+--------------------------------------------------------+ | Array Group Gene Expres~n ArrayM~n NormExp1 | |--------------------------------------------------------| 1. | 1 1 1 1100 430 974.5833 | 2. | 2 1 1 900 353.3333 851.25 | 3. | 3 2 1 425 188.3333 541.25 | 4. | 4 2 1 550 246.6667 607.9167 | 5. | 1 1 2 110 430 -15.41667 | |--------------------------------------------------------| 6. | 2 1 2 95 353.3333 46.24999 | 7. | 3 2 2 85 188.3333 201.25 | 8. | 4 2 2 110 246.6667 167.9167 | 9. | 1 1 3 80 430 -45.41667 | 10. | 2 1 3 65 353.3333 16.24999 | |--------------------------------------------------------| 11. | 3 2 3 55 188.3333 171.25 | 12. | 4 2 3 80 246.6667 137.9167 | +--------------------------------------------------------+
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
43
. by Gene: anova NormExp1 Group
-------------------------------------------------------------------------------------> Gene = 1
Number of obs = 4 R-squared = 0.9209 Root MSE = 70.0991 Adj R-squared = 0.8814
Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------- Model | 114469.431 1 114469.431 23.30 0.0403 | Group | 114469.431 1 114469.431 23.30 0.0403 | Residual | 9827.77662 2 4913.88831 -----------+---------------------------------------------------- Total | 124297.207 3 41432.4024
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
44
-> Gene = 2
Number of obs = 4 R-squared = 0.9209 Root MSE = 35.0496 Adj R-squared = 0.8814
Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------- Model | 28617.3614 1 28617.3614 23.30 0.0403 | Group | 28617.3614 1 28617.3614 23.30 0.0403 | Residual | 2456.9441 2 1228.47205 -----------+---------------------------------------------------- Total | 31074.3055 3 10358.1018
--------------------------------------------------------------------------------------> Gene = 3
Number of obs = 4 R-squared = 0.9209 Root MSE = 35.0496 Adj R-squared = 0.8814
Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------- Model | 28617.3612 1 28617.3612 23.30 0.0403 | Group | 28617.3612 1 28617.3612 23.30 0.0403 | Residual | 2456.94427 2 1228.47214 -----------+---------------------------------------------------- Total | 31074.3055 3 10358.1018
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
45
Group 1 Group 2
Array 1 Array 2 Array 3 Array 4
Gene 1 779 776 687 679
Gene 2 78 82 137 136
Gene 3 57 56 89 99
Multiplicative Normalization by Means
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
46
. generate NormExp2 = Expression*ExpMean/ArrayMean
. list Array Group Gene Expression ArrayMean NormExp2
+-------------------------------------------------------+ | Array Group Gene Expres~n ArrayM~n NormExp2 | |-------------------------------------------------------| 1. | 1 1 1 1100 430 779.1667 | 2. | 2 1 1 900 353.3333 775.8254 | 3. | 3 2 1 425 188.3333 687.3341 | 4. | 4 2 1 550 246.6667 679.1385 | 5. | 1 1 2 110 430 77.91666 | |-------------------------------------------------------| 6. | 2 1 2 95 353.3333 81.89268 | 7. | 3 2 2 85 188.3333 137.4668 | 8. | 4 2 2 110 246.6667 135.8277 | 9. | 1 1 3 80 430 56.66667 | 10. | 2 1 3 65 353.3333 56.03184 | |-------------------------------------------------------| 11. | 3 2 3 55 188.3333 88.94912 | 12. | 4 2 3 80 246.6667 98.78378 | +-------------------------------------------------------+
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
47
. by Gene: anova NormExp2 Group
---------------------------------------------------------------------------------> Gene = 1
Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------- Model | 8884.90342 1 8884.90342 453.70 0.0022
---------------------------------------------------------------------------------> Gene = 2
Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------- Model | 3219.72043 1 3219.72043 696.33 0.0014
---------------------------------------------------------------------------------> Gene = 3
Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------- Model | 1407.54019 1 1407.54019 57.97 0.0168
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
48
Group 1 Group 2
Array 1 Array 2 Array 3 Array 4
Gene 1 1025 971 512 512
Gene 2 102 102 102 102
Gene 3 75 70 66 74
Multiplicative Normalization by Medians
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
49
. sort Array
. table Array, contents(p50 Expression)
------------------------- Array | med(Expres~n)----------+-------------- 1 | 110 2 | 95 3 | 85 4 | 110-------------------------
. input ArrayMed
ArrayMed 1. 110 2. 110 3. 110 4. 95 5. 95 6. 95 7. 85 8. 85 9. 85 10. 110 11. 110 12. 110
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
50
. summarize Expression, detail
Expression------------------------------------------------------------- Percentiles Smallest 1% 55 55 5% 55 6510% 65 80 Obs 1225% 80 80 Sum of Wgt. 12
50% 102.5 Mean 304.5833 Largest Std. Dev. 363.114475% 487.5 42590% 900 550 Variance 131852.195% 1100 900 Skewness 1.27795499% 1100 1100 Kurtosis 3.132949
. generate NormExp3 = Expression*102.5/ArrayMed
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
51
-> Gene = 1
Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------- Model | 235735.794 1 235735.794 324.00 0.0031
-------------------------------------------------------------------------------------> Gene = 2
Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------- Model | 0 1 0
-------------------------------------------------------------------------------------> Gene = 3
Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------- Model | 3.6253006 1 3.6253006 0.17 0.7228
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
52
Intensity-based normalization
• Normalize by means, medians, etc., but do so only in groups of genes with similar expression levels.
• lowess is a procedure that produces a running estimate of the middle, like a robustified mean
• If we subtract the lowess of each array and add the average of the lowess’s, we get the lowess normalization
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
53
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
54
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
55
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
56
Fitting a model to genes
• We can fit a model to the data of each gene after the whole arrays have been background corrected, transformed, and normalized
• Each gene is then test for whether there is differential expression
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
57
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
58
Multiplicity Adjustments
• If we test thousands of genes and pick all the ones which are significant at the 5% level, we will get hundreds of false positives.
• Multiplicity adjustments winnow this down so that the number of false positives is smaller
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
59
Types of Multiplicity Adjustments
• The Bonferroni correction aims to detect no significant genes at all if there are truly none, and guarantees that the chance that any will be detected is less than .05 under these conditions
• Generally, this is too conservative• Less conservative versions include
methods due to Holm, Hochberg, and Benjamini and Hochberg (FDR)
November 9, 2006 EPP 245 Statistical Analysis of Laboratory Data
60