44
Microarray Design with an Illumina focus Andy Lynch 23/07/08

Microarray Design with an Illumina focus Andy Lynch 23/07/08

Embed Size (px)

Citation preview

Page 1: Microarray Design with an Illumina focus Andy Lynch 23/07/08

Microarray Design with an Illumina focus

Andy Lynch23/07/08

Page 2: Microarray Design with an Illumina focus Andy Lynch 23/07/08

Overview

• The BeadArray Technology

• Sources of variance

• Bead-level data

• Prior information

• Specific experiment types

• Reasons for choosing ‘sub-optimal’ designs

• Closing thoughts

Page 3: Microarray Design with an Illumina focus Andy Lynch 23/07/08

The Technology

Page 4: Microarray Design with an Illumina focus Andy Lynch 23/07/08

The Bead

Each silica bead is 3 microns in diameter

700,000 copies of same probe sequence are covalently attached to each bead for hybridisation & decoding

BEAD AACGTATACGACTATCGTGTACAGTATAGC

bases used to identify the bead-type

50 bases that target the RNA (for example)

of interest

UUGCAUAUGCUGAUAGCACAUGUCAUAUCG

Complementary RNA with dye

attached

Page 5: Microarray Design with an Illumina focus Andy Lynch 23/07/08

Human expression beadchips

Page 6: Microarray Design with an Illumina focus Andy Lynch 23/07/08

Human expression beadchips

HumanRef-88 Parallel Arrays on the chip Each Array has ~24,000 'high-quality' RefSeq derived probesApprox 30 copies of each bead type

HumanWG-6 V16 Parallel Arrays on the chip, each consisting of 2 parallel stripsStrip 1 has the ~24,000 RefSeq derived probesStrip 2 has ~24,000 other probes (some RefSeq derived)Approx 30 copies of each bead type

Page 7: Microarray Design with an Illumina focus Andy Lynch 23/07/08

HumanWG-6 V2, V36 Parallel Arrays on the chip, each consisting of 2 parallel stripsEach strip has ~48,000 probesApprox 30 copies of each bead type

HumanHT-1212 Parallel Arrays on the chip consisting of 1 stripEach strip has ~48,000 probes*Fewer copies (?~15) of each bead type

Human expression beadchips

Page 8: Microarray Design with an Illumina focus Andy Lynch 23/07/08

Control beads

Many negative controls ~1000 depending on chip-type- each with replicates

Some house-keeping, biotin, and “high stringency” controls

Labelling controls (may not be used)

Some perfect-match/mis-match pairs(useless in HumanWG 6 V3)

Some general hybridization controls

Page 9: Microarray Design with an Illumina focus Andy Lynch 23/07/08

SAMS

Page 10: Microarray Design with an Illumina focus Andy Lynch 23/07/08

SAMSEach array on the end of a fibre-optic cable

96 arrays in a module

Each array has about 1500 probe-typesabout 30 replicates of each

Used for specialist probe panelscan be custom made

Often used for two-colour work

Used for genotyping, allele specific expression, methylation, expression (esp with poor quality RNA) and microRNAs

Page 11: Microarray Design with an Illumina focus Andy Lynch 23/07/08

The processBeads are allocated at random to the wells

Presumed independently

Address sequences are used to identify the beads- Some beads will fail to be identified

Presume this is independent of bead-type

Array rejected if not all beadtypes are present in suitable numbers- Applies to HumanWG 6 and HumanRef 8 - At least 5 replicates on the array?- Seems to have at least one bead on each strip of two strip arrays

Sample hybridized to array

Can either return “bead-level” intensities/locations or Illumina summaries

Page 12: Microarray Design with an Illumina focus Andy Lynch 23/07/08

Illumina Summaries

For each bead type…

…on the original (i.e. non-logged) scale…

… outliers are removed (>3 MAD from the median)… number of beads is reported… mean intensity… s.e. of intensity… p-value for comparison with negative controls

Page 13: Microarray Design with an Illumina focus Andy Lynch 23/07/08

Illumina Summaries

For two-colour platforms we may wish to then calculate…

… log-ratio (log(R/G)) … beta (R/(R+G))… sum (R+G)… theta (2*arctan(R/G)/π)

However we can’t get very good estimates of the confidence in these values since the covariance of the red and the green signals is not reported in the summary information.

Page 14: Microarray Design with an Illumina focus Andy Lynch 23/07/08

Substructures

Array

Strip

Strip segment

The strips that make up one or half of one array themselves consist of 9 sub-sections (segments)

Probably shouldn't treat an array as 18 technical replicates, but need to be aware of the issue

Page 15: Microarray Design with an Illumina focus Andy Lynch 23/07/08

SubstructuresThe 96 arrays in a SAM are arranged in a 12x8 layout

Each individual array consists of an approximate hexagon of 49,777 beads arranged in 547 hexagons of 91 beads

91 beads

547 sub-units

14->

14->

27->

Page 16: Microarray Design with an Illumina focus Andy Lynch 23/07/08

Sources of variation

Page 17: Microarray Design with an Illumina focus Andy Lynch 23/07/08

Differences between probes

• Not all probes are equally well designed• There are thermodynamic differences between probes• The additional probes on the HumanWG6 arrays are a-priori

less likely to see expression• Some probes contain SNPs, mismatches, splice junctions etc

• Some probes target the 3’ end of a gene some the 5’ end• Some probes have multiple matches in the transcriptome

others have no good match

Page 18: Microarray Design with an Illumina focus Andy Lynch 23/07/08

Sources of variation

• Variation enters at many levels

bead < probe < strip < array < chip

• Random numbers of beads mean that some arrays provide more evidence than others

Page 19: Microarray Design with an Illumina focus Andy Lynch 23/07/08

Sources of variation

• Differences between chips (as expected)

• Gradients within chips (widely reported) – known that there is a between array gradient– also a perpendicular (along array) gradient in many

chips • not observable with summary data

• Quality of final array on chip has been questioned on occasion

• Differences between strips– not surprising given the gradient

• not observable with summary data

Page 20: Microarray Design with an Illumina focus Andy Lynch 23/07/08

Negative control intensities

Page 21: Microarray Design with an Illumina focus Andy Lynch 23/07/08

Positive control intensities

Page 22: Microarray Design with an Illumina focus Andy Lynch 23/07/08

Bead-level data

Page 23: Microarray Design with an Illumina focus Andy Lynch 23/07/08

Bead-Level Data

• As an alternative to the summary data – can obtain bead level data, – or the raw images and a list of bead locations

and identities

• Need to adjust the scanner settings to achieve this

• The beadarray bioconductor package is available to handle the data

Page 24: Microarray Design with an Illumina focus Andy Lynch 23/07/08

Bead-Level Advantages

• Can perform better quality control• Can rescue arrays/strips that might otherwise need to be

discarded

Page 25: Microarray Design with an Illumina focus Andy Lynch 23/07/08

Bead-Level Advantages

• Can separate the two strips– Either normalize them while combining– Or take two technical replicates

• Can analyse the data on the scale of our choice– Usually log– Includes outlier removal

• For two-colour arrays, can calculate standard errors of beta, theta etc.

Page 26: Microarray Design with an Illumina focus Andy Lynch 23/07/08

Eliciting priors

Page 27: Microarray Design with an Illumina focus Andy Lynch 23/07/08

Eliciting prior information

• Default ‘LIMMA’ analysis returning the log-odds of being differentially expressed essentially assumes a uniform prior for the probes

• Certainly with the HumanWG-6 the refseq and non-refseq probes would have different a priori odds

• May wish to elicit more specific priors, but can’t get 48,000!

• Priors by pathway?

Page 28: Microarray Design with an Illumina focus Andy Lynch 23/07/08

Eliciting prior information

• While we are about it, can try to gauge– Which contrasts are more important?– Which ‘treatments’ are expected to be similar?

Page 29: Microarray Design with an Illumina focus Andy Lynch 23/07/08

Summary• Not all arrays will provide equal amounts of evidence

– Numbers of beads will vary from chip to chip

• Some 'arrays' may provide no evidence for certain probe types– In HumanWG 12 this is a 'feature‘– In HumanWG 6 V2/3 may result from treating the two strips as

technical replicates– May result from excising part of the array in quality control

• Block designs required– may need to consider blocks of 6, 8, or 12

• Need to know if we will have raw or summarized data

Page 30: Microarray Design with an Illumina focus Andy Lynch 23/07/08

First design question

• If using Illumina for expression, which array to use?

• The 6 has extra probes (but these just as likely to hinder) and is expensive

• The 8 only has good quality probes, is cheaper, but lacks some probes on the 6

• The 12 is cheapest, but risks having no or few beads for some probes

Page 31: Microarray Design with an Illumina focus Andy Lynch 23/07/08

Some specific types of experiment

Page 32: Microarray Design with an Illumina focus Andy Lynch 23/07/08

Platform Comparison Studies

• E.g. MAQC (nature biotech, 2006, 24 1140-1150)

• How do you decide on the number of arrays to compare?• How do you choose an analysis method that isn’t biased

towards one of the platforms?

Page 33: Microarray Design with an Illumina focus Andy Lynch 23/07/08

Platform Evaluation

• How do we determine absolutely the performance of a platform?

• Titration series? (e.g. BMC Bioinformatics, 2006, 7, 511)– What levels of dilution?

• Spiked-in probes? (e.g. Affymetrix Latin Square data for expression algorithm assessment 2001) – How many and at what levels?

Page 34: Microarray Design with an Illumina focus Andy Lynch 23/07/08

Logical experiments

• Often want to find genes that show up with one treatment but not another

• Extreme example is identification of siRNA offtargets as in Nature Methods (2006) 3 199-204

• They had 4 siRNAs with the same target and replicates for each.

• The question is what genes are differentially expressed only by one siRNA?

• Need to weigh up number of alternative treatments, FPR,

FNR, and number of biological replicates

Page 35: Microarray Design with an Illumina focus Andy Lynch 23/07/08

Time Series

• Choice of time points

• Replicate the same time points or intervening ones?

• Control series?– Same time points?

• Cell cycle?

Page 36: Microarray Design with an Illumina focus Andy Lynch 23/07/08

Reasons for departing from the theoretically optimal

design

Page 37: Microarray Design with an Illumina focus Andy Lynch 23/07/08

Robustness

• Quite common to design experiments to be robust to losing a single array

• Now, may need to be robust to losing a chip

• In SAM experiments, may need to be robust to losing the edge rows and columns.

• Can cause tension if there is a shortage of samples for some treatments

Page 38: Microarray Design with an Illumina focus Andy Lynch 23/07/08

Validation

• May want to sacrifice the ability to estimate our quantity of interest in order to be able to evaluate performance

• For classifications such as CNV calls might want to include a series of many replicates

• Can estimate false calling rates by analysis of the consistency of calls within the replicates

Page 39: Microarray Design with an Illumina focus Andy Lynch 23/07/08

Validation

• Some genomic information (SNPs, CNVs etc.) we expect to be inherited at a certain rate.

• Inclusion of pedigrees can allow estimation of inheritance rates

• Discrepancies between the expected and observed rates can allow for estimation of the false calling rates

Page 40: Microarray Design with an Illumina focus Andy Lynch 23/07/08

Validation

• The gold standard of validation is to use a lower-throughput, high performance, technology such as RTPCR

• Expensive to do, can only validate a small subset of probes

• Need to choose which ones

• Need to decide how many

• The more we anticipate running, the fewer the number of microarrays we can have

Page 41: Microarray Design with an Illumina focus Andy Lynch 23/07/08

Other…

• May wish to include arrays that

– allow for ongoing QC of the microarray facility– gain information to facilitate planning future experiments– ‘complete’ the data set for future data mining

Page 42: Microarray Design with an Illumina focus Andy Lynch 23/07/08

Closing thoughts

Page 43: Microarray Design with an Illumina focus Andy Lynch 23/07/08

Thoughts

• If we are concerned about the block effects, we might want to construct log-ratios within chips

• Can even split the two strips

• If we could successfully control for block effects and batch

effects then sequential designs would potentially play a role

A B

C

DE

F

A B

C

DE

F

A B

C

DE

F

Page 44: Microarray Design with an Illumina focus Andy Lynch 23/07/08

Acknowledgements

Thanks to:

Mark Dunning, Matt Ritchie, Nat Thorne for slides

Illumina for some of the pictures

Ian Mills, Charlie Massie, Mahesh Iddawela for some of the illustrative data