39
Low-Level Low-Level Analysis and QC Analysis and QC Regional Biases Regional Biases Mark Reimers, NCI Mark Reimers, NCI

Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

Low-Level Low-Level Analysis and QCAnalysis and QC

Regional BiasesRegional Biases

Mark Reimers, NCIMark Reimers, NCI

Page 2: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

OutlineOutline

Regional biases on spotted arraysRegional biases on spotted arrays Relation to backgroundRelation to background Measures of biasMeasures of bias

Affy technical variation measuresAffy technical variation measures Dynamic rangeDynamic range RNA degradationRNA degradation

Regional biases on Affymetrix arraysRegional biases on Affymetrix arrays Using bias.display and affyPLM for QCUsing bias.display and affyPLM for QC

Page 3: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

The Quality IssueThe Quality Issue

Frequent outliers in experimentsFrequent outliers in experiments Lack of agreement between labsLack of agreement between labs The hybridization process is complex The hybridization process is complex

and cannot be observed directlyand cannot be observed directly Many factors cannot be optimized for Many factors cannot be optimized for

all reactionsall reactions Statistical QC tools attempt to make Statistical QC tools attempt to make

visible subtle but pervasive effectsvisible subtle but pervasive effects

Page 4: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

What are Regional What are Regional Biases?Biases?

Regions where all genes give consistently higher Regions where all genes give consistently higher reading in one dye than other regions, or the reading in one dye than other regions, or the same region on other slidessame region on other slides Most spots in images are relatively darkMost spots in images are relatively dark Region may not appear brighter in one dye or the other Region may not appear brighter in one dye or the other Biases not obvious by image inspection Biases not obvious by image inspection

Barazsi et al (2003), Qian et al (2003) identified Barazsi et al (2003), Qian et al (2003) identified high correlation between nearby probes in high correlation between nearby probes in Spellman cell-cycle data, other data setsSpellman cell-cycle data, other data sets

Workman et al (2002), Colantuoni et al Workman et al (2002), Colantuoni et al (SNOMAD, 2003) identified regional biases in (SNOMAD, 2003) identified regional biases in cDNA arrays by fitting loess surfaces to ratios cDNA arrays by fitting loess surfaces to ratios across each slideacross each slide

Page 5: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

Visualizing Bias by Visualizing Bias by RatiosRatios

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Display ratios for each spot at constant brightness- easier to see biases

Some slides show bias toward one color in some areas

Page 6: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

A Common StandardA Common Standard Expression ratios vary from spot to spotExpression ratios vary from spot to spot

Harder to see patternsHarder to see patterns Often a series of experiments on a single tissue, Often a series of experiments on a single tissue,

use a common referenceuse a common reference Construct average ratios (tissue typical ratios?) Construct average ratios (tissue typical ratios?) More informative image: spot ratios compared to More informative image: spot ratios compared to

typical ratio for that spot across all slidestypical ratio for that spot across all slides

Page 7: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

Common Reference Common Reference Highlights DifferenceHighlights Difference

Red/Green ratios show variation

Ratios of ratios on slide to ratios on standard show less variation

Page 8: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

Visualizing Bias using Visualizing Bias using StandardStandard

Ratio of ratios shows much clearer concentration of red spots on some slides

Note non-random but highly irregular concentration of red

Page 9: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

Bias and BackgroundBias and Background We observe that local We observe that local

background background contributes to biascontributes to bias

Does subtracting Does subtracting background remove background remove bias?bias?

Local off-spot background Local off-spot background may not be the best may not be the best estimate of spot estimate of spot background (non-specific background (non-specific hyb)hyb)

Spots BG subtracted

Page 10: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

Bias and Background Bias and Background (2)(2)

Raw spot ratios show a mild bias relative to averageAfter subtracting a high green bg in the center a red bias results

Page 11: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

Other Bias PatternsOther Bias Patterns

This spotted oligo array shows strong biases at the beginning and end of each print-tip group

The background shows a milder version of this effect

Subtracting background removes some regional biases while adding bias in other regions

Processed Raw Spot Background

Page 12: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

How to Measure Regional How to Measure Regional Biases?Biases?

Correlation between neighboring probesCorrelation between neighboring probes r = Cor( rr = Cor( ri,ji,j, ( r, ( ri-1,ji-1,j + r + ri+1,ji+1,j + r + ri,j-1i,j-1+ r+ ri,j+1i,j+1)/4 ), )/4 ),

where rwhere ri,ji,j is log ratio relative to standard at row i is log ratio relative to standard at row i column jcolumn j

Red-green ratios: r ~ 0.05-0.1 Red-green ratios: r ~ 0.05-0.1 Ratio to average: r ~ 0.1 - 0.3Ratio to average: r ~ 0.1 - 0.3 For some slides r > 0.5For some slides r > 0.5

Page 13: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

Regional Bias Affects Regional Bias Affects AnalysisAnalysis

A major source of false positives for single A major source of false positives for single slidesslides In some slides half the apparently most up-In some slides half the apparently most up-

regulated genes come from 10% of slide arearegulated genes come from 10% of slide area In replicated experimental samples, In replicated experimental samples,

regional bias results in increased variance regional bias results in increased variance - false negatives- false negatives

In clinical samples, regional bias results In clinical samples, regional bias results in serious distortion of exploratory in serious distortion of exploratory procedures such as clusteringprocedures such as clustering

Page 14: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

Visualizing Other QC Visualizing Other QC MeasuresMeasures

A heat plot of signal/SD ratios shows clearly that some slides and regions are better than others

One persistently bad region in a batch was printed poorly

S/Nratio

Low S/N implies less reliable ratios

Page 15: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

Prospects for Prospects for NormalizationNormalization

Try to fit smooth (loess) surface to Try to fit smooth (loess) surface to ratios to estimate bias.ratios to estimate bias. Workman (2002) finds modest (20%) Workman (2002) finds modest (20%)

improvements in replicates’ variance improvements in replicates’ variance Colantuoni (2003) finds moderate Colantuoni (2003) finds moderate

improvementsimprovements Qian et al (2004) find that SNOMAD Qian et al (2004) find that SNOMAD

does not remove a majority of does not remove a majority of correlation between neighboring probescorrelation between neighboring probes

Page 16: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

Prospects for Prospects for Normalization (2)Normalization (2)

Are ratios described well by smooth Are ratios described well by smooth gradient?gradient? Irregular regions are commonIrregular regions are common Short-range effects Short-range effects Poor prospects for normalization by Poor prospects for normalization by

smoothingsmoothing

Page 17: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

Regional Bias on Affy Regional Bias on Affy ChipsChips

Page 18: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

Current Quality Current Quality MeasuresMeasures

RNA quality RNA quality Gel or BioAnalyzerGel or BioAnalyzer

Affymetrix Microarray Suite:Affymetrix Microarray Suite: 3’/5’ ratios 3’/5’ ratios

Process of reverse transcriptionProcess of reverse transcription

Scaling factor Scaling factor Labeling efficiency (and total RNA)Labeling efficiency (and total RNA)

Per cent present callsPer cent present calls

PM/MM ratiosPM/MM ratios Specificity of hybridizationSpecificity of hybridization Varies with stringency of wash solutionVaries with stringency of wash solution

Page 19: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

Types of Problems Types of Problems UndetectedUndetected

Local Artifacts - scratches, smudgesLocal Artifacts - scratches, smudges Regional Bias - large regions shifted Regional Bias - large regions shifted Hybridization differences causing Hybridization differences causing

differences in dynamic rangedifferences in dynamic range Small differences in RNA Small differences in RNA

degradationdegradation

Page 20: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

Three VariablesThree Variables

RNA QualityRNA Quality RNA degrades rapidly in intact samplesRNA degrades rapidly in intact samples cRNA production may be variablecRNA production may be variable

Hybridization conditionsHybridization conditions Temperature, salinity Temperature, salinity

Defects or uneven conditions on chip Defects or uneven conditions on chip Bubbles spend more time in some placesBubbles spend more time in some places Leading to regional biasesLeading to regional biases

Page 21: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

RNA Degradation PlotRNA Degradation Plot MAS5.0 displays 5’/3’ ratios for selected MAS5.0 displays 5’/3’ ratios for selected

genesgenes Degradation plot displays relative signal Degradation plot displays relative signal

at each position from 5’ to 3’ end of probe at each position from 5’ to 3’ end of probe sequence sequence

AffyRNADeg function in affy package of AffyRNADeg function in affy package of bioconductorbioconductor

Home-crafted plotting functionHome-crafted plotting function

Page 22: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

Amplified RNA Deg. PlotAmplified RNA Deg. Plot

Doubly Doubly amplified amplified cRNAcRNA

Fairly evenFairly even No great No great

discrepancidiscrepancieses

Page 23: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

Hybridization ConditionsHybridization Conditions

Variation in thermodynamics of Variation in thermodynamics of hybridization affectshybridization affects BackgroundBackground Ratios of PM to MMRatios of PM to MM Specificity of hybridizationSpecificity of hybridization Distribution of signals from probesDistribution of signals from probes

Each of these can be investigatedEach of these can be investigated

Page 24: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

Visualizing Probe Visualizing Probe DistributionDistribution

Either as signal distribution (log Either as signal distribution (log scale works best) or as ratiosscale works best) or as ratios

Ratios:Ratios: Construct reference standard: average Construct reference standard: average

each probe over all chips (20% trimmed each probe over all chips (20% trimmed mean)mean)

Log scale works bestLog scale works best Subtract log standard from log probe Subtract log standard from log probe

signalssignals

Page 25: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

Effects of Distribution Effects of Distribution Changes Changes

MDS Plot of ChipsDistribution of Probe Ratios

90122, 90123, 90124, 97444 are replicates

Page 26: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

Local Artifacts, Regional Local Artifacts, Regional BiasBias

Workman et al (2003) identified artifacts by Workman et al (2003) identified artifacts by displaying raw data image on log2 scaledisplaying raw data image on log2 scale Not many scars visible - are the chips that good?Not many scars visible - are the chips that good? Running means of (log2) intensity show little bias Running means of (log2) intensity show little bias Dynamic range - neighboring probes vary 10X to Dynamic range - neighboring probes vary 10X to

100X100X No obvious referenceNo obvious reference

Need to compensate for large dynamic rangeNeed to compensate for large dynamic range

Page 27: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

Visualizing Artifacts by Visualizing Artifacts by RatioRatio

Construct a standard Construct a standard (virtual) chip:(virtual) chip: Trimmed (20%) mean of Trimmed (20%) mean of

each probe across all each probe across all chips chips

Roughly estimates Roughly estimates ‘typical’ level‘typical’ level

Robust: genes highly Robust: genes highly expressed in few expressed in few samples don’t affectsamples don’t affect

Compute ratio of each Compute ratio of each probe on any chip to probe on any chip to corresponding probe on corresponding probe on standard chipstandard chip

Page 28: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

Visualizing Artifacts, Visualizing Artifacts, BiasBias

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Image of raw data on a log2 scale shows striations but no obvious artifacts

Image of ratios of probes to standard shows a smudge

Non-coding probes

Page 29: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

Background and ScaleBackground and Scale

For each region: fit regression lines For each region: fit regression lines to probes on this chip vs to probes on this chip vs corresponding probes on standardcorresponding probes on standard

Intercept and slope may be interpreted as local minimum intensity (background) and sensitivity (scale factor)

Slope ~ 1.4

y=x

Background ~ +10

Page 30: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

Visualizing Bias as BG Visualizing Bias as BG and Scaleand Scale

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 31: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

A Good ChipA Good Chip

Probe ratio image shows small (<5%) elevated region

Background plot shows this artifact mostly in background

Page 32: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

An Acceptable ChipAn Acceptable Chip

Less than 10% of chip area affected in both background and scale

Page 33: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

A Bad ChipA Bad Chip

Half of this chip shows strong biases in background

Page 34: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

Quantifying BiasQuantifying Bias

Compute correlation over the chip Compute correlation over the chip between probe log-intensities and the between probe log-intensities and the averages of the 4 nearest neighborsaverages of the 4 nearest neighbors

Typical ‘good’ Affy chip has Typical ‘good’ Affy chip has correlation of ratios ~.2correlation of ratios ~.2

Some chips have correlations near 0.8Some chips have correlations near 0.8 Horizontal correlation > vertical Horizontal correlation > vertical

correlationcorrelation

Page 35: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

Does Bias Affect Does Bias Affect Measures?Measures?

Affymetrix distributes probes - Affymetrix distributes probes - robust?robust?

Experiment: distort a chip in Experiment: distort a chip in softwaresoftware 10,000 probes raised 2X10,000 probes raised 2X

4% of genes distorted > 0.2 in MAS5 (log2 scale)0.2 % show distortions > 0.2 by RMA (log2 scale)

MAS5 RMA

Page 36: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

Bias Affects Measures - Bias Affects Measures - IIII

Experiment: 50% of probes raised 2X

Page 37: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

Consequences for Consequences for AnalysisAnalysis

A study with 41 chips founders on A study with 41 chips founders on qualityquality

Six groups - color coded in plot at Six groups - color coded in plot at rightright

Several chips seem very atypical for their groups

Page 38: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

QC by affyPLMQC by affyPLM

Robust Multi-chip Analysis (RMA) Robust Multi-chip Analysis (RMA) fits a linear model to each probe setfits a linear model to each probe set

High residuals in green

High residuals show regional patternsMean residuals a global indicator of qualityAvailable in affyPLM package at www.bioconductor.org

Page 39: Low-Level Analysis and QC Regional Biases Mark Reimers, NCI

Current Affy PipelineCurrent Affy Pipeline

Construct standard chip Construct standard chip if few samples, add samples of similar if few samples, add samples of similar

tissuestissues Compute ratios of probes to Compute ratios of probes to

standardstandard Compute correlations of ratiosCompute correlations of ratios Examine imagesExamine images Decide to accept/reject Decide to accept/reject