44
Image Analysis Class web site: http://statwww.epfl.ch/davison/teaching/Microarr Statistics for Microarrays

Image Analysis Class web site: Statistics for Microarrays

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Image Analysis Class web site:  Statistics for Microarrays

Image Analysis

Class web site:

http://statwww.epfl.ch/davison/teaching/Microarrays/ETHZ/

Statistics for Microarrays

Page 2: Image Analysis Class web site:  Statistics for Microarrays

Biological questionDifferentially expressed genesSample class prediction etc.

Testing

Biological verification and interpretation

Microarray experiment

Estimation

Experimental design

Image analysis

Normalization

Clustering Discrimination

R, G

16-bit TIFF files

(Rfg, Rbg), (Gfg, Gbg)

Page 3: Image Analysis Class web site:  Statistics for Microarrays

Microarray Experiment

Page 4: Image Analysis Class web site:  Statistics for Microarrays

Scanner

Laser

PMT

DyeGlass Slide

Objective Lens

Detector lens

Pinhole

Beam-splitter

Page 5: Image Analysis Class web site:  Statistics for Microarrays

Scanner Process

Dye Photons Electrons SignalLaser PMT

A/DConvertor

excitation amplification FilteringTime-spaceaveraging

Page 6: Image Analysis Class web site:  Statistics for Microarrays

Yeast genome on a chip

Page 7: Image Analysis Class web site:  Statistics for Microarrays

Quantification of Expression

For each spot on the slide, calculate

Red intensity = Rfg - Rbg

(fg = foreground, bg = background) and

Green intensity = Gfg - Gbg

and combine them in the log (base 2) ratio

Log2(Red/Green)

Page 8: Image Analysis Class web site:  Statistics for Microarrays

Practical Problems 1

Comet Tails

• Likely caused by insufficiently rapid immersion of the slides in the succinic anhydride blocking solution

Page 9: Image Analysis Class web site:  Statistics for Microarrays

Practical Problems 2

Blotches, unequal spot sizes, overlapping spots

Page 10: Image Analysis Class web site:  Statistics for Microarrays

Practical Problems 3

High Background• 2 likely causes:

– Insufficient blocking

– Precipitation of the labeled probe

Weak Signals

Page 11: Image Analysis Class web site:  Statistics for Microarrays

Practical Problems 4

Spot overlap• Likely cause: Too much rehydrationduring post -processing.

Page 12: Image Analysis Class web site:  Statistics for Microarrays

Practical Problems 5

DustDust

Page 13: Image Analysis Class web site:  Statistics for Microarrays

Images from scanner

• Resolution– standard 10m [currently, max 5m]– 100m spot on chip = 10 pixels in diameter

Image format

– TIFF (tagged image file format) 16 bit (65,536 levels of gray)

– 1cm x 1cm image at 16 bit = 2Mb (uncompressed)– other formats exist e.g. SCN (used at Stanford)

• Separate image for each fluorescent sample– channel 1, channel 2, etc.

Page 14: Image Analysis Class web site:  Statistics for Microarrays

Images in analysis software

• The two 16-bit images (Cy3, Cy5) are compressed into 8-bit images

• Display fluorescence intensities for both wavelengths using a 24-bit RGB overlay image

• RGB image :– Blue values (B) are set to 0 – Red values (R) are used for Cy5 intensities– Green values (G) are used for Cy3

intensities

• Qualitative representation of results

Page 15: Image Analysis Class web site:  Statistics for Microarrays

Images : examples

Cy3

Cy5

Spot color Signal strength Gene expression

yellow Control = Treated unchanged

red Control < Treated induced

green Control > Treated repressed

Pseudo-color overlay

Page 16: Image Analysis Class web site:  Statistics for Microarrays

Steps in Images Processing

• Addressing (or Gridding)– Assigning coordinates to each spot

• Segmentation

– Classification of pixels as either foreground (signal) or background

• Information Extraction– Foreground fluorescence intensity pairs (R, G)– Background intensities– Quality measures

Page 17: Image Analysis Class web site:  Statistics for Microarrays

Addressing This is the process of

assigning coordinates to each of the spots.

Automating this part of the

procedure permits high throughput analysis.

4 by 4 grids19 by 21 spots per grid

Page 18: Image Analysis Class web site:  Statistics for Microarrays

Addressing

4 by 4 grids

Within the same batch of print runs; estimate translation of grids

Other problems:— Misregistration— Rotation— Skew in the array

Page 19: Image Analysis Class web site:  Statistics for Microarrays

Addressing — Registration

Page 20: Image Analysis Class web site:  Statistics for Microarrays

Problems in automatic addressing

Misregistration of the red and green channels

Rotation of the array in the imageSkew in the array

Rotation

Rotation

Page 21: Image Analysis Class web site:  Statistics for Microarrays

ScanAlyze

• Parameters to address spot positions– Separation between rows and

columns of grids– Individual translation of grids– Separation between rows and

columns of spots within each grid– Small individual translation of

spots– Overall position of the array in

the image

Addressing (I)• Basic structure of images known

(determined by the arrayer)

Page 22: Image Analysis Class web site:  Statistics for Microarrays

Addressing (II)

• The measurement process depends on the addressing procedure

• Addressing accuracy can be enhanced by allowing user intervention (at the cost of time)

• Most software systems now provide for both manual and automatic gridding procedures

Page 23: Image Analysis Class web site:  Statistics for Microarrays

Segmentation• Classification of pixels as foreground

or background fluorescence intensities are calculated for each spot as measure of transcript abundance

• Production of a spot mask : set of foreground pixels for each spot

Page 24: Image Analysis Class web site:  Statistics for Microarrays

Segmentation Methods

• Fixed circles

• Adaptive circles

• Adaptive shape – Edge detection – Seeded Region Growing (R. Adams and

L. Bishof (1994): Regions grow outwards from seed points preferentially according to the difference between a pixel’s value and the running mean of values in an adjoining region

• Histogram methods

Page 25: Image Analysis Class web site:  Statistics for Microarrays

Segmentation Methods in some programs

Fixed circle

ScanAlyze, GenePix, QuantArray

Adaptive circle

GenePix, Dapple, SignalViewer (uses ellipse)

Adaptive shape

Spot, region growing and watershed

Histogram ImaGene, QuantArray, DeArray and adaptive thresholding

Page 26: Image Analysis Class web site:  Statistics for Microarrays

Fixed circle segmentation

• Fits a circle with a constant diameter to all spots in the image

• Easy to implement

• The spots should be of the same shape and size

May not be goodfor this example

Page 27: Image Analysis Class web site:  Statistics for Microarrays

Adaptive circle segmentation

• The circle diameter is estimated separately for each spot

• Dapple finds spots by detecting edges of spots (second derivative)

• Problematic if spot exhibits oval shapes

Page 28: Image Analysis Class web site:  Statistics for Microarrays

Limitation of circular segmentation

—Small spot—Not circular

Results from SRG

Page 29: Image Analysis Class web site:  Statistics for Microarrays

Limitation of fixed circles

SRG Fixed Circle

Page 30: Image Analysis Class web site:  Statistics for Microarrays

Adaptive shape segmentation

• Specification of starting points or seeds•Bonus: already know geometry of array

• Regions grow outwards from the seed points preferentially according to the difference between a pixel’s value and the running mean of values in an adjoining region

Page 31: Image Analysis Class web site:  Statistics for Microarrays

Seeds

Page 32: Image Analysis Class web site:  Statistics for Microarrays

Histogram segmentation• Choose target mask larger than any

spot

• Fg and bg intensities determined from the histogram of pixel values for pixels within the masked area

• Example : QuantArray – Background : mean between 5th and 20th percentile– Foreground : mean between 80th and 95th percentile

• May not work well when a large target mask is set to compensate for variation in spot size

Bkgd Foreground

Page 33: Image Analysis Class web site:  Statistics for Microarrays

Information Extraction

• Spot Intensities– mean of pixel intensities– median of pixel intensities– Pixel variation (e.g. IQR)

• Background values– None– Local – Constant (global)– Morphological opening

• Quality Information

Take the average

Page 34: Image Analysis Class web site:  Statistics for Microarrays

Background intensity

• The measured fluorescence intensity includes a contribution of non-specific hybridization and other chemicals on the glass

• Fluorescence from regions not occupied by DNA should be different from regions occupied by DNA

one solution is to use local negative controls (spotted DNA that should not hybridize)

Page 35: Image Analysis Class web site:  Statistics for Microarrays

BG: None

• Do not consider the background

– Probably not accurate in many cases, but may be better than some forms of local background determination

Page 36: Image Analysis Class web site:  Statistics for Microarrays

BG: Local• Focusing on small regions surrounding the spot mask

• Median of pixel values in this region

• Most software package implement such an approach

ImaGene Spot, GenePixScanAlyze

• By not considering the pixels immediately surrounding the spots, the background estimate is less sensitive to the performance of the segmentation procedure

Page 37: Image Analysis Class web site:  Statistics for Microarrays

BG: Constant

• Global method which subtracts a constant background for all spots

• Some evidence that the binding of fluorescent dyes to ‘negative control spots’ is lower than the binding to the glass slide

More meaningful to estimate background based on a set of negative control spots– If no negative control spots :

approximation of the average background =third percentile of all the spot foreground values

Page 38: Image Analysis Class web site:  Statistics for Microarrays

BG: Morphological opening• Non-linear filtering, used in Spot

• Use a square structuring element with side length at least twice as large as the spot separation distance

• Compute local minimum filter, then compute local maximum filter

– This removes all spots and generates an image that is an estimate of the background for the entire slide

• For individual spots, the background is estimated by sampling this background image at the nominal center of the spot

• Lower, less variable bg estimate

Page 39: Image Analysis Class web site:  Statistics for Microarrays

Background matters

From Spot From GenePix

Page 40: Image Analysis Class web site:  Statistics for Microarrays

Quality Measurements• Array

– Correlation between spot intensities– Percentage of spots with no signals– Distribution of spot signal area

• Spot– Signal / Noise ratio– Variation in pixel intensities– Identification of “bad spot” (spots with no signal)

• Ratio (2 spots combined)– Circularity

• Flag or weight spots based on these (or other appropriate) criteria

Page 41: Image Analysis Class web site:  Statistics for Microarrays

Quality of Array

Distribution of areas- Judge by eye- Look at variation. (e.g. SD)

Cy3 area

• mean 57

•median 56

•SD 20.67

Cy5 area

• mean 59

• median 57

• SD 24.34

Page 42: Image Analysis Class web site:  Statistics for Microarrays

Summary• The choice of background

correction method often has a larger impact on the log-intensity ratios than the segmentation method used

• The morphological opening method provides a better estimate of background than other methods– Low within- and between-

slide variability of the log2 R/G

• Background adjustment has a larger impact on low intensity spots

Spot, GenePix

ScanAlyze

M = log2 R/G

A = log2 √(R•G)

Page 43: Image Analysis Class web site:  Statistics for Microarrays

Selected references

• Yang, Y. H., Buckley, M. J., Dudoit, S. and Speed, T. P. (2001), ‘Comparisons of methods for image analysis on cDNA microarray data’. Technical report #584, Department of Statistics, University of California, Berkeley.http://www.stat.berkeley.edu/users/terry/zarray/Html/papersindex.html

• Yang, Y. H., Buckley, M. J. and Speed, T. P. (2001), ‘Analysis of cDNA microarray images’. Briefings in bioinformatics, 2 (4), 341-349.Excellent review in concise format!

Page 44: Image Analysis Class web site:  Statistics for Microarrays

Acknowledgments

Terry SpeedTerry SpeedMichael BuckleyMichael BuckleySandrine DudoitSandrine DudoitNatalie RobertsNatalie RobertsBen BolstadBen BolstadBrian StevensonBrian Stevenson

CSIRO Image Analysis GroupRyan LagerstormRichard BeareHugues TalbotKevin Cheong

Matt Callow (LBL)

Percy Luu (USB)Dave Lin (USB)Vivian Pang (USB)Elva Diaz (USB)