39
Gene Array Analysis Statistical genetics - Class 10 Gene array description Normalization Data Analysis Multiple measurements

Gene Array Analysis

  • Upload
    avak

  • View
    69

  • Download
    0

Embed Size (px)

DESCRIPTION

Gene Array Analysis. Statistical genetics - Class 10 Gene array description Normalization Data Analysis Multiple measurements. What is a gene array. - PowerPoint PPT Presentation

Citation preview

Page 1: Gene Array Analysis

Gene Array Analysis

Statistical genetics - Class 10

Gene array description

Normalization

Data Analysis

Multiple measurements

Page 2: Gene Array Analysis

What is a gene array

Gene arrays are solid supports upon which a collection of gene-specific nucleic acids have been placed at defined locations, either by spotting or direct synthesis.

In array analysis, a nucleic acid-containing sample is labeled and then allowed to hybridize with the gene-specific targets on the array.

Based on the amount of probe hybridized to each target spot, information is gained about the specific nucleic acid composition of the sample.

The major advantage of gene arrays is that they can provide information on thousands of targets in a single experiment.

Page 3: Gene Array Analysis

Nomenclature

Many terms exist for naming gene arrays, including: biochip, DNA chip, GeneChip (a registered trademark of Affymetrix, Inc.), DNA array, microarray macroarray.

Microarray and macroarray may be used to differentiate between spot size or the number of spots on the support.

Glass Support

Page 4: Gene Array Analysis

Experiment

A typical gene array experiment involves: 1. Isolating RNA from the samples to be compared 2. Converting the RNA samples to labeled cDNA via

reverse transcription; this step may be combined with aRNA amplification

3. Hybridizing the labeled cDNA to identical membrane or glass slide arrays

4. Removing the unhybridized cDNA 5. Detecting and quantitating the hybridized cDNA6. Comparing the quantitative data from the various

samples

Page 5: Gene Array Analysis

General Picture

Page 6: Gene Array Analysis

Choosing Cell Populations

The goal of comparative cDNA hybridization is to compare gene transcription in two or more different kinds of cells. For example:

Tissue-specific Genes - Cells from two different tissues (say, cardiac muscle and prostate epithelium) are specialized for performing different functions in an organism. Although we can recognize cells from different tissues by their phenotypes, it is not known just what makes one cell function as smooth muscle, another as a neuron, and still another as prostate.

Ultimately, a cell's role is determined by the proteins it produces, which in turn depend on its expressed genes. Comparative hybridization experiments can reveal genes which are preferentially expressed in specific tissues.

Page 7: Gene Array Analysis

Choosing Cell Populations

Genetic disease is often caused by genes which are inappropriately transcribed -- either too much or too little -- or which are missing altogether.

Such defects are especially common in cancers, which can occur when regulatory genes are deleted, inactivated, or become constitutively active.

Unlike some genetic diseases (e.g. cystic fibrosis) in which a single defective gene is always responsible, cancers which appear clinically similar can be genetically heterogeneous.

For example, prostate cancer (prostatic adenocarcinoma) may be caused by several different, independent regulatory gene defects even in a single patient.

Page 8: Gene Array Analysis

Choosing Cell Populations

Cell Cycle Variations Cells undergo DNA replication, mitosis, and eventually

death. These activities require quite different gene products, such as DNA polymerases for genome replication or microtubule spindle proteins for mitosis. A cell's genes encode the "programs" for these activities, and gene transcription is required to execute those programs. Comparative hybridization can be used to distinguish genes that are expressed at different times in the cell cycle. In this way, the pathways responsible for controlling basic life processes can be uncovered.

Page 9: Gene Array Analysis

mRNA Extraction

Genes which code for protein are transcribed into messenger RNA's (mRNA's) in the cell nucleus. The mRNA's in turn are translated into proteins by ribosomes in the cytoplasm. The transcription level of a gene is taken to be the amount of its corresponding mRNA present in the cell. Comparative hybridization experiments compare the amounts of many different mRNA's in two cell populations.

Page 10: Gene Array Analysis

mRNA Extraction

To prepare mRNA for use in a microarray assay, it must be purified from total cellular contents. mRNA accounts for only about 3% of all RNA in a cell.

Common mRNA isolation methods take advantage of the fact that most mRNA's have a poly-adenine (poly(A)) tail. These poly(A)+ mRNA's can be purified by capturing them using complementary oligodeoxythymidine (oligo(dT)) molecules bound to a solid support.

Page 11: Gene Array Analysis

Reverse transcription

Captured mRNA's are still difficult to work with because they are prone to being destroyed.

The environment is full of RNA-digesting enzymes, so free RNA is quickly degraded. To prevent the experimental samples from being lost, they are reverse-transcribed back into more stable DNA form. The products of this reaction are called complementary DNA's (cDNA's) because their sequences are the complements of the original mRNA sequences.

Page 12: Gene Array Analysis

Reverse transcription

A problem with cDNA production is that not all mRNA's are reverse-transcribed with the same efficiency. This fact leads to reverse transcription bias, which can change the relative amounts of different cDNA's measured by the microarray assay.

Reverse transcription bias is not a problem when comparing the same mRNA across two cell populations unless it causes the mRNA not to be transcribed at all.

However, the bias does prohibit quantitative comparison between different mRNA's on one array.

Page 13: Gene Array Analysis

Fluorescent labeling of cDNA's

In order to detect cDNA's bound to the microarray, we must label them with a reporter molecule that identifies their presence. The reporters currently used in comparative hybridization to microarrays are fluorescent dyes (fluors).

A differently-colored fluor is used for each sample so that we can tell the two samples apart on the array. The labeled cDNA samples are called probes because they are used to probe the collection of spots on the array.

Fluors do not show their colors unless stimulated with a specific frequency of light by a laser. Even then, the colors are not directly observed; rather, the wavelength of the emitted light is used to tune a detector which measures the fluorescence.

Page 14: Gene Array Analysis

Normalization

The number of fluor molecules which label each cDNA depends on its length and possibly its sequence composition, both of which are often unknown.

This is one more reason that fluorescent intensities for different cDNA's cannot be quantitatively compared. However, identical cDNA's from the two probes are still comparable as long as the same number of label molecules are added to the same DNA sequence in each probe.

Page 15: Gene Array Analysis

Normalization

To equalize the total concentrations of the two cDNA probes before applying them to the array, the probe solutions are diluted to have the same overall fluorescent intensity.

This procedure makes two possibly unjustified assumptions: 1. that the total amount of mRNA in each cell type

being tested is identical2. that each fluor emits the same amount of light

relative to its concentration.

Page 16: Gene Array Analysis

Hybridization to a DNA Microarray

The two cDNA probes are tested by hybridizing them to a DNA microarray.

The array holds hundreds or thousands of spots, each of which contains a different DNA sequence.

In this way, every spot on an array is an independent assay for the presence of a different cDNA. There is enough DNA on each spot that both probes can hybridize to it at once without interference.

Microarrays are made from a collection of purified DNA's. A drop of each type of DNA in solution is placed onto a specially-prepared glass microscope slide by an arraying machine. The arraying machine can quickly produce a regular grid of thousands of spots in a square about 2 cm on a side

Page 17: Gene Array Analysis

Scanning the Hybridized Array

Once the cDNA probes have been hybridized to the array and any loose probe has been washed off, the array must be scanned to determine how much of each probe is bound to each spot.

The probes are tagged with fluorescent reporter molecules which emit detectable light when stimulated by a laser.

The emitted light is captured by a detector,usualy a charge-coupled device (CCD).

Spots with more bound probe will have more reporters and will therefore fluoresce more intensely.

The scanner also records light from a few molecules that hybridized either to the wrong spot or nonspecifically to the glass slide. This extra light becomes the background of the scanned array image.

Page 18: Gene Array Analysis

Affymetrix arrays

• 107copies per oligo in 24 x 24 um square

• Use 20 pairs of different 25-mers per gene• Perfect match and mismatch

Page 19: Gene Array Analysis

Data Analysis

Normalization Detection of outliers Clustering Multiple measurments

Page 20: Gene Array Analysis

False color images of spotted array

Overlay of two scans of the slide Compares the two samples Green = less relative expression Red = more relative expression Yellow = equal expression Dimmer colors = lower expression levels.

Page 21: Gene Array Analysis

Normalizing two-color arrays

before after

The signals for the two colors are rarely “balanced”.

Page 22: Gene Array Analysis

Normalization

Cy3 signal (log2)

Cy5

sig

nal (

log 2

)

Page 23: Gene Array Analysis

Normalization by iterative linear regression

fit a line (y=mx+b) to the data set

set aside outliers (residuals > 2 x s.e.)

repeat until r2 changes by

< 0.001

then apply slope and intercept to

the original dataset

D Finkelstein et al. http://www.camda.duke.edu/CAMDA00/abstracts.asp

Page 24: Gene Array Analysis

Normalization (Linear)

Cy3 signal (log2)

Cy5

sig

nal (

log 2

)

Page 25: Gene Array Analysis

Normalization (Linear)

Cy3 signal (log2)

Cy5

sig

nal (

log 2

)

Page 26: Gene Array Analysis

average signal {log2 (Cy3 + Cy5)/2}

rati

o {

log

2 (C

y5 /

Cy3

)} Loess function fit line

0

Normalization (Curvilinear)

G Tseng et al., NAR 2001

Page 27: Gene Array Analysis

LOESS function

To use LOESS, the user must specify the degree, d, of the local polynomial to be fit to the data, and the fraction of the data, q, to be used in each fit. In this case, the simplest possible initial function specification is d=1 and q=1. While it is relatively easy to understand how the degree of the local polynomial affects the simplicity of the initial model, it is not as easy to determine how the smoothing parameter affects the function.

Page 28: Gene Array Analysis

LOESS function

The weight function gives the most weight to the data points nearest the point of estimation and the least weight to the data points that are furthest away. The use of the weights is based on the idea that points near each other in the explanatory variable space are more likely to be related to each other in a simple way than points that are further apart. Following this logic, points that are likely to follow the local model best influence the local model parameter estimates the most. Points that are less likely to actually conform to the local model have less influence on the local model parameter estimates. The traditional weight function used for LOESS is the tri-cube weight function,

Page 29: Gene Array Analysis
Page 30: Gene Array Analysis

Image Analysis

2 images per array

Super-imposing

Grid on image

Clone Id Ratio1 1.52 0.8… …

Page 31: Gene Array Analysis

Gene Ratios

Gene expression levels determined by intrinsic properties of each gene

low high expression level

Gene A Gene B

Page 32: Gene Array Analysis

Statistical Analysis

Differences in ratios due to random variation meaningful changes

Hypothesis testing, with H0: no systematic differences between ratios

Page 33: Gene Array Analysis

Most Basic Statistical Analysis

Assumptions ‘red’ and ‘green’ intensities at a given gene

~ i.i.N.d with common variance constant coefficient of variation over the whole

gene set

Page 34: Gene Array Analysis

Statistical Analysis

with Tk = Rk / Gk ,

22

2

22

2

12

1exp

21

11

tc

t

tc

tttf

kT

with c: coefficient of variation, estimated from data

According to Chen et al. 1997 (J Biomedical Optics, 2(4):364)

Page 35: Gene Array Analysis

Statistical Analysis

Classification with hypothesis testingunder-expressed over-expressed

/2 /2

3 classes of genes

Page 36: Gene Array Analysis
Page 37: Gene Array Analysis

Fold Change Graphs

How many times did the expression of this gene change in the treated tissue versus the control?

comparison analysis requires experiment vs control does not apply to absolute analysis parameter value in one vs another Avg diff (perfect match vs mismatch)

Page 38: Gene Array Analysis

Fold Change of Average Difference

Page 39: Gene Array Analysis

Noise and Repeats

>90% 2 to 3 fold Multiplicative

noise Repeat experiments Log scale

dist(4,2)=dist(2,1)

log – log plot