27
Biology and Cells All living organisms consist of cells. Humans have trillions of cells. Yeast - one cell. Cells are of many different types (blood, skin, nerve), but all arose from a single cell (the fertilized egg) Each cell contains a complete copy of the genome (the program for making the organism), encoded in DNA.

Biology and Cells All living organisms consist of cells. Humans have trillions of cells. Yeast - one cell. Cells are of many different types (blood, skin,

Embed Size (px)

Citation preview

Page 1: Biology and Cells All living organisms consist of cells. Humans have trillions of cells. Yeast - one cell. Cells are of many different types (blood, skin,

Biology and Cells

• All living organisms consist of cells. • Humans have trillions of cells. Yeast - one cell.• Cells are of many different types (blood, skin,

nerve), but all arose from a single cell (the fertilized egg)

• Each cell contains a complete copy of the genome (the program for making the organism), encoded in DNA.

Page 2: Biology and Cells All living organisms consist of cells. Humans have trillions of cells. Yeast - one cell. Cells are of many different types (blood, skin,

DNA

• DNA molecules are long double-stranded chains; 4 types of bases are attached to the backbone: adenine (A), guanine (G), cytosine (C), and thymine (T). A pairs with T, C with G.

• A gene is a segment of DNA that specifies how to make a protein.

• Human DNA has about 30-35,000 genes; • Rice -- about 50-60,000, but shorter genes.

Page 3: Biology and Cells All living organisms consist of cells. Humans have trillions of cells. Yeast - one cell. Cells are of many different types (blood, skin,

Exons and Introns: Data and Logic?

• exons are coding DNA (translated into a protein), which are only about 2% of human genome

• introns are non-coding DNA, which provide structural integrity and regulatory (control) functions

• exons can be thought of program data, while introns provide the program logic

• Humans have much more control structure than rice

Page 4: Biology and Cells All living organisms consist of cells. Humans have trillions of cells. Yeast - one cell. Cells are of many different types (blood, skin,

Gene Expression

• Cells are different because of differential gene expression.

• About 40% of human genes are expressed at one time.

• Gene is expressed by transcribing DNA into single-stranded mRNA

• mRNA is later translated into a protein• Microarrays measure the level of mRNA

expression

Page 5: Biology and Cells All living organisms consist of cells. Humans have trillions of cells. Yeast - one cell. Cells are of many different types (blood, skin,

Gene Expression Measurement

• mRNA expression represents dynamic aspects of cell

• mRNA expression can be measured with latest technology

• mRNA is isolated and labeled with fluorescent protein

• mRNA is hybridized to the target; level of hybridization corresponds to light emission which is measured with a laser

Page 6: Biology and Cells All living organisms consist of cells. Humans have trillions of cells. Yeast - one cell. Cells are of many different types (blood, skin,

Molecular Biology Overview Cell Nucleus

Chromosome

ProteinGraphics courtesy of the National Human Genome Research Institute

Gene (DNA)Gene (mRNA), single strand

Page 7: Biology and Cells All living organisms consist of cells. Humans have trillions of cells. Yeast - one cell. Cells are of many different types (blood, skin,

Gene Expression Microarrays

The main types of gene expression microarrays:

• Short oligonucleotide arrays (Affymetrix);

• cDNA or spotted arrays (Brown/Botstein).

• Long oligonucleotide arrays (Agilent Inkjet);

• Fiber-optic arrays

• ...

Page 8: Biology and Cells All living organisms consist of cells. Humans have trillions of cells. Yeast - one cell. Cells are of many different types (blood, skin,

DNA Chip Microarrays• Put a large number (~100K) of cDNA sequences or synthetic

DNA oligomers onto a glass slide in known locations on a grid.• Label an RNA sample and hybridize (Label 2 RNA samples with

2 different colors of flourescent dye - control vs. experimental) • Mix two labeled RNAs and hybridize to the chip• Measure amounts of RNA bound to each square in the grid• Make comparisons

– Cancerous vs. normal tissue– Treated vs. untreated– Time course

Page 9: Biology and Cells All living organisms consist of cells. Humans have trillions of cells. Yeast - one cell. Cells are of many different types (blood, skin,

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Spot your own Chip (plans available for free from Pat Brown’s website)

Robot spotter

Ordinary glass microscope slide

Page 10: Biology and Cells All living organisms consist of cells. Humans have trillions of cells. Yeast - one cell. Cells are of many different types (blood, skin,

cDNA Spotted Microarrays

Page 11: Biology and Cells All living organisms consist of cells. Humans have trillions of cells. Yeast - one cell. Cells are of many different types (blood, skin,

Affymetrix “Gene chip” system

• Uses 25 base oligos synthesized in place on a chip (20 pairs of oligos for each gene)

• RNA labeled and scanned in a single “color”– one sample per chip

• Can have as many as 20,000 genes on a chip• Arrays get smaller every year (more genes)• Chips are expensive• Proprietary system: “black box” software,

can only use their chips

Page 12: Biology and Cells All living organisms consist of cells. Humans have trillions of cells. Yeast - one cell. Cells are of many different types (blood, skin,

Affymetrix Microarrays

50um

1.28cm

~107 oligonucleotides, half Perfectly Match mRNA (PM), half have one Mismatch (MM)Raw gene expression is intensity difference: PM - MM

Raw image

Page 13: Biology and Cells All living organisms consist of cells. Humans have trillions of cells. Yeast - one cell. Cells are of many different types (blood, skin,
Page 14: Biology and Cells All living organisms consist of cells. Humans have trillions of cells. Yeast - one cell. Cells are of many different types (blood, skin,
Page 15: Biology and Cells All living organisms consist of cells. Humans have trillions of cells. Yeast - one cell. Cells are of many different types (blood, skin,

Data Acquisition

• Scan the arrays

• Quantitate each spot

• Subtract background

• Normalize

• Export a table of fluorescent intensities for each gene in the array

Page 16: Biology and Cells All living organisms consist of cells. Humans have trillions of cells. Yeast - one cell. Cells are of many different types (blood, skin,

Normalization

• Can control for many of the experimental sources of variability (systematic, not random or gene specific)

• Bring each image to the same average brightness

• Can use simple math or fancy - – divide by the mean (whole chip or by sectors)– LOESS (locally weighted regression)

• No sure biological standards

Page 17: Biology and Cells All living organisms consist of cells. Humans have trillions of cells. Yeast - one cell. Cells are of many different types (blood, skin,

Multiple Comparisons

• In a microarray experiment, each gene (each probe or probe set) is really a separate experiment

• Yet if you treat each gene as an independent comparison, you will always find some with significant differences– (the tails of a normal distribution)

Page 18: Biology and Cells All living organisms consist of cells. Humans have trillions of cells. Yeast - one cell. Cells are of many different types (blood, skin,

Microarray Potential Applications

• Biological discovery– new and better molecular diagnostics

– new molecular targets for therapy

– finding and refining biological pathways

• Recent examples– molecular diagnosis of leukemia, breast cancer, ...

– appropriate treatment for genetic signature

– potential new drug targets

Page 19: Biology and Cells All living organisms consist of cells. Humans have trillions of cells. Yeast - one cell. Cells are of many different types (blood, skin,

Microarray Data Analysis Types

• Gene Selection– find genes for therapeutic targets– avoid false positives (FDA approval ?)

• Classification (Supervised)– identify disease (biomaker study)– predict outcome / select best treatment

• Clustering (Unsupervised)– find new biological classes / refine existing ones– Understanding regulatory relationship/pathway– exploration

• …

Page 20: Biology and Cells All living organisms consist of cells. Humans have trillions of cells. Yeast - one cell. Cells are of many different types (blood, skin,

Microarray Data Mining Challenges

• too few records (samples), usually < 100 • too many columns (genes), usually > 1,000• Too many columns likely to lead to False positives• for exploration, a large set of all relevant genes is

desired• for diagnostics or identification of therapeutic

targets, the smallest set of genes is needed• model needs to be explainable to biologists

Page 21: Biology and Cells All living organisms consist of cells. Humans have trillions of cells. Yeast - one cell. Cells are of many different types (blood, skin,

Data Preparation Issues

• Thresholding: usually min 20, max 16,000– For older Affy chips (new Affy chips do not have negative

values)

• Filtering - remove genes with insufficient variation– e.g. MaxVal - MinVal < 500 and MaxVal/MinVal < 5– biological reasons– feature reduction for algorithmic

• For clustering, normalize each gene (sample) separately to Mean = 0, Std. Dev = 1

Page 22: Biology and Cells All living organisms consist of cells. Humans have trillions of cells. Yeast - one cell. Cells are of many different types (blood, skin,

Normalization issues

Within-slide– What genes to use– Location– Scale

Paired-slides (dye swap)– Self-normalization

Between slides

Page 23: Biology and Cells All living organisms consist of cells. Humans have trillions of cells. Yeast - one cell. Cells are of many different types (blood, skin,

Control RNA Sample Test RNA Sample

Hybridization to microarray filters

Use Phosphor Imager laser scanner to obtain densities of each spot on filter.

radio-labelled

cDNA probes

Reverse-Transcription

Compare densities at each spot to determine if treatment changes gene expression. Compile subset of differentially expressed genes.

Gene Control Test A 1X 3X : : : Z 1X 0.5X

Page 24: Biology and Cells All living organisms consist of cells. Humans have trillions of cells. Yeast - one cell. Cells are of many different types (blood, skin,

Normalization continued• Intensity-dependent normalization (Yang, YH, 2002 )

– Do M-A plot to check the data distribution, where

– Use Lowess function in R to perform normalization

where c(A) is the lowess fit to the M-A plot

– Transform data by M'=M - c(A). – Locally nonparametric method and is robust to a small

number of differentially expressed genes.

CTAandCTM *log/log 22

)/(log)(/log/log 222 kCTAcCTCT

Page 25: Biology and Cells All living organisms consist of cells. Humans have trillions of cells. Yeast - one cell. Cells are of many different types (blood, skin,

(R,G) (M,A) Transformation

“Observed” data {(R,G)}

R = red channel signal

G = green channel signal

(background corrected or not)

Transformed data {(M,A)}

M = log2(R/G) (ratio),

A = log2(R·G)1/2 = 1/2·log2(R·G) (intensity)

R=(22A+M)1/2, G=(22A-M)1/2

Page 26: Biology and Cells All living organisms consist of cells. Humans have trillions of cells. Yeast - one cell. Cells are of many different types (blood, skin,

Normalization• Regression normalization:

– Fit the linear regression model:– Assumption: all the genes on the array have the same

variance (homogeneity)

– Test the significance of the intercept . Fit a linear regression without if it is insignificant.

– Transform the treatment data:– Problem:

• assumption may not hold• nonlinear trend (the third replicates of RL95 data has a slight

quadratic trend) .

iii xy

ii

yy

Page 27: Biology and Cells All living organisms consist of cells. Humans have trillions of cells. Yeast - one cell. Cells are of many different types (blood, skin,

Scatter plot of log intensity before and after regression normalization

2 3 4 5 6 7

234567

scatter plot of DMSO vs BAP

log(dmso1)

log(bap1)

2 3 4 5 6 7 8

24

68

scatter plot of DMSO vs BAP

log(dmso2)

log(bap2)

0 2 4 6 8

13

57

scatter plot of DMSO vs BAP

log(dmso3)

log(bap3)

2 3 4 5 6 7

234567

scatter plot after norm

log(dmso1)

log(bap1)

2 3 4 5 6 7 8

24

68

scatter plot after norm

log(dmso2)log(bap2)

2 3 4 5 6 7 8

13

57

scatter plot after norm

log(dmso3)

log(bap3)