Thanks to the Lipper Center for Computational Genetics

Preview:

DESCRIPTION

Array quantitation for modeling mutations affecting RNA, protein interactions & cell proliferation. CHI Macroresults through Microarrays 3. George Church 1-May-02. Thanks to the Lipper Center for Computational Genetics Government and private grant agencies: NHLBI, - PowerPoint PPT Presentation

Citation preview

Thanks to the Lipper Center for Computational Genetics

Government and private grant agencies: NHLBI,

NSF, ONR, DOE, DARPA, HHMI, Armenise

Corporate collaborators & sponsors:

Affymetrix, GTC, Mosaic, Aventis, Dupont, Cistran

CHI Macroresults through Microarrays 3

George Church 1-May-02

Array quantitation for modeling mutations affecting RNA, protein interactions & cell

proliferation.

gggatttagctcagttgggagagcgccagactgaa gatttg gaggtcctgtgttcgatccacagaattcgcacca

Post- 300 genomes &

3D structures

DNA RNA Protein: in vivo & in vitro interactions

Metabolites

Replication rate

Environment

Biosystems Measures & Models

Microbes Cancer & stem cells DarwinianIn vitro replicationSmall multicellular organisms

RNAiInsertionsSNPs

Functional Genomics Challenges • Systems dynamics and optimality modeling.• Multiple genetic domains per gene: high density readout of whole genome mutant phenotypes.• Multiple RNAs & regulatory proteins per gene.• Many causative genes & haplotypes per disease.

• Polony RNA exon-typing• Multiplex in situ RNA & protein analyses • Automated differentiation• Homologous recombination genome engineering

Human Red Blood CellODE model200 measured parameters

GLCe GLCi

G6P

F6P

FDP

GA3P

DHAP

1,3 DPG

2,3 DPG

3PG

2PG

PEP

PYR

LACi LACe

GL6P GO6P RU5PR5P

X5P

GA3P

S7P

F6P

E4P

GA3P F6P

NADPNADPH

NADPNADPH

ADPATP

ADPATP

ADP ATPNADHNAD

ADPATP

NADHNAD

K+

Na+

ADP

ATPADP

ATP

2 GSH GSSGNADPH NADP

ADO

INO

AMP

IMPADOe

INOe

ADE

ADEeHYPX

PRPP

PRPP

R1P R5PATP

AMPATP

ADP

Cl-

pH

HCO3-

Jamshidi, Edwards, Fahland, Church, Palsson, B.O. (2001) Bioinformatics 17: 286.(http://atlas.med.harvard.edu/gmc/rbc.html)

Modeling suboptimality:

Segre, Edwards, Vitkup

0 20 40 60 80 100 120 140 160 180 200

0

20

40

60

80

100

120

140

160

180

200

12

3

4 56

7

8

9

10

11121314

15

16

1718

Sauer wild type

LP w

tSauer data and FBA fluxes comparison

Wild type, C 0.4-limited CC=0.97

Cal

cult

ed F

lux

Calculated & Observed Fluxes in wt

Observed Fluxes in wt

Replication rate of a whole-genome set of mutants

Badarinarayana, et al. (2001) Nature Biotech.19: 1060

Replication rate challenge met: multiple homologous domains

 

1 2 3

1 2 3

thrA

metL

1.1 6.7

1.8 1.8

1 2lysC

10.4

 

  

probes

Selective disadvantage in minimal media

Multiple mutations per gene

Correlation between two selection experiments

Badarinarayana, et al. (2001) Nature Biotech.19: 1060

Comparison of selection data with Flux Balance Optimization predictions on 488 genes

predictions number of genes

negatively selected

not negatively selected

essential 143 80 63

reduced growth rate

46 24 22

non essential

299 119 180

P-value Chi Square = 0.004

>

<

Novelduplicates?

Positioneffects, toxin

accumulation, non-opt?

DNA RNA Protein: in vivo & in vitro interactions

Metabolites

Replication rate

Environment

Biosystems Measures & Models

microbescancer & stem cellsIn vitro replicationsmall multicellular organisms

RNAiInsertionsSNPs

RNA quantitation issues

Small fold changes in RNA are important. Example: 1.5-fold in trisomies.

Cross-hybridizing RNAs. Alternative RNAs, gene families.

Mixed tissues.In situ hybridization has low multiplex.

Gene Expression database Aach, Rindone, Church, (2000) Genome Research 10: 431-445.

• Microarrays1

• Affymetrix2

• Lynx-MPSS3, SAGE4

experiment

control • R/G ratios

• R, G values

• quality indicators

ORF

ORF

PMMM

• Averaged PM-MM

• “presence”

• feature statistics

• 25-mers

• Counts of 14-mers sequence tags for each ORF

1 DeRisi, et.al., Science 278:680-686 (1997)2 Lockhart, et.al., Nat Biotech 14:1675-1680 (1996)3 Brenner et al. Massively Parallel Signature Sequencing, Nat Biotechnol. 18:630-4 (2000)4 Velculescu, et.al, Serial Analysis of Gene Expression, Science 270:484-487 (1995)

agactagcag

RNA Cluster Analyses: Cell Cycle

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

3

Replication & DNA synthesis (2)

s.d

. fr

om

mean

MCB SCB

0

20

40

60

80

100

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 3005

101520253035

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

CLUSTERCLUSTER

Nu

mb

er o

f O

RF

s

05

1015

2025

3035

Distance from ATG (b.p.)

Nu

mb

er o

f si

tes

02468

1012141618

Distance from ATG (b.p.)

Nu

mb

er o

f si

tes

Nu

mb

er o

f O

RF

s

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

3

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

MIPS Functional category (total ORFs) ORFs withinfunctional category

(k)

P-value-Log10

DNA synthesis and replication (82)Cell cycle control and mitosis (312)Recombination and DNA repair (84)Nuclear organization (720)

23301140

16854

N = 186

Tavazoie, et al. 1999 Nature Genetics 22:281.

(homeobox gene Crx-/-)

Livesey, Furukawa, Steffen, Church, Cepko (2000) Current Biol. 10:301.

sp

Combining mouse knockouts with

RNA array analysis

DNA RNA Protein: in vivo & in vitro interactions

Metabolites

Replication rate

Environment

Biosystems Measures & Models

microbescancer & stem cellsIn vitro replicationsmall multicellular organisms

RNAiInsertionsSNPs

ds-DNA ds-DNA arrayarray

HMS: Martha Bulyk, Xiaohua Wang, Martin Steffen HMS: Martha Bulyk, Xiaohua Wang, Martin Steffen MRC: Yen ChooMRC: Yen Choo

Combinatorial arrays for binding constantsHuman/Mouse EGR1

Combinatorial DNA-binding

protein domains

ds-DNA ds-DNA arrayarray

Phage

pVIIIpVIII

pIIIpIII

Antibodies

Combinatorial arrays for binding constants

PhycoerythrinPhycoerythrin- 2º IgG- 2º IgG

Combinatorial DNA-binding

protein domains

ds-DNA ds-DNA arrayarray

Martha Bulyk et alMartha Bulyk et al

Phage

Combinatorial arrays for binding constants

Isalan et al., Biochemistry (‘98) 37:12026-12033

Interactions of Adjacent Basepairs in EGR1 Interactions of Adjacent Basepairs in EGR1 Zinc Finger DNA RecognitionZinc Finger DNA Recognition

high [DNA](+) ctrl sequence

for wt binding

alignment oligos

etc.

Wildtype EGR1 MicroarrayWildtype EGR1 Microarray

WildtypeWildtypeRSDHLTTRSDHLTT

RGPDLARRGPDLARREDVLIRREDVLIR

LRHNLETLRHNLET

TGG 2.8 nM

GCG 16 nM

2.5 nM

TAT 5.7 nM

AAA,AAT,ACT,AGA,AGC,AGT,CAT,CCT,CGA,CTT,TTC,TTT

AAT 240 nM

KASNLVSKASNLVS

Motifs weight all 64 Kaapp

DNA RNA Protein: in vivo & in vitro interactions

Metabolites

Replication rate

Environment

Biosystems Measures & Models

microbescancer & stem cellsIn vitro replicationsmall multicellular organisms

RNAiInsertionsSNPs

Common diseases: billions of “new” allelesplus a millions of balanced polymorphisms

• 60 new mutations per generation * 5,000 generations since major bottleneck(s) which set up the linkage patterns (=300,000 per genome)

• Each of the 3 Gbp in the genome exist in all SNP forms: A,C,G,T, 600,000 of each SNP on earth (spread over the common haplotypes).The population frequency will be <0.01%. (Aach et al, 2001 Nature 409: 856)

• Functional genomics (FG) may provide better leads for therapies & diagnostics. (Accuracy goal 1 ppb?)

Projected costs affect our view of what is possible.

In 1985, the dawn of the genome project, $10 per bp, would have been $30B per genome.In 2002, Perlegen or Lynx: $3M (103 bits/$, 4 logs)

In 2001, the cost of video data collection? 1013 bits/$

Genotyping & functional genomics demand will probably be as high as permitted by costs.

Femtoliter (10-15) scale & low-cost scannersPolymerase DNA colonies (polonies)Fluorescent in situ sequencing (FISSEQ)

Why lower-cost, high quality “sequencing”?

Mitra & Church Nucleic Acids Res. 27: e34

Environmental, food, & biodiversity monitoring Human genome haplotypingRNA splicing & editingimmune B&T cell receptor spectra

& How?

A’

A’A’

A’

A’

A’

B

BB

B

BB

A

Single Molecule From Library

B

BA’

A’

1st Round of PCR

Primer is Extendedby Polymerase

B

A’

BA’

Primer A has 5’ immobilizing (Acrydite) modification.

1. Remove 1 strand of DNA.2. Hybridize Universal Primer.3. Add Red (Cy3) dTTP.

B B’

3’ 5’

AGT..

T

4. Wash; Scan Red Channel

B B’

3’ 5’

GCG..

Sequence polonies by sequential,fluorescent single-base extensions

5. Add Green (FITC) dCTP

6. Wash; Scan Green Channel

B B’

3’ 5’

AGT.

TC

B B’

3’ 5’

GCG..

C

Sequence polonies by sequential, fluorescent single-base extensions

Polony Template

3’ P’

P5’ A ATA CAA TTCACACAGGAAACAGCTATGA CATT CTATTGTTAAAGTGTGTCCTTTGTCGATACTGGTA…5’

FITC ( C ) CY3 ( T )

Mean Intensity: 58, 0.5 40, 6.5 0.3, 48 0.4, 43

Primer Extension 26 cycles, 34 Nucleotides

Femtoliter (10-15) scale & low-cost scannersPolymerase DNA colonies (polonies)Fluorescent in situ sequencing (FISSEQ)

Why lower-cost, high quality “sequencing”?

Mitra & Church Nucleic Acids Res. 27: e34

Environmental, food, & biodiversity monitoring •Human genome haplotypingRNA splicing & editingimmune B&T cell receptor spectra

& How?

Femtoliter (10-15) scale & low-cost scannersPolymerase DNA colonies (polonies)Fluorescent in situ sequencing (FISSEQ)

Why lower-cost, high quality “sequencing”?

Mitra & Church Nucleic Acids Res. 27: e34

Environmental, food, & biodiversity monitoring Human genome haplotyping•RNA splicing & editingimmune B&T cell receptor spectra

& How?

RNA Exon typing

•Single molecules of RNA dispersed.

•Multiplex polonies spanning all likely variable exons

•Sequential probing of each exon.

Functional Genomics Challenges • Systems dynamics and optimality modeling.• Multiple genetic domains per gene: high density readout of whole genome mutant phenotypes.• Multiple RNAs & regulatory proteins per gene.• Many causative genes & haplotypes per disease.

• Polony RNA exon-typing• Multiplex in situ RNA & protein analyses • Automated differentiation• Homologous recombination genome engineering

For more information:

arep.med.harvard.edu

Recommended