View
220
Download
0
Tags:
Embed Size (px)
Citation preview
GtL Workshop B: Experimental Technology Development and Integration Tue at 2 PM
Co-Chairs – George Church, Harvard Medical SchoolHam Smith, Institute for Biological Energy Alternatives
As we attempt to understand, protect, and/or engineer environmental microbial communities, we need to ask what sorts of data would most benefit our models and how to obtain these cost-effectively. For this session let us answer what small (or large) technological step are we taking toward these specific challenges
The framework for the discussions will be the following questions:What are the most useful technologies for our tasks/goals now and for the future? What are the major technological gaps that will need to be addressed to reach the GTL goals? To what extent will the technologies be developed by others?How can technologies best be used to complement each other and strengthen the resulting research/insights? How do we promote the kind of synergistic interactions among the practitioners?
We would like to invite you to bring one viewgraph to share with the participants on your views about technologies needed to meet these challenges.
GtL Workshop B: Experimental Technology Development and Integration Tue at 2 PM
Specific challenges: (1) Microscopic methods capable of tracing the chain of a small genome?(2) Quantitation of “all” peptide states (either in single cells or populations)? (3) Sequencing at Mbp per $?(4) Automated designed genome engineering?
Discussions leaders:(1) Joachim Frank (Wadsworth Center, NY Dept of Health) on Cryo-Electron Microscopy(1) Hoi-Ying Holman (Berkeley Lab) on FTIR imaging(1) Steve Colson (PNNL) on optical imaging(2) Bob Hettich or Greg Hurst (ORNL) and Dick Smith (PNNL) on Mass spectrometry, (3) George Church (HMS) Polony sequencing(4) Ham Smith (IBEA) Genome Synthesis
We would like to invite you to bring one viewgraph to share with the participants on your views about technologies needed to meet these challenges.
DNA RNA Proteins
Metabolites
Replication rate
Environment
Biosystems Integrating Measures & Models
Microbes Cancer & stem cells Darwinian optimaIn vitro replicationSmall multicellular organisms
RNAiInsertionsSNPs
interactions
Improving Models & Measures
Why model?
“Killer Applications”: Share, Search, Merge, Check, Design
The issue is not speed, but integration.Cost per 99.99% bp : Including Reagents, Personnel, Equipment/5yr, Overhead/sq.m• Sub-mm scale : 1m = femtoliter (10-15)• Instruments $2-50K per CPU
Why improve measurements?
Human genomes (6 billion)2 = 1019 bpImmune & cancer genome changes >1010 bp per time pointRNA ends & splicing: in situ 1012 bits/mm3
Biodiversity: Environmental & lab evolution Compact storage 105 now to 1017 bits/ mm3 eventually
& How? ($1K per genome, 108-1013 bits/$ )
Projected costs determine when biosystems data overdetermination is feasible.
In 1984, pre-HGP (X, pBR322, etc.) 0.1bp/$, would have been $30B per human
genome.
In 2002, (de novo full vs. resequencing ) ABI/Perlegen/Lynx: $300M vs. $3M
103 bp/$ (4 log improvement)
Other data I/O (e.g. video) 1013 bits/$
Why single molecules?
Integration from cells/genomes/RNAs to data
Geometric constraints :Who’s “in cis” on a molecule, complex, or cell.e.g. DNA Haplotypes & RNA splice-forms
Polymerasecolonies
(Polonies) along a DNA
or RNAmolecule
A’
A’A’
A’
A’
A’
B
BB
B
BB
A
Single Molecule From Library
B
BA’
A’
1st Round of PCR
Primer is Extendedby Polymerase
B
A’
BA’
Polymerase colony (polony) PCR in a gel
Primer A has 5’ immobilizing Acrydite
Mitra & Church Nucleic Acids Res. 27: e34
• Hybridize Universal Primer • Add Red (Cy3) dTTP. Wash.• Add Green (FITC) dCTP• Wash; Scan
B B’
3’ 5’
AGT.
TC
B B’
3’ 5’
GCG..
C
Sequence polonies by sequential, fluorescent single-base extensions
$1K per diploid human sequence
Input: Buccal cells, blood, or forensic samples. Output: Prioritized list of deviant bps (e.g. non-conservative).
Raw data rate: 16 pixels/bp, 1Mpixel per 6sec/CPU = 24 CPU days. Amortization: 5 yr for camera/CPU/transport @ $50K total = $200 per 1011 bp Overhead: $200 /sq ft/yr * 40 sq.ft (400 cu.ft) = $40Reagents: At 20 m per (5 m) polony and 40 bp reads means 10000 cm2 area, 800 ml of fluor dNTP, $100/mg = $40 5 ml PCR reactions = $200Disposables: 500 slides = $50 Electricity: 2 kwatts 24hr*24days* 0.13$/kwatt-hr = $150Labor for repair: 10% of instrument cost = $10 Labor for operation: Slide PCR, slide dips, scans, etc. = $20R&D: Initially NIH grants (roughly 10%).
Inexpensive, off-the-shelf equipment
MJR in situ Cycler$10K
Automatedslide fluidics
$4K
MicroarrayScanner$26K+
Human Haplotype:CFTR gene
45 kbp
Rob MitraVincent ButtyJay ShendureBen Williams
Quantitative removal of Fluorophores
Rob Mitra
Template ST30:3' TCACGAGT
Base added: (C) A G T (C)
(A) G (T) C (A)
(G) T C A
3' TCACGAGT AGTGCTCA
Sequencing multiple polonies
Rob Mitra
Mutiple Image Alignment
Metric based on optimal coincidence of high intensity noise pixels over a matrix of local offsets (0.4 pixel precision)
Shendure
Polony exclusion principle &Single pixel sequences
Mitra & Shendure
Polony Flavors
1. Replica Plating of DNA images [Mitra et al. NAR 1999]
2. Long Range Haplotyping [Mitra et al. PNAS 2003]
3. Allelic mRNA Quantitation (HEP) [Mitra et al. 2003]
4. Alternative Splicing Combinatorics [Zhu et al. 2003]
5. Precise SNP-mutant & mRNA ratios [Merrill et al. 2003]
6. Fluor in situ Sequencing (FISSEQ 1) [Mitra et al. 2003]
7. Multiplex Genotyping (ApoE, Hyman, Shendure & Williams)
8. In situ / single-cell extensions of the above (Zhu & Williams)
Synthetic Mini-genomes• 90kbp genome? All 3D structures known.• Comprehensive functional data too.• 100X faster replication (10 sec doubling) & selection to evolve widgets & systems?• Utility of mirror-image & other unnatural polymers.• Chassis & power supply
A 90 kbp mini-genomeSP (3D) StochimetryMge# Bp Min access# Gene L.end R.endorientationlen2 SequenceTotal 144 107 89,498 74,310 285316S 1 y 1418 1418 3968 rrsB 4164238 4165779 > 124 aaattgaagagtttgatcatggctcagattgaacgctggcggcaggcctaacacatgcaagtcgaacggtaacaggaagaagcttgcttctttgctgacgagtggcggacgggtgagtaatgtctgggaaactgcctgatggagggggataactactggaaacggtagctaataccgcataacgtcgcaagaccaaagagggggaccttcgggcctcttgccatcggatgtgcccagatgggattagctagtagg23S 1 y 2903 2903 3970 rrlB 4166220 4169123 > 1 ggttaagcgactaagcgtacacggtggatgccctggcagtcagaggcgatgaaggacgtgctaatctgcgataagcgtcggtaaggtgatatgaaccgttataaccggcgatttccgaatggggaaacccagtgtgtttcgacacactatcattaactgaatccataggttaatgaggcgaaccgggggaactgaaacatctaagtaccccgaggaaaagaaatcaaccgagattcccccagtagcggcgagcga5S 1 120 120 3971 rrfB 4169216 4169335 > 0 tgcctggcggcagtagcgcggtggtcccacctgaccccatgccgaactcagaagtgaaacgccgtagcgccgatggtagtgtggggtctccccatgcgagagtagggaactgccaggcat10sb (RNaseP) 375 375 3123 rnpB 3268233 3267857 < 2 gaagctgaccagacagtcgccgcttcgtcgtcgtcctcttcgggggagacgggcggaggggaggaaagtccgggctccatagggcagggtgccaggtaacgcctgggggggaaacccacgaccagtgcaacagagagcaaaccgccgatggcccgcgcaagcgggatcaggtaagggtgaaagggtgcggtaagagcgcaccgcgcggctggtaacagtccgtggcacggtaaactccacccggagcaaggccaatRNAs 20-46 y 3136 1364 3939 eg. gltT 4165951 4166026 > gtccccttcgtctagaggcccaggacaccgccctttcacggcggtaacaggggttcgaatcccctaggggacgccaCca (no) ? 1236 3056 cca 3199532 3200770 > 3 gtgaagatttatctggtcggtggtgctgttcgggatgcattgttagggctaccggtcaaagacagagattgggtggtggtcggcagtacgccacaggagatgctcgacgcgggctaccagcaggtaggccgcgattttcctgtgtttctgcatccgcaaacgcatgaagagtatgcgctggcacgtaccgaacggaaatccggttccggttacaccggttttacttgctatgccgcaccggatgtcacgctggaaTrmA (22?) ? 1098 3965 trmA 4159749 4160849 < 3 atgacccccgaacaccttccaacagaacagtatgaagcgcagttagccgaaaaagtggtacgtttgcaaagtatgatggcaccgttttctgacctggttccggaagtgtttcgctcgccggtcagtcattaccggatgcgcgcggagttccgcatctggcacgatggcgatgacctgtatcacatcattttcgatcaacaaaccaaaagccgcatccgcgtggatagcttccccgccgccagtgaacttatcaacBstNBI (no) 1815 AF329098 1 1815 > 0 atggctaaaaaagttaattggtatgtttcttgttcacctagaagtccagaaaaaattcagcctgagttaaaagtactagcaaattttgagggaagttattggaaaggggtaaaagggtataaagcacaagaggcatttgctaaagaacttgctgctttaccacaattcttaggtactacttataaaaaagaagctgcattttctactcgagacagagtggcaccaatgaaaacttatggtttcgtatttgtagatTri1 ? AP001918 traI 92673 97943 > atgatgagtattgcgcaggtcagatcggccggaagtgccgggaactattataccgacaaggataattactatgtgctgggcagcatgggagaacgctgggccggcaggggggctgaacagctggggctgcagggcagtgtcgataaggatgtttttacccgtcttctggagggcaggctgccggacggagcggatctaagccgcatgcaggatggcagtaacaggcatcgtcccggctacgatctgaccttctccFlp no 1272 NC_001398 5573 523 > 0 atgccacaatttggtatattatgtaaaacaccacctaaggtgcttgttcgtcagtttgtggaaaggtttgaaagaccttcaggtgagaaaatagcattatgtgctgctgaactaacctatttatgttggatgattacacataacggaacagcaatcaagagagccacattcatgagctataatactatcataagcaattcgctgagtttcgatattgtcaataaatcactccagtttaaatacaagacgcaaaaaGFP no 717 AF302837 27 743 > 0 atgagtaaaggagaagaacttttcactggagttgtcccaattcttgttgaattagatggcgatgttaatgggcaaaaattctctgtcagtggagagggtgaaggtgatgcaacatacggaaaacttacccttaaatttatttgcactactgggaagctacctgttccatggccaacacttgtcactactttcgcgtatggtcttcaatgctttgcgagatacccagatcatatgaaacagcatgactttttcaagRnpa (36%) 357 357 3704 rnpA 3882122 3882481 > 3 gtggttaagctcgcatttcccagggagttacgcttgttaactcccagtcaattcacattcgtcttccagcagccacaacgggctggcacgccgcaaattaccattctcggccgcctgaattcgctggggcatccccgtatcggtcttacagtcgccaagaaaaacgttcgacgcgcccatgaacgcaatcggattaaacgtctgacgcgtgaaagcttccgtctgcgccaacatgaactcccggctatggatttcBstPol multiprot 2631 2631 U93028 95 2728 > 3 atgagattgaagaaaaaactcgtcttaattgatggcaacagtgtggcataccgcgccttttttgccttgccacttttgcataacgacaaaggcattcatacgaatgcggtttacgggtttacgatgatgttgaacaaaattttggcggaagaacaaccgacccatttacttgtagcgtttgacgccggaaaaacgacgttccggcatgaaacgtttcaagagtataaaggcggacggcaacaaacgcccccggaaRpol_Bpt7 multiprot 2649 2649 NC_001604 3171 5822 > 2 atgaacacgattaacatcgctaagaacgacttctctgacatcgaactggctgctatcccgttcaacactctggctgaccattacggtgagcgtttagctcgcgaacagttggcccttgagcatgagtcttacgagatgggtgaagcacgcttccgcaagatgtttgagcgtcaacttaaagctggtgaggttgcggataacgctgccgccaagcctctcatcactaccctactccctaagatgattgcacgcatcEFTu 451 1179 1179 3339 tufA 3467782 3468966 < 6 gtgtctaaagaaaaatttgaacgtacaaaaccgcacgttaacgttggtactatcggccacgttgaccacggtaaaactactctgaccgctgcaatcaccaccgtactggctaaaacctacggcggtgctgctcgtgcattcgaccagatcgataacgcgccggaagaaaaagctcgtggtatcaccatcaacacttctcacgttgaatacgacaccccgacccgtcactacgcacacgtagactgcccggggcacEFG (59%) 89 2109 2109 3340 fusA 3469037 3471151 < 6 atggctcgtacaacacccatcgcacgctaccgtaacatcggtatcagtgcgcacatcgacgccggtaaaaccactactaccgaacgtattctgttctacaccggtgtaaaccataaaatcggtgaagttcatgacggcgctgcaaccatggactggatggagcaggagcaggaacgtggtattaccatcacttccgctgcgactactgcattctggtctggtatggctaagcagtatgagccgcatcgcatcaacEFTs 433 846 846 170 tsf 190857 191708 > 6 atggctgaaattaccgcatccctggtaaaagagctgcgtgagcgtactggcgcaggcatgatggattgcaaaaaagcactgactgaagctaacggcgacatcgagctggcaatcgaaaacatgcgtaagtccggtgctattaaagcagcgaaaaaagcaggcaacgttgctgctgacggcgtgatcaaaaccaaaatcgacggcaactacggcatcattctggaagttaactgccagactgacttcgttgcaaaaEFP (no) 26 561 561 4147 efp 4373277 4373843 > 6 atggcaacgtactatagcaacgattttcgtgctggtcttaaaatcatgttagacggcgaaccttacgcggttgaagcgagtgaattcgtaaaaccgggtaaaggccaggcatttgctcgcgttaaactgcgtcgtctgctgaccggtactcgcgtagaaaaaaccttcaaatctactgattccgctgaaggcgctgatgttgtcgatatgaacctgacttacctgtacaacgacggtgagttctggcacttcatgIF1 173 213 213 884 infA 925448 925666 < 6 atggccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgaIF2 (25%) 142 2682 2682 3168 infB 3310983 3313655 < -9 atgacagatgtaacgattaaaacgctggccgcagagcgacagacctccgtggaacgcctggtacagcaatttgctgatgcaggtatccggaagtctgctgacgactctgtgtctgcacaagagaaacagactttgattgaccacctgaatcagaaaaattcaggcccggacaaattgacgctgcaacgtaaaacacgcagcacccttaacattcctggtaccggtggaaaaagcaaatcggtacaaatcgaagtcIF3 (~50%) 196 540 540 1718 infC 1798120 1798662 < 3 attaaaggcggaaaacgagttcaaacggcgcgccctaaccgtatcaatggcgaaattcgcgcccaggaagttcgcttaacaggtctggaaggcgagcagcttggtattgtgagtctgagagaagctctggagaaagcagaagaagccggagtagacttagtcgagatcagccctaacgccgagccgccggtttgtcgtataatggattacggcaaattcctctatgaaaagagcaagtcttctaaggaacagaagRF1 (no) 258 1080 1211 prfA 1264235 1265317 > 3 atgaagccttctatcgttgccaaactggaagccctgcatgaacgccatgaagaagttcaggcgttgctgggtgacgcgcaaactatcgccgaccaggaacgttttcgcgcattatcacgcgaatatgcgcagttaagtgatgtttcgcgctgttttaccgactggcaacaggttcaggaagatatcgaaaccgcacagatgatgctcgatgatcctgaaatgcgtgagatggcgcaggatgaactgcgcgaagctRRF 435 555 555 172 frr 192872 193429 > 3 gtgattagcgatatcagaaaagatgctgaagtacgcatggacaaatgcgtagaagcgttcaaaacccaaatcagcaaaatacgcacgggtcgtgcttctcccagcctgctggatggcattgtcgtggaatattacggcacgccgacgccgctgcgtcagctggcaagcgtaacggtagaagattcccgtacactgaaaatcaacgtgtttgatcgttcaatgtctccggccgttgaaaaagcgattatggcgtccRL1 (~50%) 1 82 699 699 3984 rplA 4176457 4177161 > 6 atggctaaactgaccaagcgcatgcgtgttatccgcgagaaagttgatgcaaccaaacagtacgacatcaacgaagctatcgcactgctgaaagagctggcgactgctaaattcgtagaaagcgtggacgtagctgttaacctcggcatcgacgctcgtaaatctgaccagaacgtacgtggtgcaactgtactgccgcacggtactggccgttccgttcgcgtagccgtatttacccaaggtgcaaacgctgaaRL2 1 154 816 816 3317 rplB 3448180 3449001 < 6 atggcagttgttaaatgtaaaccgacatctccgggtcgtcgccacgtagttaaagtggttaaccctgagctgcacaagggcaaaccttttgctccgttgctggaaaaaaacagcaaatccggtggtcgtaacaacaatggccgtatcaccactcgtcatatcggtggtggccacaagcaggcttaccgtattgttgacttcaaacgcaacaaagacggtatcccggcagttgttgaacgtcttgagtacgatccg
The in vitro assembly (& 3D structure) of the prokaryotic ribosomes is known. (e.g. Nomura et al.; Noller et al.)
M 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
DNA Template
RNA Transcript
All 30S-Ribosomal-protein DNAs & mRNAs synthesized in vitro
Tian & Church
His-tagged ribosomal proteins synthesized in vitro
RS-2,4,5,6,9,10,12,13,15,16,17,and 21 as original constructs.
RS1 required deletion of a feedback motif in the mRNA.RS-3, 7, 8, 11, 14, 18, 19, 20 are still weakly expressed.
Note that S1, S4, S7, S8, S20, L1, L4, L10 are known to repress their own translation (and are likely titrated by rRNA).
Tian & Church
Set o
f N
coor
dina
tes
x y z
Matrix ofdistances
SVD(singularvaluedecomposition)
Euclidean Metric
pdb file (viewed with RasMol)
Matlab visualization
Representations of the Chromosome
Bidirectionalreplication Paired fork
Origin
Blue: Left replicated segment (yelgr=high gene#)Red: Right (i.e. middle) segmentAqua: unduplicated segment of the circular genome
Avoidance of entanglement throughout cell cycle