View
1.276
Download
1
Category
Tags:
Preview:
DESCRIPTION
The analysis of all transcripts within a cell is of essential importance. Molecular biology provides many approaches to clone RNA transcripts into cDNA. Large cDNA collections are in the public domain to serve the research community. Today, however, new high-speed sequencing methods allow a much deeper view into transcriptomes than possible by classical cloning.
Citation preview
Approaches to cDNA Cloning and Analysis
Dr. Matthias Harbers
Chief Scientist DNAFORM Inc.
Co-assigned Scientist at the RIKEN Omics Center
© Matthias Harbers 20081
2
Genomic DNA(storage of information)
Coding mRNA(transport of information)
Promoter “Gene”
Transcript Start Site
Protein(tools to operate “functions”)
Transcription by RNA polymerase II
Translation at ribosome
Transcription Factors
Nucleus
Cytoplasm
Classical View on the Utilization of Genomic Information
AAAAACap
(7-methylguanosine cap or m7G cap)
Developed in the 50th and 60th of last century.
3
The Classical View Has Been Challenged by new Developments
Discovery/Project Importance Year
Discovery of reverse transcriptases
DNA can be synthesized from RNA templates
1969
Discovery of ligase and restriction endonucleases
Establishing DNA recombination, DNA cloning, and preparation of DNA libraries
1960s and 70s
DNA sequencing Chain-termination method(“Sanger Sequencing”)
1975
Human Genome Project Move to sequencing entire genomes 1990 to 2003
Expressed sequence tags (ESTs)
First attempt to gene discoveryand expression profiling
1991
IMAGE Project Program to create cDNA collections from key organisms
1993 to 2007
ENCODE Project Functional elements in human genome
Since 2003
Approaches to cDNA cloning
Special topics related to cDNA cloning
Large-scale cDNA cloning projects
Small RNA (sRNA) cloning
Tag-based approaches
Next-Generation Sequencing
Where do we go from here?
4
Topics of the Presentation
5
Approaches to cDNA cloning
Capped and polyadenylated mRNA
1st Strand cDNA synthesis:Commonly oligo(dT) priming
Prime 2nd strand cDNA synthesis:5’-Linker ligation or tailing reaction
2nd Strand synthesis(Option to make PCR)
Digestion with cloning enzyme(s):Methylation can protect against internalcleavage within cDNA
Ligation into phage or plasmid vector:(Plasmid with cDNA insert may be excised from phage vector)
PlPasmidPlasmidPlasmid
PhagePhage
AAAAACap5’3’
A A A A A…mRNAT T T T T
Cap
mRNAAdaptor
cDNA
cDNAAdaptor
cDNA
6
Synthesis of very long cDNAs (>10.000 bp, not further discussed)
Full-length cDNA cloning (important to obtain functional cDNAs)
Normalization (key to gene discovery in large-scale projects)
Cloning vectors and applications (not further discussed)
Subtractive cloning (not further discussed)
Expression cloning (not further discussed)
Addressing splicing (left out of large-scale projects)
Special Topics Related to cDNA Cloning
Ref.: Harbers M: The current status of cDNA cloning, Genomics. 2008 Mar;91(3):232-42.
7
Use of cDNA Libraries
Isolation of individual target genes
in Research Laboratories
Transcriptome Analysis and Genome Projects
Large-scale random clone picking
End-sequencing to build transcript catalogs
Full-length sequencing of selected clones
Creation of sequence data bases
Creation of cDNA collections
Ref.: Carninci P et al.: Targeting a complex transcriptome: the construction of the mouse full-length cDNA encyclopedia. Genome Res. 2003 Jun;13(6B):1273-89.
8
Benefits of Large-Scale cDNA Cloning Projects
Improved cDNA Cloning Technology
Gene Regulation:Promoter Identification
Expression Profiling
Genomics:Gene Discovery
Mapping
Sequence Data
Clone Collections
Proteomics:Functional Studies on
Proteins
RNAiKnock down
SNP Analysis:Location in Promoter or
ExonFunctional Studies
Noncoding RNASense-antisense Pairs
Public sequence databases and clone collections are essential tools for research!
9
The mRNA Pool of a Cell
500 t0 2,000 transcripts40 to 60 % of mRNA
5 t0 10 transcriptsup to 20% of mRNA
10,000 t0 20,000 transcripts<20% of mRNA
Discovery of rarely expressed genes is a difficult task!
(Old numbers estimated fromreassociation and hybridization studies)
10Number of Libraries
Num
ber o
f non
-red
unda
nd c
lone
s
Driver 2
Lib. 1
Lib. 3 + Driver 1
Lib. 4 + Driver 2
No Driver
Driver 1
Lib. 2
Without Normalization /Subtraction
With Normalization /Subtraction
: Highly expressed genes
/Hind III /Hind III
9.4 kbp6.6 kbp
4.4 kbp
2.2 kbp2.0 kbp
0.5 kbp
9.4 kbp6.6 kbp
4.4 kbp
2.2 kbp2.0 kbp
0.5 kbp
Example: Pancreas cDNA
Normalization of cDNA LibrariesDuring a Normalization Step a cDNA pool is hybridized against an aliquot of the
original mRNA sample or the same cDNA pool. Due to concentration dependent
hybridization kinetics the number clones representing highly expressed genes will
be reduced yielding in a more equal distribution of different cDNAs in the library.
Combine Normalization and
Subtraction for higher Gene
Discovery
11
Full-Length cDNA Cloning
“Cap Trapper” Method “Oligo Capping” Method
A A A A A…mRNA
cDNA T T T T T
A A A A A…mRNA
Adaptor
T T T T T
cDNA T T T T T
cDNA
Cap
A A A A A…mRNACap
Cap
Biotin
BiotinBeads
A A A A A…
cDNA T T T T T
Primer
T T T T T
cDNA
A A A A A…Adaptor
mRNA
mRNA
Adaptor
cDNA T T T T TA A A A A…mRNACapBiotin
RNase I digestion
A A A A A…mRNA
A A A A A…mRNAP
A A A A A…mRNA
A A A A A…mRNA
A A A A A…mRNA
A A A A A…mRNA
P
Key Steps:Biotinylation of Cap structure and RNase I Treatment
Key Steps:Replacement of Cap structure by RNA oligonucleotide
PP PCap
PP PCap
Phosphatase
Pyrophosphatase
RNA Ligase
Chemical reaction
Recovery on beads
12
Examples for Large-Scale cDNA Cloning Projects
Project Organisms URLIMAGE Consortium Human, mouse, rat, zebrafish, fugu,
Xenopus (X. laevis and X. tropicalis), cow, and primate
http://image.llnl.gov/
Mammalian Gene Collection (MGC)
Human, mouse, rat, cow, others http://mgc.nci.nih.gov/
Tokyo University Human http://cdna.hgc.jp/
RIKEN FANTOM Mouse http://fantom3.gsc.riken.go.jp/
Rice full-length cDNAConsortium
Rice http://cdna01.dna.affrc.go.jp/cDNA/
RIKEN Arabidopsis Arabidopsis http://www.brc.riken.jp/lab/epd/Eng/news/071015.shtml
ORF Consortium Human (some mouse clones) http://www.orfeomecollaboration.org
Targeting at the cloning and full-length sequencing of “one representative” cDNA clone for
each gene. This reduces cost, but it entirely ignores splicing events.
13
Pre-mRNA is Spliced into mRNA
Large-scale cloning projects do not cover splice variants.But maybe 75% of all signal transducers are regulated by splicing!
14
Capturing alternatively Spliced Exons in mRNA
Sense strandSample 1
Antisense strandSample 2
Cut double-stranded regions
Capture single-stranded regions
Ref.: Watahiki A et al.: Libraries enriched for alternatively spliced exons reveal splicing patterns in melanocytes and melanomas. Nature Methods 2004 Dec 1(3): 233-9.
15
The Discovery of small RNAs
Classical cloning protocols removed all cDNA fragments of less than500 bp (avoid linker contamination, cutoff of cloning vectors).
Proteins of less than 100 amino acids were commonly not annotated.
However, small RNAs have important functions!
Small RNAs are non-coding RNAs (ncRNAs) often derived from maturationprocesses in the cell that include digestion steps by RNases.
Most prominent example: microRNAs (miRNA) have reverse complement sequences to other mRNA transcripts. They are around 21-23 base pairs long after maturation and can alter the expression/translation of one or several target genes through RNA interference.
And we are still finding many more new RNA species!
Ref.: Kawaji H, Hayashizaki Y. Exploration of small RNAs. PLoS Genet. 2008 Jan;4(1):e22.
Short RNA
Modify 3’ end:C-Tailing or adaptor ligation
Modify 5’ end:Here by adaptor ligation
1st Strand cDNA synthesis
2nd Strand synthesis and PCR
Sequence analysis:Direct sequencing of DNA fragments(Option to ligate into plasmid vector)
CCCCCCCCC
CCCCCCCCC
CCCCCCCCCGGGGGGGG
CCCCCCCCCGGGGGGGG
PlPasmidPlasmidPlasmid
P
P
OH
Small RNA (sRNA) Cloning
16
Key Steps:Modification of 5’ and 3’ end of RNA for PCR amplification. Selection by size range. Commonly only sequenced.No cloning needed as short cDNAs can be chemically synthesized.
5’ 3’
P
17
Tag-Based Approaches
Gene discovery cannot be done by standard methods used in expression profiling such as microarray or PCR.
Unsupervised approaches are needed for gene discovery that donot require sequence information for probe design.
First approach to gene discovery was sequencing of 3’ ends of cDNAclones (EST sequencing). Requires one read per clone.
Gene identification does not require sequences of 500 to 800 bp,but much shorter sequences of some 20 bp or less are sufficient.
Use long sequencing reads to cover many short fragments by one run.
New protocols to isolated short fragments from RNA.
Tag-based approaches in expression profiling and gene discovery.
Ref.: Harbers M and Carninci P: Tag-based approaches for transcriptome research and genome annotation. Nature Methods 2005 Jul 2(7): 495-502.
18
Tag-Based Approaches
A A A A AmRNACap
Anchoring enzyme sites
CAGE5’ SAGE
SAGE(5’ related)
SAGE(3’ related)
MPSSDGE
3’ SAGE
RNA-Seqor other shotgun approaches
5’ endCap selection
3’ endRemove poly(A)
Paired-end Tags or PETs
19
Serial Analysis Gene Expression (SAGE)(Digital Gene Expression (DGE))
Ref.: Velculescu VE et al. Serial analysis of gene expression. Science. 1995 Oct 20;270(5235):368-9, 371.
A A A A A…mRNAT T T T T T Biotin
cDNA
Adaptor cDNABiotin
Biotin
Adaptor Adaptor
1st Strand cDNA Synthesis with biotinylated primer(Commonly starting from mRNA.)
Preparation of double-stranded cDNA and digestion with anchoring enzyme
Adaptor Ligation and digestion with Mme I (20 bp) or EcoP15I (27 bp)
Formation of “Di-Tags”(Di-Tags can be used for direct sequencing (DGE).)
Concatenation and cloning into plasmid vector(Classic sequencing of concatemers.)
Beads
Beads
Very well established and rich reference/annotation information.Digital expression profiling by “tag counting”.
20
Cap Analysis Gene Expression (CAGE)
Ref.: Kodzius R et al.: Cap analysis of gene expression: transcription start site mapping and expression profiling. Nature Methods 2006 Mar 3(3): 211-222.
A A A A AmRNA
A A A A AmRNAcDNA N N N N N N
1st Strand cDNA Synthesis(Covering poly(A-) mRNA and long mRNA.)
A A A A AmRNAcDNA
Beads
N N N N N N
5’-End Selection on Beads by Cap Trapper(Less bias due to chemical modification of Cap.)
cDNA N N N N N NAdaptor I
Adaptor Ligation and 2nd Strand Synthesis
cDNAAdaptor I
Digestion with Mme I (20 bp) or EcoP15I (27 bp)
TAGAdaptor I
Isolation of CAGE TAGs
TAGAdaptor I Adaptor II
3’-End Adaptor Ligation
5’ 3’ Commonly starting from 50g total RNA.
Preferably used for direct sequencing (>4,000,000 tags per run).
CAP
CAP
CAP
21
Cap Analysis Gene Expression (CAGE)
TF1 TF2 TF3
Signal 1 Signal 2 Signal 3
TF
ChIPChIP
Exon 1
TSSTSS
CAGE TagsCAGE Tags
GenomeGenome 2 3 4 5
Tiling Tiling Array/RNAArray/RNA--SeqSeqMicroarrayMicroarraySAGESAGE
A A A A AA A A A AmRNAmRNA
RACERACE
CAP
CAGE tags experimentally link transcripts to their promoters.CAGE tags integrate information based on genome annotations.CAGE tags can be linked to whole genome tiling arrays and RNA-Seq data.CAGE tags can be linked to Chromatin IP/ChIP-Seq data.CAGE tags correlate with open chromatin.CAGE tags provide primer information for cloning new transcripts.
22
Classical DNA Sequencing by Chain-Termination Method
A G C T
A C C A
ACT
G
T G G T T G GT ACC AC G TT
A
CT
G
A
C
A TG
Primer
DNA Template
DNA Polymerase
dNTP/ddNTP Mix
One reaction per nucleotide
T G G T T G GT ACC AC G TT
T G G T T G GT CC AC
T G G T T G
T G G T T G GT CC
A T G C
Analyze fragmentsby gel electrophoresis
DNA fragments fromPrimer extension reactions
Capillary Sequencer
Over 30 years the most important method in molecular biology.
Challenged by emerging new sequencing technologies: Next-Generation Sequencing.
23
Next-Generation Sequencing
Platform Mb per run/read length Method
Roche 454 Sequencing 100 Mb/250 bp/7h per run Emulsion PCR and Pyrosequencing
Illumina (Solexa) 1300 Mb/32-40bp/4 days per run Bridge PCR and sequencing-by-synthesis
ABI SOLiD 3000 Mb/35 bp/5 days per run Emulsion PCR and ligation-based sequencing
Helicos 25 to 90 Mb per h/up to 55 bp Single-molecule detection
Driven by the “$1000 genome” different companies are on the move to provide new sequencing
technologies based on “sequencing by synthesis” or “ligation-based sequencing”. Other approaches
may use hybridization methods or physical means in the future.
Ref.: Mardis ER. The impact of next-generation sequencing technology on genetics. Trends Genet. 2008 Mar;24(3):133-41. Epub 2008 Feb 11.von Bubnoff A. Next-generation sequencing: the race is on. Cell. 2008 Mar 7;132(5):721-3.
24
Example for Ligation-Based Sequencing: ABI SOLID System
DNA fragments havingadaptor sequences:
Genomic DNATag Sequencing
Project specific data analysis:Mapping to genome
Reference information
Images are the courtesy of ABI and were kindly provided by ABI Japan.
25
Example for Ligation-Based Sequencing: ABI SOLID System
Images are the courtesy of ABI and were kindly provided by ABI Japan.
26
Example for Ligation-Based Sequencing: ABI SOLID System
Images are the courtesy of ABI and were kindly provided by ABI Japan.
27
Example for Sequencing-by-Synthesis: Illumina 1G System
Images are the courtesy of Illumina and were kindly provided by Illumina Japan.
DNA per run0.1 ~1µg
Addition of 2 adaptors
Add to flow cell
Preparationof clusters
28
Example for Sequencing-by-Synthesis: Illumina 1G System
Images are the courtesy of Illumina and were kindly provided by Illumina Japan.
Cycle 1 Addition of the sequence reagent
5’3’
5’
C
C
C
C
C
C
G
G
G
GT
T
T
T
T
A
A
A
A
A
CG
CGTA
TGCC
GCAA
TGTT
One base extension reaction
Cycle 2
Repetition of the above reactions
Removal of non-incorporated bases
Detect fluorescence signal
Removal of the fluorescence label
Cycle 3, 4, 5…..
Repetition of the above reaction
29
Example for Sequencing-by-Synthesis: Illumina 1G System
Images are the courtesy of Illumina and were kindly provided by Illumina Japan.
100um
20um
40,000,000 clusters on a flow cell
30
Where do we go from here?
Next-Generation Sequencing will push genome sequencing field forre-sequencing and de novo sequencing (“1000 Genome Project”).
Metagenomics (Environmental Genomics, Ecogenomics, or Community Genomics): Direct analysis of genetic materials obtainedfrom environmental samples.
Expression profiling: SAGE (DGE), CAGE, PET, RNA-Seq.
Analytical applications to identify functional regions/elements in genomes: ChIP-Seq, open chromatin, SNPs, splicing, others to come .
Analytical applications in mutation screens.
Analytical applications for detection of infectious agents.
31
Ref.: Mattick, J.S. "Challenging the dogma: The hidden layer of non-protein-coding RNAs on complex organisms" Bioessays. (2003) 25, 930-939.
Transcriptome Analysis: The Dominance of noncoding RNA
Genome sequencing and annotation did not tell us about the realextent of gene expression!
Tiling array experiments and deep sequencing by next-generationsequencing methods indicates that >90% of the genome is expressed.
Maybe 40 to 50% of the mRNA is not polyadenylated, and we did notanalyze it yet.
Most of the transcripts are potentially noncoding RNAs having unknown (regulatory ?) functions.
The definition of a “gene” may no longer hold with many differenttranscripts derived from same loci.
We do not understand the “hidden layers” regulating the utilization ofgenomic information.
32
Example for RNA-Seq in Yeast Saccharomyces pombe (fission yeast)
Illumina 1G sequencer; average read length 39.1 base, fragments from poly(A) mRNA
> 23 mil reads (~60 genome length) proliferating cells.
> 99 mil reads (~ 190 genome length) from five different stages.
Covering ~94% nuclear and > 99% of mitochondrial genome.
Confirmed expression from intergenic regions by RT-PCR.
Control experiments using whole genome tiling arrays (25 mer/20 nt intervals)confirmed identification novel transcripts (26 out of 453 may encode shortproteins).
Ref.: Wilhelm BT, Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution.Nature. 2008 Jun 26;453(7199):1239-43. Epub 2008 May 18.Graveley BR. Molecular biology: power sequencing. Nature. 2008 Jun 26;453(7199):1197-8.
Recent publications on the use of RNA-Seq include S. pombe, S. cerevisiae, Arabidopsis,
mouse tissues, mouse stem cells, and HeLa S3.
33
Examples for Genome Size (haploid)
Genome Length in bp Estimated gene number
Phi-X 174 5,386 10
Human mitochondrion 16,569 37
E. coli 4,639,221 4,377
Saccharomyces cerevisiae 12,495,682 5,770
Caenorhabditis elegans 100,258,171 19,427
Arabidopsis thaliana 115,409,949 ~28,000
Drosophila melanogaster 122,653,977 13,379
Humans 3.3 x 109 ~20,500
Amphibians 109–1011 ?
Values taken from: http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/G/GenomeSizes.html out of July 2007
34
Where are our limitations?
Mammalian genome size and transcriptome complexity:Enrichment of fragments e.g. using microarrays,Normalization and longer reads required.
Thus far uneven representation requires use of more than one method.
Requirements for starting materials (target is to analyze single cells).
No unified cDNA library method: using different methods depending on RNA length.
Very large data files and lack of computational analysis tools.
What is transcriptional noise?
Research dominated by “detection” rather than “functional analysis”.
Ref.: Struhl K. Transcriptional noise and the fidelity of initiation by RNA polymerase II. Nat Struct Mol Biol. 2007 Feb;14(2):103-5.
35
Present Strategies for Transcriptome Analysis
Interest has shifted to next-generation sequencing to profile transcriptionalactivities.
We cannot predict ends of transcripts, and therefore tag-based approaches to indentify start sites and termination sites are needed.
Identification of transcription start sites in combination with other information is driving “gene networks studies” and “system biology”.
RNA-Seq provides new means for the identification of splice sites andexpressed mutations.
We do not clone all those new transcripts, but there will be a need to getresources for functional analysis of new transcripts.
We are more than ever falling short on the functional analysis of new transcripts.Thus far we have not even analyzed all coding transcripts!
It is an exciting time to work on transcriptome analysis offering many challenges and rewards!
36
Contact:
Dr. Matthias Harbers
DNAFORM Inc.
Leading Venture Plaza-2, 75-1, Ono-choTsurumi-ku, Yokohama City, Kanagawa, 230-0046 Japan
E-mail: matthias.harbers@dnaform.jp
Phone: +81-(0)45-510-0607
FAX: +81-(0) 45-510-0608
URL: http://www.dnaform.jp
Recommended