Approaches to cDNA Cloning and Analysis

Dr. Matthias Harbers

Chief Scientist DNAFORM Inc.

Co-assigned Scientist at the RIKEN Omics Center

Genomic DNA(storage of information)

Coding mRNA(transport of information)

Promoter “Gene”

Transcript Start Site

Protein(tools to operate “functions”)

Transcription by RNA polymerase II

Translation at ribosome

Transcription Factors

Nucleus

Cytoplasm

Classical View on the Utilization of Genomic Information

AAAAACap

(7-methylguanosine cap or m7G cap)

Developed in the 50th and 60th of last century.

The Classical View Has Been Challenged by new Developments

Discovery/Project Importance Year

Discovery of reverse transcriptases

DNA can be synthesized from RNA templates

Discovery of ligase and restriction endonucleases

Establishing DNA recombination, DNA cloning, and preparation of DNA libraries

1960s and 70s

DNA sequencing Chain-termination method(“Sanger Sequencing”)

Human Genome Project Move to sequencing entire genomes 1990 to 2003

Expressed sequence tags (ESTs)

First attempt to gene discoveryand expression profiling

IMAGE Project Program to create cDNA collections from key organisms

1993 to 2007

ENCODE Project Functional elements in human genome

Since 2003

Approaches to cDNA cloning

Special topics related to cDNA cloning

Large-scale cDNA cloning projects

Small RNA (sRNA) cloning

Tag-based approaches

Next-Generation Sequencing

Where do we go from here?

Topics of the Presentation

Approaches to cDNA cloning

Capped and polyadenylated mRNA

1st Strand cDNA synthesis:Commonly oligo(dT) priming

Prime 2nd strand cDNA synthesis:5’-Linker ligation or tailing reaction

2nd Strand synthesis(Option to make PCR)

Digestion with cloning enzyme(s):Methylation can protect against internalcleavage within cDNA

Ligation into phage or plasmid vector:(Plasmid with cDNA insert may be excised from phage vector)

PlPasmidPlasmidPlasmid

PhagePhage

AAAAACap5’3’

A A A A A…mRNAT T T T T

mRNAAdaptor

cDNAAdaptor

Synthesis of very long cDNAs (>10.000 bp, not further discussed)

Full-length cDNA cloning (important to obtain functional cDNAs)

Normalization (key to gene discovery in large-scale projects)

Cloning vectors and applications (not further discussed)

Subtractive cloning (not further discussed)

Expression cloning (not further discussed)

Addressing splicing (left out of large-scale projects)

Special Topics Related to cDNA Cloning

Ref.: Harbers M: The current status of cDNA cloning, Genomics. 2008 Mar;91(3):232-42.

Use of cDNA Libraries

Isolation of individual target genes

in Research Laboratories

Transcriptome Analysis and Genome Projects

Large-scale random clone picking

End-sequencing to build transcript catalogs

Full-length sequencing of selected clones

Creation of sequence data bases

Creation of cDNA collections

Ref.: Carninci P et al.: Targeting a complex transcriptome: the construction of the mouse full-length cDNA encyclopedia. Genome Res. 2003 Jun;13(6B):1273-89.

Benefits of Large-Scale cDNA Cloning Projects

Improved cDNA Cloning Technology

Gene Regulation:Promoter Identification

Expression Profiling

Genomics:Gene Discovery

Mapping

Sequence Data

Clone Collections

Proteomics:Functional Studies on

Proteins

RNAiKnock down

SNP Analysis:Location in Promoter or

ExonFunctional Studies

Noncoding RNASense-antisense Pairs

Public sequence databases and clone collections are essential tools for research!

The mRNA Pool of a Cell

500 t0 2,000 transcripts40 to 60 % of mRNA

5 t0 10 transcriptsup to 20% of mRNA

10,000 t0 20,000 transcripts<20% of mRNA

Discovery of rarely expressed genes is a difficult task!

(Old numbers estimated fromreassociation and hybridization studies)

10Number of Libraries

Driver 2

Lib. 1

Lib. 3 + Driver 1

Lib. 4 + Driver 2

No Driver

Driver 1

Lib. 2

Without Normalization /Subtraction

With Normalization /Subtraction

: Highly expressed genes

/Hind III /Hind III

9.4 kbp6.6 kbp

4.4 kbp

2.2 kbp2.0 kbp

0.5 kbp

9.4 kbp6.6 kbp

4.4 kbp

2.2 kbp2.0 kbp

0.5 kbp

Example: Pancreas cDNA

Normalization of cDNA LibrariesDuring a Normalization Step a cDNA pool is hybridized against an aliquot of the

original mRNA sample or the same cDNA pool. Due to concentration dependent

hybridization kinetics the number clones representing highly expressed genes will

be reduced yielding in a more equal distribution of different cDNAs in the library.

Combine Normalization and

Subtraction for higher Gene

Discovery

Full-Length cDNA Cloning

“Cap Trapper” Method “Oligo Capping” Method

A A A A A…mRNA

cDNA T T T T T

A A A A A…mRNA

Adaptor

T T T T T

cDNA T T T T T

A A A A A…mRNACap

Biotin

BiotinBeads

A A A A A…

cDNA T T T T T

Primer

T T T T T

A A A A A…Adaptor

Adaptor

cDNA T T T T TA A A A A…mRNACapBiotin

RNase I digestion

A A A A A…mRNA

A A A A A…mRNAP

A A A A A…mRNA

Key Steps:Biotinylation of Cap structure and RNase I Treatment

Key Steps:Replacement of Cap structure by RNA oligonucleotide

PP PCap

Phosphatase

Pyrophosphatase

RNA Ligase

Chemical reaction

Recovery on beads

Examples for Large-Scale cDNA Cloning Projects

Project Organisms URLIMAGE Consortium Human, mouse, rat, zebrafish, fugu,

Xenopus (X. laevis and X. tropicalis), cow, and primate

http://image.llnl.gov/

Mammalian Gene Collection (MGC)

Human, mouse, rat, cow, others http://mgc.nci.nih.gov/

Tokyo University Human http://cdna.hgc.jp/

RIKEN FANTOM Mouse http://fantom3.gsc.riken.go.jp/

Rice full-length cDNAConsortium

Rice http://cdna01.dna.affrc.go.jp/cDNA/

RIKEN Arabidopsis Arabidopsis http://www.brc.riken.jp/lab/epd/Eng/news/071015.shtml

ORF Consortium Human (some mouse clones) http://www.orfeomecollaboration.org

Targeting at the cloning and full-length sequencing of “one representative” cDNA clone for

each gene. This reduces cost, but it entirely ignores splicing events.

Pre-mRNA is Spliced into mRNA

Large-scale cloning projects do not cover splice variants.But maybe 75% of all signal transducers are regulated by splicing!

Capturing alternatively Spliced Exons in mRNA

Sense strandSample 1

Antisense strandSample 2

Cut double-stranded regions

Capture single-stranded regions

Ref.: Watahiki A et al.: Libraries enriched for alternatively spliced exons reveal splicing patterns in melanocytes and melanomas. Nature Methods 2004 Dec 1(3): 233-9.

The Discovery of small RNAs

Classical cloning protocols removed all cDNA fragments of less than500 bp (avoid linker contamination, cutoff of cloning vectors).

Proteins of less than 100 amino acids were commonly not annotated.

However, small RNAs have important functions!

Small RNAs are non-coding RNAs (ncRNAs) often derived from maturationprocesses in the cell that include digestion steps by RNases.

Most prominent example: microRNAs (miRNA) have reverse complement sequences to other mRNA transcripts. They are around 21-23 base pairs long after maturation and can alter the expression/translation of one or several target genes through RNA interference.

And we are still finding many more new RNA species!

Ref.: Kawaji H, Hayashizaki Y. Exploration of small RNAs. PLoS Genet. 2008 Jan;4(1):e22.

Short RNA

Modify 3’ end:C-Tailing or adaptor ligation

Modify 5’ end:Here by adaptor ligation

1st Strand cDNA synthesis

2nd Strand synthesis and PCR

Sequence analysis:Direct sequencing of DNA fragments(Option to ligate into plasmid vector)

CCCCCCCCC

CCCCCCCCCGGGGGGGG

PlPasmidPlasmidPlasmid

Small RNA (sRNA) Cloning

Key Steps:Modification of 5’ and 3’ end of RNA for PCR amplification. Selection by size range. Commonly only sequenced.No cloning needed as short cDNAs can be chemically synthesized.

5’ 3’

Tag-Based Approaches

Gene discovery cannot be done by standard methods used in expression profiling such as microarray or PCR.

Unsupervised approaches are needed for gene discovery that donot require sequence information for probe design.

First approach to gene discovery was sequencing of 3’ ends of cDNAclones (EST sequencing). Requires one read per clone.

Gene identification does not require sequences of 500 to 800 bp,but much shorter sequences of some 20 bp or less are sufficient.

Use long sequencing reads to cover many short fragments by one run.

New protocols to isolated short fragments from RNA.

Tag-based approaches in expression profiling and gene discovery.

Ref.: Harbers M and Carninci P: Tag-based approaches for transcriptome research and genome annotation. Nature Methods 2005 Jul 2(7): 495-502.

Tag-Based Approaches

A A A A AmRNACap

Anchoring enzyme sites

CAGE5’ SAGE

SAGE(5’ related)

SAGE(3’ related)

MPSSDGE

3’ SAGE

RNA-Seqor other shotgun approaches

5’ endCap selection

3’ endRemove poly(A)

Paired-end Tags or PETs

Serial Analysis Gene Expression (SAGE)(Digital Gene Expression (DGE))

Ref.: Velculescu VE et al. Serial analysis of gene expression. Science. 1995 Oct 20;270(5235):368-9, 371.

A A A A A…mRNAT T T T T T Biotin

Adaptor cDNABiotin

Biotin

Adaptor Adaptor

1st Strand cDNA Synthesis with biotinylated primer(Commonly starting from mRNA.)

Preparation of double-stranded cDNA and digestion with anchoring enzyme

Adaptor Ligation and digestion with Mme I (20 bp) or EcoP15I (27 bp)

Formation of “Di-Tags”(Di-Tags can be used for direct sequencing (DGE).)

Concatenation and cloning into plasmid vector(Classic sequencing of concatemers.)

Very well established and rich reference/annotation information.Digital expression profiling by “tag counting”.

Cap Analysis Gene Expression (CAGE)

Ref.: Kodzius R et al.: Cap analysis of gene expression: transcription start site mapping and expression profiling. Nature Methods 2006 Mar 3(3): 211-222.

A A A A AmRNA

A A A A AmRNAcDNA N N N N N N

1st Strand cDNA Synthesis(Covering poly(A-) mRNA and long mRNA.)

A A A A AmRNAcDNA

N N N N N N

5’-End Selection on Beads by Cap Trapper(Less bias due to chemical modification of Cap.)

cDNA N N N N N NAdaptor I

Adaptor Ligation and 2nd Strand Synthesis

cDNAAdaptor I

Digestion with Mme I (20 bp) or EcoP15I (27 bp)

TAGAdaptor I

Isolation of CAGE TAGs

TAGAdaptor I Adaptor II

3’-End Adaptor Ligation

5’ 3’ Commonly starting from 50g total RNA.

Preferably used for direct sequencing (>4,000,000 tags per run).

Cap Analysis Gene Expression (CAGE)

TF1 TF2 TF3

Signal 1 Signal 2 Signal 3

ChIPChIP

Exon 1

TSSTSS

CAGE TagsCAGE Tags

GenomeGenome 2 3 4 5

Tiling Tiling Array/RNAArray/RNA--SeqSeqMicroarrayMicroarraySAGESAGE

A A A A AA A A A AmRNAmRNA

RACERACE

CAGE tags experimentally link transcripts to their promoters.CAGE tags integrate information based on genome annotations.CAGE tags can be linked to whole genome tiling arrays and RNA-Seq data.CAGE tags can be linked to Chromatin IP/ChIP-Seq data.CAGE tags correlate with open chromatin.CAGE tags provide primer information for cloning new transcripts.

Classical DNA Sequencing by Chain-Termination Method

A G C T

A C C A

T G G T T G GT ACC AC G TT

Primer

DNA Template

DNA Polymerase

dNTP/ddNTP Mix

One reaction per nucleotide

T G G T T G GT ACC AC G TT

T G G T T G GT CC AC

T G G T T G

T G G T T G GT CC

A T G C

Analyze fragmentsby gel electrophoresis

DNA fragments fromPrimer extension reactions

Capillary Sequencer

Over 30 years the most important method in molecular biology.

Challenged by emerging new sequencing technologies: Next-Generation Sequencing.

Next-Generation Sequencing

Platform Mb per run/read length Method

Roche 454 Sequencing 100 Mb/250 bp/7h per run Emulsion PCR and Pyrosequencing

Illumina (Solexa) 1300 Mb/32-40bp/4 days per run Bridge PCR and sequencing-by-synthesis

ABI SOLiD 3000 Mb/35 bp/5 days per run Emulsion PCR and ligation-based sequencing

Helicos 25 to 90 Mb per h/up to 55 bp Single-molecule detection

Driven by the “$1000 genome” different companies are on the move to provide new sequencing

technologies based on “sequencing by synthesis” or “ligation-based sequencing”. Other approaches

may use hybridization methods or physical means in the future.

Ref.: Mardis ER. The impact of next-generation sequencing technology on genetics. Trends Genet. 2008 Mar;24(3):133-41. Epub 2008 Feb 11.von Bubnoff A. Next-generation sequencing: the race is on. Cell. 2008 Mar 7;132(5):721-3.

Example for Ligation-Based Sequencing: ABI SOLID System

DNA fragments havingadaptor sequences:

Genomic DNATag Sequencing

Project specific data analysis:Mapping to genome

Reference information

Images are the courtesy of ABI and were kindly provided by ABI Japan.

Example for Sequencing-by-Synthesis: Illumina 1G System

Images are the courtesy of Illumina and were kindly provided by Illumina Japan.

DNA per run0.1 ～1µg

Addition of 2 adaptors

Add to flow cell

Preparationof clusters

Cycle 1 Addition of the sequence reagent

5’3’

One base extension reaction

Cycle 2

Repetition of the above reactions

Removal of non-incorporated bases

Detect fluorescence signal

Removal of the fluorescence label

Cycle 3, 4, 5…..

Repetition of the above reaction

40,000,000 clusters on a flow cell

Where do we go from here?

Next-Generation Sequencing will push genome sequencing field forre-sequencing and de novo sequencing (“1000 Genome Project”).

Metagenomics (Environmental Genomics, Ecogenomics, or Community Genomics): Direct analysis of genetic materials obtainedfrom environmental samples.

Expression profiling: SAGE (DGE), CAGE, PET, RNA-Seq.

Analytical applications to identify functional regions/elements in genomes: ChIP-Seq, open chromatin, SNPs, splicing, others to come .

Analytical applications in mutation screens.

Analytical applications for detection of infectious agents.

Ref.: Mattick, J.S. "Challenging the dogma: The hidden layer of non-protein-coding RNAs on complex organisms" Bioessays. (2003) 25, 930-939.

Transcriptome Analysis: The Dominance of noncoding RNA

Genome sequencing and annotation did not tell us about the realextent of gene expression!

Tiling array experiments and deep sequencing by next-generationsequencing methods indicates that >90% of the genome is expressed.

Maybe 40 to 50% of the mRNA is not polyadenylated, and we did notanalyze it yet.

Most of the transcripts are potentially noncoding RNAs having unknown (regulatory ?) functions.

The definition of a “gene” may no longer hold with many differenttranscripts derived from same loci.

We do not understand the “hidden layers” regulating the utilization ofgenomic information.

Example for RNA-Seq in Yeast Saccharomyces pombe (fission yeast)

Illumina 1G sequencer; average read length 39.1 base, fragments from poly(A) mRNA

> 23 mil reads (~60 genome length) proliferating cells.

> 99 mil reads (~ 190 genome length) from five different stages.

Covering ~94% nuclear and > 99% of mitochondrial genome.

Confirmed expression from intergenic regions by RT-PCR.

Control experiments using whole genome tiling arrays (25 mer/20 nt intervals)confirmed identification novel transcripts (26 out of 453 may encode shortproteins).

Ref.: Wilhelm BT, Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution.Nature. 2008 Jun 26;453(7199):1239-43. Epub 2008 May 18.Graveley BR. Molecular biology: power sequencing. Nature. 2008 Jun 26;453(7199):1197-8.

Recent publications on the use of RNA-Seq include S. pombe, S. cerevisiae, Arabidopsis,

mouse tissues, mouse stem cells, and HeLa S3.

Examples for Genome Size (haploid)

Genome Length in bp Estimated gene number

Phi-X 174 5,386 10

Human mitochondrion 16,569 37

E. coli 4,639,221 4,377

Saccharomyces cerevisiae 12,495,682 5,770

Caenorhabditis elegans 100,258,171 19,427

Arabidopsis thaliana 115,409,949 ~28,000

Drosophila melanogaster 122,653,977 13,379

Humans 3.3 x 109 ~20,500

Amphibians 109–1011 ?

Values taken from: http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/G/GenomeSizes.html out of July 2007

Where are our limitations?

Mammalian genome size and transcriptome complexity:Enrichment of fragments e.g. using microarrays,Normalization and longer reads required.

Thus far uneven representation requires use of more than one method.

Requirements for starting materials (target is to analyze single cells).

No unified cDNA library method: using different methods depending on RNA length.

Very large data files and lack of computational analysis tools.

What is transcriptional noise?

Research dominated by “detection” rather than “functional analysis”.

Ref.: Struhl K. Transcriptional noise and the fidelity of initiation by RNA polymerase II. Nat Struct Mol Biol. 2007 Feb;14(2):103-5.

Present Strategies for Transcriptome Analysis

Interest has shifted to next-generation sequencing to profile transcriptionalactivities.

We cannot predict ends of transcripts, and therefore tag-based approaches to indentify start sites and termination sites are needed.

Identification of transcription start sites in combination with other information is driving “gene networks studies” and “system biology”.

RNA-Seq provides new means for the identification of splice sites andexpressed mutations.

We do not clone all those new transcripts, but there will be a need to getresources for functional analysis of new transcripts.

We are more than ever falling short on the functional analysis of new transcripts.Thus far we have not even analyzed all coding transcripts!

It is an exciting time to work on transcriptome analysis offering many challenges and rewards!

Contact:

Dr. Matthias Harbers

DNAFORM Inc.

Leading Venture Plaza-2, 75-1, Ono-choTsurumi-ku, Yokohama City, Kanagawa, 230-0046 Japan

E-mail: matthias.harbers@dnaform.jp

Phone： +81-(0)45-510-0607

FAX: +81-(0) 45-510-0608

URL: http://www.dnaform.jp

Approaches to cDNA Cloning and Analysis

Health & Medicine

Purification, Characterization, and cDNA Cloning of ... · Purification, Characterization, and cDNA Cloning of Profilin from Phaseolus vulgaris117 cDNA Library Screening Twenty-day-old

Purification, cDNA cloning and expression of a … cDNA cloning and expression of a cadmium-inducible cysteine-rich metallothionein-like protein from the marine sponge Suberites domuncula

Sequenceanalysis of the cDNA liver · Liver Glycogen Phosphorylase cDNA. A summary of the cloning strategy is shownin Fig. 1. Apartial rabbit muscle phosphorylase cDNA (6) encoding

Cloning, Sequencing, and Expression of a cDNA Encoding Rat ...Printed in U.S.A. Cloning, Sequencing, and Expression of a cDNA Encoding Rat Liver Mitochondrial Carnitine Palmitoyltransferase

pCDF cDNA Cloning and Expression Lentivectors User … · pCDF cDNA Expression Lentivectors Cat. #s CD100A-1 – CD111B-1 D. List of Components pCDF cDNA Cloning and Expression Lentivectors:

Molecular cloning and characterization of an α-amylase cDNA

Manual: cDNA Synthesis Kit, ZAP-cDNA Synthesis … Synthesis Kit, ZAP-cDNA Synthesis Kit, and ZAP-cDNA Gigapack III Gold Cloning Kit Instruction Manual Catalog #200400 (ZAP-cDNA Synthesis

Cloning by differential screening of a Xenopus cDNA coding for a

cDNA cloning of U1, U2, U4 and U5 snRNA families expressed in pea nuclei · cDNA cloning of U1, U2, U4 and U5 snRNA families expressed in pea nuclei Brian A.Hanley and Mary A.Schuler*

Cloning - pnas.org · cDNA Cloning. An Epo cDNA bank was constructed according to a modification of the general procedures of Okayama and Berg (31) by using the poly(A)+ mRNA described

Principles of cloning, vectors and cloning strategies · PDF filePrinciples of cloning, vectors and cloning strategies. ... a gene that permits selection, ... cDNA libraries

Cloning sequence of cDNA encoding aProc. Nati. Acad. Sci. USA Vol. 85, pp. 1782-1786, March 1988 Biochemistry Cloningandcompletenucleotide sequenceofafull-length cDNA encodingacatalytically

pCDH cDNA Cloning and Expression Lentivectors · The multiple cloning site (MCS) in the positive orientation of the bi-directional promoter which allows for convenient cloning of

CLONING, TRANSFORMATION AND EXPRESSION OF cDNA

Molecular cloning andnucleotide ofhuman glucocerebrosidase cDNA

AAVanced™ AAV Cloning and Expression Vectors · C. AAVanced Cloning and Expression vectors SBI provides a collection of cDNA cloning and expression vectors for various applications

Cloning of a cDNA encoding the rat high molecular weight

Cloning and characterization of the lectin cDNA clones from

1993 Au Molecular Cloning and Sequence Analysis of the CDNA for Ancrod C. Rhodostoma

Cloning and characterization of the cDNA encoding a novel human pre-B-cell colony-enhancing factor