43
I6S RIBOSOMAL DNA SEQUENCE ANALYSIS Abdulrahman Mohammed School of Public Health & Zoonoses GADVASU

16S Ribosomal DNA Sequence Analysis

Embed Size (px)

Citation preview

Page 1: 16S Ribosomal DNA Sequence Analysis

I6S RIBOSOMAL DNA SEQUENCE ANALYSIS

Abdulrahman Mohammed

School of Public Health & Zoonoses

GADVASU

Page 2: 16S Ribosomal DNA Sequence Analysis

INTRODUCTION• The rRNA gene is the most conserved (least variable)

DNA in all cells. Portions of the rDNA sequence fromdistantly related organisms are remarkably similar. Thismeans that sequences from distantly relatedorganisms can be precisely aligned, making the truedifferences easy to measure. For this reason, genesthat encode the rRNA (rDNA) have been usedextensively to determine taxonomy, phylogeny(evolutionary relationships), and to estimate rates ofspecies divergence among bacteria. Thus thecomparison of 16s rDNA sequence can showevolutionary relatedness among microorganisms.

• Carl Woese, who proposed the three Domain system ofclassification - Archaea, Bacteria, and Eucarya - basedon such sequence information, pioneered this work

Page 3: 16S Ribosomal DNA Sequence Analysis

Note on terminology

• Several pieces of RNA are important for proper ribosome function.

• This RNA is not translated to protein, the ribosomal RNA is the active component.

• Thus we can refer to the “rRNA gene” or “rDNA” to designate the DNA in the genome that produces the ribosomal RNA.

Page 4: 16S Ribosomal DNA Sequence Analysis

Ribosomal RNA

Page 5: 16S Ribosomal DNA Sequence Analysis

Universal phylogenetic tree as determined from comparative ribosomal RNA sequencing.

Page 6: 16S Ribosomal DNA Sequence Analysis

• Although the three domains of living organisms were originally defined by ribosomal RNA sequencing, subsequent studies have shown that they differ in many other ways

• Large public databases available for comparison.

• Ribosomal Database Project currently contains >1.5 million rRNA sequences.

Page 7: 16S Ribosomal DNA Sequence Analysis

Detailed phylogenetic tree of the major lineages (phyla) of Bacteria based on 16S ribosomal RNA

sequence comparisons

Page 8: 16S Ribosomal DNA Sequence Analysis

RIBOSOMAL RNA

• To infer relationships that span the diversity of known life, it is necessary to look at genes conserved through the billions of years of evolutionary divergence.

• Examples of genes in this category are those that define the ribosomal RNAs (rRNAs).

• In Bacteria, Archaea, Mitochondria, and Chloroplasts, the small ribosomal subunit contains the 16S

• rRNA (where the S in 16S represents Svedberg units). The large ribosomal subunit contains two rRNA species (the 5S and 23S rRNAs).

Page 9: 16S Ribosomal DNA Sequence Analysis

• Most prokaryotes have three rRNAs, called the 5S, 16S and 23S rRNA. Bacterial 16S,

• 23S, and 5S rRNA genes are typically organized as a co-transcribed operon. There may

• be one or more copies of the operon dispersed in the genome (for example, E coli has

• seven). The Archaea contains either a single rDNAoperon or multiple copies of the operon

• rRNA targets were studied originally, most researchers now target the corresponding ribosomal DNA (rDNA) because DNA is more stable and easier to analyse

Page 10: 16S Ribosomal DNA Sequence Analysis

Secondary structureof small subunit ribosomal RNA

Page 11: 16S Ribosomal DNA Sequence Analysis

Types• In prokaryotes: 23S, 5S,16S

• In eukaryotes: 28S, 5.8S, 5S, 18S

Page 12: 16S Ribosomal DNA Sequence Analysis

rRNA gene sequencing

Page 13: 16S Ribosomal DNA Sequence Analysis

Stems (sites that rarely mutate & are conserved)

Loops(sites that are more free to mutate & evolve faster)

Use of primers to copy the 16S rDNA gene in bacteria

Ribosome synthesizing a protein

Bacterium with ribosomes

Campbell & Reece, 6th Ed.

Atomic structure of the small subunit a ribosome. The rRNA, shown in orange, helps match the mRNA (codon) to the tRNA (anticodon).

Small subunit ribosomal RNA

5’ 3’

Forward primer Reverse primer

Conserved region Variable regions

16S rDNA gene – codes for making SSU rRNA

FR

Copied DNA (using PCR)

Page 14: 16S Ribosomal DNA Sequence Analysis

Ribosomal RNAs in Prokaryotes:

NAME SIZE (NUCLEOTIDES) LOCATION

5S 120 Large subunit of ribosome

16S 1500 Small subunit of ribosome

23S 2900 Large subunit of ribosome

Page 15: 16S Ribosomal DNA Sequence Analysis

• The 16s rDNA sequence has hypervariable regions, wheresequences have diverged over evolutionary time.

• Strongly conserved regions often flank these hypervariableregions.

• Primers are designed to bind to conserved regions and amplifyvariable regions.

• The DNA sequence of the16S rDNA gene has been determinedfor an extremely large number of species. In fact, there is noother gene that has been as well characterized in as manyspecies.

• Sequences from tens of thousands of clinical and environmentalisolates are available over the Internet through the NationalCenter for Biotechnology Information (www.ncbi.nlm.nih.gov)and the Ribosomal Database Project (http://rdp.cme.msu.edu/).

• These sites also provide search algorithms to compare newsequences to their database.

Page 16: 16S Ribosomal DNA Sequence Analysis

Why is the small subunit rRNA gene so useful ?

Conserved in parts – highly variable in other parts. Thus it a very good phylogenetic marker

VERY large database of sequences

Cell have many ribosomes which can be targeted with probes (e.g. FISH, &TRFLP) for community analysis

16S rRNA gene sequencing is now the gold standard for community analysis

Page 17: 16S Ribosomal DNA Sequence Analysis

Which hyper-variable regions to sequence?

Region Position # b.p.

V1 69-99 30

V2 137-242 105

V3 338-533 195

V4 576-682 106

V5 822-879 57

V6 967-1046 79

V7 1117-1173 56

V8 1243-1294 51

V9 1435-1465 30

E.coli 16S SSU rRNA hyper-variable regions

Page 18: 16S Ribosomal DNA Sequence Analysis

454-based 16S amplicon sequencing

Page 19: 16S Ribosomal DNA Sequence Analysis

RFLP Fingerprinting Analysis• RFLP = restriction fragment length polymorphism• RFLP analysis involves cutting DNA into fragments using one

or a set of restriction enzymes. • For chromosomal DNA the RFLP fragments are separated by

gel electrophoresis, transferred to a membrane, and probed with a gene probe.

• One advantage of this fingerprinting technique is that all bands are bright (from chromosomal DNA) because they are detected by a gene probe. AP-PCR, ERIC-PCR, and REP-PCR all have bands of variable brightness and also can have ghost bands.

• For PCR products a simple fragment pattern can be distinguised immediately on a gel. This is used to confirm the PCR product or to distinguish between different isolates based on restriction cutting of the 16S-rDNA sequence “ribotyping”. Also developed into a diversity measurement technique called “TRFLP”.

Page 20: 16S Ribosomal DNA Sequence Analysis

TRFLP Analysis• TRFLP = (terminal restriction fragment length polymorphism

analysis)• A way to separate multiple PCR products of the same size.

These products can be generated by a 16S-rRNA PCR of community DNA

• The PCR is performed as usual with two primers, but one is fluorescently labeled

• The PCR products are then cut up using a restriction enzyme• The fluorescently labeled PCR pieces are detected• TRFLP steps:

1. Extract DNA

2. Perform 16S rRNA PCR using fluorescently-labeled primer

3. Choose a restriction enzyme for TRFLP that will give the

greatest diversity in restriction product size

Page 21: 16S Ribosomal DNA Sequence Analysis

Automated DNA analyzer

Gel electrophoresis analysis

Fragment Length

0 100 200 300 400 500 600 700

Rela

tive A

bun

dan

ce

0.00

0.02

0.04

0.06

0.08

0.10

Page 22: 16S Ribosomal DNA Sequence Analysis

TRFLP (Terminal Restriction Fragment Length Polymorphism)

• Mixed population is amplified using a 16S primer with a fluorescent tag

• PCR product is cut with a 4bp cutting restriction endonuclease

• Different sequences will give different length fragments

• Sample is injected into a capillary sequencer to sort fragments by sizefragment size

FU

cut with 4bp RE

Page 23: 16S Ribosomal DNA Sequence Analysis

TRFLP (cont.)

Advantages

Very sensitive

Fast, easy and cheap

Disadvantages

Can NOT cut bands to get sequence data

Requires capillary sequencer

Hard to distinguish noise from little peaks sometimes

Page 24: 16S Ribosomal DNA Sequence Analysis

Southern Blot Hybridization• SBH analysis is a method named after its developer, Southern, E, M. (1979)

that facilitates detection of a DNA fragment of interest among hundreds ofother fragments generated by REA

• Allows restriction digestion electrophoresis patterns to becomeinterpretable

• Restriction DNA fragments separated in agarose gel are transferred(blotted) onto a piece of nitrocellulose or nylon membrane

• The membrane is then exposed to a DNA probe that has been labeled witha molecule that facilitates visual detection of a selected target DNAfragment

• The probe, which is a piece of single-stranded DNA, specifically binds(hybridizes) to its complementary DNA sequence embedded in themembrane under appropriate conditions

• When the SBH typing method uses ribosomal operon genes (rrn) foundamong restriction-digested fragments in a membrane as the target, it iscalled ribotyping

Page 25: 16S Ribosomal DNA Sequence Analysis
Page 26: 16S Ribosomal DNA Sequence Analysis

GeneChip microarrays consist of small DNA fragments (referred to also as probes), chemically synthesized at specific locations on a coated quartz surface. By extracting, amplifying, and labeling nucleic acids from experimental samples, and then hybridizing those prepared samples to the array, the amount of label can be monitored at each feature, enabling either the precise identification of hundreds of thousands of target sequence (DNA Analysis) or the simultaneous relative quantitation of the tens of thousands of different RNA transcripts, representing gene activity(Expression Analysis).

MicroarraysConstructed using probes for a known nucleic acid sequence or for a series of targets, a nucleic acid sequence whose abundance is being detected.

The intensity and color of each spot provide information on the specific gene from the tested sample.

Page 27: 16S Ribosomal DNA Sequence Analysis

DNA extractionPCR

Gel electrophoresis

Bacteria identification

ACAGATGTCTTGTAATCCGGCCGTTGGTGGCATAGGGAAAGGACATTTAGTGAAAGAAATTGATGCGATGGGTGGATCGATGGCTTATGCTATCGATCAATCAGGAATTCAATTTAGAGTACTTAATAGTAGCAAAGGAGCTGCTGTTAGAGCAACACGTGCTCAGGCAGATAAAATATTATATCGTCAAGCAATACGTAGTATTCTTGAATATCAAAAATTTTTGTTGGTTATTCA

DNA sequencing

Bioinformatics

Page 28: 16S Ribosomal DNA Sequence Analysis

Secondary structure of 16S rRNA in E. coli

Molecular Phylogenetics

Step 1. Select a DNA region that is homologous, or similar across species due to common ancestry.

Ribosomal RNA (rRNA)

Ideal gene for phylogenetic studies because it :

• is an essential gene that is present in all organisms.

• is a common target for sequencing studies; large database for comparisons.

• contains sites that are relatively conserved (stems) and sites that are more free to vary (loops).

Page 29: 16S Ribosomal DNA Sequence Analysis

2. Amplify and Sequence this region across isolates….

ACAGATGTCTTGTAATCCGGCCGTTGGTGGCATAGGGAAAGGACATTTAGTGAAAGAAATTGATGCGATGGGTGGATCGATGGCTTATGCTATCGATCAATCAGGAATTCAATTTAGAGTACTTAATAGTAGCAAAGGAGCTGCTGTTAGAGCAACACGTGCTCAGGCAGATAAAATATTATATCGTCAAGCAATACGT

ACAGATGTCTTGTAATCCGGCCGTTGGTGGCATAGGGAAAGGACATTTAGTGAAAGAAATTGATGCGATGGGTGGATCGATGGCTTATGCTATCGATCAATCAGGAATTCAATTTAGAGTACTTAATAGTAGCAAAGGAGCTGCTGTTAGAGCAACACGTGCTCAGGCAGATAAAATATTATATCGTCAAGCAATACGT

ACAGATGTCTTGTAATCCGGCCGTTGGTGGCATAGGGAAAGGACATTTAGTGAAAGAAATTGATGGTFTGGGTGGATCGATGGCTTATGCTATCGATCAATCAGGAATTCAATTTAGAGTACTTAATAGTAGCAAAGGAGCTGCTGTTAGAGCAACACGTGCTCAGGCAGATAAAATATTATATCGTCAAGCAATACGT

ACAGATGTCTTGTAATCCGGCCGTTGGTGGCATAGGGAAAGGACATTTAGTGAAAGAAATTGATGCGATGGGTGGATCGATGGCTTATGCTATCGATCAATTTAGAATTCAATTTAGAGTACTTAATAGTAGCAAAGGAGCTGCTGTTAGAGCAACACGTGCTCAGGCAGATAAAATATTATATCGTCAAGCAATACGT

Sequence thePCR product

PCR

Page 30: 16S Ribosomal DNA Sequence Analysis

3. Sequence alignment is crucial for inferring how DNA sites have changed.

Poor alignmentImplies that species “I” is divergent from the others, but this is not the case.

Good alignment.Species “I” has probably experienced a deletion event at position #6 or #7.

Page 31: 16S Ribosomal DNA Sequence Analysis

4. Estimate relationships based on extent of DNA similarity.

G

B

C

D

A

J

F

E

K

H

I

ATGTTGGCAGTCCGATGTAAGC

ATGTTGGCAGTCCGATGTAAGC

ATGTTGGCAGTCCGATGTAACC

ACGGTAGCAGTCTGATGTATCC

ACGGTAGCAGTCTGATGTATCC

ACGGTAGCAGTCTGATGTATCC

CTGCTGGTAGTCGTTTGTAACC

CTGCTGGTAGTCGTTTGTAACC

CTGCTGGCAGTCGGTTGTAACC

ATGCTGGCAGTCGGGTGTAACC

ATGGTGGCAGTCGGGTGTCACCColored letters = different from top sequence (taxon G)

Molecular phylogeny of taxa A-I.

At variable DNA positions, related groups will tend to share the same nucleotide.

The sheer number of characters is helpful to distinguish the

‘phylogenetic signal’ from noise.

Page 32: 16S Ribosomal DNA Sequence Analysis

Example: Molecular phylogenies have revealed unexpected features of bacterial evolution.

For instance, an endosymbiotic lifestyle has evolved several times independently.

Moran and Wernegreen (2000)

Page 33: 16S Ribosomal DNA Sequence Analysis

2. Amplify and Sequence this region across isolates….

ACAGATGTCTTGTAATCCGGCCGTTGGTGGCATAGGGAAAGGACATTTAGTGAAAGAAATTGATGCGATGGGTGGATCGATGGCTTATGCTATCGATCAATCAGGAATTCAATTTAGAGTACTTAATAGTAGCAAAGGAGCTGCTGTTAGAGCAACACGTGCTCAGGCAGATAAAATATTATATCGTCAAGCAATACGT

ACAGATGTCTTGTAATCCGGCCGTTGGTGGCATAGGGAAAGGACATTTAGTGAAAGAAATTGATGCGATGGGTGGATCGATGGCTTATGCTATCGATCAATCAGGAATTCAATTTAGAGTACTTAATAGTAGCAAAGGAGCTGCTGTTAGAGCAACACGTGCTCAGGCAGATAAAATATTATATCGTCAAGCAATACGT

ACAGATGTCTTGTAATCCGGCCGTTGGTGGCATAGGGAAAGGACATTTAGTGAAAGAAATTGATGGTFTGGGTGGATCGATGGCTTATGCTATCGATCAATCAGGAATTCAATTTAGAGTACTTAATAGTAGCAAAGGAGCTGCTGTTAGAGCAACACGTGCTCAGGCAGATAAAATATTATATCGTCAAGCAATACGT

ACAGATGTCTTGTAATCCGGCCGTTGGTGGCATAGGGAAAGGACATTTAGTGAAAGAAATTGATGCGATGGGTGGATCGATGGCTTATGCTATCGATCAATTTAGAATTCAATTTAGAGTACTTAATAGTAGCAAAGGAGCTGCTGTTAGAGCAACACGTGCTCAGGCAGATAAAATATTATATCGTCAAGCAATACGT

Sequence thePCR product

PCR

Page 34: 16S Ribosomal DNA Sequence Analysis

3. Sequence alignment is crucial for inferring how DNA sites have changed.

Poor alignmentImplies that species “I” is divergent from the others, but this is not the case.

Good alignment.Species “I” has probably experienced a deletion event at position #6 or #7.

Page 35: 16S Ribosomal DNA Sequence Analysis

4. Estimate relationships based on extent of DNA similarity.

G

B

C

D

A

J

F

E

K

H

I

ATGTTGGCAGTCCGATGTAAGC

ATGTTGGCAGTCCGATGTAAGC

ATGTTGGCAGTCCGATGTAACC

ACGGTAGCAGTCTGATGTATCC

ACGGTAGCAGTCTGATGTATCC

ACGGTAGCAGTCTGATGTATCC

CTGCTGGTAGTCGTTTGTAACC

CTGCTGGTAGTCGTTTGTAACC

CTGCTGGCAGTCGGTTGTAACC

ATGCTGGCAGTCGGGTGTAACC

ATGGTGGCAGTCGGGTGTCACCColored letters = different from top sequence (taxon G)

Molecular phylogeny of taxa A-I.

At variable DNA positions, related groups will tend to share the same nucleotide.

The sheer number of characters is helpful to distinguish the

‘phylogenetic signal’ from noise.

Page 36: 16S Ribosomal DNA Sequence Analysis

Example: Molecular phylogenies have revealed unexpected features of bacterial evolution.

For instance, an endosymbiotic lifestyle has evolved several times independently.

Moran and Wernegreen (2000)

Page 37: 16S Ribosomal DNA Sequence Analysis

How does this organism fit into the world of available sequence data?

ACAGATGTCTTGTAATCCGGCCGTTGGTGGCATAGGGAAAGGACATTTAGTGAAAGAAATTGATGCGATGGGTGGATCGATGGCTTATGCTATCGATCAATCAGGAATTCAATTTAGAGTACTTAATAGTAGCAAAGGAGCTGCTGTTAGAGCAACACGTGCTCAGGCAGATAAAATATTATATCGTCAAGCAATACGT

PCR Sequence thePCR product

“Blast” sequence to Genbank

GENBANK = NIH genetic database with all publicly available DNA sequences. As of 2004: > 44 billion bp, and > 40 million sequences

Blast output:Lists sequences that are most similar to yours

Page 38: 16S Ribosomal DNA Sequence Analysis

?

ACAGATGTCTTGTAATCCGGCCGTTGGTGGCATAGGGAAAGGACATTTAGTGAAAGAAATTGATGCGATGGGTGGATCGATGGCTTATGCTATCGATCAATCAGGAATTCAATTTAGAGTACTTAATAGTAGCAAAGGAGCTGCTGTTAGAGCAACACGTGCTCAGGCAGATAAAATATTATATCGTCAAGCAATACGT

Is the bacterium really Wolbachia?

PCR and sequence a gene of interest (e.g., 16S rDNA)

Blast results:Wolbachia sp. 1Wolbachia sp. 2Wolbachia sp. 3….

YES!!

“Blast” sequence to Genbank

Page 39: 16S Ribosomal DNA Sequence Analysis

Some Databases• National Center for Biotechnology Information

(www.ncbi.nlm.nih.gov)

• Ribosomal Database Project II (http://rdp.cme.msu.edu/html/)

• Ribosomal Differentiation of Medical Microorganisms (www.ridom.com)

• MicroSeq 16S 500 Library (Applied Biosystems)

• GenBank

• Mayo Database

Page 40: 16S Ribosomal DNA Sequence Analysis

Guidelines for interpretation of 16S rRNA genesequence-based results for identification of

medically important aerobic Gram-positive bacteria(Woo et al., 2009)Full and 527 bp 16S rRNA gene sequencing and MicroSeqdatabases used for identifying medically important aerobic Gram-positive bacteria. Overall, full and 527 bp 16S rRNAgene sequencing can identify 24 and 40 % of medically important Gram-positive cocci (GPC), and 21 and 34 % of medically important Gram-positive rods (GPR) confidently to the species level, whereas the full-MicroSeq and 500-MicroSeq databases can identify 15 and 34 % of medically important GPC and 14 and 25 % of medically important GPR confidently to the species level. Among staphylococci, streptococci, enterococci, mycobacteria, corynebacteria, nocardia and members of Bacillus and related taxa(Paenibacillus, Brevibacillus, Geobacillus and Virgibacillus), the methods and databases are least useful for identification of staphylococci and nocardia.

Page 41: 16S Ribosomal DNA Sequence Analysis

Only 0–2 and 2–13 % of staphylococci, and 0 and 0–10 % of nocardia, can be confidently and doubtfully identified, respectively. However, these methods and databases are most useful for identification of Bacillus and related taxa, with 36–56 and 11–14 % of Bacillus and related taxa confidently and doubtfully identified, respectively. A total of 15 medically important GPC and 18 medically important GPR that should be confidently identified by full 16S rRNA gene sequencing are not included in the full-MicroSeq database. A total of 9 medically important GPC and 21 medically important GPR that should be confidently identified by 527 bp 16S rRNA gene sequencing are not included in the 500-MicroSeq database. 16S rRNA gene sequence results of Gram-positive bacteria should be interpreted with basic phenotypic tests results. Additional biochemical tests or sequencing of additional gene loci are often required for definitive identification. To improve the usefulness of the MicroSeq databases, bacterial species that can be confidently identified by 16S rRNA gene sequencing but are not found in the MicroSeq databases should be included.

Page 42: 16S Ribosomal DNA Sequence Analysis

Definitions

“A bacterium species is defined as ‘confidently identified by16S rRNA gene sequencing’ if there is >3% differencebetween the16S rRNA gene sequence of the species andthose of other medically important bacteria species. Abacterium species is defined as ‘not confidently identifiedby 16S rRNA gene sequencing’ if there is <2% differencebetween the 16S rRNA gene sequence of the species andthat of one or more medically important aerobic Gram-positive bacterium species. A bacterium species is definedas ‘only doubtfully identified by 16S rRNA gene sequencing’if there is 2–3 % difference between the 16S rRNA genesequence of the species and that of one or more medicallyimportant aerobic Gram-positive bacterium species. (Wooet al., 2009)

Page 43: 16S Ribosomal DNA Sequence Analysis

THANK YOU FOR YOUR ATTENTION