1
Environmental Sequences Abstract We need your opinion, comments, suggestions regarding . . . - tools you need - our alignment strategy - our classification system - additional data subsets - other? Contact us at [email protected] User Input / Feedback 100 10 1000 100 25 75 50 1 Number of Species % of Type Strains with 16S rRNA Sequences 100 10 10000 100000 1000 100 25 75 50 1 Total Number of 16S rRNA Sequences % Environmental Sequences Type Strain Sequences Environmental Sequences : Future Enhancements Responding to New Methodologies and Your Needs High Throughput rRNA Analysis Pipeline assists researchers in carrying out comparative studies on thousands of rRNA sequences for - Analysis of microbial communities - Comparison of microbial communities - Discovery of novel microbes Sequence Accumulation What the Classifier does: Assigns 16s rRNA sequences to the taxonomic hierarchy proposed in Bergey's Manual of Systematic Bacteriology , 2nd edition, using type strain sequences as the training set Algorithm - Calculate the occurrences of all overlapping 8-base subsequences (words) for each genus from sequences in the training set. - Calculate the combined probability of all non- overlapping words in the query sequence for every genus, and assign the sequence to the genus that has the highest probability. - Repeat for all 8 "reading frames". Assign to the lowest taxonomic rank where all 8 agree. Classification Accuracy [based on exhaustive "leave-one-out" testing] A B C Phylum 99.4 0.4 0.2 Class 98.3 0.4 0.8 Order 96.5 0.7 1.1 Family 92.3 2.2 2.2 Genus 89.5 2.3 3.3 A) % correctly classified B) % uncertain classification at this rank but correctly classified at higher ranks C) % incorrectly classified at this rank but correctly classified at higher ranks Behind Preview Release 9: Harvesting and Alignment and A naive Bayesian Classifier Adding New Sequences: - Monthly GenBank search - Classify in the new Bergey’s Taxonomy (Garrity et al., 2002) using a new naïve Bayesian Classifier - Nomenclature checked and updated - Align sequences with new aligner - Release to the public Alignment Strategy: - Stochastic Context-Free Grammars (SCFG) based aligner (RNACAD; Brown, 2000) - Directly incorporates secondary structure information - Generates alignment comparable in quality to hand alignment (Cole et al., 2002) - Reduces potential for human error and unintended bias - Allows alignment updates to keep pace with growing number of sequences Citations: Brown, M. S. P. Small subunit ribosome RNA modeling using stochastic context-free grammars, Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology , pp. 57-6 (2000). Cole, J. R., T. G. Lilburn, R. J. Farris, P. R. Saxman, S. Chandra, B. Chai, S. Kulam, G. M. Garrity, T. M. Schmidt, and J. M. Tiedje. The RDP-II (Ribosomal Database Project) . 102nd General Meeting of American Society for Microbiology, R-14, 2002. (http://rdp.cme.msu.edu/pubs/NAR/ASM2002.pdf) Garrity, G. M., M.Winters, A. W. Kuo, and D. B. Searles, Taxonomic Outline of the Prokaryotes. Bergey’s Manual of Systematic Bacteriology , Second Edition. Release 2.0, January 2002. Springer-Verlag, New York. Tool Enhancements New Browser Options -- Choice of Taxonomy: RDP (Bergey's) or NCBI's Sequence Subsets: Type Strain and/or 1200 bases (near full length) Special Features: Turn off if you have a slow connection Your Sequence Cart: Continue browsing where you left off Updated Sequence Match -- Search all near-full-length bacterial sequences, or just type strain sequences (shown). More Hierarchy Features -- Hierarchy Browser with sequences selected for subalignment download. The Ribosomal Database Project - II (RDP-II) provides data, tools, and services related to ribosomal RNA sequences to the research community. Through its website (http://rdp.cme.msu.edu), RDP-II offers aligned and annotated rRNA sequence data, analysis services, and phylogenetic inferences (trees) derived from these data. RDP-II has recently taken steps to improve the timeliness of data releases to better serve our users. In addition to the current (8.1) RDP release, we are offering a preview version of release 9. This new release is being updated monthly and, as of November 2002, contains over 57,000 aligned (eu)bacterial small subunit rRNA sequences. Key is the use of new stochastic context free grammars to incorporate secondary structure information for auto alignment. 1 This automated aligning method makes it possible to update the alignment with the growing number of newly submitted rRNA sequences and should also reduce the potential for human error or unintended bias in the alignment process. To help users navigate through the large number of sequences, the preview release includes new versions of our Hierarchy Browser and Sequence Match programs. The new Hierarchy Browser displays sequences in a hierarchy consistent with the higher- level taxonomy proposed in the second edition of the Bergey's Manual of Systematic Bacteriology . 2 It features the additional options of searching/browsing using NCBI's taxonomy as an alternative, viewing only sequences from type strains (of high value since type strains connect phylogeny and taxonomy), and of hiding short partial sequences. The RDP-II email address for questions or comments is [email protected]. RDP-II is supported by a grant from DOE-OBER and the State of Michigan. Brown, M. P. S. Small subunit ribosomal RNA modeling using stochastic context-free grammars, Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology , pp. 57-66 (2000). Garrity, G. M., M. Winters, A. W. Kuo, and D. B. Searles, Taxonomic Outline of the Prokaryotes. Bergey's Manual of Systematic Bacteriology , Second Edition. Release 2.0, January 2002. Springer-Verlag, New York. 2 1 The RDP-II (Ribosomal Database Project): Previewing a New Bacterial Alignment That Allows Regular Updates and the New Prokaryotic Taxonomy R-033 http://rdp.cme.msu.edu J. R. Cole, B. Chai, R. J. Farris, Q. Wang, S. Chandra, S. Kulam, D. M. McGarrell, G. M. Garrity, J. M. Tiedje Center for Microbial Ecology, Michigan State University, East Lansing, MI 48824 Ribosomal Database Project (RDP-II) 2225A Biomedical - Physical Sciences Building Michigan State University East Lansing, MI 48824-4320 517-432-4998 / fax 517-353-8957 ATTN: J Cole Number of Aligned Sequences Offered by RDP

R-033 The RDP-II (Ribosomal Database Project): Ribosomal …rdp.cme.msu.edu/download/posters/ASMposter2003.pdf · 2010-09-22 · Bergey’s Manual of Systematic Bacteriology, Second

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: R-033 The RDP-II (Ribosomal Database Project): Ribosomal …rdp.cme.msu.edu/download/posters/ASMposter2003.pdf · 2010-09-22 · Bergey’s Manual of Systematic Bacteriology, Second

Environmental Sequences

Abstract

We need your opinion, comments, suggestions regarding . . .

- tools you need

- our alignment strategy

- our classification system

- additional data subsets

- other?

Contact us at [email protected]

User Input / Feedback

100 10 10001002575 50 1

Number of Species% of Type Strains with 16S rRNA Sequences

100 10 10000 10000010001002575 50 1

Total Number of 16S rRNA Sequences% Environmental Sequences

Type Strain Sequences Environmental Sequences

:

Future Enhancements

Responding to New Methodologies and Your Needs

High Throughput rRNA Analysis Pipeline assists researchers in carrying out

comparative studies on thousands of rRNA sequences for

- Analysis of microbial communities

- Comparison of microbial communities

- Discovery of novel microbes

Sequence Accumulation

What the Classifier does:

Assigns 16s rRNA sequences to the taxonomic hierarchy proposed in Bergey's Manual of

Systematic Bacteriology, 2nd edition, using type strain sequences as the training set

Algorithm

-Calculate the occurrences of all overlapping 8-base

subsequences (words) for each genus from sequences

in the training set.

-Calculate the combined probability of all non-

overlapping words in the query sequence for every

genus, and assign the sequence to the genus that has

the highest probability.

-Repeat for all 8 "reading frames". Assign to the lowest taxonomic rank where all 8 agree.

Classification Accuracy

[based on exhaustive "leave-one-out" testing]

A B C

Phylum 99.4 0.4 0.2Class 98.3 0.4 0.8Order 96.5 0.7 1.1Family 92.3 2.2 2.2Genus 89.5 2.3 3.3

A)% correctly classified

B)% uncertain classification at this rank but correctly classified at higher ranks

C)% incorrectly classified at this rank but correctly classified at higher ranks

Behind Preview Release 9:Harvesting and Alignment and A naive Bayesian Classifier

Adding New Sequences:

- Monthly GenBank search

- Classify in the new Bergey’s Taxonomy (Garrity et al., 2002) using a new naïve Bayesian Classifier

- Nomenclature checked and updated

- Align sequences with new aligner

- Release to the public

Alignment Strategy:

- Stochastic Context-Free Grammars (SCFG) based aligner (RNACAD; Brown, 2000)

- Directly incorporates secondary structure information

- Generates alignment comparable in quality to hand alignment (Cole et al., 2002)

- Reduces potential for human error and unintended bias

- Allows alignment updates to keep pace with growing number of sequences

Citations:

Brown, M. S. P. Small subunit ribosome RNA modeling using stochastic context-free grammars, Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, pp. 57-6 (2000).

Cole, J. R., T. G. Lilburn, R. J. Farris, P. R. Saxman, S. Chandra, B. Chai, S. Kulam, G. M. Garrity, T. M. Schmidt, and J. M. Tiedje. The RDP-II (Ribosomal Database Project). 102nd General Meeting of American Society for Microbiology, R-14, 2002. (http://rdp.cme.msu.edu/pubs/NAR/ASM2002.pdf)

Garrity, G. M., M.Winters, A. W. Kuo, and D. B. Searles, Taxonomic Outline of the Prokaryotes. Bergey’s Manual of Systematic Bacteriology, Second Edition. Release 2.0, January 2002. Springer-Verlag, New York.

Tool Enhancements

New Browser Options --

Choice of Taxonomy: RDP (Bergey's) or NCBI's

Sequence Subsets: Type Strain and/or ≥1200 bases (near full length)

Special Features: Turn off if you have a slow connection

Your Sequence Cart: Continue browsing where you left off

Updated Sequence Match --

Search all near-full-length bacterial sequences, or just type strain sequences (shown).

More Hierarchy Features --

Hierarchy Browser with sequences selected for subalignment download.

The Ribosomal Database Project - II (RDP-II) provides data, tools, and services

related to ribosomal RNA sequences to the research community. Through its website

(http://rdp.cme.msu.edu), RDP-II offers aligned and annotated rRNA sequence data,

analysis services, and phylogenetic inferences (trees) derived from these data. RDP-II

has recently taken steps to improve the timeliness of data releases to better serve our

users. In addition to the current (8.1) RDP release, we are offering a preview version of

release 9. This new release is being updated monthly and, as of November 2002,

contains over 57,000 aligned (eu)bacterial small subunit rRNA sequences. Key is the

use of new stochastic context free grammars to incorporate secondary structure

information for auto alignment.1 This automated aligning method makes it possible to

update the alignment with the growing number of newly submitted rRNA sequences and

should also reduce the potential for human error or unintended bias in the alignment

process. To help users navigate through the large number of sequences, the preview

release includes new versions of our Hierarchy Browser and Sequence Match programs.

The new Hierarchy Browser displays sequences in a hierarchy consistent with the higher-

level taxonomy proposed in the second edition of the Bergey's Manual of Systematic

Bacteriology.2 It features the additional options of searching/browsing using NCBI's

taxonomy as an alternative, viewing only sequences from type strains (of high value

since type strains connect phylogeny and taxonomy), and of hiding short partial

sequences. The RDP-II email address for questions or comments is [email protected].

RDP-II is supported by a grant from DOE-OBER and the State of Michigan.

Brown, M. P. S. Small subunit ribosomal RNA modeling using stochastic context-free grammars, Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, pp. 57-66 (2000).

Garrity, G. M., M. Winters, A. W. Kuo, and D. B. Searles, Taxonomic Outline of the Prokaryotes. Bergey's Manual of Systematic Bacteriology, Second Edition. Release 2.0, January 2002. Springer-Verlag, New York.

2

1

The RDP-II (Ribosomal Database Project):Previewing a New Bacterial Alignment That Allows Regular Updates and the New Prokaryotic Taxonomy

R-033http://rdp.cme.msu.edu

J. R. Cole, B. Chai, R. J. Farris, Q. Wang, S. Chandra, S. Kulam, D. M. McGarrell, G. M. Garrity, J. M. TiedjeCenter for Microbial Ecology, Michigan State University, East Lansing, MI 48824

Ribosomal Database Project (RDP-II)2225A Biomedical - Physical Sciences BuildingMichigan State UniversityEast Lansing, MI 48824-4320517-432-4998 / fax 517-353-8957 ATTN: J Cole

Number of Aligned Sequences Offered by RDP