Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Environmental Sequences
Abstract
We need your opinion, comments, suggestions regarding . . .
- tools you need
- our alignment strategy
- our classification system
- additional data subsets
- other?
Contact us at [email protected]
User Input / Feedback
100 10 10001002575 50 1
Number of Species% of Type Strains with 16S rRNA Sequences
100 10 10000 10000010001002575 50 1
Total Number of 16S rRNA Sequences% Environmental Sequences
Type Strain Sequences Environmental Sequences
:
Future Enhancements
Responding to New Methodologies and Your Needs
High Throughput rRNA Analysis Pipeline assists researchers in carrying out
comparative studies on thousands of rRNA sequences for
- Analysis of microbial communities
- Comparison of microbial communities
- Discovery of novel microbes
Sequence Accumulation
What the Classifier does:
Assigns 16s rRNA sequences to the taxonomic hierarchy proposed in Bergey's Manual of
Systematic Bacteriology, 2nd edition, using type strain sequences as the training set
Algorithm
-Calculate the occurrences of all overlapping 8-base
subsequences (words) for each genus from sequences
in the training set.
-Calculate the combined probability of all non-
overlapping words in the query sequence for every
genus, and assign the sequence to the genus that has
the highest probability.
-Repeat for all 8 "reading frames". Assign to the lowest taxonomic rank where all 8 agree.
Classification Accuracy
[based on exhaustive "leave-one-out" testing]
A B C
Phylum 99.4 0.4 0.2Class 98.3 0.4 0.8Order 96.5 0.7 1.1Family 92.3 2.2 2.2Genus 89.5 2.3 3.3
A)% correctly classified
B)% uncertain classification at this rank but correctly classified at higher ranks
C)% incorrectly classified at this rank but correctly classified at higher ranks
Behind Preview Release 9:Harvesting and Alignment and A naive Bayesian Classifier
Adding New Sequences:
- Monthly GenBank search
- Classify in the new Bergey’s Taxonomy (Garrity et al., 2002) using a new naïve Bayesian Classifier
- Nomenclature checked and updated
- Align sequences with new aligner
- Release to the public
Alignment Strategy:
- Stochastic Context-Free Grammars (SCFG) based aligner (RNACAD; Brown, 2000)
- Directly incorporates secondary structure information
- Generates alignment comparable in quality to hand alignment (Cole et al., 2002)
- Reduces potential for human error and unintended bias
- Allows alignment updates to keep pace with growing number of sequences
Citations:
Brown, M. S. P. Small subunit ribosome RNA modeling using stochastic context-free grammars, Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, pp. 57-6 (2000).
Cole, J. R., T. G. Lilburn, R. J. Farris, P. R. Saxman, S. Chandra, B. Chai, S. Kulam, G. M. Garrity, T. M. Schmidt, and J. M. Tiedje. The RDP-II (Ribosomal Database Project). 102nd General Meeting of American Society for Microbiology, R-14, 2002. (http://rdp.cme.msu.edu/pubs/NAR/ASM2002.pdf)
Garrity, G. M., M.Winters, A. W. Kuo, and D. B. Searles, Taxonomic Outline of the Prokaryotes. Bergey’s Manual of Systematic Bacteriology, Second Edition. Release 2.0, January 2002. Springer-Verlag, New York.
Tool Enhancements
New Browser Options --
Choice of Taxonomy: RDP (Bergey's) or NCBI's
Sequence Subsets: Type Strain and/or ≥1200 bases (near full length)
Special Features: Turn off if you have a slow connection
Your Sequence Cart: Continue browsing where you left off
Updated Sequence Match --
Search all near-full-length bacterial sequences, or just type strain sequences (shown).
More Hierarchy Features --
Hierarchy Browser with sequences selected for subalignment download.
The Ribosomal Database Project - II (RDP-II) provides data, tools, and services
related to ribosomal RNA sequences to the research community. Through its website
(http://rdp.cme.msu.edu), RDP-II offers aligned and annotated rRNA sequence data,
analysis services, and phylogenetic inferences (trees) derived from these data. RDP-II
has recently taken steps to improve the timeliness of data releases to better serve our
users. In addition to the current (8.1) RDP release, we are offering a preview version of
release 9. This new release is being updated monthly and, as of November 2002,
contains over 57,000 aligned (eu)bacterial small subunit rRNA sequences. Key is the
use of new stochastic context free grammars to incorporate secondary structure
information for auto alignment.1 This automated aligning method makes it possible to
update the alignment with the growing number of newly submitted rRNA sequences and
should also reduce the potential for human error or unintended bias in the alignment
process. To help users navigate through the large number of sequences, the preview
release includes new versions of our Hierarchy Browser and Sequence Match programs.
The new Hierarchy Browser displays sequences in a hierarchy consistent with the higher-
level taxonomy proposed in the second edition of the Bergey's Manual of Systematic
Bacteriology.2 It features the additional options of searching/browsing using NCBI's
taxonomy as an alternative, viewing only sequences from type strains (of high value
since type strains connect phylogeny and taxonomy), and of hiding short partial
sequences. The RDP-II email address for questions or comments is [email protected].
RDP-II is supported by a grant from DOE-OBER and the State of Michigan.
Brown, M. P. S. Small subunit ribosomal RNA modeling using stochastic context-free grammars, Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, pp. 57-66 (2000).
Garrity, G. M., M. Winters, A. W. Kuo, and D. B. Searles, Taxonomic Outline of the Prokaryotes. Bergey's Manual of Systematic Bacteriology, Second Edition. Release 2.0, January 2002. Springer-Verlag, New York.
2
1
The RDP-II (Ribosomal Database Project):Previewing a New Bacterial Alignment That Allows Regular Updates and the New Prokaryotic Taxonomy
R-033http://rdp.cme.msu.edu
J. R. Cole, B. Chai, R. J. Farris, Q. Wang, S. Chandra, S. Kulam, D. M. McGarrell, G. M. Garrity, J. M. TiedjeCenter for Microbial Ecology, Michigan State University, East Lansing, MI 48824
Ribosomal Database Project (RDP-II)2225A Biomedical - Physical Sciences BuildingMichigan State UniversityEast Lansing, MI 48824-4320517-432-4998 / fax 517-353-8957 ATTN: J Cole
Number of Aligned Sequences Offered by RDP