86
Using Entrez The Life Sciences Search Engine

Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Embed Size (px)

Citation preview

Page 1: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Using Entrez

The Life Sciences Search Engine

Page 2: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Searching NCBI Databases Efficiently

• Knowing how to retrieve the exact information you need in an efficient way is the fundamental and most important skill in Bioinformatics.

• Every NCBI database is designed and created for some specific purposes.

• A common mistake Bioinformatics novices make is searching for information in an inappropriate database.

• Entrez links among and within databases, making it easier to search for information.

Page 3: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

What is Entrez?

• Entrez is an NCBI retrieval system designed for searching several linked databases.

• Entrez is a search tool for integrated access to the biological literature and sequence data.

• Entrez is extremely powerful, enabling the user to quickly move between the different specialized databases.

Page 4: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Entrez

• Entrez is divided into sites for nucleotide, protein, structure, genomes, OMIM, and more. You can use limits (such as RefSeq) to focus your Entrez search.

• When you conduct a search via Entrez, your query generates this screen, telling you the number of hits to your query.

Page 5: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

The Entrez System

Page 6: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

The Big Picture

LocusLink

Nucleotide

Protein

OMIM

PubMed

SNP

MGC

UCSC

GDB

e!

HGMD

UniGene

Homologene

MapViewer

Structure

3D Domains

CDD

Books

PopSet

Genome

Taxonomy

ProbeSet

UniSTS

Entrez

Page 7: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Entrez and LocusLink

• Entrez doesn’t link to all the databases that contain sequences, however!

• LocusLink has its own groups of links to specialty databases, since it doesn’t cover all the genomes yet.

Page 8: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Genomes

Taxonomy

Entrez:Database Integration

PubMed abstracts

Nucleotide sequences

Protein sequences

3-D Structure

3 -D Structure

Word weight

VAST

BLASTBLAST

Phylogeny

Page 9: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Entrez

Journals

UniGenePubMed Nucleotide

Protein

SNP

Genome

BooksProbeSet

OMIM

CDD

Taxonomy

3D Domains

UniSTS

PopSet

Structure

The (ever) Expanding Entrez System

Page 10: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Entrez DatabasesPubMed Biomedical literatureBooks Online textbooksNucleotide GenBank, EMBL, DDBJ, RefSeq, PDBProtein [GenBank, EMBL, DDBJ], RefSeq,

SWISS-PROT, PIR, PRF, PDBGenome Complete genomesTaxonomy Organisms in NCBI sequence databasesStructure MMDB: experimental 3D structuresDomains CDD: conserved protein domains3D Domains Compact 3D protein domains in MMDBOMIM Online Mendelian Inheritance in ManSNP Single nucleotide polymorphismsUniSTS Sequence Tagged Site markersProbeSet Gene expression and microarray datasetsPopSet Population study datasetsUniGene Gene-based expressed sequence clusters

Page 11: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Nucleotide Database

• The Nucleotide database contains sequence data from GenBank, EMBL, and DDBJ, the members of the tripartite, international collaboration of sequence databases.

• EMBL is the European Molecular Biology Laboratory at Hinxton Hall, UK;

• DDBJ is the DNA Database of Japan in Mishima, Japan.

• Sequence data are also incorporated from the Genome Sequence Data Base (GSDB), Santa Fe, NM.

• Patent sequences are incorporated through arrangements with the U.S. Patent and Trademark Office (USPTO) and via the collaborating international databases from other international patent offices.

Page 12: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Entrez Nucleotides

Primary • GenBank / EMBL / DDBJ 35,116,960

Derivative• RefSeq 259,219• Third Party Annotation 3,182

• PDB 4,703 Total 35,384,248

Page 13: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Database Searching with Entrez

Using limits and field restriction to find plant g6pdhLinking and neighboring with g6pdh

Page 14: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Entrez Nucleotides

glucose 6 phosphate dehydrogenase

The G6PD enzyme catalyzes the oxidation of glucose-6-phosphate to 6-phosphogluconate, while reducing nicotinamide adenine dinucleotide phosphate (NADP+ to NADPH). In terms of electron transfer, glucose-6-phosphate loses two electrons to become 6-phosphogluconate and NADP+ gains two electrons to become NADPH. This is the first step in the pentose phosphate pathway. This pathway, or shunt, as it is sometimes called, produces the 5- carbon sugar, ribose, which is an essential component of both DNA and RNA.

Page 15: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient
Page 16: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Limits Are Helpful

• Limits allow restriction of a search to a defined subset of the database.

• Limits can be set to restrict a search to a particular database field (e.g., the Author field).

• Limits can be set to search everything but a particular type of data (e.g., “exclude patent records”).

• Alternatively, limits can be set to search only a particular type of data (e.g., Genomic RNA/DNA) or to search only data from a particular source database (e.g., EMBL). Date limits and sequence length limits are also possible.

• The contents of each Entrez database differ, and therefore the Limits available for each database differ.

Page 17: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

glucose 6 phosphate dehydrogenase

Entrez Nucleotides: Limits & Preview/Index

Try using the Limits and Preview function to hone your searchTo find the Plant G6PD genes.

Page 18: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

glucose 6 phosphate dehydrogenase

Entrez Nucleotides: LimitsAccessionAll FieldsAuthor NameEC/RN NumberFeature keyFilterGene NameIssueJournal NameKeywordModification DateOrganismPage NumberPrimary AccessionPropertiesProtein NamePublication DateSeqID StringSequence LengthSubstance NameText WordTitle WordUidVolume

Field Restriction

Exclude bulk sequences

Page 19: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

glucose 6 phosphate dehydrogenase

Entrez Nucleotides: Limits

Title == Definition

Exclude Bulk Sequences

mRNA molecule type

Nuclear gene

Page 20: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Document Summaries: Limits

Page 21: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

green plants

Adding Terms: Preview/IndexAccessionAll FieldsAuthor NameEC/RN NumberFeature keyFilterGene NameIssueJournal NameKeywordModification DateOrganismPage NumberPrimary AccessionPropertiesProtein NamePublication DateSeqID StringSequence LengthSubstance NameText WordTitle WordUidVolume

green plants

Page 22: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Plant cytosolic g6pdh mRNAs

Page 23: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Database Neighbors and Interlinking

• What makes Entrez more powerful than many services is that most of its records are linked to other records, both within a given database (such as Nucleotide) and between databases.

• Links within a database are called “neighbors” (e.g., Nucleotide neighbors).

Page 24: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Links Between Databases

• Protein and Nucleotide neighbors are determined by performing similarity searches using the BLAST algorithm to compare the entry amino acid or DNA sequence to all other amino acid or DNA sequences in the database. We will discuss more about BLAST later.

• Nucleotide sequence records in the Nucleotide database are linked to the PubMed citation of the article in which the sequences were published.

• Protein sequence records are linked to the nucleotide sequence from which the protein was translated.

Page 25: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Plant cytosolic g6pdh mRNAsSummaryBriefGenBankASN.1FASTAGI listLinkOutPubMed LinksProtein LinksNucleotide NeighborsPopSet LinksStructure LinksGenome LinksTaxonomy LinksOMIM Links

Formats

Links and neighbors (related records)

Page 26: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

LinkOut

• LinkOut is a feature of Entrez that is designed to provide users with links from PubMed and other Entrez databases to a wide variety of relevant web-accessible online resources:– Full-text publications– Other biological databases– Consumer health information– Research tools

• The goal is to facilitate access to relevant online resources beyond the Entrez system to extend, clarify, or supplement information found in the Entrez databases.

Page 27: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Protein Database• The protein

database includes proteins from translate regions of DNA in GenBank as well as sequence from PIR

• The entry includes:– The name of the

protein– How the protein

sequence was derived

– An accession and a PID number

– The number of amino acids

Page 28: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Protein EntryThe Entry also

includes:• Structural

information for the protein (if known)– Helices and -

Sheets – Domains– Etc

• The sequence of amino acids comprising the protein

Page 29: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Setting Protein Database search limits• Choose Protein from

the drop-down menu– Can do a Boolean

search– Or can set LIMITS

• Fields (eg Author, Journal, etc.)

• Gene Location (genomic, mitochondrial etc)

• Segmented Sequence

• Only from (Database to check)

• Modification date

Page 30: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Linking Between Databases

• Sometimes you will pull up a record and you have no idea what organism the gene you are looking at is from.

• For Example, the following record- what is Medicago sativa ?

Page 31: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Entrez GenBank / GenPept

Page 32: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Taxonomy to the Rescue

• Entrez lets you click a live link from the record and determine what organism Medicago sativa is.

• It is alfalfa.• You can also tell what it is related to

taxonomically, because sometimes the common name isn’t very useful either!

Page 33: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Taxonomy Link

Page 34: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Advanced Neighbors: BLink

Page 35: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

What is BLink

• BLink - BLAST Link • Someone has done a BLAST search

already, and you can just retrieve it!• BLink displays the graphical output of pre-

computed blastp results against the protein non-redundant (nr) database.

Page 36: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

This graphical output includes:

• Alignment of up to 200 BLAST hits on the query sequence

• Best Hits to each organism • List of known protein domains in the query

sequence • Filter hits by selecting the BLAST cutoff score • Distribution of hits by taxonomic grouping • Display of similar sequences with known 3D

structure • Filter hits by database and/or by taxonomic

grouping • Display a taxonomic tree of all organisms with

similar sequences

Page 37: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

PopSet Links

• The PopSet database contains aligned sequences submitted as a set resulting from a population, phylogenetic, or mutation study.

• These alignments describe such events as evolution and population variation.

• The PopSet database contains both nucleotide and protein sequence data.

Page 38: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Protein Neighbors->PopSet Links

Page 39: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Protein Neighbors->Genome Links

Page 40: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

PopSet search results

• The results or a PopSet search

• The PopSet database includes alignments of genes from multiple organisms OR different gene families OR mutational analyses

Page 41: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

PopSet Entry• The PopSet

entry includes:– The title of

the paper/study

– The length of the sequence(s) aligned

– The number of aligned sequences

Page 42: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

PopSet Entry without alignment

• The PopSet Entry without an alignment– Title of the

study– The number

of sequences included

– Links to the sequences

Page 43: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Entrez Structures

Page 44: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Protein Structures can also be in databases

http://bmbiris.bmb.uga.edu/wampler/tutorial/prot0.html is a useful review

Tutorial.

Page 45: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Entrez links to structure databases

• The Structure database or Molecular Modeling Database (MMDB) contains experimental data from crystallographic and NMR structure determinations.

• The data for MMDB are obtained from the Protein Data Bank (PDB).

• The NCBI has cross-linked structural data to bibliographic information, to the sequence databases, and to the NCBI taxonomy.

• Use Cn3D, the NCBI 3D structure viewer, for easy interactive visualization of molecular structures from Entrez.

Page 46: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Structure Search results

• The structure of proteins are also in a database

• Search as before

• Your search results are similar

Page 47: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Structure Entry• The structure

Entry has links to the other databases

• And it will allow you download a file to open with a structure viewer program

Page 48: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

• Proteins with similar structures and functions have been identified in the databases

Page 49: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

BLink: Advanced Protein Neighbors

Page 50: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

BLink: Related Structures

Page 51: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Viewing Structure in Cn3D• You can

download Cn3D (a structural viewer program) from NCBI

• This will allow you to view the structures from the structure database

Page 52: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Cn3D Text Window

• The Text window of Cn3D will align two or more proteins so you can compare the structure of multiple proteins

Page 53: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

BLink: Human Homologue

Page 54: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Human RefSeqs: Genome Reagents

Page 55: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

MMDB: MMolecular MModeling Data Base

• Derived from experimentally determined PDB records

• Value added to PDB records including:– Addition of explicit chemical graph

information– Validation– Inclusion of Taxonomy, Citation, and other information– Conversion to ASN.1 data description

language• Structure neighbors determined by

Vector Alignment Search Tool (VAST)

Page 56: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Structure Summary

Cn3D viewer

Conserved Domains3D Domain Neighbors

Structure Neighbors

Page 57: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Cn3D 4.1

Page 58: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Cn3D 4.1: Structural Alignment

Casein kinase S. pombe

Src Kinase H. sapiens

Conserved ATP binding site

Page 59: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Cn3D: Simple Homology Modeling

human

swordtail

Page 60: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Using Cn3D to model domains

Page 61: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Other services and databases from the NCBI

• LocusLink to all possible information from NCBI and beyond for a few well characterized model organisms.

• LocusLink is a great starting point: it collects key information on each gene/protein from major databases. It now covers 8 organisms.

• RefSeq provides a curated, optimal accession number for each DNA (NM_006744) or protein (NP_007635)

Page 62: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Locus Links • Results of a Locus links search, includes:– Locus ID– Species – Locus symbol– Locus name– Locus location– Links

• Protein Database

• OMIM

• Reference Sequence

• Related GenBank Sequences

• Homologene Data

• UniGene

• Variation Data

Page 63: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

LocusLink: Selected Higher Genomes

OMIM

RefSeq

GenBank dbSNP

UniGene

Full report

PubMedHomoloGene

Map Viewer

Protein

Page 64: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Protein Database

• The Protein database contains sequence data from the translated coding regions from DNA sequences in GenBank, EMBL, and DDBJ as well as protein sequences submitted to:– Protein Information Resource (PIR)– SWISS-PROT– Protein Research Foundation (PRF)– Protein Data Bank (PDB) (sequences from solved

structures)

Page 65: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

NCBI Protein Databases

• GenPept GenBank, EMBL, DDBJ CDS translations

• RefSeq mRNA based (NP_) and genome based (XP_)

• Swiss-Prot curated high quality protein reviews

• PIR protein information resource Georgetown University

• PRF protein resource foundation

• PDB Protein Databank sequences from structures

Page 66: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Entrez Protein

• GenPept (GB,EMBL, DDBJ) 3,442,298 • RefSeq 856,191

• Third Party Annotation 3,834• Swiss Prot 144,508• PIR 282,821• PRF 12,079 Total 3,442,298

BLAST nr 1,642,191

Page 67: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Protein Link

BLAST Link

Conserved Domains

Page 68: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Related Proteins: Redundancy

Red

un

dan

t Seq

uen

ces

Page 69: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Sequence from MutL structure

Related Proteins: Links

Page 70: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

BLink: non-redundant relatives

Arabidopsis homolog

Conserved Domain

Page 71: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

MLH1 Domain Structure: CDD

ATPase Domain Mismatch Repair Domain

Page 72: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

MLH1: ATPase Domain

Page 73: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

1BGQ: ATPase Domain in Cn3D

Yeast HSP90ATP Binding site helix

Page 74: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Variations Human MLH1

Page 75: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

BLink

Finding structural models

Page 76: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Mapping Variation Onto Structure

Bacterial DNA mismatch repair proteins

Loads sequence alignment and structure in Cn3D

Page 77: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Mapping Variation Onto Structure

Conserved Asn

AsnIle

Ile – Val

Page 78: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

NCBI Genome Databases

• The Genome database provides views for a variety of genomes, complete chromosomes, sequence maps with contigs, and integrated genetic and physical maps.

Page 79: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Microbial Genomes

ZWF

Page 80: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Genome search results

• Genome Search Results

• The Genome database includes full (and some partial) genomes from viruses to complex organisms

Page 81: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Genome Entry

• Genome entries include– Maps of the

genome– Links to the

sequence– The organism

for the genome

Page 82: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Genes Database: All Genomes

Coming soon!

Page 83: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Genes Database: All Genomes

Page 84: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

Genes Database: All Genomes

Page 85: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

But wait! There’s more!

• There is even more at NCBI that I have covered here.

• This site map is also a guide to NCBI resources. Each link leads to a brief description of the resource on this page, then to the resource itself. http://www.ncbi.nlm.nih.gov/Sitemap/

Page 86: Using Entrez The Life Sciences Search Engine. Searching NCBI Databases Efficiently Knowing how to retrieve the exact information you need in an efficient

There are many bioinformatics servers outside NCBI.

• Try ExPASy’s sequence retrieval system at http://www.expasy.ch/

• (ExPASy = Expert Protein Analysis System)

• Or try ENSEMBL at www.ensembl.org for a premier human genome web browser.