127
Biological Databases Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University [email protected]

Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University [email protected]

  • View
    215

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

Biological DatabasesBiological Databases

November 30, 2006

Wailap V. Ng

Institute of Biotechnology in MedicineInstitute of Bioinformatics

National Yang Ming [email protected]

Page 2: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

• DNA (Deoxyribonucleic acid)

• RNA (Ribonucleic acid)

- mRNA (Messenger RNA)

- rRNA (Ribosomal RNA)

- tRNA (Transfer RNA)

• Proteins- Enzymes

- Structural proteins

- Regulatory proteins

- Transporters

Macromolecules Related to Bioinformatics

Page 3: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

A C G T G A A C CT

A C G U G A A C CU

G A V L I S T C DM E N Q R K F H Y PW

Nucleic acid and protein sequences store the essential bioinformation

DNA (A, C, G, T)

RNA (A, C, G, U)

Protein (20 amino acids)

A C G T G A A C CT

A C G U G A A C CU

G A V L I S T C DM E N Q R K F H Y PW

Page 4: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

DNA mRNAs Proteins

Replication

Transcription Translation

Central DogmaCentral Dogma

Page 5: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

Basic structure of a bacterial gene

Transcription

mRNA

Translation

protein

P GeneDNA

5’ 3’

5’ 3’

N C

Page 6: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

Stop codon (TAA)

Start codon (ATG)

A gene is a segment of DNA with an upstream start codon and a downstream stop codon that codes for the sequence of a polypeptide (protein)

Page 7: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

Information in Biological Databases

• DNA and protein sequences• Protein structures• Expression data (microarray, SAGE, etc.)• Biological pathways• Subcellular location of proteins• Protein-protein interactions • Molecular medicine• Literature• etc.

Page 8: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

International Union of Pure and Applied Chemistry (IUPAC) codes for nucleotides and amino acids

Page 9: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

IUPAC nucleotide code Base

A Adenine

C Cytosine

G Guanine

T (or U) Thymine (or Uracil)

R A or G (purine)

Y C or T (pyrimidine)

S G or C

W A or T

K G or T

M A or C

B C or G or T

D A or G or T

H A or C or T

V A or C or G

N any base

. or - gap

Page 10: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

IUPAC amino acid code

Three letter code

Amino acid

A Ala Alanine

C Cys Cysteine

D Asp Aspartic Acid

E Glu Glutamic Acid

F Phe Phenylalanine

G Gly Glycine

H His Histidine

I Ile Isoleucine

K Lys Lysine

L Leu Leucine

M Met Methionine

N Asn Asparagine

P Pro Proline

Q Gln Glutamine

R Arg Arginine

S Ser Serine

T Thr Threonine

V Val Valine

W Trp Tryptophan

Y Tyr Tyrosine

Page 11: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

The Origins of Protein Sequence Databases

* Protein sequencing (Sanger and Tuppy, 1951)

• Atlas of Protein Sequence and Structure (Margaret Dayhoff and National Biomedical Research Foundation (NBRF) (1965-1978)

• Protein Information Resource (PIR) (NBRF, 1984 - present)

• PIR-International Protein Sequence Database (NBRF, MIPS, and JIJPID, 1988 – present)

Page 12: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

The Origins of DNA Sequence Databases

* DNA double-helix structure (James Watson and Francis Crick, 1953)

* Recombinant DNA (Paul Berg et al., 1972)

* DNA sequencing (Maxim and Gilbert; Sanger - 1977)

• GenBank [Walter Goad et al., 1979 (prototype); 1982 -1992, LANL (Los Alamos National Lab.)]

• EMBL Data Library [1982 (1980) – present] – UK

• DDBJ [1986 (1984) – present] - Japan

Page 13: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

http://www.infobiogen.fr/services/dbcat/ (Site closed)

Number of biological databases in 2005

Page 14: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

Major Bioinformation Resources

• NCBI – National Institute of Health

• EMBL – European Bioinformatics Institute

• DDBJ – National Institute of Genetics (Japan)

• Expasy – Swiss Institute of Bioinformatics

• GenomeNet – Koyoto University

Page 15: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

NCBI molecular databases

Page 16: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

Nucleotide Sequence Databases Consist of the Following Sequences:

• DNA fragments

• cDNA [Expressed Sequence Tags (EST) and full length cDNA sequences - partial and complete mRNA]

• Genomes

Nucleic acid sequences provide the fundamental starting point for describing and understanding the structure, function, and development of genetically diverse organisms.

Page 17: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

Common Sequence File Formats• Fasta

• GenBank (DNA) or GenPept (Protein)

Each sequence has at least one unique number to allow you to retrieve it from the public db – e.g. Accession Number, gi_number, g

ene_ID, protein_ID, locus name, etc.

Page 18: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

>gi|43500|emb|Y00534.1|HHGVPA Halobacterium halobium gvpA gene for major gas vesicle protein AAGCTTTACACTCTCCGTACTTAGAAGTACGACTCATTACAGGAGACATAACGACTGGTGAAACCATACACATCCTTATGTGATGCCCGAGTATAGTTAGAGATGGGTTAATCCCAGATCACCAATGGCGCAACCAGATTCTTCAGGCTTGGCAGAAGTCCTTGATCGTGTACTAGACAAAGGTGTCGTTGTGGACGTGTGGGCTCGTGTGTCGCTTGTCGGCATCGAAATCCTGACCGTCGAGGCGCGGGTCGTCGCCGCCTCGGTGGACACCTTCCTCCACTACGCAGAAGAAATCGCCAAGATCGAACAAGCCGAACTTACCGCCGGCGCGAGGCGGCACCCGAGGCCTGACGCACAGGCCTCCCTTCGGCCGGCGTAAGGGAGGTGAATCGCTTGCAAACCATACTTTAACACCT TCTCGGGTAC

DNA sequence in FASTA format

Page 19: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

DNA sequence in GenBank format

Page 20: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

Nucleotide Sequence DatabasesNucleotide Sequence Databases

• GenBank – NCBI (National Center for Biotechnology Information)

http://www.ncbi.nlm.nih.gov/

• EMBL (European Molecular Biology Laboratory) – EBI

http://www.ebi.ac.uk/

• DDBJ (DNA Data Bank of Japan) – NIG (National Institute of Genetics)

http://www.ddbj.nig.ac.jp/

Page 21: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

When did the collaboration start?

In February, 1986, GenBank and EMBL began a collaborative effort [joined by DDBJ in 1987] to devise a common feature table format and common standards for annotation practice.

INSDCINSDC

International Nucleotide Sequence Database CollaborationInternational Nucleotide Sequence Database Collaboration

Page 22: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

August 2005

Page 23: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

National Center for Biotechnology Information (NCBI)

- Established in 1988

- Part of the National Library of Medicine, NIH, USA

- Creates public databases, conducts research in computational biology, develops software tools for analyzing genome data, and disseminates biomedical information

- Host to the GenBank nucleotide sequence database since 1992 (1982 -1992, LANL)

Page 24: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 25: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

NCBI Nucleotide Databases

• GenBank - INSDC collected DNA sequences• RefSeq - a comprehensive, integrated, non-redundant set of seq

uences, for major research organisms

• dbEST - contains sequence data on "single-pass" cDNA sequences (Expressed Sequence Tags)

• UniGene - a non-redundant set of gene-oriented clusters of automatically partitioned from GenBank sequences

• dbSTS - sequence & mapping data on short genomic landmark sequences or Sequence Tagged Sites (PCR primer pairs)

• UniSTS - a comprehensive db of STSs derived from STS-based maps and other experiments

Page 26: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

NCBI Nucleotide Databases (continued)

• dbSNP – Single nucleotide polymorphism database

• dbGSS - Genome survey sequence database

• PopSet - a set of DNA sequences collected to analyze the evolutionary relatedness of a population

• TPA - Third party annotation sequences

• Nucleotide - Entrez Nucleotides database of GenBank, RefSeq, and PDB sequences

• Trace Archive – Raw DNA sequence trace files

• HomoloGene – A system for automated detection of homologs among the annotated genes of several completely sequenced eukaryotic genomes

• http://www.ncbi.nlm.nih.gov/Database/

Page 27: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

NCBI Nucleotide Databases (continued)• MGC (Mammalian Gene Collection; )

Page 28: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

cDNA Sequence Related Databases

dbEST

Unigene

TIGR THC

Full-Length cDNA Sequences

Page 29: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

What is dbEST?

dbEST (Nature Genetics 4:332-3;1993) is a division of GenBank that contains sequence data and other information on "single-pass" cDNA sequences, or Expressed Sequence Tags, from a number of organisms.

Page 30: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

Transcription

DNA

mRNA

cDNA

Reverse Transcription

DNA sequencing

EST

Page 31: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

cDNA sequencing is a powerful tool for quick identification of new genes

Page 32: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 33: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

cDNA Sequence Related Databases

dbEST

Unigene

TIGR THC, Human Gene Index

Full-Length cDNA Sequences

Page 34: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

AAAAAAAAAAA

AAAAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAAA

mRNA

Transcription

cDNA cloning

Gene

cDNA sequencing

ESTs

EST clustering

Unigene

Page 35: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 36: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 37: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 38: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 39: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 40: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

Expression profile

Page 41: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

cDNA Sequence Related Databases

dbEST

Unigene

THC (Tentative human consensus sequences) - The Institute for Genome Research (www.tigr.org)

Full-Length cDNA Sequences

Page 42: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 43: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

cDNA Sequence Related Databases

dbEST

Unigene

THC (Tentative human consensus sequences) - The Institute for Genome Research (www.tigr.org)

Full-Length cDNA Sequences

Page 44: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

AAAAAAAAAAA

AAAAAAAAAAAAAA

AAAAAAAAAAAA

AAAAAAAAAA

mRNA

Transcription

Gene

Sequence assemblyFull-length cDNA sequence

DNA sequencing

ESTs

Full-length cDNA clone

cDNA cloning

Page 45: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 46: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

http://hinv.ddbj.nig.ac.jp/

Page 47: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

Protein Sequence Databases

Page 48: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

Origins of Protein Sequences

Derived from:

• DNA fragment sequences

• mRNA sequences

• ESTs

• Genomes

Page 49: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

Database name

Full name and/or description

NCBI Protein database

All protein sequences: translated from GenBank and imported from other protein databases

PIR-PSDProtein Information Resource Protein Sequence Database, has been merged into the UniProt knowledgebase - Georgetown University

PIR-NREFPIR's Non-redundant Reference protein database - Georgetown University

PRFProtein research foundation database of peptides: sequences, literature and unnatural amino acids - Japan

Swiss-ProtNow UniProt/Swiss-Prot: expertly curated protein sequence database, section of the UniProt knowledgebase - Swiss Institute of Bioinformatics

TrEMBLNow UniProt/TrEMBL: computer-annotated translations of EMBL nucleotide sequence entries: section of the UniProt knowledgebase - SIB

UniProtUniversal protein knowledgebase: merged data from Swiss-Prot, TrEMBL and PIR protein sequence databases – GU, SIB, EMBL

UniRefUniProt non-redundant reference database: clustered sets of related sequences (including splice variants and isoforms) – GU, SIB, EMBL

Page 50: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

Protein sequence in FASTA format

Page 51: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

Protein sequence in GenPept format – example 1

Page 52: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

Protein sequence in GenPept format – example 2

Page 53: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

Protein sequence in UniProt/SwisProt format

Page 54: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

Other Biological Databases

• Protein-Protein Interaction

• Gene Ontology

• Biological Pathways

• Protein structures

• Orthologs

• Gene expression

• Literature

Page 55: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

Protein-Protein Interaction Databases

• Most proteins do not work alone in the cell

• Utilize the concept of ‘guilt by association’ to discover the functions of previously uncharacterized proteins

Page 56: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

Figure 1: (A) An interaction map of the yeast proteome assembled from published interactions.The map contains 1,548 proteins and 2,358 interactions. Proteins are colored according to their functional role as defined by the Yeast Protein Database16; proteins involved in membrane fusion (blue), chromatin structure (gray), cell structure (green), lipid metabolism (yellow), and cytokinesis (red). For other maps with different functional groups highlighted, see <http://depts.washington.edu/sfields/>. On-line maps can also be zoomed and searched for protein names. (B) Section of part A showing the clustering of proteins involved in membrane fusion (blue), lipid metabolism (yellow), and cell structure (green).

Schwikowski et al. 2000. Nat. Biotech.

Page 57: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

Protein interaction map of Drosophila melanogasterGiot et al. Science 302:1727-36, 2003

7,048 proteins & 20,405 interactions

Page 58: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

Integrated physical-interaction network. Nodes represent genes and are labeled with their corresponding gene names. Connections between nodes display physical interactions as recorded in the public databases, where a yellow arrow directed from one node to another represents a protein --> DNA interaction, and a blue line between nodes represents a protein-protein interaction. Global changes in mRNA expression (in this case, in response to a deletion of GAL4 in the presence of galactose) are visually superimposed on the network. The grayscale intensity of each node indicates the change in mRNA expression of the corresponding gene, where medium gray represents no change, darker or lighter shades represent an increase or decrease in expression, respectively, and node diameter scales with the overall magnitude of change. GAL4 is colored in red to signify that its expression level has been perturbed by external means. Highly interconnected groups of genes tend to have common biological function and are annotated accordingly (rectangular labels).

Ideker et al. Science 292:929, 2001

Page 59: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

• Database of Interacting Proteins (DIP)

(http://dip.doe-mbi.ucla.edu/; UCLA)

• Biomolecular Interaction Network Database (BIND)

(http://bind.ca/; Mount Sinai Hospital, Canada)

• Human Protein Reference Database (HPRD)

(http://www.hprd.org/; Johns Hopkins University and the Institute of Bioinformatics)

• MIPS Mammalian Protein-Protein Interaction Database

(http://mips.gsf.de/proj/ppi/; Munich Information Center for Protein Sequences)

More can be found in http://mips.gsf.de/proj/ppi/

Page 60: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

Database of Interacting Proteins (DIP)

The DIPTM database catalogs experimentally determined interactions between proteins. It combines information from a variety of sources to create a single, consistent set of protein-protein interactions. The data stored within the DIP database were curated, both, manually by expert curators and also automatically using computational approaches that utilize the knowledge about the protein-protein interaction networks extracted from the most reliable, core subset of the DIP data.

Page 61: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 62: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 63: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 64: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

BIND Database

Page 65: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 66: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 67: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 68: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 69: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

MIPS (Mammalian Protein-Protein Interaction) Database is a collection of manually curated high-quality PPI data collected from the scientific literature by expert curators. We took great care to include only data from individually performed experiments since they usually provide the most reliable evidence for physical interactions.

Page 70: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

Human Protein Reference Database

A centralized platform to visually depict and integrate information pertaining to domain architecture, post-translational modifications, interaction networks and disease association for each protein in the human proteome. All the information in HPRD has been manually extracted from the literature by expert biologists who read, interpret and analyze the published data.

Page 71: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

Biological Pathway Databases

• KEGG (GenomeNet)

• Biocarta ( NCBI)

• BioPax (Biological Pathway Exchange)

* Ingenuity Systems

* GeneGo

Page 72: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 73: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 74: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 75: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 76: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

ARGININE AND PROLINE METABOLISM

Page 77: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

Biocarta Pathways http://cgap.nci.nih.gov/Pathways/BioCarta_Pathways

http://www.biocarta.com/

Page 78: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 79: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 80: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 81: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 82: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

http://www.biopax.org/

Page 83: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

BioPAX Motivation

Before BioPAX With BioPAX

Common format will make data more accessible, promoting data sharing and distributed curation efforts

>150 DBs and tools

Database

Application

User

Page 84: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

Ingenuity Systems - Analyze expression/other biological data in pathways/networks

Page 85: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

Ingenuity Systems – example

Page 86: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

Gene Ontology (GO)

http://www.geneontology.org/

Page 87: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 88: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 89: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

• Gene Card

• Human genes, proteins and diseases db

• http://www.genecards.org/

• Omin

• Online Mendelian Inheritance in Man

• http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM

Molecular Pathology and Disease Information Databases

Page 90: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

Human disease database

GeneCards

Omin

Page 91: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 92: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 93: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 94: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 95: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 96: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 97: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 98: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

COG/KOG

{COGNITOR/KOGNITOR}

Clusters of Orthologous Groups of proteins (COGs)

Page 99: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 100: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 101: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

SAGE Database

Serial Analysis of Gene Expression

Page 102: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 103: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 104: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 105: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

GEO (Gene Expression Omnibus)

http://www.ncbi.nlm.nih.gov/geo/

Page 106: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

GPLPlatform

descriptions

GSMRaw/processedspot intensities

from a singleslide/chip

GSEGrouping of

slide/chip data“a single experiment”

GDSGrouping ofexperiments

Curated byNCBI

Submitted byExperimentalistsSubmitted by

Manufacturer*

Entrez GEOEntrez

GEO Datasets

Page 107: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

Submit and update data

Query the database:• gene identifiers• field information• sequence

Browse datasets

Download data

Redesigned

with

new features

Page 108: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

From Unigene: Hs.194143

Page 109: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 110: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

Sequence and literature Search/Retrieval

• Entrez

• SRS

• ftp

Page 111: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

Major sequence databases accessible through the Internet

1. GenBank - National Center for Biotechnology Information (NCBI), USA http://www.ncbi.nih.gov/Entrez/

2. European Molecular Biology Laboratory (EMBL) - European Bioinformatics Institute http://www.ebi.ac.uk/embl/index.html

3. DNA DataBank of Japan (DDBJ) - Mishima, Japanhttp://www.ddbj.nig.ac.jp/

4. Protein International Resource (PIR) - National Biomedical Research Foundation (NBRF), USAhttp://www-nbrf.georgetown.edu/pirwww/

5. SwissProt - Swiss Institute for Experimental Cancer Researchhttp://www.expasy.org/cgi-bin/sprot-search-de

6. Sequence Retrieval System (SRS) - European Bioinformatics Institute http://srs6.ebi.ac.uk

Page 112: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 113: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 114: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 115: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

Protein Structure Databases

• PDB (Protein Data Bank)

http://www.rcsb.org/pdb/

• Entrez Structure (NCBI)

Page 116: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 117: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 118: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

ftp ftp.ncbi.nih.gov

ftp ftp.expasy.org

ftp ftp.ebi.ac.uk

ftp ftp.ddbj.nig.ac.jp

Retrieve complete sets of data

Page 119: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

Retrieve Raw Sequencing Data from

NCBI Trace Archive Database

Page 120: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 121: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 122: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

Literature Searches

Entrez Pubmed (NCBI)

Entrez Pubmed Central (NCBI)

SRS (EMBL-EBI)

Gopubmed (Ontology-based Literature search)

Page 123: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 124: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 125: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw
Page 126: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw

http://www.geocities.com/bioinformaticsweb/datalink.html

More bio-db can be found in Bioinformatics web

Page 127: Biological Databases November 30, 2006 Wailap V. Ng Institute of Biotechnology in Medicine Institute of Bioinformatics National Yang Ming University wvng@ym.edu.tw