23
window programs and dot plots Sept. 25, 2012 Learning objectives-Become familiar with OMIM and PubMed. Understand the difference between a research article and review article. Understand the concept of sliding window programs. Understand difference between identity, similarity and homology. Appreciate that proteins can be modular Workshop-Learn how to use OMIM and obtain DNA and proteins sequences associated with diseases. Perform sliding window to compute % (G+C) as a function of position in sequence. Homework due Tuesday, Oct. 2 nd .

NCBI data, sliding window programs and dot plots

  • Upload
    trula

  • View
    33

  • Download
    0

Embed Size (px)

DESCRIPTION

NCBI data, sliding window programs and dot plots. Sept. 25, 2012 - PowerPoint PPT Presentation

Citation preview

Page 1: NCBI data, sliding window programs and dot plots

NCBI data, sliding window programs and dot plots

Sept. 25, 2012

Learning objectives-Become familiar with OMIM and PubMed. Understand the difference between a research article and review article. Understand the concept of sliding window programs. Understand difference between identity, similarity and homology. Appreciate that proteins can be modular

Workshop-Learn how to use OMIM and obtain DNA and proteins sequences associated with diseases. Perform sliding window to compute %(G+C) as a function of position in sequence.

Homework due Tuesday, Oct. 2nd.

Page 2: NCBI data, sliding window programs and dot plots

Primary public domain bioinformatics servers

Public DomainBioinformatics

Facilities

European BioinformaticsInstitute (EBI)

United Kingdom

National CenterFor Biotechnology

Information (NCBI)United States

GenomeNet

(KEGG & DDBJ)Japan

DatabasesAnalysis

ToolsDatabases

AnalysisTools

DatabasesAnalysis

Tools

Page 3: NCBI data, sliding window programs and dot plots

NCBI ENTREZ

A platform that provides access to and links to databases with biological information

ENTREZPubMed

GenBank Proteindatabases

Genomes PopSet Taxonomy OMIMMedLine

Page 4: NCBI data, sliding window programs and dot plots

NCBI ENTREZ

GenBank

Proteindatabases

Genomes

PopSet

Taxonomy

OMIM

MedLine Literature Database

Database of DNA sequences that have been collected toanalyze the evolutionary relatedness of a population.

Database of human genes and genetic disorders

Database of all publicly available DNA sequences

Database of amino acid sequences from Uniprot, Protein ResearchFoundation, PDB.

Database of genomes from organisms and viruses

Database of names of organisms with sequences in GenBank.

Page 5: NCBI data, sliding window programs and dot plots

Literature DatabasesMedline/PubmedOMIMCSULA LibraryBookshelf (from NCBI)Melvyl (Books at UC Libraries)Other molecular life science databases Science Direct Pub Med Central Free Medical Journals LinkOut Journals Wiley InterScience

Page 6: NCBI data, sliding window programs and dot plots

OMIM-Online Mendelian Inheritance in Man

A catalog of human genes linked to diseasesVictor A. McKusick at Johns Hopkins UniversityA good place to start when you want to research a certain disease or biological moleculeThis database is cross-referenced to PubMed and other NCBI-based databases

Page 7: NCBI data, sliding window programs and dot plots

Sliding window

A sliding window-gathers information about properties of nucleotides or amino acids.

GCATATGCGCATATCCCGTCAATACCA

GCATATGCGCATATCCCGTCAATACCA

GCATATGCGCATATCCCGTCAATACCA

4

5

6

A simple example is to calculate the %(G+C) content within a window. Then move the window one nucleotide and repeat the calculation.

Page 8: NCBI data, sliding window programs and dot plots

Sliding window

If the window is too small it is difficult to detect the trendof the measurement. If too large you could miss meaningfuldata.

Large window size

Small window size

%(G+C)

%(G+C)

Sequence number

Sequence number

Page 9: NCBI data, sliding window programs and dot plots

Sliding window

Adapted from Zhao et al, BMC Genomics. 2007 Nov 7;8:403.

Page 10: NCBI data, sliding window programs and dot plots

Amino acid characteristics

Page 11: NCBI data, sliding window programs and dot plots

Amino Acid Hydrop. VALUEA 1.8C 2.5D -3.5E -3.5F 2.8G -0.4H -3.2I 4.5K -3.9L 3.8M 1.9N -3.5P -1.6Q -3.5R -4.5S -0.8T -0.7V 4.2W -0.9Y -1.3

Page 12: NCBI data, sliding window programs and dot plots

Four levels of protein structure

1) Primary

2) Secondary

3) Tertiary

4) Quaternary

Linear sequence- AGHIPLLQ

Initial folding patterns-AGHIPLLQ TTT

Complex folding patterns-

Interactions between polypeptides

Page 13: NCBI data, sliding window programs and dot plots

Kyte-Doolittle Hydropathy

– A sliding window software program [J. Mol. Biol. 157:105-132 (1982)].

The seven known membrane-spanning regions are numbered 1-7 in red on the plot. Note that this particular software program averaged the hydropathy values in the window (http://www.vivo.colostate.edu/molkit/hydropathy/index.html). The original program by Kyte and Doolittle summed the hydropathy values.

Page 14: NCBI data, sliding window programs and dot plots

Dot Plot with window = 1

Window = 1

Note that 25% ofthe table will befilled due to randomchance. 1 in 4 chanceat each position

A T G C C T A G

A

T

G

C

C

T

A

G

●●

Page 15: NCBI data, sliding window programs and dot plots

Dot Plot with window = 3

Window = 3The larger the windowthe more noise canbe filtered

What is thepercent chance thatyou will receive a match randomly? Onein (four)3

chance.(¼)3 * 100 = 1.56%

{

A T G C C T A G G A

T G C C T A G

● ●

● ●

● ●

Page 16: NCBI data, sliding window programs and dot plots

Do workshop #2

Answer questions 1-3

Page 17: NCBI data, sliding window programs and dot plots

Evolutionary Basis of Sequence Alignment

1. Identity: Quantity that describes how muchtwo sequences are alike in the strictest terms.2. Similarity: Quantity that relates how much two amino acid sequences are alike.3. Homology: A conclusion drawn from datasuggesting that two genes share a commonevolutionary history.

Page 18: NCBI data, sliding window programs and dot plots

Purpose of finding differences and similarities of amino acids in two proteins.

Infer structural information

Infer functional information

Infer evolutionary relationships

Page 19: NCBI data, sliding window programs and dot plots

Modular nature of proteins

Proteins possess local regions of similarity.

Proteins can be thought of as assemblies of modular domains.

Page 20: NCBI data, sliding window programs and dot plots

Two proteins that are similar in certain regions

Tissue plasminogen activator (PLAT)Coagulation factor 12 (F12).

Baxevanis and Ouellette, Bioinformatics, Wiley-Interscience, New York, 2001

Page 21: NCBI data, sliding window programs and dot plots

The Dotter Program

• Program consists of three components:

•Sliding window

•A table that gives a score for each amino acid match

•A graph that converts the score to a dot of certain density (the higher the dot density the higher the score)

Page 22: NCBI data, sliding window programs and dot plots
Page 23: NCBI data, sliding window programs and dot plots

Dot plot of sequence alignment highlighting Kringle domain alignments. Adapted from Baxevanis, Ouellette: Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, 2nd Edition.