58
Bio-Informatics Lectures A Short Introduction

mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Bio-Informatics LecturesA Short Introduction

Page 2: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

The History of Bioinformatics

Page 3: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Sanger Sequencing

PCR in presence of fluorescent, chain-terminating dideoxynucleotides

Page 4: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Massively Parallel Sequencing

Page 5: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Massively Parallel Sequencing

Illumina/Solexa

Page 6: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Roche/454, Emulsion PCR

Metzker, Nature Review: Genetics (11):31-46

Page 7: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Illumina/Solexa: Solid-Phase Amplification

Page 8: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required
Page 9: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required
Page 10: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required
Page 11: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required
Page 12: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required
Page 13: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required
Page 14: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required
Page 15: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required
Page 16: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required
Page 17: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required
Page 18: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

http://www.genome.gov/sequencingcosts/

Page 19: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

http://www.genome.gov/sequencingcosts/

Page 20: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

~200 million sequences

1000 billion basesGrowth of GenBank and WGS

http://www.ncbi.nlm.nih.gov/genbank/statistics

Page 21: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Growth of UniProtKB/TrEMBL

http://www.ebi.ac.uk/uniprot/TrEMBLstats

Page 22: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

How Does the Sequence Information Tell Us?

Page 23: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

How Does the Sequence Information Tell Us?

Bio-Informatics

Page 24: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Scope of this lab

DATABASES: GenBank-http://www.ncbi.nlm.nih.gov EMBL-http://www.ebi.ac.uk DDBJ-http://www.ddbj.nig.ac.jp

Sequence Search and Retrieval: BLAST Sequence Alignement: ClustalW2, MAFFT Sequences Analysis and Domain Search: Pfam and SMART Protein Structure and Prediction: Pymol Molecular Evolution: MEGA

1. Be familiar with sequence databases and some online bioinformatics tools

http://www.ebi.ac.uk/services/all

More Tools to Discover on Your Own

http://www.expasy.org

Page 25: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Online Tools

Page 26: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Scope of this lab

2. Touch Some Simple Programming (Stand-alone)

Basic UNIX Commands: cd, mkdir, mv. cp, rm, cat, ls, pwd, gunzip, unzip, tar

Perl: String, Array, Hash

R: Read a file, column, row, plot, hist, heat map

Page 27: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Beginning with a DNA Sequence

Page 28: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Proteins

The primary sequence, structure, and function

of a protein are inter-related

MQIFVKTLTGKTITLEVESSDTIDNVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLADYNIQKESTLHLVLRLRGG

N-termnus

C-termnus

Page 29: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Database Sequence Similarity Searching

Definition: Applies computation, mathematical algorithms, statistical inference to rapidly find similar sequences (hits) to a target (query) sequence from a database.

All similarity searching methods rely on the concepts of alignment between sequences.

A similarity score is calculated from a distance: the number of DNA bases or amino acids that are different between two sequences.

Page 30: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Edit Distance

Page 31: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Edit Distance

Page 32: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Sequence Alignement and Dynamic Programming

Page 33: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Sequence Alignement Comparison and Substitution Matrix

Some popular scoring matrices are:

PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required.

BLOSUM (BLOcks amino acid Substitution Matrix): for finding common motifs. For example in BLOSUM62, the alignment is created using sequences sharing no more than 62% identity.

Experimentation has shown that the BLOSUM-62 matrix is among the best for detecting most weak protein similarities.

Page 34: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Sequence Alignement Comparison and Substitution Matrix

Some popular scoring matrices are:

PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required.

BLOSUM (BLOcks amino acid Substitution Matrix): for finding common motifs. For example in BLOSUM62, the alignment is created using sequences sharing no more than 62% identity.

Experimentation has shown that the BLOSUM-62 matrix is among the best for detecting most weak protein similarities.

Page 35: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Sequence Alignement Comparison and Substitution Matrix

Some popular scoring matrices are:

PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required.

BLOSUM (BLOcks amino acid Substitution Matrix): for finding common motifs. For example in BLOSUM62, the alignment is created using sequences sharing no more than 62% identity.

Experimentation has shown that the BLOSUM-62 matrix is among the best for detecting most weak protein similarities.

Page 36: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Sequence Alignement Comparison and Substitution Matrix

Some popular scoring matrices are:

PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required.

BLOSUM (BLOcks amino acid Substitution Matrix): for finding common motifs. For example in BLOSUM62, the alignment is created using sequences sharing no more than 62% identity.

Experimentation has shown that the BLOSUM-62 matrix is among the best for detecting most weak protein similarities.

Page 37: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Sequence Alignement Comparison and Substitution Matrix

Page 38: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Sequence Alignement Comparison and Substitution Matrix

Log-odds matrices

Page 39: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Local and Global Alignements

Smith-Waterman

Needleman-Wunsch

Page 40: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

BLAST/FASTA Search and k-Tuple Method

Page 41: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Use proteins for database similarity searches when possible

Page 42: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required
Page 43: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Lab 1

Sequence Search and Retrieval: BLAST Sequence Alignement: ClustalW2, MAFFT Sequences Analysis and Domain Search: Pfam and SMART Protein Structure and Prediction: Pymol Molecular Evolution: MEGA

Sequence Format - Fasta

>AT4G05320 ATGCAGATCTTTGTTAAGACTCTCACCGGAAAGACAATCACCCTCGAGGTGGAAAGCTCCGACACCATCGACAACGTTAAGGCCAAGATCCAGGATAAGGAGGGCATTCCTCCGGATCAGCAGAGGCTTATTTTCGCCGGCAAGCAGCTAGAGGATGGCCGTACGTTGGCTGATTACAATATCCAGAAGGAATCCACCCTCCACTTGGTCCTCAGGCTCCGTGGTGGTATGCAGATTTTCGTTAAAACCCTAACGGGAAAGACGATTACTCTTGAGGTGGAGAGTTCTGACACCATCGACAACGTCAAGGCCAAGATCCAAGACAAAGAGGGTATTCCTCCGGACCAGCAGAGGCTGATCTTCGCCGGAAAGCAGTTGGAGGATGGCAGAACTCTTGCTGACTACAATATCCAGAAGGAGTCCACCCTTCATCTTGTTCTCAGGCTCCGTGGTGGTATGCAGATTTTCGTTAAGACGTTGACTGGGAAAACTATCACTTTGGAGGTGGAGAGTTCTGACACCATTGATAACGTGAAAGCCAAGATCCAAGACAAAGAGGGTATTCCTCCGGACCAGCAGAGATTGATCTTCGCCGGAAAACAACTTGAAGATGGCAGAACTTTGGCCGACTACAACATTCAGAAGGAGTCCACACTCCACTTGGTCTTGCGTCTGCGTGGAGGTATGCAGATCTTCGTGAAGACTCTCACCGGAAAGACCATCACTTTGGAGGTGGAGAGTTCTGACACCATTGATAACGTGAAAGCCAAGATCCAGGACAAAGAGGGTATCCCACCGGACCAGCAGAGATTGATCTTCGCCGGAAAGCAACTTGAAGATGGAAGAACTTTGGCTGACTACAACATTCAGAAGGAGTCCACACTTCACTTGGTCTTGCGTCTGCGTGGAGGTATGCAGATCTTCGTGAAGACTCTCACCGGAAAGACTATCACTTTGGAGGTAGAGAGCTCTGACACCATTGACAACGTGAAGGCCAAGATCCAGGATAAGGAAGGAATCCCTCCGGACCAGCAGAGGTTGATCTTTGCCGGAAAACAATTGGAGGATGGTCGTACTTTGGCGGATTACAACATCCAGAAGGAGTCGACCCTTCACTTGGTGTTGCGTCTGCGTGGAGGTATGCAGATCTTCGTCAAGACTTTGACCGGAAAGACCATCACCCTTGAAGTGGAAAGCTCCGACACCATTGACAACGTCAAGGCCAAGATCCAGGACAAGGAAGGTATTCCTCCGGACCAGCAGCGTCTCATCTTCGCTGGAAAGCAGCTTGAGGATGGACGTACTTTGGCCGACTACAACATCCAGAAGGAGTCTACTCTTCACTTGGTCCTGCGTCTTCGTGGTGGTTTCTAA

Page 44: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Lab 1 - BLAST

Page 45: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Lab 1 - BLAST

Page 46: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Lab 1 - BLAST

Page 47: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

E value: is the expectation value or probability to find by chance hits similar to your sequence. The lower the E, the more significant the score.

Lab 1 - BLAST

Page 48: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Lab 1 - BLAST

Page 49: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Lab 1 - BLAST

Page 50: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Lab 1 - BLAST

Page 51: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Lab 1 - BLAST

Page 52: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Lab 1 - BLAST

Page 53: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Lab 1 - Domain Search

Page 54: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Lab 1 - Domain Search

Page 55: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Lab 1 - Domain Search

Page 56: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Lab 1 - Structure Visualization

Pymol

Page 57: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Lab 1 - Phylogenetics

http://www.megasoftware.net

Page 58: mcb7300 Bioinformatics Lab1 - Ohio University...PAM (Point Accepted Mutation): for evolutionary studies. For example in PAM1, 1 accepted point mutation per 100 amino acids is required

Lab 1 - Phylogenetics

UPGMA (Unweighted Pair Group Method with Arithmetic Mean)

Maximum likelihood

Maximum parsimony

Neighbor joining

MrBayes: Bayesian Inference of Phylogeny