DNA Sequence AnalysisDNA Sequence Analysis
Broad and Long Term ObjectiveBroad and Long Term Objective
To characterize a single clone from an To characterize a single clone from an Emiliania huxleyiEmiliania huxleyi cDNA library using sequence analysiscDNA library using sequence analysis
Research PlanResearch Plan
Preparation of Competent Cells and Bacterial Transformation
Growth of Transformant and Plasmid MiniPrep
Cycle Sequencing
Sequence analysis
Today’s Laboratory Objectives Today’s Laboratory Objectives
To learn how to characterize a DNA sequence using various web To learn how to characterize a DNA sequence using various web based bioinformatics tools including:based bioinformatics tools including:
1. BLASTN- has this piece of DNA been sequenced 1. BLASTN- has this piece of DNA been sequenced before? Does it look like anything before? Does it look like anything
already in already in GeneBank at the nucleotide GeneBank at the nucleotide level?level?
2. BLASTX- Can we identify the putative function of 2. BLASTX- Can we identify the putative function of the the transcripts? transcripts?
3. ORF Finder- What does the open reading frame 3. ORF Finder- What does the open reading frame look look like? Do we have a full length clone like? Do we have a full length clone with with an identifiable start and stop codon? an identifiable start and stop codon?
4. ClustalW- How does it compare with other 4. ClustalW- How does it compare with other sequences sequences either at the nucleotide or amino either at the nucleotide or amino acid level? acid level? What residues are conserved What residues are conserved and thus likely and thus likely to be important? And what to be important? And what residues are residues are divergent? divergent?
BLAST Database Search ToolBLAST Database Search Tool
BLAST (Basic Local Sequence Alignment Tool)BLAST (Basic Local Sequence Alignment Tool) Available on the internet and downloadableAvailable on the internet and downloadable Quick and simpleQuick and simple http://www.ncbi.nlm.nih.gov/http://www.ncbi.nlm.nih.gov/
Program Query Sequence Database Target
BLASTN Nucleotide (both strnds)
Optimized for speed not accuracy
Not good for distant homologues
Dust Option (low complexity)
Nucleotide Database
BLASTX Nucleotide translated 6 frames
Less sensitive to sequence errors and mismatches
Useful for preliminary data/EST
Dust Filter Option
Protein Database
TBLASTX Nucleotide translated 6 frames
Good for ESTs and Single Pass Sequences, Very Slow
Nucleotide Database
Translated 6 frames
BLASTP Protein Protein Database
TBLASTN Protein
Proteins against nucleotides and ESTs
Nucleotide Database
Translated 6 frames
The BLAST FamilyThe BLAST Family
The Blast AlgorithmThe Blast Algorithm
Identify HSP’s (High Scoring Segment Pairs)Identify HSP’s (High Scoring Segment Pairs)
default 11 bp or 3 aadefault 11 bp or 3 aa
Perfect matchPerfect match
Slide query and target sequence across each other until the maximum Slide query and target sequence across each other until the maximum number of HSP for that target is foundnumber of HSP for that target is found
The Blast AlgorithmThe Blast Algorithm
Score the AlignmentScore the Alignment a scoring matrx such as BLOSUM62 or PAM is useda scoring matrx such as BLOSUM62 or PAM is used
gaps introduced between GSP’s during sliding get gaps introduced between GSP’s during sliding get negative score negative score
a match gets a positive scorea match gets a positive scoretotal alignment score is subjected to statistical total alignment score is subjected to statistical
analysis to calculate the significance vs. chance of the analysis to calculate the significance vs. chance of the scorescore
Repeat for every sequence in the target databaseRepeat for every sequence in the target database Return total resultsReturn total results
Paste Sequence here
Submit Search by Clicking Here
Execute Search by Clicking Format
BLASTX ResultsBLASTX Results
Interpreting BLAST ResultsInterpreting BLAST Results
•Length
•E-Value
•Bit Score
•Identities
•Positives
Begin with “ATG” start codonEnd with “TAA”, “TAG”, or “TGA” stop codonsCan occur in any six possible reading frames
Sense Strand: Frame +1 Frame +2 Frame +3
Antisense Strand: Frame -1 Frame –2Frame -3
NCBI’s ORF FINDER and Open Reading Frames
ORF Finder AlgorithmORF Finder Algorithm
Iterates over all frames:Iterates over all frames:
Iterate to the end of frameIterate to the end of frame
Find first/next Start codonFind first/next Start codon
Continues to the next Stop codonContinues to the next Stop codon
Records the size and location of ORFRecords the size and location of ORF
List OFRs sorted by length in descending orderList OFRs sorted by length in descending order
www.ncbi.nlm.nih.gov/gorf/gorf.html
Graphical View
ORF Table
Minimum ORF Length: Can Redraw with lower cut-off
Clickable
Submit for BLAST
Selected ORF
ORF Length
ORF Translation
Multiple Sequence Alignment with Clustal WMultiple Sequence Alignment with Clustal W
HomologousHomologous residues in a set of residues in a set of sequences are aligned together in sequences are aligned together in columnscolumns
Ideally, homology reflects Ideally, homology reflects structuralstructural and and evolutionaryevolutionary conservation conservation
Evolutionary history of a residue can Evolutionary history of a residue can be deduced from be deduced from sequence sequence alignmentsalignments of sequences from of sequences from different organismsdifferent organisms
http://www.ebi.ac.uk/clustalw/
Alignment Editor
Pairwise Scores
Download file
Colored Alignment