37
Authors: Phan, V., Saha, S., Pandey, A., Wong, T-Y Published in: Intl. Journal of Data Mining and Bioinformatics Vol. 4, No. 4, 2010 Presented by: Khaled Monsoor Bioinformatics Masters Program The University of Memphis Mail: [email protected] Date: Nov 05, 2010

Maximizing hidden stop codon on gene design

Embed Size (px)

DESCRIPTION

Khaled Monsoor's presentation on a paper on "Maximizing hidden stop codon on gene design

Citation preview

Page 1: Maximizing hidden stop codon on gene design

Authors: Phan, V., Saha, S., Pandey, A., Wong, T-Y

Published in: Intl. Journal of Data Mining and Bioinformatics

Vol. 4, No. 4, 2010

Presented by:

Khaled MonsoorBioinformatics Masters ProgramThe University of MemphisMail: [email protected]

Date: Nov 05, 2010

Page 2: Maximizing hidden stop codon on gene design

What ?

Why ?

How ?

Result ?

Conclusion

Synthetic gene design with a large number of hidden stops

Page 3: Maximizing hidden stop codon on gene design
Page 4: Maximizing hidden stop codon on gene design

Like him …

Sleeping is waste of precious time

Page 5: Maximizing hidden stop codon on gene design

What are the Hidden stops in genes ?

Can we “redesign” genes to include more Hidden stops ?

How clever computer algorithms can help us ?

Page 6: Maximizing hidden stop codon on gene design

What ?

Why ?

How ?

Result ?

Conclusion

Synthetic gene design with a large number of hidden stops

Page 7: Maximizing hidden stop codon on gene design

It is now feasible to construct artificial genomes.

Researchers at the C. Venter Research Institute created artificially the genome of Mycoplasma genitalium, completed in 2010

…. To increase efficiency of protein synthesis in ‘designed’ genes ?

How to increase efficiency …

Hidden stops can protect from frame shifts

by terminating them early

Without hidden stops, frame shifts can cause

very long non-functional proteins

Page 8: Maximizing hidden stop codon on gene design

Dictates what a protein is composed of

Has evolved through millions of years

A protein is a sequence of amino acids

Contains 20(twenty) amino acids

8

Page 9: Maximizing hidden stop codon on gene design
Page 10: Maximizing hidden stop codon on gene design

mRNA:

ATGTCCAAACCT

Protein:

M S L P

10

Page 11: Maximizing hidden stop codon on gene design

11

CCT, CCC, CCA, CCG all represent P (Proline)

A mutation in the 3rd

positions does not change the amino acid

Page 12: Maximizing hidden stop codon on gene design

Deletion creates frame shifts, which change entire subsequence content

RNA: ….. CAT.CAT.CAT.CAT ….

Protein: …HHHH… (chain of Histidine)

Deletion of 3rd character (T): CAC.ATC.ATC.AT

Protein: HII

... Totally bizarre something else !!!

12

Page 13: Maximizing hidden stop codon on gene design

:-(

Page 14: Maximizing hidden stop codon on gene design

(start) (codon)k (stop)

Start – ATG

Stop – TAA, TAG, TGA

Codon – any triplet not equal to TAA, TAG, or TGA

Example: ATG.ACC.AAT.CGG.TAA

14

Stop codon (but hidden)

Page 15: Maximizing hidden stop codon on gene design

Hidden stops can protect against frame shifts by terminating consequence translation early

Without hidden stops, frame shifts can cause very long non-functional proteins, resulting to NOT

ONLY waste of time, amino acid resources (money), ATP (energy) but also produce some

deadly toxin

15

Ref: Seligmann and Pollock, DNA and Cell Biology, 2004

Page 16: Maximizing hidden stop codon on gene design

What ?

Why ?

How ?

Result ?

Conclusion

Synthetic gene design with a large number of hidden stops

Page 17: Maximizing hidden stop codon on gene design

•Design genes with maximum hiddenstops

•Constraints:

1. None,

2. by matching GC content, and

3. by matching codon usage

17

Page 18: Maximizing hidden stop codon on gene design

18

Page 19: Maximizing hidden stop codon on gene design

Consider this protein is MSDSKED

Both sequences encode for this protein:

1. ATG.AGT.GAT.AGT.AAA.GAA.GAC.TAA

2. ATG.TCC.GAT.TCG.AAA.GAA.GAC.TAA

Sequence (1) is better! It has 4 hidden stops!

19

Page 20: Maximizing hidden stop codon on gene design

Goal:

• Given a protein, design a DNA sequence that encodes the protein with the maximum number of hidden stops

20

Page 21: Maximizing hidden stop codon on gene design

Idea:

Optimal design of whole sequence is based on optimal design of partial sequences

H(i, j) = optimal design up to ith amino acid, Ai , which is coded by its jth codon

21

Page 22: Maximizing hidden stop codon on gene design

This formula can be computed recursively (in linear time, O(n))

H(i, j) = maxk { H(i-1, k) + Ikj }

Maximizing over all k codons coding the previous amino acid, Ai-1

Ikj = 1 if the kth codon of Ai-1 and jth codon of Ai is a stop codon

22

Page 23: Maximizing hidden stop codon on gene design

Protein DNA This is a 1-to-many mapping

Back translation should:

1. Satisfy constraints imposed by host genomes,

2. Serve specific design purpose

23

Page 24: Maximizing hidden stop codon on gene design
Page 25: Maximizing hidden stop codon on gene design

GC content = number of G & C in sequence

GC content relates to the stability of DNA

Algorithm’s objectives: 1. maximize number of hidden stops, 2. then, match GC content of host genome

25

Page 26: Maximizing hidden stop codon on gene design
Page 27: Maximizing hidden stop codon on gene design

Algorithm:

Construct the sequence with maximum number of hidden stops

“Fit” this sequence to the required Codon usage

Result:

Cannot achieve both max hidden stops and match Codon usage

Still “better” than wild-type genes

27

Page 28: Maximizing hidden stop codon on gene design

28For Leucine, codon CUG is used 51% in E. Coli.

Page 29: Maximizing hidden stop codon on gene design
Page 30: Maximizing hidden stop codon on gene design

What ?

Why ?

How ?

Result ?

Conclusion

Synthetic gene design with a large number of hidden stops

Page 31: Maximizing hidden stop codon on gene design

1. “Wild type” (genes from NCBI)

2. Random gene (constrained by Codon usage of “wild type”

3. “Optimal” – design with no constraint (max stop codon)

4. Constrained by GC content of wild type

5. Constrained by Codon usage of wild type

31

Page 32: Maximizing hidden stop codon on gene design

.

.

.

Page 33: Maximizing hidden stop codon on gene design

Nu

mb

er o

f h

idd

en s

top

co

do

n

Page 34: Maximizing hidden stop codon on gene design

What ?

Why ?

How ?

Result ?

Conclusion

Page 35: Maximizing hidden stop codon on gene design

While maintaining GC content & codon usage of wild-types, the algorithms can propose gene s with 1approx 10% more hidden stops

Maintaining both the constraints, the shape of distribution graph of ‘wild-type’ and ‘designed’ gene can maintain 98% Pearson correlation

Page 36: Maximizing hidden stop codon on gene design

As a lagging grad student,

I’ll try my best to answer

Page 37: Maximizing hidden stop codon on gene design

Thank you for attending his boring presentation … oh