27
Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab SoCalBSI August 24, 2006

Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab

Embed Size (px)

Citation preview

Page 1: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab

Finding detailed relationships between proteins specific to phenotypes among microbial organisms

Daniel ParkMolecular Biology Institute, UCLA

Yeates labSoCalBSI

August 24, 2006

Page 2: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab

OUTLINE

• Phylogenetic profiles

• Ternary logic analysis

• Building COG & phenotype profiles

• Results of logic analysis

Page 3: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab

OUTLINE

• Phylogenetic profiles

• Ternary logic analysis

• Building COG & phenotype profiles

• Results of logic analysis

Page 4: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab

PHYLOGENETIC PROFILES• Turning an earlier question on its side:• From, “What proteins are found in a genome?”• To, “What genomes contain a given protein?”

Page 5: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab

VARIATIONS OF PHYLOGENETIC PROFILES

• Relationships between protein families

• Relationships between protein family profile and given target ‘phenotype’ profile

Page 6: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab

OUTLINE

• Phylogenetic profiles

• Ternary logic analysis

• Building COG & phenotype profiles

• Results of logic analysis

Page 7: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab

COMPLEXITY OF CELLULAR PROCESSES

Page 8: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab

HIGHER ORDER RELATIONSHIPS:TERNARY LOGIC ANALYSIS

A B

Page 9: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab

8 LOGIC TYPES FOR PHYLOGENETIC PROFILE TRIPLETS

Page 10: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab

MEASURING MUTAL INFORMATION BETWEEN TWO PROFILES

Where U is the uncertainty coefficient relating profiles x and y H is the Shannon entropy of the probability distributions

Range of U: [0,1] Ex. U = 0.88 88% decrease in uncertainty

High value of U indicates high

mutual information between x and y

)(/)],()()([)|( xHyxHyHxHyxU

Page 11: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab

MEASURING MUTAL INFORMATION AMONG THREE PROFILES

U(c | f(a,b)) where f(a,b) is the logical combination of a and b

Constraints:

U(c|a) < xU(c|b) < xU(c|f(a,b)) > y

Page 12: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab

OUTLINE

• Phylogenetic profiles

• Ternary logic analysis

• Building COG & phenotype profiles

• Results of logic analysis

Page 13: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab

COGs: CLUSTERS OF ORTHOLOGOUS GROUPS

Set of orthologous proteins from at least three different lineages

Cluster Functional group

Page 14: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab

COMBINATIONS OF COG PROFILES MATCHING A PHENOTYPE

Page 15: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab

ASSOCIATING MORE GENOMES WITH COGS

No. of fully sequenced bacterial genomes over the last 9 years

66

354

70

50

100

150

200

250

300

350

400

1997 2003 2006

Years

No

. o

f b

acte

rial

gen

om

es

Page 16: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab

`

BUILDING COG PROFILES

• 81,480 proteins• 354 bacterial genomes• 4,613 COGs

Page 17: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab

BUILDING PHENOTYPE PROFILES

http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi

Page 18: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab
Page 19: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab

OUTLINE

• Phylogenetic profiles

• Ternary logic analysis

• Building COG & phenotype profiles

• Results of logic analysis

Page 20: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab

Cumulative no. of protein triplets recovered at an uncertainty coefficient score greater than a given

threshold

Page 21: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab

Frequency for each of the eight logic function types observed

Page 22: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab

CORRELATIONS WITH PHENOTYPES:TEMPERATURE RANGE

• For U > 0.8, one relationship between proteins was found:

Hyperthermophilicity = and( COG0432, !COG0225 )U ( Hyp. | COG0432 ) = 0.26

U ( Hyp. | COG0225 ) = 0.29

U ( Hyp. | and( COG0432, !COG0225 ) ) = 0.71

[S] COG0432: Uncharacterized conserved protein

[O] COG0225: Peptide methionine sulfoxide reductase

Page 23: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab

LOGICAL COMBINATION OF COG PROFILES MATCHING A PHENOTYPE PROFILE

c = hyperthermophilicityf = and( COG0432, !COG0225 ) a = COG0432 (Uncharacterized conserved protein)b = !COG0225 (Peptide methionine sulfoxide reductase)

Page 24: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab

CONCLUSIONS

• There may be a correlation between the absence of methionine sulfoxide reductase and the presence of an uncharacterized conserved protein in hyperthermophiles.

Page 25: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab

CONCLUSIONS

– Classified ~80,000 proteins from 354 bacterial genomes into ~4,600 COGs

– Built COG and phenotype profile matrices for 354 fully sequenced bacterial genomes

– Support that ternary relationships among COGs are biologically significant

– Support that some logic types are seen in biology more than others: 1 (and)

57 (xor)

Page 26: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab

FUTURE DIRECTIONS

• Build a richer database of phenotype profiles

• Investigate relationships at lower cutoffs

• Experimentally characterize the unknown COG0432 by crystallography

Page 27: Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab

ACKNOWLEDGEMENTS

Todd Yeates

Matteo Pellegrini

Yeates lab

Morgan Beeby

Brian O’Connor

Rest of the lab

SoCalBSI 2006

Jamil Momand

Wendie Johnston

Sandra Sharp

Nancy Warter-Perez

Ronnie Cheng

Fellow participants