Upload
samson-terry
View
219
Download
4
Embed Size (px)
Citation preview
The Cobweb of life revealed by Genome-Scale estimates of Horizontal Gene Transfer
Fan Ge, Li-San Wang, Junhyong Kim
Mourya Vardhan
Outline
• Controversy : The extent of HGT affecting the core genealogical history• Examination of this controversy by assessing the extent among
core orthologous genes
• A novel statistical method : To asses the extent of HGT based on comparisons of tree topology
Introduction
• Horizontal gene transfer (HGT) refers to the transfer of genes between organisms in a manner other than traditional reproduction.
• Whole genome analyses of different prokaryotes have been thought to indicate rampant HGTs
• There is an on going debate over the estimation of HGT frequency and its impact on phylogeny
• Inference of HGT from tree comparisons should be done under a proper statistical framework
Methodology to assess the extent
• New method to explicitly test for phylogenetic incongruence due to horizontal transfer versus statistical tree errors
• Used Clusters of Orthologous Groups (COG) from NCBI databases• Extracted most reliable COGs
• Built gene tree for every COG and integrated to construct W-G tree
• Comparisons of each gene tree with W-G tree to infer significant HGT
• Augmented this method to pairwise comparisons of gene trees to detect conflicts
High-Quality Gene Groups and the W-G Tree
• COG database is built by redoing sequence comparisons over 43 genomes
• This resulted in retention of 297 high quality COG entries out of 3852
• To approximate the W-G tree, they used median tree estimator
• The estimate used boot strap values from bootstrap sampling
Detection of HGT events
• By comparison of estimated trees against other gene trees or against trees that represent the history of genomes, we infer HGTs
• Discrepancy in the trees maybe caused due to HGT or other errors
• Distance metrics are used to test discrepancies
• The paper explicitly asks if the discrepancies are caused by HGT events, as an additional precaution.
Comparison Metrics
• Maximum agreement subtree (MAST) - If two trees differ by branches, they share common subtree, the bound on size of the shared subtree can be calculated using MAST
• Symmetric Difference (SD) - Difference in the trees can be found by this metric
Interpretation of HGT events…
• Case 1: • If both MAST and SD are low, trees are most likely not different
• Case 2: • If both the metrics are large, can be either HGT events or errors
• Case 3: • But if they have large SD and low MAST values, it is most likely an HGT event.
• Case 4: • Large MAST and low SD cannot occur due to algorithmic reasons
SD and MAST scores for Gene Tree 1 and the W-G tree are 2 and 2, while the scores for Gene Tree 2 and the W-G tree are
8 and 2
The Hypothesis Test• Hypothesis test Ɣ – difference of the two metrics
• Computed by generating null distribution by bootstrapping gene trees
• HGT was inferred when the observed Ɣ was significant with the p-value below the 5% level
• Simulation studies applied to each COG showed it detecting HGT events as follows, in a COG tree using the 5% significanceHGT Events Rates
1 53.8
2 70
3 77.3
• ds is the SD metric
• dm is the MAST metric
• m,n are the no. of branch splits
• X is the no. of taxa
• Used PAUP software to calculate
HGT Estimation via Comparisons between Each Gene Tree and the W-G Tree
• Hypothesis Test was applied to each COG
• Observations showed that the test does not significantly vary with the p-value
• At 5% level, 33/297 (11.1%) COGs showed putative HGTs
• These COGs are termed hCOGs
The Relationship between Detecting COG entries with HGT and the p-Values
HGT Estimation via Comparisons among Gene Trees
• Problem with comparing the Gene tree and W-G tree is that the results are sensitive to the W-G tree
• COG entries do not all share the same taxa
• If its a hCOG, it should test differently for all the comparisons
• 14,004 pairs of gene trees that contained greater than or equal to six shared taxa were compared
• At 5% level, 1,764/14,004 (12.6%) pairs were significant
Identification of transferred branches in gene trees.
• For each COG that tested positive for HGT events, transferred branches were found by exhaustive enumeration of possible subtree matches
• Searched for all combinations of branch prunings to find the ‘‘troublesome’’ branches
• If there’s only one way to prune to make the trees congruent, it is an HGT event
Color HGT Rates
Red >4%
Yellow 3%–4%
Pink 2%–3%
Blue 1%–2%
Green 1%
References
1. Goddard W, Kubicka E, Kubicki G, McMorris FR (1994) The agreement metric for labeled binary trees. Math Biosci 123: 215–226.
2. Robinson DF, Foulds LR (1981) Comparison of phylogenetic trees. Math Biosci 53: 131–147
3. Conover WJ (1999) Practical nonparametric statistics, 3rd ed. New York: Wiley. 584 p.
4. Eisen JA (2000) Horizontal gene transfer among microbial genomes: New insights from complete genome analysis. Curr Opin Genet Dev 10: 606–611
Thank You!