| Literature DB >> 19808865 |
Abstract
Macrogenomic events, in which genes are gained and lost, play a pivotal evolutionary role in microbial evolution. Nevertheless, probabilistic-evolutionary models describing such events and methods for their robust inference are considerably less developed than existing methodologies for analyzing site-specific sequence evolution. Here, we present a novel method for the inference of gains and losses of gene families. First, we develop probabilistic-evolutionary models describing the dynamics of gene-family content, which are more biologically realistic than previously suggested models. In our likelihood-based models, gains and losses are represented by transitions between presence and absence, given an underlying phylogeny. We employ a mixture-model approach in which we allow both the gain rate and the loss rate to vary among gene families. Second, we use these models together with the analytic implementation of stochastic mapping to infer branch-specific events. Our novel methodology allows us to infer and quantify horizontal gene transfer (HGT) events. This enables us to rank various gene families and lineages according to their propensity to undergo gains and losses. Applying our methodology to 4,873 gene families shows that: 1) the novel mixture models describe the observed variability in gene-family content among microbes significantly better than previous models; 2) The stochastic mapping approach enables accurate inference of gain and loss events based on simulations; 3) At least 34% of the gene families analyzed are inferred to have experienced HGT at least once during their evolution; and 4) Gene families that were inferred to experience HGT are both enriched and depleted with respect to specific functional categories.Entities:
Mesh:
Year: 2009 PMID: 19808865 PMCID: PMC2822287 DOI: 10.1093/molbev/msp240
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
FToy example. Shown is the computation of the posterior expectation of the number of gain events for the branch connecting nodes N1 and N2. The total expectation equals 0.53 and is computed as the weighted sum over four scenarios: N1 = 0 and N2 = 0, N1 = 0 and N2 = 1, N1 = 1 and N2 = 0, and N1 = 1 and N2 = 1. The gain and loss rates of the Q matrix used for this computation are 0.35 and 0.7, respectively, and πROOT=0.5. For each scenario, the most plausible event leading to a gain event is depicted.
Comparison of Evolutionary Models Used for the Analysis of Phyletic Patterns.
| Model | Assumptions | MLE of Model Parameters | Maximum Log-Likelihood |
| M1 + | Rate ∼ | −91,962.8 | |
| M2 + | Rate ∼ | −90,293.7 | |
| MM1 | Gain ∼ | −91,873.9 | |
| MM2 | Gain ∼ | −89,590.1 | |
MLE denotes maximum likelihood estimate.
FThe empirical distributions of gain and loss rates. The empirical distribution of gain rates (red) and loss rates (blue) were computed for all 4,873 COG gene families. The bins denoted by the symbols “†” and “‡” represent the loss rate of the 63 gene families that are present in all species and the loss rate of the 288 gene families that are present only in the three eukaryotes, respectively.
FROC curve for the inference of gain events. The accuracy of the stochastic mapping method to infer gain events for a given gene family along a specific branch was evaluated using simulations.
Percent of Transferable Gene Families in Functional Categories that Significantly Differ from the Background Percent of All Gene Families.
| Functional Categories | Transferable Gene Families Out of the Total Number of Gene Families in Each Functional Category ( |
| (A) | |
| | 38.93% (0.028) |
| | 37.19% (0.1 |
| Carbohydrate transport and metabolism | 46.96% (0.011) |
| Replication, recombination, and repair | 46.63% (0.011) |
| General function prediction only | 39.03% (0.092 |
| Energy production and conversion | 41.47% (0.1 |
| Depleted Functional categories | |
| (B) | |
| | 25.9% (0.00053) |
| | 24.9% (0.00023) |
| Translation, ribosomal structure, and biogenesis | 12.25% (3.72E−09) |
| Intracellular trafficking, secretion, and vesicular transport | 12.03% (1.66E−06) |
| Transcription | 18.61% (0.00015) |
| RNA processing and modification | 4.0% (0.011) |
| Cell motility | 17.71% (0.012) |
| Cell cycle control, cell division, and chromosome partitioning | 19.44% (0.06 |
(A) Enriched categories (significantly higher than 34.23%). (B) Depleted functional categories (significantly lower than 34.23%). Supercategories are in bold and in upper-case letters. All P values were computed using Fisher's exact test.
Functional categories for which the P value is not significant after FDR correction but lower or equal to 0.1.